Set up Anaconda + IPython + Tensorflow + Julia on a Google Compute Engine VM

Recently, I had to run heavy experiments that my Macbook Pro just wasn’t up to spec for. Seeing how Google is offering a nice credit bonus for signing up to their Cloud platform and a sustained-use discount, I decided to give their cloud services a go. Initially, I tried out their DataLab1 app , but quickly found myself wanting more fine-grained control.

Google_Compute_Engine_logoIt was surprisingly easy to set up a Google Compute Engine Virtual Machine (VM) with the required hardware and software resources. In my case, I needed a moderately-sized server (16-24 cores2) with IPython/Jupyter, Tensorflow and Julia. This guide shows you how to set up a similar VM step-by-step, in 30 minutes or less.

After you sign up to Google Cloud Platform, there are three basic steps to complete:

  1. Create a Linux-based VM instance with the required hardware specs.
  2. Install Software: Anaconda Python, Tensorflow and Julia.
  3. Set up Jupyter (IPython), so that you can do your MachineLearning/DataScience magic remotely via a browser.

Prerequisites: I’m going to assume you know your way around a Linux terminal.

1. Create a Linux VM Instance

GCE_CreateInstanceFollow the Quickstart guide to create a new VM instance, but note the following:

  • Machine type: a micro instance isn’t going to cut it for compute intensive tasks. I created a 16 vCPU machine; select what works for you. Note: if you need a machine with more than 24 cores, you’ll need to increase your quota.
  • Boot Disk: I’m more familiar with Ubuntu so, that’s what I picked (14.04 LTS). The setup instructions below assume you’re using Ubuntu.
  • Firewall: Allow HTTPS traffic.
  • Take note of the Zone and instance Name. You’ll need those them later in our final step. In this example, the zone is us-central1-f and the name is awesomeness.

The SSH Browser-based Terminal

Google’s Compute Engine has a sweet browser-based SSH terminal you can use to connect to your machine. We’ll be using it for additional setup below.

ssh_button_gce

Optional: Get some extra hard disk space

The VM that we instantiated comes with a 10GB SSD drive. It’s fast, but I needed more space. Follow these instructions to add more disk space.

2. Install Required Software

We’ll install three major software packages: Anaconda Python, Google Tensorflow and Julia. For many data scientists, Anaconda Python should suffice, but I wanted to play with Deep Learning models, and needed to run a tensor-factorizer I had written in Julia.

Bring up your SSH terminal. Let’s create a downloads directory to keep things organized:

mkdir downloads
cd downloads

anacondalogoInstall Anaconda Python for Scientific Computing

There’s several python distributions around, but Anaconda is my favorite. It bundles popular scientific computing libraries into a single, coherent, easy-to-install package.

Note: The following installs Python 3, if you want Python 2.x, replace Anaconda3-X.X.X… with Anaconda2-X.X.X…

In your SSH terminal, enter:

wget http://repo.continuum.io/archive/Anaconda3-4.0.0-Linux-x86_64.sh
bash Anaconda3-4.0.0-Linux-x86_64.sh

and follow the on-screen instructions. The defaults usually work fine, but answer yes to the last question about  prepending the install location to PATH:

Do you wish the installer to prepend the 
Anaconda3 install location to PATH 
in your /home/haroldsoh/.bashrc ? 
[yes|no][no] >>> yes

To make use of Anaconda right away, source your bashrc:

source ~/.bashrc

tensorflowlogoInstall Tensorflow for Deep Learning

Tensorflow is Google’s open-source deep learning / machine intelligence library. I’ve been using it for about a month and a few issues aside—some components (e.g., dynamic RNN cells) are still under active development—it’s a pleasure being able to develop state-of-the-art deep models in relatively few lines of code. We’re going to install the conda package contributed by Jonathan Helmus:

conda install -c jjhelmus tensorflow=0.8.0rc0

If you prefer, you can also install via pip: follow these instructions.

julialogoInstall Julia for (Fast) Technical Computing

Julia is a fantastic language that I’ve written about before. With a MATLAB-like syntax, Julia is easy to pick-up and work in, and the real kicker is that Julia code is fast (much faster than plain Python and MATLAB). Think C/C++ speed in far fewer lines of code. To install:

sudo add-apt-repository ppa:staticfloat/juliareleases
sudo add-apt-repository ppa:staticfloat/julia-deps
sudo apt-get update
sudo apt-get install julia

If you want to use Julia via the notebook interface, install the IJulia package.

julia -e 'Pkg.add("IJulia")'

3. Set up Jupyter (IPython)

In our final step, we’ll need to set up the Jupyter server and connect to it. The following instructions come mainly from here, with some tweaks.

Set up the Server side (on the VM)

Open up a SSH session to your VM. Check if you have a Jupyter configuration file:

ls ~/.jupyter/jupyter_notebook_config.py

If it doesn’t exist, create one:

jupyter notebook --generate-config

We’re going to add a few lines to your Jupyter configuration file; the file is plain text so, you can do this via your favorite editor (e.g., vim, emacs):

c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8123

My configuration file looks like this:

Jupyter_ConfigOnce that’s done, you have two options for starting up the server. The first is via nohup. The second is using screen. Both ensure your server doesn’t die upon logout. Using nohup is slightly easier, but I prefer the latter; I always install screen in a remote system I’m using. Choose either Option A or B below, then move on to setting up your client.

Option A: Using nohup

This is the easier option. Create a notebooks directory and start our Jupyter server there.

mkdir notebooks
cd notebooks
nohup jupyter notebook > ~/notebook.log &

Option B: Using Screen

This is the more complicated option, but you’ll learn how to use screen, which I’ve found to be tremendously useful. Install screen:

sudo apt-get install screen

and start a screen session with the name jupyter:

screen -S jupyter

The -S option names our session (else, screen will assign a numeric ID). I’ve chosen “jupyter” but the name can be anything you want.

Create a notebooks directory and start the jupyter notebook server:

cd ~/
mkdir notebooks
cd notebooks
jupyter notebook

Press CTRL-A, D to detach from the screen and take you back to the main command line. If you want to re-attach to this screen session in the future, type:

screen -r jupyter

You can now close your SSH session if you like and Jupyter will keep running.

Set up the Client Side (on your laptop/desktop)

Now that we have the server side up and running, we need to set up a SSH tunnel so that you can securely access your notebooks.

For this, you’ll need to install the Google Cloud SDK on your local machine. Come back after it’s installed.

Now, authenticate yourself:

gcloud init

and initiate a SSH tunnel from your machine to the server:

gcloud compute ssh  --zone=<host-zone> \
  --ssh-flag="-D" --ssh-flag="1080" --ssh-flag="-N" --ssh-flag="-n" <host-name>

You’ll need to replace the <host-zone> and <host-name> with the appropriate zone and host-name of your VM (that you took note of in the first step). You’ll also find the relevant info on your VM Instances page.

Finally, start up your favorite browser with the right configuration:

<browser executable path> \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/

Replace <browser executable path> with your full browser path on your system. See here for some common executable paths on different operating systems. Optional: Write a simple bash script called gcenotebook.sh to avoid having to type that whole long string each you want to launch the browser. My script looks like this:

#!/bin/bash

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="socks5://localhost:1080" \
 --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
 --user-data-dir=/tmp/

which I made executable the usual way:

chmod +x gcenotebook.sh

Finally, using the browser that you just launched, head to your server’s main notebook page:

http://<host-name>:8123

and if everything went according to plan, you should see something like this:

Jupyter_Tree

With both Python and Julia installed, you can start a notebook using either as a kernel and doing stuff:

Jupyter_sine.png

That’s it! Congratulations!

You now have a 16-core machine learning/data science compute server that you didn’t have 30 minutes ago. Have fun!

Further Reading


Footnotes

  1. If you’re exploring Python/Tensorflow, Datalab is a quick way to get up and running.
  2. I couldn’t find a machine with a GPU, but I hear Google is working on this.
Advertisements

16 thoughts on “Set up Anaconda + IPython + Tensorflow + Julia on a Google Compute Engine VM

  1. hi! thanks for your tutorial, unfortunately i have problems setting up my ssh tunnel. after entering the “gcloud compute ssh…” statement i am always getting “unknown option: “-D 1080” on my screen.

    any idea, why this ssh-flag is not working?

      1. I had the same issue, made it working splitting the parameters, maybe only the second got used:
        –ssh-flag=”-D” –ssh-flag=”1080″

      2. FYI I think you need to split that ssh flag into 2 i.e –ssh-flag=”-D” –ssh-flag=”1080″ instead of –ssh-flag=”-D 1080″ also I had to include the instance name and port to connect from my laptop to the running instance on GCE e.g PATHTOCHROME/chrome.exe “http://[GCEInstance]:8123″ –proxy-server=”socks5://localhost:1080” but maybe thats just down to Windows

  2. I followed your instructions and everything worked except the last command. when I enter http://:8123 (i have entered the proper host-name) it says no internet connection. When chrome starts up it says “Your preferences can not be read.Some features may be unavailable and changes to preferences won’t be saved.”

  3. When i launch chrom it says “Your preferences can not be read. Some features may be unavailable and changes to preferences won’t be saved.” And then when I try to connect using http://:8123 it says no internet connection

  4. Do you keep your instance running or do you turn it off when you’re not doing work? A “n1-standard-16” machine on GCE costs ~$0.6/hr, that is $432 for a typical month if kept on all the time.

  5. Hi Harold,
    This was brilliant.. Followed through and got everything working perfectly. Thank you so much.

    I have one question:
    Is there a way to make the cloud instance start jupyter at startup automatically? It would remove one more step from the workflow that way.. (and would make it more convenient to start and stop instances as needed.)

    Best,
    Can

  6. Hi, thank you for sharing experiences.

    My VM works exactly described in this page. But, where is my lovely Julia? It’s not imported into IPython.

    Thank you in advance from Korea.

    1. Hi Jang, “kernel” is just another word for the computational engine. Python and Julia are both kernels. To quote the jupyter website: “Kernels are processes that run interactive code in a particular programming language and return output to the user. Kernels also respond to tab completion and introspection requests.” You can select the kernel under New->Notebooks.

  7. my platform is window10.
    I tried following command. It does not work. Any idea?

    C:\Program Files (x86)\Google\Chrome\Application\chrome.exe \
    –proxy-server=”socks5://localhost:1080″ \
    –host-resolver-rules=”MAP * 0.0.0.0 , EXCLUDE localhost” \
    –user-data-dir=/tmp/

  8. Thank you, great tutorial, works fine.
    To remove one more step, you could specify in your executable the url to open directly :

    #!/bin/bash

    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome –proxy-server=”socks5://localhost:1080″ \
    –host-resolver-rules=”MAP * 0.0.0.0 , EXCLUDE localhost” \
    –user-data-dir=/tmp/ http://:8123

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s