Enabling IPython Cluster with Simple Azure

In this tutorial, automated functions of Simple Azure enable IPython Cluster on Windows Azure within a few steps.

Deploy virtual machines for clusters

We use Azure Data Science Core here to deploy ipython installed virtual machines on Windows Azure.

In [ ]:
from simpleazure.simpleazure import SimpleAzure as saz
In [ ]:
azure = saz()

Create 3 nodes for IPython Cluster with ADSC

In a previous tutorial, you may learn how to deploy ADSC with Simple Azure. We use same steps here to have IPython Cluster.
If you want to have more or less nodes, simply change the number of create_cluster() function.

In [ ]:
adsc = azure.get_registered_image(name="Azure-Data-Science-Core")
azure.set_location("West Europe")

Import IPython Plugin

Simple Azure will load IPython Cluster through its plugin. plugin/ directory will contain a plugin for an external software like IPython.

In [ ]:
from simpleazure.plugin import ipython
In [ ]:
ipy = ipython.IPython()

Configure ssh settings

IPython Cluster will use SSH tunneling for communication between a master and engine node(s) so SSH setting is required first.

In [ ]:

Set a master and engines

The master and engine node(s) should be defined.
We will get the information from azure object that created the cluster.

In [ ]:
from simpleazure import config
master = config.get_azure_domain(azure.results['master'])
engines = [ config.get_azure_domain(x) for x in azure.results.keys()]

Then, we assign the names to ipython plugin.

In [ ]:

Establish SSH

Now we are ready to initialize IPython Cluster through SSH. There are some functions to do this task.

In [ ]:

init_ssh() above makes paramiko objects to establish ssh.
connect_nodes() actually make connections to nodes.

In [ ]:

Create IPython profile

We will use a new profile for this cluster.

In [ ]:

IPController on a master

Once you created the profile, you can run ipcontroller on the master node.

In [ ]:

Configure engine node(s)

You need to let engine nodes know who the master is.
ipcontroller-engin.json file on the master node helps get the information.
We will copy the file to each node.

In [ ]:
ipy.copy_pkey_to_nodes() # <- Temporary function to distribute id_rsa private key to node(s)

IPEngine on engine(s)

It's close. The last step is to execute ipengine on each engine node so let them communicate with the master.

In [ ]:

It's finally done. You can now access to the master node and use IPython.parallel.Client module.
Note. thses steps can be replaced with a single wrapper function apply_ipcluster().

In [ ]: