Notebooks are where data scientists process, analyse, and visualise data in an iterative, collaborative environment. They typically run environments for languages like Python, R, and Scala. For years, data science notebooks have served academics and research scientists as a scratchpad for writing code, refining algorithms, and sharing and proving their work. Today, it's a workflow that lends itself well to web developers experimenting with data sets in Node.js.
To that end, pixiedust_node is an add-on for Jupyter notebooks that allows Node.js/JavaScript to run inside notebook cells. Not only can web developers use the same workflow for collaborating in Node.js, but they can also use the same tools to work with existing data scientists coding in Python.
pixiedust_node is built on the popular PixieDust helper library. Let’s get started.
Note: Run one cell at a time or unexpected results might be observed.
Install the pixiedust
and pixiedust_node
packages using pip
, the Python package manager.
# install or upgrade the packages
# restart the kernel to pick up the latest version
!pip install pixiedust --upgrade
!pip install pixiedust_node --upgrade
Requirement already up-to-date: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust) Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust) Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust) Requirement already up-to-date: pixiedust_node in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages Requirement not upgraded as not directly required: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node) Requirement not upgraded as not directly required: pandas in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node) Requirement not upgraded as not directly required: ipython in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node) Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node) Requirement not upgraded as not directly required: python-dateutil in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node) Requirement not upgraded as not directly required: pytz>=2011k in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node) Requirement not upgraded as not directly required: numpy>=1.9.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node) Requirement not upgraded as not directly required: setuptools>=18.5 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: decorator in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: pickleshare in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: simplegeneric>0.8 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: traitlets>=4.2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: prompt_toolkit<2.0.0,>=1.0.4 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: pygments in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: pexpect in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: backports.shutil_get_terminal_size in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: pathlib2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node) Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node) Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node) Requirement not upgraded as not directly required: ipython_genutils in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node) Requirement not upgraded as not directly required: enum34 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node) Requirement not upgraded as not directly required: wcwidth in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from prompt_toolkit<2.0.0,>=1.0.4->ipython->pixiedust_node) Requirement not upgraded as not directly required: scandir in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pathlib2->ipython->pixiedust_node)
Now we can import pixiedust_node
into our notebook:
import pixiedust_node
And then we can write JavaScript code in cells whose first line is %%node
:
%%node
// get the current date
var date = new Date();
It’s that easy! We can have Python and Node.js in the same notebook. Cells are Python by default, but simply starting a cell with %%node
indicates that the next lines will be JavaScript.
We can use the html
function to render HTML code in a cell:
%%node
var str = '<h2>Quote</h2><blockquote cite="https://www.quora.com/Albert-Einstein-reportedly-said-The-true-sign-of-intelligence-is-not-knowledge-but-imagination-What-did-he-mean">"Imagination is more important than knowledge"\nAlbert Einstein</blockquote>';
html(str)
"Imagination is more important than knowledge" Albert Einstein
If we have an image we want to render, we can do that with the image
function:
%%node
var url = 'https://github.com/IBM/nodejs-in-notebooks/blob/master/notebooks/images/pixiedust_node_schematic.png?raw=true';
image(url);
Print variables using console.log
.
%%node
var x = { a:1, b:'two', c: true };
console.log(x);
{ a: 1, b: 'two', c: true }
Calling the print
function within your JavaScript code is the same as calling print
in your Python code.
%%node
var y = { a:3, b:'four', c: false };
print(y);
{"a": 3, "c": false, "b": "four"}
You can also use PixieDust’s display
function to render data graphically. Configuring the output as line chart, the visualization looks as follows:
%%node
var data = [];
for (var i = 0; i < 1000; i++) {
var x = 2*Math.PI * i/ 360;
var obj = {
x: x,
i: i,
sin: Math.sin(x),
cos: Math.cos(x),
tan: Math.tan(x)
};
data.push(obj);
}
// render data
display(data);
PixieDust presents visualisations of DataFrames using Matplotlib, Bokeh, Brunel, d3, Google Maps and, MapBox. No code is required on your part because PixieDust presents simple pull-down menus and a friendly point-and-click interface, allowing you to configure how the data is presented:
There are thousands of libraries and tools in the npm repository, Node.js’s package manager. It’s essential that we can install npm libraries and use them in our notebook code.
Let’s say we want to make some HTTP calls to an external API service. We could deal with Node.js’s low-level HTTP library, or an easier option would be to use the ubiquitous request
npm module.
Once we have pixiedust_node set up, installing an npm module is as simple as running npm.install
in a Python cell:
npm.install('request');
/opt/conda/envs/DSX-Python27/bin/npm install -s request + request@2.87.0 updated 1 package in 0.856s
Once installed, you may require
the module in your JavaScript code:
%%node
var request = require('request');
var r = {
method:'GET',
url: 'http://api.open-notify.org/iss-now.json',
json: true
};
request(r, function(err, req, body) {
console.log(body);
});
... ... ... ... ... ... { iss_position: { longitude: '42.2119', latitude: '51.1124' }, timestamp: 1531779053, message: 'success' }
As an HTTP request is an asynchronous action, the request
library calls our callback function when the operation has completed. Inside that function, we can call print to render the data.
We can organise our code into functions to encapsulate complexity and make it easier to reuse code. We can create a function to get the current position of the International Space Station in one notebook cell:
%%node
var request = require('request');
var getPosition = function(callback) {
var r = {
method:'GET',
url: 'http://api.open-notify.org/iss-now.json',
json: true
};
request(r, function(err, req, body) {
var obj = null;
if (!err) {
obj = body.iss_position
obj.latitude = parseFloat(obj.latitude);
obj.longitude = parseFloat(obj.longitude);
obj.time = new Date().getTime();
}
callback(err, obj);
});
};
... ..... ..... ..... ..... ... ..... ..... ....... ....... ....... ....... ....... ..... ..... ...
And use it in another cell:
%%node
getPosition(function(err, data) {
console.log(data);
});
... ... { longitude: 42.9493, latitude: 51.1819, time: 1531779061073 }
If you prefer to work with JavaScript Promises when writing asynchronous code, then that’s okay too. Let’s rewrite our getPosition
function to return a Promise. First we're going to install the request-promise
module from npm:
npm.install( ('request', 'request-promise') )
/opt/conda/envs/DSX-Python27/bin/npm install -s request request-promise + request@2.87.0 + request-promise@4.2.2 updated 2 packages in 0.912s
Notice how you can install multiple modules in a single call. Just pass in a Python list
or tuple
.
Then we can refactor our function a little:
%%node
var request = require('request-promise');
var getPosition = function(callback) {
var r = {
method:'GET',
url: 'http://api.open-notify.org/iss-now.json',
json: true
};
return request(r).then(function(body) {
var obj = null;
obj = body.iss_position;
obj.latitude = parseFloat(obj.latitude);
obj.longitude = parseFloat(obj.longitude);
obj.time = new Date().getTime();
return obj;
});
};
... ..... ..... ..... ..... ... ..... ..... ..... ..... ..... ..... ..... ...
And call it in the Promises style:
%%node
getPosition().then(function(data) {
console.log(data);
}).catch(function(err) {
console.error(err);
});
... ... ... ... { longitude: 44.0843, latitude: 51.2787, time: 1531779072259 }
Or call it in a more compact form:
%%node
getPosition().then(console.log).catch(console.error);
{ longitude: 44.6288, latitude: 51.3208, time: 1531779077984 }
In the next part of this notebook we'll illustrate how you can access local and remote data sources from within the notebook.
You can access any data source using your favorite public or home-grown packages. In the second part of this notebook you'll learn how to retrieve data from an Apache CouchDB (or Cloudant) database and visualize it using PixieDust or third-party libraries.
To access data stored in an Apache CouchDB or Cloudant database, we can use the cloudant-quickstart
npm module:
npm.install('cloudant-quickstart')
/opt/conda/envs/DSX-Python27/bin/npm install -s cloudant-quickstart + cloudant-quickstart@1.25.5 updated 1 package in 0.983s
With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:
%%node
// connect to Cloudant using cloudant-quickstart
const cqs = require('cloudant-quickstart');
const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');
For this code pattern example a remote database has been pre-configured to accept anonymous connection requests. If you wish to explore the
cloudant-quickstart
library beyond what is covered in this nodebook, we recommend you create your own replica and replace above URL with your own, e.g.https://myid:mypassword@mycloudanthost/mydatabase
.
Now we have an object named cities
that we can use to access the database.
We can retrieve all documents using all
.
%%node
// If no limit is specified, 100 documents will be returned
cities.all({limit:3}).then(console.log).catch(console.error)
[ { _id: '1000501', name: 'Grahamstown', latitude: -33.30422, longitude: 26.53276, country: 'ZA', population: 91548, timezone: 'Africa/Johannesburg' }, { _id: '1000543', name: 'Graaff-Reinet', latitude: -32.25215, longitude: 24.53075, country: 'ZA', population: 62896, timezone: 'Africa/Johannesburg' }, { _id: '100077', name: 'Abū Ghurayb', latitude: 33.30563, longitude: 44.18477, country: 'IQ', population: 900000, timezone: 'Asia/Baghdad' } ]
Specifying the optional limit
and skip
parameters we can paginate through the document list:
cities.all({limit:10}).then(console.log).catch(console.error)
cities.all({skip:10, limit:10}).then(console.log).catch(console.error)
If we know the IDs of documents, we can retrieve them singly:
%%node
cities.get('2636749').then(console.log).catch(console.error);
{ _id: '2636749', name: 'Stowmarket', latitude: 52.18893, longitude: 0.99774, country: 'GB', population: 15394, timezone: 'Europe/London' }
Or in bulk:
%%node
cities.get(['5913490', '4140963','3520274']).then(console.log).catch(console.error);
[ { _id: '5913490', name: 'Calgary', latitude: 51.05011, longitude: -114.08529, country: 'CA', population: 1019942, timezone: 'America/Edmonton' }, { _id: '4140963', name: 'Washington, D.C.', latitude: 38.89511, longitude: -77.03637, country: 'US', population: 601723, timezone: 'America/New_York' }, { _id: '3520274', name: 'Río Blanco', latitude: 18.83036, longitude: -97.156, country: 'MX', population: 39543, timezone: 'America/Mexico_City' } ]
Instead of just calling print
to output the JSON, we can bring PixieDust's display
function to bear by passing it an array of data to visualize. Using mapbox as renderer and satelite as basemap, we can display the location and population of the selected cities:
%%node
cities.get(['5913490', '4140963','3520274']).then(display).catch(console.error);
We can also query a subset of the data using the query
function, passing it a Cloudant Query statement. Using mapbox as renderer, the customizable output looks as follows:
%%node
// fetch cities in UK above latitude 54 degrees north
cities.query({country:'GB', latitude: { "$gt": 54}}).then(display).catch(console.error);
The cloudant-quickstart
library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.
Let’s calculate the sum of the population field:
%%node
cities.sum('population').then(console.log).catch(console.error);
2694222973
Or compute the sum of the population
, grouped by the country
field, displaying 10 countries with the largest population:
%%node
// helper function
function top10(data) {
// convert input data structure to array
var pop_array = [];
Object.keys(data).forEach(function(n,k) {
pop_array.push({name: n, population: data[n]});
});
// sort array by population in descending order
pop_array.sort(function(a,b) {
return b.population - a.population;
});
// display top 10 entries
pop_array.slice(0,10).forEach(function(e) {
console.log(e.name + ' ' + e.population.toLocaleString());
});
}
// fetch aggregated data and invoke helper routine
cities.sum('population','country').then(top10).catch(console.error);
... ... ... ..... ..... ... ... ..... ..... ... ... ..... ..... ... CN 389,487,480 IN 269,553,896 US 190,515,768 BR 125,426,547 RU 108,885,695 JP 99,000,238 MX 80,474,387 ID 63,161,801 DE 58,884,999 TR 55,733,719
The cloudant-quickstart
package is just one of several Node.js libraries that you can use to access Apache CouchDB or Cloudant. Follow this link to learn more about your options.
npm.install('quiche');
/opt/conda/envs/DSX-Python27/bin/npm install -s quiche + quiche@0.3.0 updated 1 package in 0.957s
%%node
var Quiche = require('quiche');
var pie = new Quiche('pie');
// fetch cities in UK
cities.query({name: 'Cambridge'}).then(function(data) {
var colors = ['ff00ff','0055ff', 'ff0000', 'ffff00', '00ff00','0000ff'];
for(i in data) {
var city = data[i];
pie.addData(city.population, city.name + '(' + city.country +')', colors[i]);
}
var imageUrl = pie.getUrl(true);
image(imageUrl);
});
... ... ... ... ... ... ... ... ...
You can share variables between Python and Node.js cells. Why woud you want to do that? Read on.
The Node.js library ecosystem is extensive. Perhaps you need to fetch data from a database and prefer the syntax of a particular Node.js npm module. You can use Node.js to fetch the data, move it to the Python environment, and convert it into a Pandas or Spark DataFrame for aggregation, analysis and visualisation.
PixieDust and pixiedust_node give you the flexibility to mix and match Python and Node.js code to suit the workflow you are building and the skill sets you have in your team.
Mixing Node.js and Python code in the same notebook is a great way to integrate the work of your software development and data science teams to produce a collaborative report or dashboard.
Define variables in a Python cell.
# define a couple variables in Python
a = 'Hello from Python!'
b = 2
c = False
d = {'x':1, 'y':2}
e = 3.142
f = [{'a':1}, {'a':2}, {'a':3}]
Access or modify their values in Node.js cells.
%%node
// print variable values
console.log(a, b, c, d, e, f);
// change variable value
a = 'Hello from Node.js!';
// define a new variable
var g = 'Yes, it works both ways.';
Hello from Python! 2 false { y: 2, x: 1 } 3.142 [ { a: 1 }, { a: 2 }, { a: 3 } ]
Inspect the manipulated data.
# display modified variable and the new variable
print('{} {}'.format(a,g))
Hello from Node.js! Yes, it works both ways.
Note: PixieDust natively supports data sharing between Python and Scala, extending the loop for some data types:
%%scala
println(a,b,c,d,e,f,g)
(Hello from Node.js!,2,null,null,null,null,Yes, it works both ways.)
If you wish transfer data from Node.js to Python from an asynchronous callback, make sure you write the data to a global variable.
Load a csv file from a GitHub repository.
%%node
// global variable
var sample_csv_data = '';
// load csv file from GitHub and store data in the global variable
request.get('https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv').then(function(data) {
sample_csv_data = data;
console.log('Fetched sample data from GitHub.');
});
... ... ... Fetched sample data from GitHub.
Create a Pandas DataFrame from the downloaded data.
import pandas as pd
import io
# create DataFrame from shared csv data
pandas_df = pd.read_csv(io.StringIO(sample_csv_data))
# display first five rows
pandas_df.head(5)
mpg | cylinders | engine | horsepower | weight | acceleration | year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130 | 3504 | 12.0 | 70 | American | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | American | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150 | 3436 | 11.0 | 70 | American | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150 | 3433 | 12.0 | 70 | American | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140 | 3449 | 10.5 | 70 | American | ford torino |
Note: Above example is for illustrative purposes only. A much easier solution is to use PixieDust's sampleData method if you want to create a DataFrame from a URL.