#!/usr/bin/env python
# coding: utf-8

# In this post, we will visualize the Paris Vélib bicycle stations using pandas and then, to do interactive exploration, `bokeh`. The goal is to get familiar with the plotting syntax of `bokeh`, which is quite different from `matplotlib`, the classic plotting package in the Python scientific stack.

# # Fetching the data 

# JC Decaux, the company responsible for the Paris shared biking system Vélib, has an open-data service available here: [https://developer.jcdecaux.com/#/opendata/vls?page=static](https://developer.jcdecaux.com/#/opendata/vls?page=static). We can use it to fetch the static data describing the different stations.

# In[1]:


import pandas as pd


# In[2]:


df = pd.read_json("https://developer.jcdecaux.com/rest/vls/stations/Paris.json")


# Let's look at the head of the data:

# In[3]:


df.head()


# Now, let's see what we can do with it!

# # Examining the data 

# A first question that can be asked is "how many stations are there in each city / neighbourhood?". It turns out that we can extract a 5 digit postcode from each address field quite easily using regular expressions. This is because the `pandas.str.findall` function [accepts regular expressions as arguments](http://pandas.pydata.org/pandas-docs/stable/text.html).

# In[4]:


df['postcode'] = [item[0] for item in df.address.str.findall("\d\d\d\d\d")]


# In[5]:


df.head()


# This allows us to easily count the number of stations in given locations:

# In[6]:


get_ipython().run_line_magic('matplotlib', 'inline')
import matplotlib.pyplot as plt
plt.style.use('bmh')


# In[7]:


plt.figure(figsize=(10, 6))
df.groupby(by='postcode').size().plot(kind='bar')
plt.tight_layout()


# This allows us to determine that there are the most stations in the 15th arrondissement of Paris.

# We can also decide to plot each station as a dot on a map. Let's try that:

# In[8]:


fig, ax = plt.subplots(figsize=(10, 8))
df.plot(ax=ax, kind='scatter', x='longitude', y='latitude')
plt.tight_layout()


# We can faintly distinguish the Seine River contour, were there are no Vélib stations.

# Finally, a last visualization could be to compute the mean coordinates of stations for each postcode and plot them on a map:

# In[9]:


mean_stations = df.groupby('postcode').mean()
mean_stations.head()


# In[10]:


mean_stations.describe()


# In[11]:


mean_stations['station_count'] = df.groupby(by='postcode').size()


# We can also label the points [as in this SO thread](http://stackoverflow.com/questions/15910019/annotate-data-points-while-plotting-from-pandas-dataframe/15911372#15911372).

# In[12]:


def label_point(x, y, val, ax):
    a = pd.DataFrame({'x': x, 'y': y, 'val': val})
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))


# In[13]:


fig, ax = plt.subplots(figsize=(10, 8))
mean_stations.plot(ax=ax, kind='scatter', x='longitude', y='latitude', s=mean_stations['station_count'], color='red')
label_point(mean_stations.longitude.values, mean_stations.latitude.values, mean_stations.index, ax)
plt.tight_layout()


# In[14]:


mean_stations.latitude.values


# Finally, we can put everything together: stations and mean locations of stations.

# In[15]:


s = df.groupby(by='postcode').size()
cmap = list(s.index.values)


# In[16]:


fig, ax = plt.subplots(figsize=(10, 8))
df.plot(ax=ax, kind='scatter', x='longitude', y='latitude', 
        c=[cmap.index(item) + 1 for item in df.postcode.values],
       colormap='cubehelix', label='index of location')
mean_stations.plot(ax=ax, kind='scatter', x='longitude', y='latitude', s=100, color='red')
label_point(mean_stations.longitude.values, mean_stations.latitude.values, mean_stations.index, ax)
plt.tight_layout()


# # Using Bokeh 

# The maps I plotted in the previous section were static. This is a limiting factor when exploring a dataset. To really come to grips with the data, it is often useful to make it interactive, which is what we will do using `bokeh`. We will follow the [quickstart guide to Bokeh](http://bokeh.pydata.org/en/latest/docs/user_guide/quickstart.html#userguide-quickstart) and try to obtain the same plots as above using this framework.
# 
# To get a feeling for how `bokeh` works, we will first use the high level `bokeh.charts` interface and then the medium and low-level `bokeh.plotting` and `bokeh.models`.

# ## High level version 

# First, we import the different elements we need for bokeh.

# In[17]:


import bokeh.plotting as bp


# Let's tell bokeh to show things in the notebook:

# In[18]:


bp.output_notebook()


# Now, let's use the high level function found the charts module:

# In[19]:


import bokeh.charts


# In[20]:


p = bokeh.charts.Scatter(df, x='longitude', y='latitude', color='postcode',
                        tools="crosshair, hover, wheel_zoom, pan")
bp.show(p)


# That was easy! The visualization is interesting and we didn't have much to do to obtain it.

# What if we want a hover tool displaying the address over each station? I didn't find any easy way to extend the previous chart, so let's switch to a lower level of plotting and do this in detail.

# ## Medium and low-level bokeh 

# We now need to do the following things to make our plot, from the medium or low-level perspective:
# 
# - create a figure
# - add renderers (points in our cases)
# - show the plot
# 
# Let's do a simple scatter plot to show how this goes:

# In[21]:


p = bp.figure(title="simple scatter plot")
p.scatter(x=df.longitude.values, y=df.latitude.values)
bp.show(p)


# Now, let's customize this plot a little more:
# 
# - add colors to each dot according to postcode
# - add labels showing the adress of a station using hovering

# We will start with the colors. I didn't figure out how to apply this easily with bokeh, so I had to resort to a manual generation of each color code using matplotlib classes, in particular a ScalarMappable.

# In[22]:


import matplotlib as mpl

color_index = pd.Series([cmap.index(item) for item in df.postcode.values])
norm = mpl.colors.Normalize()
norm.autoscale(color_index)
sm = mpl.cm.ScalarMappable(norm, 'hot')


# We can test the output into rgba space using `to_rgba`:

# In[23]:


sm.to_rgba(0.1, bytes=True)


# Finally, let's just generate the list of colors we need:

# In[24]:


colors = [
    "#%02x%02x%02x" % (int(r), int(g), int(b)) for r, g, b, a in [sm.to_rgba(item, bytes=True) for item in color_index]
]


# In[25]:


colors[:10]


# Let's now customize the tooltip shown while hovering. The way to do this is well described in the [Bokeh tutorial about interactions](http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/tutorial/04%20-%20interactions.ipynb):
# 
# - we need to build a datasource containing a description field
# - and a hover tool, based on this description field from the data source
# 

# In[26]:


import bokeh.models as bm

source = bm.ColumnDataSource(
        data=dict(
            x=df.longitude.values,
            y=df.latitude.values,
            c=colors,
            desc=df.address.values,
        )
    )

hover = bm.HoverTool(
        tooltips=[
            ("address", "@desc"),
        ]
    )


pan = bm.PanTool()
zoom = bm.WheelZoomTool()


# Finally, here's the scatter plot, in low-level plotting language, with hovering tooltips! 

# In[27]:


p = bp.figure(title="Vélib stations in Paris", 
              tools=[hover, pan, zoom])
p.circle(x='x', y='y', fill_color='c', size=10, source=source)

bp.show(p)


# I've just found out that it is possible to [plot markers on top of a Google Map using `bokeh`](http://bokeh.pydata.org/en/latest/docs/user_guide/geo.html). Let's try and do this:

# In[28]:


geo_source = bm.GeoJSONDataSource(
        data=dict(
            x=df.longitude.values,
            y=df.latitude.values,
            c=colors,
            desc=df.address.values,
        )
    )

hover = bm.HoverTool(
        tooltips=[
            ("address", "@desc"),
        ]
    )


pan = bm.PanTool()
zoom = bm.WheelZoomTool()

p = bp.figure(title="Vélib stations in Paris", 
              tools=[hover, pan, zoom])
p.circle(x='x', y='y', fill_color='c', size=10, source=geo_source)

bp.show(p)


# Unfortunately, this doesn't work, yet. There are several bug reports describing this behaviour (one of them is here: [https://github.com/bokeh/bokeh/issues/3737](https://github.com/bokeh/bokeh/issues/3737)). Hopefully, this will get fixed soon!

# # Using Folium 

# A last thing I wanted to try was to use Folium for displaying interactive maps. It seems very simple to use to get markers on a map using an OpenStreetMap tiling.

# In[29]:


import folium


# In[30]:


map_osm = folium.Map(location=[48.86, 2.35])
for lng, lat, desc in zip(df.longitude.values,
                         df.latitude.values,
                         df.address.values):
    map_osm.circle_marker([lat, lng], radius=100, popup=desc)
map_osm


# That's it for today! I hope you had fun!

# This post was entirely written using the IPython notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at [20160205_VisualizingVelibStations.ipynb](http://nbviewer.ipython.org/urls/raw.github.com/flothesof/posts/master/20160205_VisualizingVelibStations.ipynb).