In [1]:

import ipyparallel as ipp
rc = ipp.Client()
everyone = rc[:]
everyone

Out[1]:

<DirectView [0, 1, 2, 3,...]>

Step 1: Initialize the engine state. In this case, that's importing pandas and loading the sessions.csv into a data frame.

In [4]:

%%px
import pandas as pd

# users = pd.read_csv('users.csv')
# Each row is the details of one user action. 
# There is several rows with the same user ID
sessions = pd.read_csv('sessions.csv')

Define our task function, to be called on the engines. This will resolve sessions in the engine namespaces, avoiding passing the data frame as an argument on every call.

In [15]:

def process(user):
    # Locate all the user sessions in the *global* sessions dataframe
    user_session = sessions.loc[sessions['user_id'] == user]
    user_session_data = pd.Series()

    # Make calculations and append to user_session_data

    return user_session_data

Locally, get the unique session IDs:

In [16]:

import pandas as pd
sessions = pd.read_csv('sessions.csv')
user_ids = sessions['user_id'].unique()

Next, make a load-balanced view and map tasks on unique user IDs

In [17]:

view = rc.load_balanced_view()

In [18]:

ar = view.map(process, user_ids)
for user_id, result in zip(user_ids, ar):
    print(user_id, result)

qtw88d9pbl Series([], dtype: float64)
ucgks2fyez Series([], dtype: float64)
xwxei6hdk4 Series([], dtype: float64)
d1mm9tcy42 Series([], dtype: float64)