import ipyparallel as ipp
rc = ipp.Client()
everyone = rc[:]
everyone
<DirectView [0, 1, 2, 3,...]>
Step 1: Initialize the engine state. In this case, that's importing pandas and loading the sessions.csv into a data frame.
%%px
import pandas as pd
# users = pd.read_csv('users.csv')
# Each row is the details of one user action.
# There is several rows with the same user ID
sessions = pd.read_csv('sessions.csv')
Define our task function, to be called on the engines.
This will resolve sessions
in the engine namespaces,
avoiding passing the data frame as an argument on every call.
def process(user):
# Locate all the user sessions in the *global* sessions dataframe
user_session = sessions.loc[sessions['user_id'] == user]
user_session_data = pd.Series()
# Make calculations and append to user_session_data
return user_session_data
Locally, get the unique session IDs:
import pandas as pd
sessions = pd.read_csv('sessions.csv')
user_ids = sessions['user_id'].unique()
Next, make a load-balanced view and map tasks on unique user IDs
view = rc.load_balanced_view()
ar = view.map(process, user_ids)
for user_id, result in zip(user_ids, ar):
print(user_id, result)
qtw88d9pbl Series([], dtype: float64) ucgks2fyez Series([], dtype: float64) xwxei6hdk4 Series([], dtype: float64) d1mm9tcy42 Series([], dtype: float64)