# pipegraph User Guide¶

## Rationale¶

scikit-learn provides a useful set of data preprocessors and machine learning models. The Pipeline object can effectively encapsulate a chain of transformers followed by final model. Other functions, like GridSearchCV can effectively use Pipeline objects to find the set of parameters that provide the best estimator.

### Pipeline + GridSearchCV: an awesome combination¶

Let's consider a simple example to illustrate the advantages of using Pipeline and GridSearchCV.

First let's import the libraries we will use and then let's build some artificial data set following a simple polynomial rule

In [48]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt

X = 2*np.random.rand(100,1)-1
y = 40 * X**5 + 3*X*2 +  3*X + 3*np.random.randn(100,1)


Once we have some data ready, we instantiate the transformers and a regressor we want to fit:

In [49]:
scaler = MinMaxScaler()
polynomial_features = PolynomialFeatures()
linear_model = LinearRegression()


We define the steps that form the Pipeline object and then we instantiate such a Pipeline

In [50]:
steps = [('scaler', scaler),
('polynomial_features', polynomial_features),
('linear_model', linear_model)]

pipe = Pipeline(steps=steps)


Now we can pass this pipeline to GridSearchCV. When the GridSearchCV object is fitted, the search for the best combination for hyperparameters is performed according to the values provided in the param_grid parameter:

In [51]:
param_grid = {'polynomial_features__degree': range(1, 11),
'linear_model__fit_intercept': [True, False]}

grid_search_regressor = GridSearchCV(estimator=pipe, param_grid=param_grid, refit=True)
grid_search_regressor.fit(X, y);


And now we can check the results of fitting the Pipeline and the values of the hyperparameters:

In [52]:
y_pred = grid_search_regressor.predict(X)
plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()

In [53]:
coef = grid_search_regressor.best_estimator_.get_params()['linear_model'].coef_
degree = grid_search_regressor.best_estimator_.get_params()['polynomial_features'].degree

print('Information about the parameters of the best estimator: \n degree: {} \n coefficients: {} '.format(degree, coef))

Information about the parameters of the best estimator:
degree: 6
coefficients: [[  -49.54923935   402.82893512 -1420.10739142  2372.8119209  -1442.05060911
-411.51219647   589.22468528]]


### Pipeline weaknesses:¶

From this example we can learn that Pipeline and GridSearchCV are very useful tools to consider when attempting to fit models. As far as the needs of the user can be satisfied by a set of transformers followed by a final model, this approach seems to be highly convenient. Additional advantages of such approach are the parallel computation and memoization capabilities of GridSearchCV.

Unfortunately though, current implementation of scikit-learn's Pipeline:

• Does not allow postprocessors after the final model
• Does not allow extracting information about intermediate results
• The X is transformed on every transformer but the following step can not have access to X variable values beyond the previous step
• Only allows single path workflows

### pipegraph goals:¶

pipegraph was programmed in order to allow researchers and practitioners to:

• Use multiple path workflows
• Have access to every variable value produced by any step of the workflow
• Use an arbitraty number of models and transformers in the way the user prefers
• Express the model as a graph consisting of transformers, regressors, classifiers or custom blocks
• Build new custom block in an easy way
• Provide the community some adapters to scikit-learn's objects that may help further developments

## Adapting scikit-learn object to provide a common interface¶

pipergraph expresses models as graph system composed by nodes holding steps. In each of these steps, the user might want to introduce a scikit-learn object or customized processes. But in doing so, a few difficulties arise as we will explain in the following.

### An interface problem¶

Consider the following Scikit-Learn common objects:

In [54]:
import sklearn
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import DBSCAN

classifier = GaussianNB()
scaler = MinMaxScaler()
dbscanner = DBSCAN()


And let's load some data to run the examples. For brevity we will apply the three objects to the same data.

In [55]:
from sklearn.datasets import load_iris
X, y = iris.data, iris.target


Now, let's fit each of the above defined scikit-learn objects and get the output produced afterwards by using the corresponding method (predict, fit_predict, transform):

In [56]:
classifier.fit(X, y)
scaler.fit(X);
dbscanner.fit(X, y)

Out[56]:
DBSCAN(algorithm='auto', eps=0.5, leaf_size=30, metric='euclidean',
metric_params=None, min_samples=5, n_jobs=1, p=None)
In [57]:
classifier.predict(X)

Out[57]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
In [58]:
classifier.predict_proba(X)

Out[58]:
array([[  1.00000000e+000,   1.38496103e-018,   7.25489025e-026],
[  1.00000000e+000,   1.48206242e-017,   2.29743996e-025],
[  1.00000000e+000,   1.07780639e-018,   2.35065917e-026],
[  1.00000000e+000,   1.43871443e-017,   2.89954283e-025],
[  1.00000000e+000,   4.65192224e-019,   2.95961100e-026],
[  1.00000000e+000,   1.52598944e-014,   1.79883402e-021],
[  1.00000000e+000,   1.13555084e-017,   2.79240943e-025],
[  1.00000000e+000,   6.57615274e-018,   2.79021029e-025],
[  1.00000000e+000,   9.12219356e-018,   1.16607332e-025],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   4.48944985e-018,   5.19388089e-025],
[  1.00000000e+000,   1.65734172e-017,   7.24605453e-025],
[  1.00000000e+000,   1.19023891e-018,   3.06690017e-026],
[  1.00000000e+000,   7.39520546e-020,   1.77972179e-027],
[  1.00000000e+000,   2.58242749e-019,   8.73399972e-026],
[  1.00000000e+000,   3.17746623e-017,   1.73684833e-023],
[  1.00000000e+000,   5.70113578e-017,   4.84010372e-024],
[  1.00000000e+000,   2.42054769e-017,   8.45556661e-025],
[  1.00000000e+000,   6.27645419e-015,   1.06276762e-021],
[  1.00000000e+000,   8.94493797e-018,   7.10691894e-025],
[  1.00000000e+000,   1.12843548e-015,   7.60807373e-023],
[  1.00000000e+000,   6.39726172e-016,   2.98066089e-023],
[  1.00000000e+000,   2.01227309e-020,   1.00676223e-027],
[  1.00000000e+000,   1.88370574e-011,   3.47694606e-019],
[  1.00000000e+000,   9.85315738e-015,   6.06138600e-022],
[  1.00000000e+000,   3.37823264e-016,   6.39532840e-024],
[  1.00000000e+000,   1.76045187e-014,   4.11462407e-022],
[  1.00000000e+000,   7.35980232e-018,   4.42389485e-025],
[  1.00000000e+000,   4.16674318e-018,   1.83083484e-025],
[  1.00000000e+000,   4.59768498e-017,   1.25839903e-024],
[  1.00000000e+000,   1.05032415e-016,   2.32677467e-024],
[  1.00000000e+000,   2.19590125e-014,   6.17650711e-022],
[  1.00000000e+000,   6.53087316e-021,   3.11887725e-027],
[  1.00000000e+000,   3.19701924e-020,   1.42881733e-026],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   1.31355747e-018,   2.91614269e-026],
[  1.00000000e+000,   3.69675482e-018,   2.51866027e-025],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   2.08944813e-018,   3.09410939e-026],
[  1.00000000e+000,   9.57268514e-018,   4.26475768e-025],
[  1.00000000e+000,   6.37746927e-018,   1.99216264e-025],
[  1.00000000e+000,   7.48755609e-016,   1.85220582e-024],
[  1.00000000e+000,   6.74316102e-019,   1.54533175e-026],
[  1.00000000e+000,   6.24456357e-011,   1.54295833e-018],
[  1.00000000e+000,   8.14548341e-013,   7.52199540e-020],
[  1.00000000e+000,   1.94244394e-016,   1.96296487e-024],
[  1.00000000e+000,   2.39642309e-018,   3.11909164e-025],
[  1.00000000e+000,   2.30047669e-018,   5.36192288e-026],
[  1.00000000e+000,   2.70414239e-018,   2.86492790e-025],
[  1.00000000e+000,   3.60099614e-018,   1.12304319e-025],
[  1.87127931e-108,   8.04037666e-001,   1.95962334e-001],
[  6.18854779e-101,   9.45169639e-001,   5.48303606e-002],
[  1.52821825e-122,   4.56151317e-001,   5.43848683e-001],
[  2.14997261e-070,   9.99968751e-001,   3.12488556e-005],
[  9.04938222e-107,   9.52441811e-001,   4.75581888e-002],
[  1.29272979e-090,   9.99119627e-001,   8.80372565e-004],
[  2.72490532e-114,   6.58952285e-001,   3.41047715e-001],
[  1.19734767e-034,   9.99999767e-001,   2.33206910e-007],
[  3.02545627e-098,   9.90316309e-001,   9.68369084e-003],
[  1.29477666e-069,   9.99909746e-001,   9.02536083e-005],
[  2.68173680e-041,   9.99999765e-001,   2.35068227e-007],
[  7.51115851e-087,   9.96238286e-001,   3.76171369e-003],
[  6.40165546e-061,   9.99993984e-001,   6.01632484e-006],
[  4.81814146e-105,   9.86090825e-001,   1.39091754e-002],
[  1.72509107e-055,   9.99975387e-001,   2.46126406e-005],
[  1.18941242e-093,   9.80037003e-001,   1.99629974e-002],
[  1.18009940e-098,   9.90687273e-001,   9.31272734e-003],
[  2.31534504e-063,   9.99983663e-001,   1.63372809e-005],
[  5.48394976e-102,   9.94697108e-001,   5.30289217e-003],
[  5.51699136e-059,   9.99993006e-001,   6.99364316e-006],
[  7.43572418e-129,   1.54494085e-001,   8.45505915e-001],
[  2.12417952e-071,   9.99807026e-001,   1.92973847e-004],
[  1.06622383e-120,   9.27077052e-001,   7.29229479e-002],
[  4.79428037e-097,   9.98156519e-001,   1.84348055e-003],
[  2.71707817e-084,   9.98460816e-001,   1.53918416e-003],
[  2.03176962e-093,   9.87471082e-001,   1.25289184e-002],
[  4.95012220e-113,   9.12844444e-001,   8.71555561e-002],
[  2.12531216e-137,   7.52691316e-002,   9.24730868e-001],
[  4.19702663e-100,   9.86480268e-001,   1.35197316e-002],
[  4.63173354e-042,   9.99998762e-001,   1.23794211e-006],
[  2.77274013e-055,   9.99996447e-001,   3.55251831e-006],
[  2.14091116e-048,   9.99998651e-001,   1.34923924e-006],
[  6.63563094e-063,   9.99972348e-001,   2.76523927e-005],
[  2.61124821e-134,   6.12159845e-001,   3.87840155e-001],
[  3.71647418e-098,   9.92476638e-001,   7.52336224e-003],
[  1.13230275e-103,   8.76107551e-001,   1.23892449e-001],
[  1.05786721e-111,   7.99294752e-001,   2.00705248e-001],
[  3.76539608e-089,   9.99385417e-001,   6.14582528e-004],
[  3.07894878e-073,   9.99796270e-001,   2.03730114e-004],
[  4.17712661e-070,   9.99955234e-001,   4.47664632e-005],
[  3.92710689e-082,   9.99873680e-001,   1.26320322e-004],
[  3.30872742e-100,   9.89371467e-001,   1.06285328e-002],
[  8.31545615e-067,   9.99966229e-001,   3.37713204e-005],
[  6.26912483e-035,   9.99999798e-001,   2.02487922e-007],
[  7.66367658e-078,   9.99832329e-001,   1.67671378e-004],
[  1.58557717e-073,   9.99849875e-001,   1.50125137e-004],
[  1.02662082e-077,   9.99714947e-001,   2.85053350e-004],
[  1.72307593e-083,   9.98992363e-001,   1.00763708e-003],
[  4.12872931e-030,   9.99999769e-001,   2.31316897e-007],
[  5.99667528e-074,   9.99847160e-001,   1.52839987e-004],
[  4.13779546e-251,   6.35381030e-011,   1.00000000e+000],
[  5.00845630e-151,   2.50121636e-002,   9.74987836e-001],
[  1.04941686e-218,   1.67915381e-007,   9.99999832e-001],
[  2.13833836e-175,   1.99462374e-003,   9.98005376e-001],
[  7.20399720e-216,   2.30543407e-007,   9.99999769e-001],
[  4.51654712e-271,   2.40976994e-010,   1.00000000e+000],
[  4.59552511e-108,   9.73514345e-001,   2.64856553e-002],
[  2.22191497e-227,   1.34018147e-006,   9.99998660e-001],
[  2.10589122e-190,   4.92901785e-004,   9.99507098e-001],
[  1.20055778e-262,   1.40568402e-012,   1.00000000e+000],
[  3.62359789e-160,   4.12884115e-004,   9.99587116e-001],
[  8.83719953e-165,   2.77742178e-003,   9.97222578e-001],
[  6.87376950e-192,   4.80711862e-006,   9.99995193e-001],
[  4.08220498e-152,   1.28807070e-002,   9.87119293e-001],
[  2.75153031e-187,   1.04253685e-006,   9.99998957e-001],
[  1.44750671e-192,   4.50951786e-007,   9.99999549e-001],
[  2.76680341e-170,   1.87196580e-003,   9.98128034e-001],
[  3.75302289e-285,   1.64574932e-012,   1.00000000e+000],
[  4.69548986e-310,   6.47406861e-013,   1.00000000e+000],
[  5.69697725e-125,   9.58135362e-001,   4.18646381e-002],
[  2.94299535e-219,   1.17116897e-008,   9.99999988e-001],
[  2.82525894e-146,   1.37625971e-002,   9.86237403e-001],
[  1.12237933e-272,   7.58240410e-010,   9.99999999e-001],
[  2.28867567e-136,   1.29986728e-001,   8.70013272e-001],
[  5.61795825e-203,   9.71777952e-007,   9.99999028e-001],
[  8.72622664e-206,   7.39901993e-006,   9.99992601e-001],
[  9.96933448e-131,   1.99928220e-001,   8.00071780e-001],
[  4.66749613e-135,   1.07483532e-001,   8.92516468e-001],
[  6.88743059e-196,   1.09467814e-005,   9.99989053e-001],
[  1.61337601e-181,   7.01805717e-004,   9.99298194e-001],
[  8.55580252e-221,   9.06238440e-007,   9.99999094e-001],
[  1.24722670e-250,   2.96730515e-010,   1.00000000e+000],
[  3.64874362e-203,   1.50788229e-006,   9.99998492e-001],
[  2.19798649e-130,   7.12645144e-001,   2.87354856e-001],
[  3.68949024e-153,   4.86199285e-001,   5.13800715e-001],
[  2.13595212e-251,   8.98578645e-011,   1.00000000e+000],
[  2.75337356e-217,   6.58848819e-009,   9.99999993e-001],
[  1.30868299e-169,   1.90227600e-003,   9.98097724e-001],
[  1.38946382e-129,   1.93183856e-001,   8.06816144e-001],
[  1.71830037e-186,   5.23126458e-006,   9.99994769e-001],
[  5.79667973e-220,   5.01446575e-009,   9.99999995e-001],
[  1.61093140e-184,   4.67798053e-007,   9.99999532e-001],
[  5.00845630e-151,   2.50121636e-002,   9.74987836e-001],
[  2.54029381e-231,   4.42556022e-009,   9.99999996e-001],
[  4.84219075e-234,   1.64602693e-010,   1.00000000e+000],
[  6.47732320e-189,   5.85507961e-007,   9.99999414e-001],
[  5.17352411e-148,   2.54457623e-002,   9.74554238e-001],
[  5.93498263e-166,   3.70166861e-004,   9.99629833e-001],
[  5.58649523e-197,   2.46020434e-007,   9.99999754e-001],
[  9.13863414e-145,   5.60050091e-002,   9.43994991e-001]])
In [59]:
classifier.predict_log_proba(X)

Out[59]:
array([[  0.00000000e+00,  -4.11208597e+01,  -5.78855367e+01],
[  0.00000000e+00,  -3.87505119e+01,  -5.67328319e+01],
[  0.00000000e+00,  -4.13716038e+01,  -5.90125166e+01],
[  0.00000000e+00,  -3.87801966e+01,  -5.65000742e+01],
[  0.00000000e+00,  -4.22118362e+01,  -5.87821546e+01],
[ -1.50990331e-14,  -3.18135483e+01,  -4.77671483e+01],
[  0.00000000e+00,  -3.90168287e+01,  -5.65377225e+01],
[  0.00000000e+00,  -3.95630818e+01,  -5.65385104e+01],
[  0.00000000e+00,  -3.92358214e+01,  -5.74109854e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -3.99448015e+01,  -5.59171461e+01],
[  0.00000000e+00,  -3.86387316e+01,  -5.55841702e+01],
[  0.00000000e+00,  -4.12723776e+01,  -5.87465451e+01],
[  0.00000000e+00,  -4.40508700e+01,  -6.15933405e+01],
[  0.00000000e+00,  -4.28003869e+01,  -5.76999890e+01],
[  0.00000000e+00,  -3.79878625e+01,  -5.24073850e+01],
[  0.00000000e+00,  -3.74032812e+01,  -5.36851061e+01],
[  0.00000000e+00,  -3.82599527e+01,  -5.54298023e+01],
[ -6.21724894e-15,  -3.27019712e+01,  -4.82934105e+01],
[  0.00000000e+00,  -3.92554439e+01,  -5.56035585e+01],
[ -1.11022302e-15,  -3.44179443e+01,  -5.09302471e+01],
[ -6.66133815e-16,  -3.49854914e+01,  -5.18673121e+01],
[  0.00000000e+00,  -4.53524369e+01,  -6.21630580e+01],
[ -1.88369320e-11,  -2.46951950e+01,  -4.25029624e+01],
[ -9.76996262e-15,  -3.22509844e+01,  -4.88549336e+01],
[ -4.44089210e-16,  -3.56240088e+01,  -5.34064744e+01],
[ -1.75415238e-14,  -3.16706208e+01,  -4.92423246e+01],
[  0.00000000e+00,  -3.94504986e+01,  -5.60776068e+01],
[  0.00000000e+00,  -4.00193970e+01,  -5.69598553e+01],
[  0.00000000e+00,  -3.76183937e+01,  -5.50322019e+01],
[  0.00000000e+00,  -3.67922627e+01,  -5.44175592e+01],
[ -2.19824159e-14,  -3.14495987e+01,  -4.88361191e+01],
[  0.00000000e+00,  -4.64777463e+01,  -6.10323244e+01],
[  0.00000000e+00,  -4.48894830e+01,  -5.95103654e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -4.11737926e+01,  -5.87969507e+01],
[  0.00000000e+00,  -4.01390763e+01,  -5.66409002e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -4.07096317e+01,  -5.87377123e+01],
[  0.00000000e+00,  -3.91876179e+01,  -5.61142420e+01],
[  0.00000000e+00,  -3.95937603e+01,  -5.68754065e+01],
[ -8.88178420e-16,  -3.48281190e+01,  -5.46456650e+01],
[  0.00000000e+00,  -4.18405880e+01,  -5.94319738e+01],
[ -6.24451602e-11,  -2.34967248e+01,  -4.10128301e+01],
[ -8.14459611e-13,  -2.78361426e+01,  -4.40338704e+01],
[ -2.22044605e-16,  -3.61774145e+01,  -5.45875862e+01],
[  0.00000000e+00,  -4.05725544e+01,  -5.64270855e+01],
[  0.00000000e+00,  -4.06134153e+01,  -5.81878898e+01],
[  0.00000000e+00,  -4.04517469e+01,  -5.65120841e+01],
[  0.00000000e+00,  -4.01653212e+01,  -5.74485852e+01],
[ -2.48052568e+02,  -2.18109163e-01,  -1.62983281e+00],
[ -2.30738394e+02,  -5.63908550e-02,  -2.90351121e+00],
[ -2.80491279e+02,  -7.84930690e-01,  -6.09084226e-01],
[ -1.60415501e+02,  -3.12493439e-05,  -1.03735278e+01],
[ -2.44173908e+02,  -4.87262645e-02,  -3.04580129e+00],
[ -2.06975902e+02,  -8.80760321e-04,  -7.03516537e+00],
[ -2.61492267e+02,  -4.17104153e-01,  -1.07573288e+00],
[ -7.81077843e+01,  -2.33206936e-07,  -1.52713398e+01],
[ -2.24546277e+02,  -9.73088268e-03,  -4.63731216e+00],
[ -1.58620033e+02,  -9.02576814e-05,  -9.31288698e+00],
[ -9.34195242e+01,  -2.35068255e-07,  -1.52633900e+01],
[ -1.98308513e+02,  -3.76880673e-03,  -5.58288066e+00],
[ -1.38601134e+02,  -6.01634294e-06,  -1.20210340e+01],
[ -2.40199046e+02,  -1.40068145e-02,  -4.27520655e+00],
[ -1.26096900e+02,  -2.46129435e-05,  -1.06122504e+01],
[ -2.13966954e+02,  -2.01649503e-02,  -3.91387485e+00],
[ -2.25487740e+02,  -9.35636191e-03,  -4.67637328e+00],
[ -1.44223302e+02,  -1.63374144e-05,  -1.10220609e+01],
[ -2.33161854e+02,  -5.31700240e-03,  -5.23950292e+00],
[ -1.34144688e+02,  -6.99366761e-06,  -1.18705089e+01],
[ -2.95027181e+02,  -1.86759947e+00,  -1.67820115e-01],
[ -1.62730156e+02,  -1.92992469e-04,  -8.55295589e+00],
[ -2.76246088e+02,  -7.57185971e-02,  -2.61835190e+00],
[ -2.21783330e+02,  -1.84518185e-03,  -6.29609989e+00],
[ -1.92417591e+02,  -1.54036992e-03,  -6.47650277e+00],
[ -2.13431507e+02,  -1.26080671e-02,  -4.37971584e+00],
[ -2.58592703e+02,  -9.11897920e-02,  -2.44006076e+00],
[ -3.14700239e+02,  -2.58668517e+00,  -7.82525369e-02],
[ -2.28824133e+02,  -1.36119554e-02,  -4.30360506e+00],
[ -9.51756427e+01,  -1.23794287e-06,  -1.36020601e+01],
[ -1.25622344e+02,  -3.55252462e-06,  -1.25478538e+01],
[ -1.09762853e+02,  -1.34924015e-06,  -1.35159697e+01],
[ -1.43170407e+02,  -2.76527750e-05,  -1.04957983e+01],
[ -3.07586574e+02,  -4.90761846e-01,  -9.47161995e-01],
[ -2.24340564e+02,  -7.55180548e-03,  -4.88974213e+00],
[ -2.37042011e+02,  -1.32266420e-01,  -2.08834144e+00],
[ -2.55530691e+02,  -2.24025500e-01,  -1.60591787e+00],
[ -2.03604220e+02,  -6.14771461e-04,  -7.39456734e+00],
[ -1.66964124e+02,  -2.03750870e-04,  -8.49871441e+00],
[ -1.59751333e+02,  -4.47674653e-05,  -1.00140513e+01],
[ -1.87444075e+02,  -1.26328301e-04,  -8.97668964e+00],
[ -2.29061946e+02,  -1.06854191e-02,  -4.54421312e+00],
[ -1.52155085e+02,  -3.37718907e-05,  -1.02958986e+01],
[ -7.87548415e+01,  -2.02487942e-07,  -1.54125856e+01],
[ -1.77565145e+02,  -1.67685437e-04,  -8.69350458e+00],
[ -1.67627763e+02,  -1.50136407e-04,  -8.80404136e+00],
[ -1.77272780e+02,  -2.85093986e-04,  -8.16283420e+00],
[ -1.90570452e+02,  -1.00814508e-03,  -6.90014722e+00],
[ -6.76595831e+01,  -2.31316924e-07,  -1.52794772e+01],
[ -1.68600092e+02,  -1.52851668e-04,  -8.78611902e+00],
[ -5.76528695e+02,  -2.34793813e+01,  -6.35380637e-11],
[ -3.46079221e+02,  -3.68839303e+00,  -2.53302835e-02],
[ -5.01915316e+02,  -1.55998057e+01,  -1.67915395e-07],
[ -4.02192362e+02,  -6.21729985e+00,  -1.99661565e-03],
[ -4.95383744e+02,  -1.52828267e+01,  -2.30543434e-07],
[ -6.22492812e+02,  -2.21463196e+01,  -2.40977016e-10],
[ -2.47154107e+02,  -2.68427191e-02,  -3.63115200e+00],
[ -5.21888447e+02,  -1.35227055e+01,  -1.34018237e-06],
[ -4.36746429e+02,  -7.61520062e+00,  -4.93023301e-04],
[ -6.03094508e+02,  -2.72904971e+01,  -1.40598644e-12],
[ -3.67126147e+02,  -7.79234360e+00,  -4.12969376e-04],
[ -3.77747570e+02,  -5.88623220e+00,  -2.78128597e-03],
[ -4.40168625e+02,  -1.22454127e+01,  -4.80713017e-06],
[ -3.48586297e+02,  -4.35202467e+00,  -1.29643826e-02],
[ -4.29571255e+02,  -1.37738535e+01,  -1.04253739e-06],
[ -4.41726495e+02,  -1.46119054e+01,  -4.50951887e-07],
[ -3.90421773e+02,  -6.28076617e+00,  -1.87372012e-03],
[ -6.54914190e+02,  -2.71328253e+01,  -1.64490643e-12],
[ -7.12254776e+02,  -2.80658015e+01,  -6.47482068e-13],
[ -2.86083201e+02,  -4.27662147e-02,  -3.17331377e+00],
[ -5.03186707e+02,  -1.82626784e+01,  -1.17116898e-08],
[ -3.35138824e+02,  -4.28580072e+00,  -1.38581796e-02],
[ -6.26187694e+02,  -2.10000206e+01,  -7.58240581e-10],
[ -3.12323599e+02,  -2.04032292e+00,  -1.39246813e-01],
[ -4.65698806e+02,  -1.38441385e+01,  -9.71778424e-07],
[ -4.72166196e+02,  -1.18141630e+01,  -7.39904730e-06],
[ -2.99339133e+02,  -1.60979688e+00,  -2.23053830e-01],
[ -3.09308365e+02,  -2.23041763e+00,  -1.13710314e-01],
[ -4.49376980e+02,  -1.14224651e+01,  -1.09468413e-05],
[ -4.16289573e+02,  -7.26185395e+00,  -7.02052098e-04],
[ -5.06724696e+02,  -1.39139634e+01,  -9.06238851e-07],
[ -5.75425351e+02,  -2.19381967e+01,  -2.96730640e-10],
[ -4.66130391e+02,  -1.34048044e+01,  -1.50788342e-06],
[ -2.98548520e+02,  -3.38771676e-01,  -1.24703740e+00],
[ -3.50990031e+02,  -7.21136687e-01,  -6.65919804e-01],
[ -5.77189946e+02,  -2.31327920e+01,  -8.98578989e-11],
[ -4.98648138e+02,  -1.88379419e+01,  -6.58848798e-09],
[ -3.88867859e+02,  -6.26470421e+00,  -1.90408763e-03],
[ -2.96704559e+02,  -1.64411292e+00,  -2.14659463e-01],
[ -4.27739492e+02,  -1.21608575e+01,  -5.23127826e-06],
[ -5.04811435e+02,  -1.91109390e+01,  -5.01446573e-09],
[ -4.23198845e+02,  -1.45752291e+01,  -4.67798162e-07],
[ -3.46079221e+02,  -3.68839303e+00,  -2.53302835e-02],
[ -5.30964877e+02,  -1.92358690e+01,  -4.42555992e-09],
[ -5.37227545e+02,  -2.25274865e+01,  -1.64602554e-10],
[ -4.33320275e+02,  -1.43507861e+01,  -5.85508133e-07],
[ -3.39139040e+02,  -3.67120606e+00,  -2.57751046e-02],
[ -3.80448261e+02,  -7.90155668e+00,  -3.70235390e-04],
[ -4.51888911e+02,  -1.52178512e+01,  -2.46020464e-07],
[ -3.31662328e+02,  -2.88231414e+00,  -5.76344191e-02]])
In [60]:
scaler.transform(X)

Out[60]:
array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
[ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
[ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
[ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
[ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
[ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
[ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
[ 0.13888889,  0.41666667,  0.06779661,  0.        ],
[ 0.        ,  0.41666667,  0.01694915,  0.        ],
[ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
[ 0.38888889,  1.        ,  0.08474576,  0.125     ],
[ 0.30555556,  0.79166667,  0.05084746,  0.125     ],
[ 0.22222222,  0.625     ,  0.06779661,  0.08333333],
[ 0.38888889,  0.75      ,  0.11864407,  0.08333333],
[ 0.22222222,  0.75      ,  0.08474576,  0.08333333],
[ 0.30555556,  0.58333333,  0.11864407,  0.04166667],
[ 0.22222222,  0.70833333,  0.08474576,  0.125     ],
[ 0.08333333,  0.66666667,  0.        ,  0.04166667],
[ 0.22222222,  0.54166667,  0.11864407,  0.16666667],
[ 0.13888889,  0.58333333,  0.15254237,  0.04166667],
[ 0.19444444,  0.41666667,  0.10169492,  0.04166667],
[ 0.19444444,  0.58333333,  0.10169492,  0.125     ],
[ 0.25      ,  0.625     ,  0.08474576,  0.04166667],
[ 0.25      ,  0.58333333,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.10169492,  0.04166667],
[ 0.13888889,  0.45833333,  0.10169492,  0.04166667],
[ 0.30555556,  0.58333333,  0.08474576,  0.125     ],
[ 0.25      ,  0.875     ,  0.08474576,  0.        ],
[ 0.33333333,  0.91666667,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.19444444,  0.5       ,  0.03389831,  0.04166667],
[ 0.33333333,  0.625     ,  0.05084746,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.02777778,  0.41666667,  0.05084746,  0.04166667],
[ 0.22222222,  0.58333333,  0.08474576,  0.04166667],
[ 0.19444444,  0.625     ,  0.05084746,  0.08333333],
[ 0.05555556,  0.125     ,  0.05084746,  0.08333333],
[ 0.02777778,  0.5       ,  0.05084746,  0.04166667],
[ 0.19444444,  0.625     ,  0.10169492,  0.20833333],
[ 0.22222222,  0.75      ,  0.15254237,  0.125     ],
[ 0.13888889,  0.41666667,  0.06779661,  0.08333333],
[ 0.22222222,  0.75      ,  0.10169492,  0.04166667],
[ 0.08333333,  0.5       ,  0.06779661,  0.04166667],
[ 0.27777778,  0.70833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.54166667,  0.06779661,  0.04166667],
[ 0.75      ,  0.5       ,  0.62711864,  0.54166667],
[ 0.58333333,  0.5       ,  0.59322034,  0.58333333],
[ 0.72222222,  0.45833333,  0.66101695,  0.58333333],
[ 0.33333333,  0.125     ,  0.50847458,  0.5       ],
[ 0.61111111,  0.33333333,  0.61016949,  0.58333333],
[ 0.38888889,  0.33333333,  0.59322034,  0.5       ],
[ 0.55555556,  0.54166667,  0.62711864,  0.625     ],
[ 0.16666667,  0.16666667,  0.38983051,  0.375     ],
[ 0.63888889,  0.375     ,  0.61016949,  0.5       ],
[ 0.25      ,  0.29166667,  0.49152542,  0.54166667],
[ 0.19444444,  0.        ,  0.42372881,  0.375     ],
[ 0.44444444,  0.41666667,  0.54237288,  0.58333333],
[ 0.47222222,  0.08333333,  0.50847458,  0.375     ],
[ 0.5       ,  0.375     ,  0.62711864,  0.54166667],
[ 0.36111111,  0.375     ,  0.44067797,  0.5       ],
[ 0.66666667,  0.45833333,  0.57627119,  0.54166667],
[ 0.36111111,  0.41666667,  0.59322034,  0.58333333],
[ 0.41666667,  0.29166667,  0.52542373,  0.375     ],
[ 0.52777778,  0.08333333,  0.59322034,  0.58333333],
[ 0.36111111,  0.20833333,  0.49152542,  0.41666667],
[ 0.44444444,  0.5       ,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.33333333,  0.50847458,  0.5       ],
[ 0.55555556,  0.20833333,  0.66101695,  0.58333333],
[ 0.5       ,  0.33333333,  0.62711864,  0.45833333],
[ 0.58333333,  0.375     ,  0.55932203,  0.5       ],
[ 0.63888889,  0.41666667,  0.57627119,  0.54166667],
[ 0.69444444,  0.33333333,  0.6440678 ,  0.54166667],
[ 0.66666667,  0.41666667,  0.6779661 ,  0.66666667],
[ 0.47222222,  0.375     ,  0.59322034,  0.58333333],
[ 0.38888889,  0.25      ,  0.42372881,  0.375     ],
[ 0.33333333,  0.16666667,  0.47457627,  0.41666667],
[ 0.33333333,  0.16666667,  0.45762712,  0.375     ],
[ 0.41666667,  0.29166667,  0.49152542,  0.45833333],
[ 0.47222222,  0.29166667,  0.69491525,  0.625     ],
[ 0.30555556,  0.41666667,  0.59322034,  0.58333333],
[ 0.47222222,  0.58333333,  0.59322034,  0.625     ],
[ 0.66666667,  0.45833333,  0.62711864,  0.58333333],
[ 0.55555556,  0.125     ,  0.57627119,  0.5       ],
[ 0.36111111,  0.41666667,  0.52542373,  0.5       ],
[ 0.33333333,  0.20833333,  0.50847458,  0.5       ],
[ 0.33333333,  0.25      ,  0.57627119,  0.45833333],
[ 0.5       ,  0.41666667,  0.61016949,  0.54166667],
[ 0.41666667,  0.25      ,  0.50847458,  0.45833333],
[ 0.19444444,  0.125     ,  0.38983051,  0.375     ],
[ 0.36111111,  0.29166667,  0.54237288,  0.5       ],
[ 0.38888889,  0.41666667,  0.54237288,  0.45833333],
[ 0.38888889,  0.375     ,  0.54237288,  0.5       ],
[ 0.52777778,  0.375     ,  0.55932203,  0.5       ],
[ 0.22222222,  0.20833333,  0.33898305,  0.41666667],
[ 0.38888889,  0.33333333,  0.52542373,  0.5       ],
[ 0.55555556,  0.54166667,  0.84745763,  1.        ],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.77777778,  0.41666667,  0.83050847,  0.83333333],
[ 0.55555556,  0.375     ,  0.77966102,  0.70833333],
[ 0.61111111,  0.41666667,  0.81355932,  0.875     ],
[ 0.91666667,  0.41666667,  0.94915254,  0.83333333],
[ 0.16666667,  0.20833333,  0.59322034,  0.66666667],
[ 0.83333333,  0.375     ,  0.89830508,  0.70833333],
[ 0.66666667,  0.20833333,  0.81355932,  0.70833333],
[ 0.80555556,  0.66666667,  0.86440678,  1.        ],
[ 0.61111111,  0.5       ,  0.69491525,  0.79166667],
[ 0.58333333,  0.29166667,  0.72881356,  0.75      ],
[ 0.69444444,  0.41666667,  0.76271186,  0.83333333],
[ 0.38888889,  0.20833333,  0.6779661 ,  0.79166667],
[ 0.41666667,  0.33333333,  0.69491525,  0.95833333],
[ 0.58333333,  0.5       ,  0.72881356,  0.91666667],
[ 0.61111111,  0.41666667,  0.76271186,  0.70833333],
[ 0.94444444,  0.75      ,  0.96610169,  0.875     ],
[ 0.94444444,  0.25      ,  1.        ,  0.91666667],
[ 0.47222222,  0.08333333,  0.6779661 ,  0.58333333],
[ 0.72222222,  0.5       ,  0.79661017,  0.91666667],
[ 0.36111111,  0.33333333,  0.66101695,  0.79166667],
[ 0.94444444,  0.33333333,  0.96610169,  0.79166667],
[ 0.55555556,  0.29166667,  0.66101695,  0.70833333],
[ 0.66666667,  0.54166667,  0.79661017,  0.83333333],
[ 0.80555556,  0.5       ,  0.84745763,  0.70833333],
[ 0.52777778,  0.33333333,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.41666667,  0.66101695,  0.70833333],
[ 0.58333333,  0.33333333,  0.77966102,  0.83333333],
[ 0.80555556,  0.41666667,  0.81355932,  0.625     ],
[ 0.86111111,  0.33333333,  0.86440678,  0.75      ],
[ 1.        ,  0.75      ,  0.91525424,  0.79166667],
[ 0.58333333,  0.33333333,  0.77966102,  0.875     ],
[ 0.55555556,  0.33333333,  0.69491525,  0.58333333],
[ 0.5       ,  0.25      ,  0.77966102,  0.54166667],
[ 0.94444444,  0.41666667,  0.86440678,  0.91666667],
[ 0.55555556,  0.58333333,  0.77966102,  0.95833333],
[ 0.58333333,  0.45833333,  0.76271186,  0.70833333],
[ 0.47222222,  0.41666667,  0.6440678 ,  0.70833333],
[ 0.72222222,  0.45833333,  0.74576271,  0.83333333],
[ 0.66666667,  0.45833333,  0.77966102,  0.95833333],
[ 0.72222222,  0.45833333,  0.69491525,  0.91666667],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.69444444,  0.5       ,  0.83050847,  0.91666667],
[ 0.66666667,  0.54166667,  0.79661017,  1.        ],
[ 0.66666667,  0.41666667,  0.71186441,  0.91666667],
[ 0.55555556,  0.20833333,  0.6779661 ,  0.75      ],
[ 0.61111111,  0.41666667,  0.71186441,  0.79166667],
[ 0.52777778,  0.58333333,  0.74576271,  0.91666667],
[ 0.44444444,  0.41666667,  0.69491525,  0.70833333]])
In [61]:
dbscanner.fit_predict(X)

Out[61]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  1,
1,  1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,
-1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1,  1,
1,  1,  1, -1, -1,  1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1], dtype=int64)

As it can be seen, in order to have access for each object's output, one needs to call different methods.

### Adapters to provide a homogeneous interface to scikit-learn objects¶

So as to offer a homogeneous interface a collection of adapters is available in PipeGraph. Them all derive from the AdapterForSkLearnLikeAdaptee baseclass. This class, which is not meant to be called directly but through its subclasses, describes the interface of the adapters for scikit-learn objects in order to provide a common protocol based on fit and predict methods irrespectively of whether the adapted object provided a transform, fit_predict, or predict interface.

In [62]:
import inspect

class AdapterForSkLearnLikeAdaptee(BaseEstimator):
"""
This class is an adapter for Scikit-Learn objects in order to provide a common interface based on fit and predict
methods irrespectively of whether the adapted object provided a transform, fit_predict, or predict interface.
It is also used by the Process class as strategy.
"""

"""

Args:
"""

def fit(self, *pargs, **kwargs):
"""

Args:
pargs:
kwargs:

Returns:

"""
return self

@abstractmethod
def predict(self, *pargs, **kwargs):
""" To be implemented by subclasses """

def _get_fit_signature(self):
"""

Returns:

"""
else:

@abstractmethod
def _get_predict_signature(self):
""" For easier predict params passing"""

# These two methods work by introspection, do not remove because the __getattr__ trick does not work with them
def get_params(self, deep=True):
"""

Args:
deep:

Returns:

"""

def set_params(self, **params):
"""

Args:
params:

Returns:

"""
return self

def __getattr__(self, name):
"""

Args:
name:

Returns:

"""

def __setattr__(self, name, value):
"""

Args:
name:
value:
"""
self.__dict__[name] = value
else:

def __delattr__(self, name):
"""

Args:
name:
"""

def __repr__(self):
"""

Returns:

"""



As it can be seen from the code fragment, the fit and predict allow for an arbitrary number of positional and keyword based parameters. These will have to be coherent with the adaptees expectations, but at least we are not imposing hard constrains to the adapter's interface.

class AdapterForSkLearnLikeAdaptee(BaseEstimator):
def fit(self, *pargs, **kwargs):
...
def predict(self, *pargs, **kwargs):
...

### Adapter for scikit-learn objects using the predict method¶

Those sklearn objects following the predict protocol can be wrapped into the class AdapterForFitPredictAdaptee:

In [63]:
from pipegraph.adapters import AdapterForFitPredictAdaptee

y_pred = wrapped_classifier.predict(X=X)
y_pred

Out[63]:
{'predict': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
'predict_log_proba': array([[  0.00000000e+00,  -4.11208597e+01,  -5.78855367e+01],
[  0.00000000e+00,  -3.87505119e+01,  -5.67328319e+01],
[  0.00000000e+00,  -4.13716038e+01,  -5.90125166e+01],
[  0.00000000e+00,  -3.87801966e+01,  -5.65000742e+01],
[  0.00000000e+00,  -4.22118362e+01,  -5.87821546e+01],
[ -1.50990331e-14,  -3.18135483e+01,  -4.77671483e+01],
[  0.00000000e+00,  -3.90168287e+01,  -5.65377225e+01],
[  0.00000000e+00,  -3.95630818e+01,  -5.65385104e+01],
[  0.00000000e+00,  -3.92358214e+01,  -5.74109854e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -3.99448015e+01,  -5.59171461e+01],
[  0.00000000e+00,  -3.86387316e+01,  -5.55841702e+01],
[  0.00000000e+00,  -4.12723776e+01,  -5.87465451e+01],
[  0.00000000e+00,  -4.40508700e+01,  -6.15933405e+01],
[  0.00000000e+00,  -4.28003869e+01,  -5.76999890e+01],
[  0.00000000e+00,  -3.79878625e+01,  -5.24073850e+01],
[  0.00000000e+00,  -3.74032812e+01,  -5.36851061e+01],
[  0.00000000e+00,  -3.82599527e+01,  -5.54298023e+01],
[ -6.21724894e-15,  -3.27019712e+01,  -4.82934105e+01],
[  0.00000000e+00,  -3.92554439e+01,  -5.56035585e+01],
[ -1.11022302e-15,  -3.44179443e+01,  -5.09302471e+01],
[ -6.66133815e-16,  -3.49854914e+01,  -5.18673121e+01],
[  0.00000000e+00,  -4.53524369e+01,  -6.21630580e+01],
[ -1.88369320e-11,  -2.46951950e+01,  -4.25029624e+01],
[ -9.76996262e-15,  -3.22509844e+01,  -4.88549336e+01],
[ -4.44089210e-16,  -3.56240088e+01,  -5.34064744e+01],
[ -1.75415238e-14,  -3.16706208e+01,  -4.92423246e+01],
[  0.00000000e+00,  -3.94504986e+01,  -5.60776068e+01],
[  0.00000000e+00,  -4.00193970e+01,  -5.69598553e+01],
[  0.00000000e+00,  -3.76183937e+01,  -5.50322019e+01],
[  0.00000000e+00,  -3.67922627e+01,  -5.44175592e+01],
[ -2.19824159e-14,  -3.14495987e+01,  -4.88361191e+01],
[  0.00000000e+00,  -4.64777463e+01,  -6.10323244e+01],
[  0.00000000e+00,  -4.48894830e+01,  -5.95103654e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -4.11737926e+01,  -5.87969507e+01],
[  0.00000000e+00,  -4.01390763e+01,  -5.66409002e+01],
[  0.00000000e+00,  -4.02823057e+01,  -5.74425024e+01],
[  0.00000000e+00,  -4.07096317e+01,  -5.87377123e+01],
[  0.00000000e+00,  -3.91876179e+01,  -5.61142420e+01],
[  0.00000000e+00,  -3.95937603e+01,  -5.68754065e+01],
[ -8.88178420e-16,  -3.48281190e+01,  -5.46456650e+01],
[  0.00000000e+00,  -4.18405880e+01,  -5.94319738e+01],
[ -6.24451602e-11,  -2.34967248e+01,  -4.10128301e+01],
[ -8.14459611e-13,  -2.78361426e+01,  -4.40338704e+01],
[ -2.22044605e-16,  -3.61774145e+01,  -5.45875862e+01],
[  0.00000000e+00,  -4.05725544e+01,  -5.64270855e+01],
[  0.00000000e+00,  -4.06134153e+01,  -5.81878898e+01],
[  0.00000000e+00,  -4.04517469e+01,  -5.65120841e+01],
[  0.00000000e+00,  -4.01653212e+01,  -5.74485852e+01],
[ -2.48052568e+02,  -2.18109163e-01,  -1.62983281e+00],
[ -2.30738394e+02,  -5.63908550e-02,  -2.90351121e+00],
[ -2.80491279e+02,  -7.84930690e-01,  -6.09084226e-01],
[ -1.60415501e+02,  -3.12493439e-05,  -1.03735278e+01],
[ -2.44173908e+02,  -4.87262645e-02,  -3.04580129e+00],
[ -2.06975902e+02,  -8.80760321e-04,  -7.03516537e+00],
[ -2.61492267e+02,  -4.17104153e-01,  -1.07573288e+00],
[ -7.81077843e+01,  -2.33206936e-07,  -1.52713398e+01],
[ -2.24546277e+02,  -9.73088268e-03,  -4.63731216e+00],
[ -1.58620033e+02,  -9.02576814e-05,  -9.31288698e+00],
[ -9.34195242e+01,  -2.35068255e-07,  -1.52633900e+01],
[ -1.98308513e+02,  -3.76880673e-03,  -5.58288066e+00],
[ -1.38601134e+02,  -6.01634294e-06,  -1.20210340e+01],
[ -2.40199046e+02,  -1.40068145e-02,  -4.27520655e+00],
[ -1.26096900e+02,  -2.46129435e-05,  -1.06122504e+01],
[ -2.13966954e+02,  -2.01649503e-02,  -3.91387485e+00],
[ -2.25487740e+02,  -9.35636191e-03,  -4.67637328e+00],
[ -1.44223302e+02,  -1.63374144e-05,  -1.10220609e+01],
[ -2.33161854e+02,  -5.31700240e-03,  -5.23950292e+00],
[ -1.34144688e+02,  -6.99366761e-06,  -1.18705089e+01],
[ -2.95027181e+02,  -1.86759947e+00,  -1.67820115e-01],
[ -1.62730156e+02,  -1.92992469e-04,  -8.55295589e+00],
[ -2.76246088e+02,  -7.57185971e-02,  -2.61835190e+00],
[ -2.21783330e+02,  -1.84518185e-03,  -6.29609989e+00],
[ -1.92417591e+02,  -1.54036992e-03,  -6.47650277e+00],
[ -2.13431507e+02,  -1.26080671e-02,  -4.37971584e+00],
[ -2.58592703e+02,  -9.11897920e-02,  -2.44006076e+00],
[ -3.14700239e+02,  -2.58668517e+00,  -7.82525369e-02],
[ -2.28824133e+02,  -1.36119554e-02,  -4.30360506e+00],
[ -9.51756427e+01,  -1.23794287e-06,  -1.36020601e+01],
[ -1.25622344e+02,  -3.55252462e-06,  -1.25478538e+01],
[ -1.09762853e+02,  -1.34924015e-06,  -1.35159697e+01],
[ -1.43170407e+02,  -2.76527750e-05,  -1.04957983e+01],
[ -3.07586574e+02,  -4.90761846e-01,  -9.47161995e-01],
[ -2.24340564e+02,  -7.55180548e-03,  -4.88974213e+00],
[ -2.37042011e+02,  -1.32266420e-01,  -2.08834144e+00],
[ -2.55530691e+02,  -2.24025500e-01,  -1.60591787e+00],
[ -2.03604220e+02,  -6.14771461e-04,  -7.39456734e+00],
[ -1.66964124e+02,  -2.03750870e-04,  -8.49871441e+00],
[ -1.59751333e+02,  -4.47674653e-05,  -1.00140513e+01],
[ -1.87444075e+02,  -1.26328301e-04,  -8.97668964e+00],
[ -2.29061946e+02,  -1.06854191e-02,  -4.54421312e+00],
[ -1.52155085e+02,  -3.37718907e-05,  -1.02958986e+01],
[ -7.87548415e+01,  -2.02487942e-07,  -1.54125856e+01],
[ -1.77565145e+02,  -1.67685437e-04,  -8.69350458e+00],
[ -1.67627763e+02,  -1.50136407e-04,  -8.80404136e+00],
[ -1.77272780e+02,  -2.85093986e-04,  -8.16283420e+00],
[ -1.90570452e+02,  -1.00814508e-03,  -6.90014722e+00],
[ -6.76595831e+01,  -2.31316924e-07,  -1.52794772e+01],
[ -1.68600092e+02,  -1.52851668e-04,  -8.78611902e+00],
[ -5.76528695e+02,  -2.34793813e+01,  -6.35380637e-11],
[ -3.46079221e+02,  -3.68839303e+00,  -2.53302835e-02],
[ -5.01915316e+02,  -1.55998057e+01,  -1.67915395e-07],
[ -4.02192362e+02,  -6.21729985e+00,  -1.99661565e-03],
[ -4.95383744e+02,  -1.52828267e+01,  -2.30543434e-07],
[ -6.22492812e+02,  -2.21463196e+01,  -2.40977016e-10],
[ -2.47154107e+02,  -2.68427191e-02,  -3.63115200e+00],
[ -5.21888447e+02,  -1.35227055e+01,  -1.34018237e-06],
[ -4.36746429e+02,  -7.61520062e+00,  -4.93023301e-04],
[ -6.03094508e+02,  -2.72904971e+01,  -1.40598644e-12],
[ -3.67126147e+02,  -7.79234360e+00,  -4.12969376e-04],
[ -3.77747570e+02,  -5.88623220e+00,  -2.78128597e-03],
[ -4.40168625e+02,  -1.22454127e+01,  -4.80713017e-06],
[ -3.48586297e+02,  -4.35202467e+00,  -1.29643826e-02],
[ -4.29571255e+02,  -1.37738535e+01,  -1.04253739e-06],
[ -4.41726495e+02,  -1.46119054e+01,  -4.50951887e-07],
[ -3.90421773e+02,  -6.28076617e+00,  -1.87372012e-03],
[ -6.54914190e+02,  -2.71328253e+01,  -1.64490643e-12],
[ -7.12254776e+02,  -2.80658015e+01,  -6.47482068e-13],
[ -2.86083201e+02,  -4.27662147e-02,  -3.17331377e+00],
[ -5.03186707e+02,  -1.82626784e+01,  -1.17116898e-08],
[ -3.35138824e+02,  -4.28580072e+00,  -1.38581796e-02],
[ -6.26187694e+02,  -2.10000206e+01,  -7.58240581e-10],
[ -3.12323599e+02,  -2.04032292e+00,  -1.39246813e-01],
[ -4.65698806e+02,  -1.38441385e+01,  -9.71778424e-07],
[ -4.72166196e+02,  -1.18141630e+01,  -7.39904730e-06],
[ -2.99339133e+02,  -1.60979688e+00,  -2.23053830e-01],
[ -3.09308365e+02,  -2.23041763e+00,  -1.13710314e-01],
[ -4.49376980e+02,  -1.14224651e+01,  -1.09468413e-05],
[ -4.16289573e+02,  -7.26185395e+00,  -7.02052098e-04],
[ -5.06724696e+02,  -1.39139634e+01,  -9.06238851e-07],
[ -5.75425351e+02,  -2.19381967e+01,  -2.96730640e-10],
[ -4.66130391e+02,  -1.34048044e+01,  -1.50788342e-06],
[ -2.98548520e+02,  -3.38771676e-01,  -1.24703740e+00],
[ -3.50990031e+02,  -7.21136687e-01,  -6.65919804e-01],
[ -5.77189946e+02,  -2.31327920e+01,  -8.98578989e-11],
[ -4.98648138e+02,  -1.88379419e+01,  -6.58848798e-09],
[ -3.88867859e+02,  -6.26470421e+00,  -1.90408763e-03],
[ -2.96704559e+02,  -1.64411292e+00,  -2.14659463e-01],
[ -4.27739492e+02,  -1.21608575e+01,  -5.23127826e-06],
[ -5.04811435e+02,  -1.91109390e+01,  -5.01446573e-09],
[ -4.23198845e+02,  -1.45752291e+01,  -4.67798162e-07],
[ -3.46079221e+02,  -3.68839303e+00,  -2.53302835e-02],
[ -5.30964877e+02,  -1.92358690e+01,  -4.42555992e-09],
[ -5.37227545e+02,  -2.25274865e+01,  -1.64602554e-10],
[ -4.33320275e+02,  -1.43507861e+01,  -5.85508133e-07],
[ -3.39139040e+02,  -3.67120606e+00,  -2.57751046e-02],
[ -3.80448261e+02,  -7.90155668e+00,  -3.70235390e-04],
[ -4.51888911e+02,  -1.52178512e+01,  -2.46020464e-07],
[ -3.31662328e+02,  -2.88231414e+00,  -5.76344191e-02]]),
'predict_proba': array([[  1.00000000e+000,   1.38496103e-018,   7.25489025e-026],
[  1.00000000e+000,   1.48206242e-017,   2.29743996e-025],
[  1.00000000e+000,   1.07780639e-018,   2.35065917e-026],
[  1.00000000e+000,   1.43871443e-017,   2.89954283e-025],
[  1.00000000e+000,   4.65192224e-019,   2.95961100e-026],
[  1.00000000e+000,   1.52598944e-014,   1.79883402e-021],
[  1.00000000e+000,   1.13555084e-017,   2.79240943e-025],
[  1.00000000e+000,   6.57615274e-018,   2.79021029e-025],
[  1.00000000e+000,   9.12219356e-018,   1.16607332e-025],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   4.48944985e-018,   5.19388089e-025],
[  1.00000000e+000,   1.65734172e-017,   7.24605453e-025],
[  1.00000000e+000,   1.19023891e-018,   3.06690017e-026],
[  1.00000000e+000,   7.39520546e-020,   1.77972179e-027],
[  1.00000000e+000,   2.58242749e-019,   8.73399972e-026],
[  1.00000000e+000,   3.17746623e-017,   1.73684833e-023],
[  1.00000000e+000,   5.70113578e-017,   4.84010372e-024],
[  1.00000000e+000,   2.42054769e-017,   8.45556661e-025],
[  1.00000000e+000,   6.27645419e-015,   1.06276762e-021],
[  1.00000000e+000,   8.94493797e-018,   7.10691894e-025],
[  1.00000000e+000,   1.12843548e-015,   7.60807373e-023],
[  1.00000000e+000,   6.39726172e-016,   2.98066089e-023],
[  1.00000000e+000,   2.01227309e-020,   1.00676223e-027],
[  1.00000000e+000,   1.88370574e-011,   3.47694606e-019],
[  1.00000000e+000,   9.85315738e-015,   6.06138600e-022],
[  1.00000000e+000,   3.37823264e-016,   6.39532840e-024],
[  1.00000000e+000,   1.76045187e-014,   4.11462407e-022],
[  1.00000000e+000,   7.35980232e-018,   4.42389485e-025],
[  1.00000000e+000,   4.16674318e-018,   1.83083484e-025],
[  1.00000000e+000,   4.59768498e-017,   1.25839903e-024],
[  1.00000000e+000,   1.05032415e-016,   2.32677467e-024],
[  1.00000000e+000,   2.19590125e-014,   6.17650711e-022],
[  1.00000000e+000,   6.53087316e-021,   3.11887725e-027],
[  1.00000000e+000,   3.19701924e-020,   1.42881733e-026],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   1.31355747e-018,   2.91614269e-026],
[  1.00000000e+000,   3.69675482e-018,   2.51866027e-025],
[  1.00000000e+000,   3.20344249e-018,   1.12989524e-025],
[  1.00000000e+000,   2.08944813e-018,   3.09410939e-026],
[  1.00000000e+000,   9.57268514e-018,   4.26475768e-025],
[  1.00000000e+000,   6.37746927e-018,   1.99216264e-025],
[  1.00000000e+000,   7.48755609e-016,   1.85220582e-024],
[  1.00000000e+000,   6.74316102e-019,   1.54533175e-026],
[  1.00000000e+000,   6.24456357e-011,   1.54295833e-018],
[  1.00000000e+000,   8.14548341e-013,   7.52199540e-020],
[  1.00000000e+000,   1.94244394e-016,   1.96296487e-024],
[  1.00000000e+000,   2.39642309e-018,   3.11909164e-025],
[  1.00000000e+000,   2.30047669e-018,   5.36192288e-026],
[  1.00000000e+000,   2.70414239e-018,   2.86492790e-025],
[  1.00000000e+000,   3.60099614e-018,   1.12304319e-025],
[  1.87127931e-108,   8.04037666e-001,   1.95962334e-001],
[  6.18854779e-101,   9.45169639e-001,   5.48303606e-002],
[  1.52821825e-122,   4.56151317e-001,   5.43848683e-001],
[  2.14997261e-070,   9.99968751e-001,   3.12488556e-005],
[  9.04938222e-107,   9.52441811e-001,   4.75581888e-002],
[  1.29272979e-090,   9.99119627e-001,   8.80372565e-004],
[  2.72490532e-114,   6.58952285e-001,   3.41047715e-001],
[  1.19734767e-034,   9.99999767e-001,   2.33206910e-007],
[  3.02545627e-098,   9.90316309e-001,   9.68369084e-003],
[  1.29477666e-069,   9.99909746e-001,   9.02536083e-005],
[  2.68173680e-041,   9.99999765e-001,   2.35068227e-007],
[  7.51115851e-087,   9.96238286e-001,   3.76171369e-003],
[  6.40165546e-061,   9.99993984e-001,   6.01632484e-006],
[  4.81814146e-105,   9.86090825e-001,   1.39091754e-002],
[  1.72509107e-055,   9.99975387e-001,   2.46126406e-005],
[  1.18941242e-093,   9.80037003e-001,   1.99629974e-002],
[  1.18009940e-098,   9.90687273e-001,   9.31272734e-003],
[  2.31534504e-063,   9.99983663e-001,   1.63372809e-005],
[  5.48394976e-102,   9.94697108e-001,   5.30289217e-003],
[  5.51699136e-059,   9.99993006e-001,   6.99364316e-006],
[  7.43572418e-129,   1.54494085e-001,   8.45505915e-001],
[  2.12417952e-071,   9.99807026e-001,   1.92973847e-004],
[  1.06622383e-120,   9.27077052e-001,   7.29229479e-002],
[  4.79428037e-097,   9.98156519e-001,   1.84348055e-003],
[  2.71707817e-084,   9.98460816e-001,   1.53918416e-003],
[  2.03176962e-093,   9.87471082e-001,   1.25289184e-002],
[  4.95012220e-113,   9.12844444e-001,   8.71555561e-002],
[  2.12531216e-137,   7.52691316e-002,   9.24730868e-001],
[  4.19702663e-100,   9.86480268e-001,   1.35197316e-002],
[  4.63173354e-042,   9.99998762e-001,   1.23794211e-006],
[  2.77274013e-055,   9.99996447e-001,   3.55251831e-006],
[  2.14091116e-048,   9.99998651e-001,   1.34923924e-006],
[  6.63563094e-063,   9.99972348e-001,   2.76523927e-005],
[  2.61124821e-134,   6.12159845e-001,   3.87840155e-001],
[  3.71647418e-098,   9.92476638e-001,   7.52336224e-003],
[  1.13230275e-103,   8.76107551e-001,   1.23892449e-001],
[  1.05786721e-111,   7.99294752e-001,   2.00705248e-001],
[  3.76539608e-089,   9.99385417e-001,   6.14582528e-004],
[  3.07894878e-073,   9.99796270e-001,   2.03730114e-004],
[  4.17712661e-070,   9.99955234e-001,   4.47664632e-005],
[  3.92710689e-082,   9.99873680e-001,   1.26320322e-004],
[  3.30872742e-100,   9.89371467e-001,   1.06285328e-002],
[  8.31545615e-067,   9.99966229e-001,   3.37713204e-005],
[  6.26912483e-035,   9.99999798e-001,   2.02487922e-007],
[  7.66367658e-078,   9.99832329e-001,   1.67671378e-004],
[  1.58557717e-073,   9.99849875e-001,   1.50125137e-004],
[  1.02662082e-077,   9.99714947e-001,   2.85053350e-004],
[  1.72307593e-083,   9.98992363e-001,   1.00763708e-003],
[  4.12872931e-030,   9.99999769e-001,   2.31316897e-007],
[  5.99667528e-074,   9.99847160e-001,   1.52839987e-004],
[  4.13779546e-251,   6.35381030e-011,   1.00000000e+000],
[  5.00845630e-151,   2.50121636e-002,   9.74987836e-001],
[  1.04941686e-218,   1.67915381e-007,   9.99999832e-001],
[  2.13833836e-175,   1.99462374e-003,   9.98005376e-001],
[  7.20399720e-216,   2.30543407e-007,   9.99999769e-001],
[  4.51654712e-271,   2.40976994e-010,   1.00000000e+000],
[  4.59552511e-108,   9.73514345e-001,   2.64856553e-002],
[  2.22191497e-227,   1.34018147e-006,   9.99998660e-001],
[  2.10589122e-190,   4.92901785e-004,   9.99507098e-001],
[  1.20055778e-262,   1.40568402e-012,   1.00000000e+000],
[  3.62359789e-160,   4.12884115e-004,   9.99587116e-001],
[  8.83719953e-165,   2.77742178e-003,   9.97222578e-001],
[  6.87376950e-192,   4.80711862e-006,   9.99995193e-001],
[  4.08220498e-152,   1.28807070e-002,   9.87119293e-001],
[  2.75153031e-187,   1.04253685e-006,   9.99998957e-001],
[  1.44750671e-192,   4.50951786e-007,   9.99999549e-001],
[  2.76680341e-170,   1.87196580e-003,   9.98128034e-001],
[  3.75302289e-285,   1.64574932e-012,   1.00000000e+000],
[  4.69548986e-310,   6.47406861e-013,   1.00000000e+000],
[  5.69697725e-125,   9.58135362e-001,   4.18646381e-002],
[  2.94299535e-219,   1.17116897e-008,   9.99999988e-001],
[  2.82525894e-146,   1.37625971e-002,   9.86237403e-001],
[  1.12237933e-272,   7.58240410e-010,   9.99999999e-001],
[  2.28867567e-136,   1.29986728e-001,   8.70013272e-001],
[  5.61795825e-203,   9.71777952e-007,   9.99999028e-001],
[  8.72622664e-206,   7.39901993e-006,   9.99992601e-001],
[  9.96933448e-131,   1.99928220e-001,   8.00071780e-001],
[  4.66749613e-135,   1.07483532e-001,   8.92516468e-001],
[  6.88743059e-196,   1.09467814e-005,   9.99989053e-001],
[  1.61337601e-181,   7.01805717e-004,   9.99298194e-001],
[  8.55580252e-221,   9.06238440e-007,   9.99999094e-001],
[  1.24722670e-250,   2.96730515e-010,   1.00000000e+000],
[  3.64874362e-203,   1.50788229e-006,   9.99998492e-001],
[  2.19798649e-130,   7.12645144e-001,   2.87354856e-001],
[  3.68949024e-153,   4.86199285e-001,   5.13800715e-001],
[  2.13595212e-251,   8.98578645e-011,   1.00000000e+000],
[  2.75337356e-217,   6.58848819e-009,   9.99999993e-001],
[  1.30868299e-169,   1.90227600e-003,   9.98097724e-001],
[  1.38946382e-129,   1.93183856e-001,   8.06816144e-001],
[  1.71830037e-186,   5.23126458e-006,   9.99994769e-001],
[  5.79667973e-220,   5.01446575e-009,   9.99999995e-001],
[  1.61093140e-184,   4.67798053e-007,   9.99999532e-001],
[  5.00845630e-151,   2.50121636e-002,   9.74987836e-001],
[  2.54029381e-231,   4.42556022e-009,   9.99999996e-001],
[  4.84219075e-234,   1.64602693e-010,   1.00000000e+000],
[  6.47732320e-189,   5.85507961e-007,   9.99999414e-001],
[  5.17352411e-148,   2.54457623e-002,   9.74554238e-001],
[  5.93498263e-166,   3.70166861e-004,   9.99629833e-001],
[  5.58649523e-197,   2.46020434e-007,   9.99999754e-001],
[  9.13863414e-145,   5.60050091e-002,   9.43994991e-001]])}

As you can see the wrapper provides its output as a dictionary containing the outputs provided by predict, predict_proba, and predict_log_proba where these methods are available.

In [64]:
list(y_pred.keys())

Out[64]:
['predict', 'predict_proba', 'predict_log_proba']

### Adapter for scikit-learn objects using the transform method¶

Those sklearn objects following the transform protocol can be wrapped into the class AdapterForFitTransformAdaptee:

In [65]:
from pipegraph.adapters import AdapterForFitTransformAdaptee

y_pred=wrapped_scaler.predict(X)
y_pred

Out[65]:
{'predict': array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
[ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
[ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
[ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
[ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
[ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
[ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
[ 0.13888889,  0.41666667,  0.06779661,  0.        ],
[ 0.        ,  0.41666667,  0.01694915,  0.        ],
[ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
[ 0.38888889,  1.        ,  0.08474576,  0.125     ],
[ 0.30555556,  0.79166667,  0.05084746,  0.125     ],
[ 0.22222222,  0.625     ,  0.06779661,  0.08333333],
[ 0.38888889,  0.75      ,  0.11864407,  0.08333333],
[ 0.22222222,  0.75      ,  0.08474576,  0.08333333],
[ 0.30555556,  0.58333333,  0.11864407,  0.04166667],
[ 0.22222222,  0.70833333,  0.08474576,  0.125     ],
[ 0.08333333,  0.66666667,  0.        ,  0.04166667],
[ 0.22222222,  0.54166667,  0.11864407,  0.16666667],
[ 0.13888889,  0.58333333,  0.15254237,  0.04166667],
[ 0.19444444,  0.41666667,  0.10169492,  0.04166667],
[ 0.19444444,  0.58333333,  0.10169492,  0.125     ],
[ 0.25      ,  0.625     ,  0.08474576,  0.04166667],
[ 0.25      ,  0.58333333,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.10169492,  0.04166667],
[ 0.13888889,  0.45833333,  0.10169492,  0.04166667],
[ 0.30555556,  0.58333333,  0.08474576,  0.125     ],
[ 0.25      ,  0.875     ,  0.08474576,  0.        ],
[ 0.33333333,  0.91666667,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.19444444,  0.5       ,  0.03389831,  0.04166667],
[ 0.33333333,  0.625     ,  0.05084746,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.02777778,  0.41666667,  0.05084746,  0.04166667],
[ 0.22222222,  0.58333333,  0.08474576,  0.04166667],
[ 0.19444444,  0.625     ,  0.05084746,  0.08333333],
[ 0.05555556,  0.125     ,  0.05084746,  0.08333333],
[ 0.02777778,  0.5       ,  0.05084746,  0.04166667],
[ 0.19444444,  0.625     ,  0.10169492,  0.20833333],
[ 0.22222222,  0.75      ,  0.15254237,  0.125     ],
[ 0.13888889,  0.41666667,  0.06779661,  0.08333333],
[ 0.22222222,  0.75      ,  0.10169492,  0.04166667],
[ 0.08333333,  0.5       ,  0.06779661,  0.04166667],
[ 0.27777778,  0.70833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.54166667,  0.06779661,  0.04166667],
[ 0.75      ,  0.5       ,  0.62711864,  0.54166667],
[ 0.58333333,  0.5       ,  0.59322034,  0.58333333],
[ 0.72222222,  0.45833333,  0.66101695,  0.58333333],
[ 0.33333333,  0.125     ,  0.50847458,  0.5       ],
[ 0.61111111,  0.33333333,  0.61016949,  0.58333333],
[ 0.38888889,  0.33333333,  0.59322034,  0.5       ],
[ 0.55555556,  0.54166667,  0.62711864,  0.625     ],
[ 0.16666667,  0.16666667,  0.38983051,  0.375     ],
[ 0.63888889,  0.375     ,  0.61016949,  0.5       ],
[ 0.25      ,  0.29166667,  0.49152542,  0.54166667],
[ 0.19444444,  0.        ,  0.42372881,  0.375     ],
[ 0.44444444,  0.41666667,  0.54237288,  0.58333333],
[ 0.47222222,  0.08333333,  0.50847458,  0.375     ],
[ 0.5       ,  0.375     ,  0.62711864,  0.54166667],
[ 0.36111111,  0.375     ,  0.44067797,  0.5       ],
[ 0.66666667,  0.45833333,  0.57627119,  0.54166667],
[ 0.36111111,  0.41666667,  0.59322034,  0.58333333],
[ 0.41666667,  0.29166667,  0.52542373,  0.375     ],
[ 0.52777778,  0.08333333,  0.59322034,  0.58333333],
[ 0.36111111,  0.20833333,  0.49152542,  0.41666667],
[ 0.44444444,  0.5       ,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.33333333,  0.50847458,  0.5       ],
[ 0.55555556,  0.20833333,  0.66101695,  0.58333333],
[ 0.5       ,  0.33333333,  0.62711864,  0.45833333],
[ 0.58333333,  0.375     ,  0.55932203,  0.5       ],
[ 0.63888889,  0.41666667,  0.57627119,  0.54166667],
[ 0.69444444,  0.33333333,  0.6440678 ,  0.54166667],
[ 0.66666667,  0.41666667,  0.6779661 ,  0.66666667],
[ 0.47222222,  0.375     ,  0.59322034,  0.58333333],
[ 0.38888889,  0.25      ,  0.42372881,  0.375     ],
[ 0.33333333,  0.16666667,  0.47457627,  0.41666667],
[ 0.33333333,  0.16666667,  0.45762712,  0.375     ],
[ 0.41666667,  0.29166667,  0.49152542,  0.45833333],
[ 0.47222222,  0.29166667,  0.69491525,  0.625     ],
[ 0.30555556,  0.41666667,  0.59322034,  0.58333333],
[ 0.47222222,  0.58333333,  0.59322034,  0.625     ],
[ 0.66666667,  0.45833333,  0.62711864,  0.58333333],
[ 0.55555556,  0.125     ,  0.57627119,  0.5       ],
[ 0.36111111,  0.41666667,  0.52542373,  0.5       ],
[ 0.33333333,  0.20833333,  0.50847458,  0.5       ],
[ 0.33333333,  0.25      ,  0.57627119,  0.45833333],
[ 0.5       ,  0.41666667,  0.61016949,  0.54166667],
[ 0.41666667,  0.25      ,  0.50847458,  0.45833333],
[ 0.19444444,  0.125     ,  0.38983051,  0.375     ],
[ 0.36111111,  0.29166667,  0.54237288,  0.5       ],
[ 0.38888889,  0.41666667,  0.54237288,  0.45833333],
[ 0.38888889,  0.375     ,  0.54237288,  0.5       ],
[ 0.52777778,  0.375     ,  0.55932203,  0.5       ],
[ 0.22222222,  0.20833333,  0.33898305,  0.41666667],
[ 0.38888889,  0.33333333,  0.52542373,  0.5       ],
[ 0.55555556,  0.54166667,  0.84745763,  1.        ],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.77777778,  0.41666667,  0.83050847,  0.83333333],
[ 0.55555556,  0.375     ,  0.77966102,  0.70833333],
[ 0.61111111,  0.41666667,  0.81355932,  0.875     ],
[ 0.91666667,  0.41666667,  0.94915254,  0.83333333],
[ 0.16666667,  0.20833333,  0.59322034,  0.66666667],
[ 0.83333333,  0.375     ,  0.89830508,  0.70833333],
[ 0.66666667,  0.20833333,  0.81355932,  0.70833333],
[ 0.80555556,  0.66666667,  0.86440678,  1.        ],
[ 0.61111111,  0.5       ,  0.69491525,  0.79166667],
[ 0.58333333,  0.29166667,  0.72881356,  0.75      ],
[ 0.69444444,  0.41666667,  0.76271186,  0.83333333],
[ 0.38888889,  0.20833333,  0.6779661 ,  0.79166667],
[ 0.41666667,  0.33333333,  0.69491525,  0.95833333],
[ 0.58333333,  0.5       ,  0.72881356,  0.91666667],
[ 0.61111111,  0.41666667,  0.76271186,  0.70833333],
[ 0.94444444,  0.75      ,  0.96610169,  0.875     ],
[ 0.94444444,  0.25      ,  1.        ,  0.91666667],
[ 0.47222222,  0.08333333,  0.6779661 ,  0.58333333],
[ 0.72222222,  0.5       ,  0.79661017,  0.91666667],
[ 0.36111111,  0.33333333,  0.66101695,  0.79166667],
[ 0.94444444,  0.33333333,  0.96610169,  0.79166667],
[ 0.55555556,  0.29166667,  0.66101695,  0.70833333],
[ 0.66666667,  0.54166667,  0.79661017,  0.83333333],
[ 0.80555556,  0.5       ,  0.84745763,  0.70833333],
[ 0.52777778,  0.33333333,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.41666667,  0.66101695,  0.70833333],
[ 0.58333333,  0.33333333,  0.77966102,  0.83333333],
[ 0.80555556,  0.41666667,  0.81355932,  0.625     ],
[ 0.86111111,  0.33333333,  0.86440678,  0.75      ],
[ 1.        ,  0.75      ,  0.91525424,  0.79166667],
[ 0.58333333,  0.33333333,  0.77966102,  0.875     ],
[ 0.55555556,  0.33333333,  0.69491525,  0.58333333],
[ 0.5       ,  0.25      ,  0.77966102,  0.54166667],
[ 0.94444444,  0.41666667,  0.86440678,  0.91666667],
[ 0.55555556,  0.58333333,  0.77966102,  0.95833333],
[ 0.58333333,  0.45833333,  0.76271186,  0.70833333],
[ 0.47222222,  0.41666667,  0.6440678 ,  0.70833333],
[ 0.72222222,  0.45833333,  0.74576271,  0.83333333],
[ 0.66666667,  0.45833333,  0.77966102,  0.95833333],
[ 0.72222222,  0.45833333,  0.69491525,  0.91666667],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.69444444,  0.5       ,  0.83050847,  0.91666667],
[ 0.66666667,  0.54166667,  0.79661017,  1.        ],
[ 0.66666667,  0.41666667,  0.71186441,  0.91666667],
[ 0.55555556,  0.20833333,  0.6779661 ,  0.75      ],
[ 0.61111111,  0.41666667,  0.71186441,  0.79166667],
[ 0.52777778,  0.58333333,  0.74576271,  0.91666667],
[ 0.44444444,  0.41666667,  0.69491525,  0.70833333]])}

The adapter for transformers doesn't have to provide so many methods' output, only the value provided by calling trasform method on the adaptee, which for homogeneity is provided as a dictionary with 'predict' as key:

In [66]:
list(y_pred.keys())

Out[66]:
['predict']

### Adapter for scikit-learn objects using the fit_predict method¶

Those sklearn objects following the fit_predict protocol can be wrapped into the class AdapterForAtomicFitPredictAdaptee:

In [67]:
from pipegraph.adapters import AdapterForAtomicFitPredictAdaptee

y_pred = wrapped_dbscanner.predict(X=X)
y_pred

Out[67]:
{'predict': array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  1,
1,  1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,
-1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1,  1,
1,  1,  1, -1, -1,  1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1, -1, -1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1], dtype=int64)}

Again, this adapter provides a dictionary with the values of calling fit_predict under the key 'predict'.

### Adapter for objects providing an output stored in a dictionary, possibly containing several variables¶

As an addition to the previous adapter for scikit-learn objects, pipegraph provides another useful adapter frequently used with custom objects whose predict method returns multiple outputs. In this case, a dictionary can be used as well, with the name of the outputs as keys. In order to comply with this kind of output, the class AdapterForCustomFitPredictWithDictionaryOutputAdaptee is provided:

In [68]:
from pipegraph.base import Demultiplexer

demultiplexer = Demultiplexer()
output = wrapped_demultiplexer.predict(X=X, selection=y)
output

Out[68]:
{'X_0':       0    1    2    3
0   5.1  3.5  1.4  0.2
1   4.9  3.0  1.4  0.2
2   4.7  3.2  1.3  0.2
3   4.6  3.1  1.5  0.2
4   5.0  3.6  1.4  0.2
5   5.4  3.9  1.7  0.4
6   4.6  3.4  1.4  0.3
7   5.0  3.4  1.5  0.2
8   4.4  2.9  1.4  0.2
9   4.9  3.1  1.5  0.1
10  5.4  3.7  1.5  0.2
11  4.8  3.4  1.6  0.2
12  4.8  3.0  1.4  0.1
13  4.3  3.0  1.1  0.1
14  5.8  4.0  1.2  0.2
15  5.7  4.4  1.5  0.4
16  5.4  3.9  1.3  0.4
17  5.1  3.5  1.4  0.3
18  5.7  3.8  1.7  0.3
19  5.1  3.8  1.5  0.3
20  5.4  3.4  1.7  0.2
21  5.1  3.7  1.5  0.4
22  4.6  3.6  1.0  0.2
23  5.1  3.3  1.7  0.5
24  4.8  3.4  1.9  0.2
25  5.0  3.0  1.6  0.2
26  5.0  3.4  1.6  0.4
27  5.2  3.5  1.5  0.2
28  5.2  3.4  1.4  0.2
29  4.7  3.2  1.6  0.2
30  4.8  3.1  1.6  0.2
31  5.4  3.4  1.5  0.4
32  5.2  4.1  1.5  0.1
33  5.5  4.2  1.4  0.2
34  4.9  3.1  1.5  0.1
35  5.0  3.2  1.2  0.2
36  5.5  3.5  1.3  0.2
37  4.9  3.1  1.5  0.1
38  4.4  3.0  1.3  0.2
39  5.1  3.4  1.5  0.2
40  5.0  3.5  1.3  0.3
41  4.5  2.3  1.3  0.3
42  4.4  3.2  1.3  0.2
43  5.0  3.5  1.6  0.6
44  5.1  3.8  1.9  0.4
45  4.8  3.0  1.4  0.3
46  5.1  3.8  1.6  0.2
47  4.6  3.2  1.4  0.2
48  5.3  3.7  1.5  0.2
49  5.0  3.3  1.4  0.2, 'X_1':       0    1    2    3
50  7.0  3.2  4.7  1.4
51  6.4  3.2  4.5  1.5
52  6.9  3.1  4.9  1.5
53  5.5  2.3  4.0  1.3
54  6.5  2.8  4.6  1.5
55  5.7  2.8  4.5  1.3
56  6.3  3.3  4.7  1.6
57  4.9  2.4  3.3  1.0
58  6.6  2.9  4.6  1.3
59  5.2  2.7  3.9  1.4
60  5.0  2.0  3.5  1.0
61  5.9  3.0  4.2  1.5
62  6.0  2.2  4.0  1.0
63  6.1  2.9  4.7  1.4
64  5.6  2.9  3.6  1.3
65  6.7  3.1  4.4  1.4
66  5.6  3.0  4.5  1.5
67  5.8  2.7  4.1  1.0
68  6.2  2.2  4.5  1.5
69  5.6  2.5  3.9  1.1
70  5.9  3.2  4.8  1.8
71  6.1  2.8  4.0  1.3
72  6.3  2.5  4.9  1.5
73  6.1  2.8  4.7  1.2
74  6.4  2.9  4.3  1.3
75  6.6  3.0  4.4  1.4
76  6.8  2.8  4.8  1.4
77  6.7  3.0  5.0  1.7
78  6.0  2.9  4.5  1.5
79  5.7  2.6  3.5  1.0
80  5.5  2.4  3.8  1.1
81  5.5  2.4  3.7  1.0
82  5.8  2.7  3.9  1.2
83  6.0  2.7  5.1  1.6
84  5.4  3.0  4.5  1.5
85  6.0  3.4  4.5  1.6
86  6.7  3.1  4.7  1.5
87  6.3  2.3  4.4  1.3
88  5.6  3.0  4.1  1.3
89  5.5  2.5  4.0  1.3
90  5.5  2.6  4.4  1.2
91  6.1  3.0  4.6  1.4
92  5.8  2.6  4.0  1.2
93  5.0  2.3  3.3  1.0
94  5.6  2.7  4.2  1.3
95  5.7  3.0  4.2  1.2
96  5.7  2.9  4.2  1.3
97  6.2  2.9  4.3  1.3
98  5.1  2.5  3.0  1.1
99  5.7  2.8  4.1  1.3, 'X_2':        0    1    2    3
100  6.3  3.3  6.0  2.5
101  5.8  2.7  5.1  1.9
102  7.1  3.0  5.9  2.1
103  6.3  2.9  5.6  1.8
104  6.5  3.0  5.8  2.2
105  7.6  3.0  6.6  2.1
106  4.9  2.5  4.5  1.7
107  7.3  2.9  6.3  1.8
108  6.7  2.5  5.8  1.8
109  7.2  3.6  6.1  2.5
110  6.5  3.2  5.1  2.0
111  6.4  2.7  5.3  1.9
112  6.8  3.0  5.5  2.1
113  5.7  2.5  5.0  2.0
114  5.8  2.8  5.1  2.4
115  6.4  3.2  5.3  2.3
116  6.5  3.0  5.5  1.8
117  7.7  3.8  6.7  2.2
118  7.7  2.6  6.9  2.3
119  6.0  2.2  5.0  1.5
120  6.9  3.2  5.7  2.3
121  5.6  2.8  4.9  2.0
122  7.7  2.8  6.7  2.0
123  6.3  2.7  4.9  1.8
124  6.7  3.3  5.7  2.1
125  7.2  3.2  6.0  1.8
126  6.2  2.8  4.8  1.8
127  6.1  3.0  4.9  1.8
128  6.4  2.8  5.6  2.1
129  7.2  3.0  5.8  1.6
130  7.4  2.8  6.1  1.9
131  7.9  3.8  6.4  2.0
132  6.4  2.8  5.6  2.2
133  6.3  2.8  5.1  1.5
134  6.1  2.6  5.6  1.4
135  7.7  3.0  6.1  2.3
136  6.3  3.4  5.6  2.4
137  6.4  3.1  5.5  1.8
138  6.0  3.0  4.8  1.8
139  6.9  3.1  5.4  2.1
140  6.7  3.1  5.6  2.4
141  6.9  3.1  5.1  2.3
142  5.8  2.7  5.1  1.9
143  6.8  3.2  5.9  2.3
144  6.7  3.3  5.7  2.5
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8}
In [69]:
list(output.keys())

Out[69]:
['X_0', 'X_1', 'X_2']

As it can be seen, this adapter's predict method provides the dictionary of outputs provided by the adaptee with its original keys.

pipegraph uses the wrap_adaptee_in_process(adaptee, strategy_class=None) function to wrap the objects passed to its constructor's steps parameters accordingly to these rules:

• If the strategy_class parameter is passed, this class is used as adapter
• Else, if the adaptee's class is in pipegraph.base.strategies_for_custom_adaptees dictionary, the value class there is used.
• Else, if the adaptee has a predict method, the AdapterForFitPredictAdaptee class is used.
• Else, if the adaptee has a transform method, the AdapterForFitTransformAdaptee class is used.
• Else, if the adaptee has a fit_predict method, the AdapterForAtomicFitPredictAdaptee

Thanks to python's language readability, this can be seen equally clear in the source code:

In [70]:
from pipegraph.base import wrap_adaptee_in_process

def wrap_adaptee_in_process(adaptee, adapter_class=None):
"""
This function wraps the objects defined in Pipegraph's steps parameters in order to provide a common interface for them all.
This interface declares two main methods: fit and predict. So, no matter whether the adaptee is capable of doing
predict, transform or fit_predict, once wrapped the adapter uses predict as method for producing output.

Parameters:
-----------
adaptee: a Scikit-Learn object, for instance; or a user made custom estimator may be.
The object to be wrapped.
The wrapper.

Returns:
-------
An object wrapped into a first adapter layer that provides a common fit and predict interface and then wrapped
again in a second external layer using the Process class. Besides of being used by PipeGraph itself,
the user can find this function useful for inserting a user made block as one of the steps
in PipeGraph step's parameter.
"""
else:

process = Process(strategy)
return process



Let's wrap the scaler object previously defined and see how it responds to the predict method as a synonym in this case to transform

In [71]:
wrapped_scaler = wrap_adaptee_in_process(scaler)
wrapped_scaler.predict(X)

Out[71]:
{'predict': array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
[ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
[ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
[ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
[ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
[ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.30555556,  0.70833333,  0.08474576,  0.04166667],
[ 0.13888889,  0.58333333,  0.10169492,  0.04166667],
[ 0.13888889,  0.41666667,  0.06779661,  0.        ],
[ 0.        ,  0.41666667,  0.01694915,  0.        ],
[ 0.41666667,  0.83333333,  0.03389831,  0.04166667],
[ 0.38888889,  1.        ,  0.08474576,  0.125     ],
[ 0.30555556,  0.79166667,  0.05084746,  0.125     ],
[ 0.22222222,  0.625     ,  0.06779661,  0.08333333],
[ 0.38888889,  0.75      ,  0.11864407,  0.08333333],
[ 0.22222222,  0.75      ,  0.08474576,  0.08333333],
[ 0.30555556,  0.58333333,  0.11864407,  0.04166667],
[ 0.22222222,  0.70833333,  0.08474576,  0.125     ],
[ 0.08333333,  0.66666667,  0.        ,  0.04166667],
[ 0.22222222,  0.54166667,  0.11864407,  0.16666667],
[ 0.13888889,  0.58333333,  0.15254237,  0.04166667],
[ 0.19444444,  0.41666667,  0.10169492,  0.04166667],
[ 0.19444444,  0.58333333,  0.10169492,  0.125     ],
[ 0.25      ,  0.625     ,  0.08474576,  0.04166667],
[ 0.25      ,  0.58333333,  0.06779661,  0.04166667],
[ 0.11111111,  0.5       ,  0.10169492,  0.04166667],
[ 0.13888889,  0.45833333,  0.10169492,  0.04166667],
[ 0.30555556,  0.58333333,  0.08474576,  0.125     ],
[ 0.25      ,  0.875     ,  0.08474576,  0.        ],
[ 0.33333333,  0.91666667,  0.06779661,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.19444444,  0.5       ,  0.03389831,  0.04166667],
[ 0.33333333,  0.625     ,  0.05084746,  0.04166667],
[ 0.16666667,  0.45833333,  0.08474576,  0.        ],
[ 0.02777778,  0.41666667,  0.05084746,  0.04166667],
[ 0.22222222,  0.58333333,  0.08474576,  0.04166667],
[ 0.19444444,  0.625     ,  0.05084746,  0.08333333],
[ 0.05555556,  0.125     ,  0.05084746,  0.08333333],
[ 0.02777778,  0.5       ,  0.05084746,  0.04166667],
[ 0.19444444,  0.625     ,  0.10169492,  0.20833333],
[ 0.22222222,  0.75      ,  0.15254237,  0.125     ],
[ 0.13888889,  0.41666667,  0.06779661,  0.08333333],
[ 0.22222222,  0.75      ,  0.10169492,  0.04166667],
[ 0.08333333,  0.5       ,  0.06779661,  0.04166667],
[ 0.27777778,  0.70833333,  0.08474576,  0.04166667],
[ 0.19444444,  0.54166667,  0.06779661,  0.04166667],
[ 0.75      ,  0.5       ,  0.62711864,  0.54166667],
[ 0.58333333,  0.5       ,  0.59322034,  0.58333333],
[ 0.72222222,  0.45833333,  0.66101695,  0.58333333],
[ 0.33333333,  0.125     ,  0.50847458,  0.5       ],
[ 0.61111111,  0.33333333,  0.61016949,  0.58333333],
[ 0.38888889,  0.33333333,  0.59322034,  0.5       ],
[ 0.55555556,  0.54166667,  0.62711864,  0.625     ],
[ 0.16666667,  0.16666667,  0.38983051,  0.375     ],
[ 0.63888889,  0.375     ,  0.61016949,  0.5       ],
[ 0.25      ,  0.29166667,  0.49152542,  0.54166667],
[ 0.19444444,  0.        ,  0.42372881,  0.375     ],
[ 0.44444444,  0.41666667,  0.54237288,  0.58333333],
[ 0.47222222,  0.08333333,  0.50847458,  0.375     ],
[ 0.5       ,  0.375     ,  0.62711864,  0.54166667],
[ 0.36111111,  0.375     ,  0.44067797,  0.5       ],
[ 0.66666667,  0.45833333,  0.57627119,  0.54166667],
[ 0.36111111,  0.41666667,  0.59322034,  0.58333333],
[ 0.41666667,  0.29166667,  0.52542373,  0.375     ],
[ 0.52777778,  0.08333333,  0.59322034,  0.58333333],
[ 0.36111111,  0.20833333,  0.49152542,  0.41666667],
[ 0.44444444,  0.5       ,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.33333333,  0.50847458,  0.5       ],
[ 0.55555556,  0.20833333,  0.66101695,  0.58333333],
[ 0.5       ,  0.33333333,  0.62711864,  0.45833333],
[ 0.58333333,  0.375     ,  0.55932203,  0.5       ],
[ 0.63888889,  0.41666667,  0.57627119,  0.54166667],
[ 0.69444444,  0.33333333,  0.6440678 ,  0.54166667],
[ 0.66666667,  0.41666667,  0.6779661 ,  0.66666667],
[ 0.47222222,  0.375     ,  0.59322034,  0.58333333],
[ 0.38888889,  0.25      ,  0.42372881,  0.375     ],
[ 0.33333333,  0.16666667,  0.47457627,  0.41666667],
[ 0.33333333,  0.16666667,  0.45762712,  0.375     ],
[ 0.41666667,  0.29166667,  0.49152542,  0.45833333],
[ 0.47222222,  0.29166667,  0.69491525,  0.625     ],
[ 0.30555556,  0.41666667,  0.59322034,  0.58333333],
[ 0.47222222,  0.58333333,  0.59322034,  0.625     ],
[ 0.66666667,  0.45833333,  0.62711864,  0.58333333],
[ 0.55555556,  0.125     ,  0.57627119,  0.5       ],
[ 0.36111111,  0.41666667,  0.52542373,  0.5       ],
[ 0.33333333,  0.20833333,  0.50847458,  0.5       ],
[ 0.33333333,  0.25      ,  0.57627119,  0.45833333],
[ 0.5       ,  0.41666667,  0.61016949,  0.54166667],
[ 0.41666667,  0.25      ,  0.50847458,  0.45833333],
[ 0.19444444,  0.125     ,  0.38983051,  0.375     ],
[ 0.36111111,  0.29166667,  0.54237288,  0.5       ],
[ 0.38888889,  0.41666667,  0.54237288,  0.45833333],
[ 0.38888889,  0.375     ,  0.54237288,  0.5       ],
[ 0.52777778,  0.375     ,  0.55932203,  0.5       ],
[ 0.22222222,  0.20833333,  0.33898305,  0.41666667],
[ 0.38888889,  0.33333333,  0.52542373,  0.5       ],
[ 0.55555556,  0.54166667,  0.84745763,  1.        ],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.77777778,  0.41666667,  0.83050847,  0.83333333],
[ 0.55555556,  0.375     ,  0.77966102,  0.70833333],
[ 0.61111111,  0.41666667,  0.81355932,  0.875     ],
[ 0.91666667,  0.41666667,  0.94915254,  0.83333333],
[ 0.16666667,  0.20833333,  0.59322034,  0.66666667],
[ 0.83333333,  0.375     ,  0.89830508,  0.70833333],
[ 0.66666667,  0.20833333,  0.81355932,  0.70833333],
[ 0.80555556,  0.66666667,  0.86440678,  1.        ],
[ 0.61111111,  0.5       ,  0.69491525,  0.79166667],
[ 0.58333333,  0.29166667,  0.72881356,  0.75      ],
[ 0.69444444,  0.41666667,  0.76271186,  0.83333333],
[ 0.38888889,  0.20833333,  0.6779661 ,  0.79166667],
[ 0.41666667,  0.33333333,  0.69491525,  0.95833333],
[ 0.58333333,  0.5       ,  0.72881356,  0.91666667],
[ 0.61111111,  0.41666667,  0.76271186,  0.70833333],
[ 0.94444444,  0.75      ,  0.96610169,  0.875     ],
[ 0.94444444,  0.25      ,  1.        ,  0.91666667],
[ 0.47222222,  0.08333333,  0.6779661 ,  0.58333333],
[ 0.72222222,  0.5       ,  0.79661017,  0.91666667],
[ 0.36111111,  0.33333333,  0.66101695,  0.79166667],
[ 0.94444444,  0.33333333,  0.96610169,  0.79166667],
[ 0.55555556,  0.29166667,  0.66101695,  0.70833333],
[ 0.66666667,  0.54166667,  0.79661017,  0.83333333],
[ 0.80555556,  0.5       ,  0.84745763,  0.70833333],
[ 0.52777778,  0.33333333,  0.6440678 ,  0.70833333],
[ 0.5       ,  0.41666667,  0.66101695,  0.70833333],
[ 0.58333333,  0.33333333,  0.77966102,  0.83333333],
[ 0.80555556,  0.41666667,  0.81355932,  0.625     ],
[ 0.86111111,  0.33333333,  0.86440678,  0.75      ],
[ 1.        ,  0.75      ,  0.91525424,  0.79166667],
[ 0.58333333,  0.33333333,  0.77966102,  0.875     ],
[ 0.55555556,  0.33333333,  0.69491525,  0.58333333],
[ 0.5       ,  0.25      ,  0.77966102,  0.54166667],
[ 0.94444444,  0.41666667,  0.86440678,  0.91666667],
[ 0.55555556,  0.58333333,  0.77966102,  0.95833333],
[ 0.58333333,  0.45833333,  0.76271186,  0.70833333],
[ 0.47222222,  0.41666667,  0.6440678 ,  0.70833333],
[ 0.72222222,  0.45833333,  0.74576271,  0.83333333],
[ 0.66666667,  0.45833333,  0.77966102,  0.95833333],
[ 0.72222222,  0.45833333,  0.69491525,  0.91666667],
[ 0.41666667,  0.29166667,  0.69491525,  0.75      ],
[ 0.69444444,  0.5       ,  0.83050847,  0.91666667],
[ 0.66666667,  0.54166667,  0.79661017,  1.        ],
[ 0.66666667,  0.41666667,  0.71186441,  0.91666667],
[ 0.55555556,  0.20833333,  0.6779661 ,  0.75      ],
[ 0.61111111,  0.41666667,  0.71186441,  0.79166667],
[ 0.52777778,  0.58333333,  0.74576271,  0.91666667],
[ 0.44444444,  0.41666667,  0.69491525,  0.70833333]])}

As another example, we can also wrap the demultiplexer object previously defined:

In [72]:
wrapped_demultiplexer = wrap_adaptee_in_process(demultiplexer)
wrapped_demultiplexer.predict(X=X, selection=y)

Out[72]:
{'X_0':       0    1    2    3
0   5.1  3.5  1.4  0.2
1   4.9  3.0  1.4  0.2
2   4.7  3.2  1.3  0.2
3   4.6  3.1  1.5  0.2
4   5.0  3.6  1.4  0.2
5   5.4  3.9  1.7  0.4
6   4.6  3.4  1.4  0.3
7   5.0  3.4  1.5  0.2
8   4.4  2.9  1.4  0.2
9   4.9  3.1  1.5  0.1
10  5.4  3.7  1.5  0.2
11  4.8  3.4  1.6  0.2
12  4.8  3.0  1.4  0.1
13  4.3  3.0  1.1  0.1
14  5.8  4.0  1.2  0.2
15  5.7  4.4  1.5  0.4
16  5.4  3.9  1.3  0.4
17  5.1  3.5  1.4  0.3
18  5.7  3.8  1.7  0.3
19  5.1  3.8  1.5  0.3
20  5.4  3.4  1.7  0.2
21  5.1  3.7  1.5  0.4
22  4.6  3.6  1.0  0.2
23  5.1  3.3  1.7  0.5
24  4.8  3.4  1.9  0.2
25  5.0  3.0  1.6  0.2
26  5.0  3.4  1.6  0.4
27  5.2  3.5  1.5  0.2
28  5.2  3.4  1.4  0.2
29  4.7  3.2  1.6  0.2
30  4.8  3.1  1.6  0.2
31  5.4  3.4  1.5  0.4
32  5.2  4.1  1.5  0.1
33  5.5  4.2  1.4  0.2
34  4.9  3.1  1.5  0.1
35  5.0  3.2  1.2  0.2
36  5.5  3.5  1.3  0.2
37  4.9  3.1  1.5  0.1
38  4.4  3.0  1.3  0.2
39  5.1  3.4  1.5  0.2
40  5.0  3.5  1.3  0.3
41  4.5  2.3  1.3  0.3
42  4.4  3.2  1.3  0.2
43  5.0  3.5  1.6  0.6
44  5.1  3.8  1.9  0.4
45  4.8  3.0  1.4  0.3
46  5.1  3.8  1.6  0.2
47  4.6  3.2  1.4  0.2
48  5.3  3.7  1.5  0.2
49  5.0  3.3  1.4  0.2, 'X_1':       0    1    2    3
50  7.0  3.2  4.7  1.4
51  6.4  3.2  4.5  1.5
52  6.9  3.1  4.9  1.5
53  5.5  2.3  4.0  1.3
54  6.5  2.8  4.6  1.5
55  5.7  2.8  4.5  1.3
56  6.3  3.3  4.7  1.6
57  4.9  2.4  3.3  1.0
58  6.6  2.9  4.6  1.3
59  5.2  2.7  3.9  1.4
60  5.0  2.0  3.5  1.0
61  5.9  3.0  4.2  1.5
62  6.0  2.2  4.0  1.0
63  6.1  2.9  4.7  1.4
64  5.6  2.9  3.6  1.3
65  6.7  3.1  4.4  1.4
66  5.6  3.0  4.5  1.5
67  5.8  2.7  4.1  1.0
68  6.2  2.2  4.5  1.5
69  5.6  2.5  3.9  1.1
70  5.9  3.2  4.8  1.8
71  6.1  2.8  4.0  1.3
72  6.3  2.5  4.9  1.5
73  6.1  2.8  4.7  1.2
74  6.4  2.9  4.3  1.3
75  6.6  3.0  4.4  1.4
76  6.8  2.8  4.8  1.4
77  6.7  3.0  5.0  1.7
78  6.0  2.9  4.5  1.5
79  5.7  2.6  3.5  1.0
80  5.5  2.4  3.8  1.1
81  5.5  2.4  3.7  1.0
82  5.8  2.7  3.9  1.2
83  6.0  2.7  5.1  1.6
84  5.4  3.0  4.5  1.5
85  6.0  3.4  4.5  1.6
86  6.7  3.1  4.7  1.5
87  6.3  2.3  4.4  1.3
88  5.6  3.0  4.1  1.3
89  5.5  2.5  4.0  1.3
90  5.5  2.6  4.4  1.2
91  6.1  3.0  4.6  1.4
92  5.8  2.6  4.0  1.2
93  5.0  2.3  3.3  1.0
94  5.6  2.7  4.2  1.3
95  5.7  3.0  4.2  1.2
96  5.7  2.9  4.2  1.3
97  6.2  2.9  4.3  1.3
98  5.1  2.5  3.0  1.1
99  5.7  2.8  4.1  1.3, 'X_2':        0    1    2    3
100  6.3  3.3  6.0  2.5
101  5.8  2.7  5.1  1.9
102  7.1  3.0  5.9  2.1
103  6.3  2.9  5.6  1.8
104  6.5  3.0  5.8  2.2
105  7.6  3.0  6.6  2.1
106  4.9  2.5  4.5  1.7
107  7.3  2.9  6.3  1.8
108  6.7  2.5  5.8  1.8
109  7.2  3.6  6.1  2.5
110  6.5  3.2  5.1  2.0
111  6.4  2.7  5.3  1.9
112  6.8  3.0  5.5  2.1
113  5.7  2.5  5.0  2.0
114  5.8  2.8  5.1  2.4
115  6.4  3.2  5.3  2.3
116  6.5  3.0  5.5  1.8
117  7.7  3.8  6.7  2.2
118  7.7  2.6  6.9  2.3
119  6.0  2.2  5.0  1.5
120  6.9  3.2  5.7  2.3
121  5.6  2.8  4.9  2.0
122  7.7  2.8  6.7  2.0
123  6.3  2.7  4.9  1.8
124  6.7  3.3  5.7  2.1
125  7.2  3.2  6.0  1.8
126  6.2  2.8  4.8  1.8
127  6.1  3.0  4.9  1.8
128  6.4  2.8  5.6  2.1
129  7.2  3.0  5.8  1.6
130  7.4  2.8  6.1  1.9
131  7.9  3.8  6.4  2.0
132  6.4  2.8  5.6  2.2
133  6.3  2.8  5.1  1.5
134  6.1  2.6  5.6  1.4
135  7.7  3.0  6.1  2.3
136  6.3  3.4  5.6  2.4
137  6.4  3.1  5.5  1.8
138  6.0  3.0  4.8  1.8
139  6.9  3.1  5.4  2.1
140  6.7  3.1  5.6  2.4
141  6.9  3.1  5.1  2.3
142  5.8  2.7  5.1  1.9
143  6.8  3.2  5.9  2.3
144  6.7  3.3  5.7  2.5
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8}

Those users implementing their own custom blocks may find useful the option of providing their own custom class to the wrap_adaptee_in_process, as in:

In [73]:
wrapped_demultiplexer = wrap_adaptee_in_process(adaptee=demultiplexer,
wrapped_demultiplexer.predict(X=X, selection=y)

Out[73]:
{'X_0':       0    1    2    3
0   5.1  3.5  1.4  0.2
1   4.9  3.0  1.4  0.2
2   4.7  3.2  1.3  0.2
3   4.6  3.1  1.5  0.2
4   5.0  3.6  1.4  0.2
5   5.4  3.9  1.7  0.4
6   4.6  3.4  1.4  0.3
7   5.0  3.4  1.5  0.2
8   4.4  2.9  1.4  0.2
9   4.9  3.1  1.5  0.1
10  5.4  3.7  1.5  0.2
11  4.8  3.4  1.6  0.2
12  4.8  3.0  1.4  0.1
13  4.3  3.0  1.1  0.1
14  5.8  4.0  1.2  0.2
15  5.7  4.4  1.5  0.4
16  5.4  3.9  1.3  0.4
17  5.1  3.5  1.4  0.3
18  5.7  3.8  1.7  0.3
19  5.1  3.8  1.5  0.3
20  5.4  3.4  1.7  0.2
21  5.1  3.7  1.5  0.4
22  4.6  3.6  1.0  0.2
23  5.1  3.3  1.7  0.5
24  4.8  3.4  1.9  0.2
25  5.0  3.0  1.6  0.2
26  5.0  3.4  1.6  0.4
27  5.2  3.5  1.5  0.2
28  5.2  3.4  1.4  0.2
29  4.7  3.2  1.6  0.2
30  4.8  3.1  1.6  0.2
31  5.4  3.4  1.5  0.4
32  5.2  4.1  1.5  0.1
33  5.5  4.2  1.4  0.2
34  4.9  3.1  1.5  0.1
35  5.0  3.2  1.2  0.2
36  5.5  3.5  1.3  0.2
37  4.9  3.1  1.5  0.1
38  4.4  3.0  1.3  0.2
39  5.1  3.4  1.5  0.2
40  5.0  3.5  1.3  0.3
41  4.5  2.3  1.3  0.3
42  4.4  3.2  1.3  0.2
43  5.0  3.5  1.6  0.6
44  5.1  3.8  1.9  0.4
45  4.8  3.0  1.4  0.3
46  5.1  3.8  1.6  0.2
47  4.6  3.2  1.4  0.2
48  5.3  3.7  1.5  0.2
49  5.0  3.3  1.4  0.2, 'X_1':       0    1    2    3
50  7.0  3.2  4.7  1.4
51  6.4  3.2  4.5  1.5
52  6.9  3.1  4.9  1.5
53  5.5  2.3  4.0  1.3
54  6.5  2.8  4.6  1.5
55  5.7  2.8  4.5  1.3
56  6.3  3.3  4.7  1.6
57  4.9  2.4  3.3  1.0
58  6.6  2.9  4.6  1.3
59  5.2  2.7  3.9  1.4
60  5.0  2.0  3.5  1.0
61  5.9  3.0  4.2  1.5
62  6.0  2.2  4.0  1.0
63  6.1  2.9  4.7  1.4
64  5.6  2.9  3.6  1.3
65  6.7  3.1  4.4  1.4
66  5.6  3.0  4.5  1.5
67  5.8  2.7  4.1  1.0
68  6.2  2.2  4.5  1.5
69  5.6  2.5  3.9  1.1
70  5.9  3.2  4.8  1.8
71  6.1  2.8  4.0  1.3
72  6.3  2.5  4.9  1.5
73  6.1  2.8  4.7  1.2
74  6.4  2.9  4.3  1.3
75  6.6  3.0  4.4  1.4
76  6.8  2.8  4.8  1.4
77  6.7  3.0  5.0  1.7
78  6.0  2.9  4.5  1.5
79  5.7  2.6  3.5  1.0
80  5.5  2.4  3.8  1.1
81  5.5  2.4  3.7  1.0
82  5.8  2.7  3.9  1.2
83  6.0  2.7  5.1  1.6
84  5.4  3.0  4.5  1.5
85  6.0  3.4  4.5  1.6
86  6.7  3.1  4.7  1.5
87  6.3  2.3  4.4  1.3
88  5.6  3.0  4.1  1.3
89  5.5  2.5  4.0  1.3
90  5.5  2.6  4.4  1.2
91  6.1  3.0  4.6  1.4
92  5.8  2.6  4.0  1.2
93  5.0  2.3  3.3  1.0
94  5.6  2.7  4.2  1.3
95  5.7  3.0  4.2  1.2
96  5.7  2.9  4.2  1.3
97  6.2  2.9  4.3  1.3
98  5.1  2.5  3.0  1.1
99  5.7  2.8  4.1  1.3, 'X_2':        0    1    2    3
100  6.3  3.3  6.0  2.5
101  5.8  2.7  5.1  1.9
102  7.1  3.0  5.9  2.1
103  6.3  2.9  5.6  1.8
104  6.5  3.0  5.8  2.2
105  7.6  3.0  6.6  2.1
106  4.9  2.5  4.5  1.7
107  7.3  2.9  6.3  1.8
108  6.7  2.5  5.8  1.8
109  7.2  3.6  6.1  2.5
110  6.5  3.2  5.1  2.0
111  6.4  2.7  5.3  1.9
112  6.8  3.0  5.5  2.1
113  5.7  2.5  5.0  2.0
114  5.8  2.8  5.1  2.4
115  6.4  3.2  5.3  2.3
116  6.5  3.0  5.5  1.8
117  7.7  3.8  6.7  2.2
118  7.7  2.6  6.9  2.3
119  6.0  2.2  5.0  1.5
120  6.9  3.2  5.7  2.3
121  5.6  2.8  4.9  2.0
122  7.7  2.8  6.7  2.0
123  6.3  2.7  4.9  1.8
124  6.7  3.3  5.7  2.1
125  7.2  3.2  6.0  1.8
126  6.2  2.8  4.8  1.8
127  6.1  3.0  4.9  1.8
128  6.4  2.8  5.6  2.1
129  7.2  3.0  5.8  1.6
130  7.4  2.8  6.1  1.9
131  7.9  3.8  6.4  2.0
132  6.4  2.8  5.6  2.2
133  6.3  2.8  5.1  1.5
134  6.1  2.6  5.6  1.4
135  7.7  3.0  6.1  2.3
136  6.3  3.4  5.6  2.4
137  6.4  3.1  5.5  1.8
138  6.0  3.0  4.8  1.8
139  6.9  3.1  5.4  2.1
140  6.7  3.1  5.6  2.4
141  6.9  3.1  5.1  2.3
142  5.8  2.7  5.1  1.9
143  6.8  3.2  5.9  2.3
144  6.7  3.3  5.7  2.5
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8}

Passing an already wrapped object to PipeGraph's constructor steps parameter by using the wrap_adaptee_in_process as describe above may be useful for those custom blocks built by users, thus avoiding the temptation of modifying the pipegraph.base.strategies_for_custom_adaptees dictionary.

## Writing custom objects¶

Custom blocks are easy to write. By using sklearn.base.BaseEstimator as parent class the user will normally need to perform these three operations:

• Write a __init__ method following scikit-learn rules, essentially storing in attributes the configuration parameters given to __init__ with the same name, not different ones. If no configuration parameters are needed, then this step can be omitted.

• Write a fit method. In the case of transformer object this method will just return self to allow chaining commands.

• Write a predict method in charge of performing the transformation given the inputs passed by the *pargs and kwargs arguments.

In those cases in which this objects behave like normal scikit-learn objects, they can be used seamlessly as steps in pipegraph without further modifications. If, instead, they provide multiple output variables stored in a dictionary, they will have to be adapted by using the pipegraph.adapters.AdapterForCustomFitPredictWithDictionaryOutputAdaptee as explained before.

Let's see the source code of several of the custom objects mentioned in this user guide.

#### Example: CustomPower¶

In [74]:
from pipegraph.demo_blocks import CustomPower
print(inspect.getsource(CustomPower))

class CustomPower(BaseEstimator):
""" Raises X data to power defined such as range as parameter
Parameters
----------
power : range of integers for the powering operation
"""

def __init__(self, power=1):
self.power = power

def fit(self):
""""
Returns
-------
self : returns an instance of _CustomPower.
"""
return self

def predict(self, X):
""""
Parameters
----------
X: iterable
Data to power.
Returns
-------
result of raising power operation
"""
return X.values.reshape(-1, ) ** self.power



### Example: Concatenator¶

In [75]:
from pipegraph.base import Concatenator
print(inspect.getsource(Concatenator))

class Concatenator(BaseEstimator):
"""
Concatenate a set of data
"""

def fit(self):
"""Fit method that does, in this case, nothing but returning self.
Returns
-------
self : returns an instance of _Concatenator.
"""
return self

def predict(self, **kwargs):
"""Check the input data type for correct concatenating.

Parameters
----------
**kwargs :  sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse
matrices or pandas dataframes.
Data to concatenate.
Returns
-------
Pandas series or Pandas DataFrame with the data input concatenated
"""
df_list = []
for name in sorted(kwargs.keys()):
item = kwargs[name]
if isinstance(item, pd.Series) or isinstance(item, pd.DataFrame):
df_list.append(item)
else:
df_list.append(pd.DataFrame(data=item))
return pd.concat(df_list, axis=1)



### Example: Demultiplexer¶

In [76]:
from pipegraph.base import Demultiplexer
print(inspect.getsource(Demultiplexer))

class Demultiplexer(BaseEstimator):
""" Slice data-input data in columns
Parameters
----------
mapping : list. Each element contains data-column and the new name assigned
"""

def fit(self):
return self

def predict(self, **kwargs):
selection = kwargs.pop('selection')
result = dict()
for variable, value in kwargs.items():
for class_number in set(selection):
if value is None:
result[variable + "_" + str(class_number)] = None
else:
result[variable + "_" + str(class_number)] = pd.DataFrame(value).loc[selection == class_number, :]
return result



### Example: Multiplexer¶

In [77]:
from pipegraph.base import Multiplexer
print(inspect.getsource(Multiplexer))

class Multiplexer(BaseEstimator):
def fit(self):
return self

def predict(self, **kwargs):
# list of arrays with the same dimension
selection = kwargs['selection']
array_list = [pd.DataFrame(data=kwargs[str(class_number)],
index=np.flatnonzero(selection == class_number))
for class_number in set(selection)]
result = pd.concat(array_list, axis=0).sort_index()
return result



## pipegraph main interface: The PipeGraphRegressor and PipeGraphClassifier classes¶

pipegraph provides the user two main classes: PipeGraphRegressor and PipeGraphClassifier. They both provide a familiar interface to the raw PipeGraph class that most users will not need to use. The PipeGraph class provides greater versatility allowing an arbitrary number of inputs and outputs and may be the base class for those users facing applications with such special needs. Most users, though, will be happy using just the former two classes provided as main interface to operate the library.

As the names intend to imply, PipeGraphRegressor is the class to use for regression models and PipeGraphClassifier is intended for classification problems. Indeed, the only difference between these two classes is the default scoring function that has been chosen accordingly to scikit-learn defaults for each case. Apart from that, both classes share the same code. It must be noticed though, that any of these classes can comprise a plethora of different regressors or clasiffiers. It is the final step the one that will define whether we are defining a classification or regression problem.

## From a single path workflow to a graph with multiple paths: Understanding connections¶

Theses two classes provide an interface as similar to scikit-learn's Pipeline as possible in order to ease their use to those already familiar with scikit-learn. There is a slight but important difference that empowers these two classes: the PipeGraph related classes accept extra information about which input variables are needed by each step, thus allowing multiple path workflows.

To clarify the usage of these connections, let's start using pipegraph with a simple example that could be otherwise perfectly expressed using a scikit-learn's Pipeline as well. In this simple case, the data is transformed using a MinMaxScaler transformer and the preprocessed data is fed to a LinearRegression model. Figure 1 shows the steps of this PipeGraphRegressor and the connections between them: which input variables each one accepts and their origin, that is, if they are provided by a previous step, like the output of scaler, named predict, that is used by linear_model's X variable; or y which is not calculated by any previous block but is passed by the user in the fit or predict method calls.

Figure 1. PipeGraph diagram showing the steps and their connections

In this first simple example of pipegraph the last step is a regressor, and thus the PipeGraphRegressor class is the most adequate class to choose. But other than that, we define the steps as usual for a standard Pipeline: as a list of tuples (label, sklearn object). We are not introducing yet any information at all about the connections, in which case the PipeGraphRegressor object is built considering that the steps follow a linear workflow in the same way as a standard Pipeline.

In [78]:
from pipegraph import PipeGraphRegressor

X = 2*np.random.rand(100,1)-1
y = 40 * X**5 + 3*X*2 +  3*X + 3*np.random.randn(100,1)

scaler = MinMaxScaler()
linear_model = LinearRegression()
steps = [('scaler', scaler),
('linear_model', linear_model)]

pgraph = PipeGraphRegressor(steps=steps)
pgraph.fit(X, y)

Out[78]:
PipeGraphRegressor(fit_connections={'scaler': {'X': 'X'}, 'linear_model': {'X': ('scaler', 'predict'), 'y': 'y'}},
log_level=None,
predict_connections={'scaler': {'X': 'X'}, 'linear_model': {'X': ('scaler', 'predict'), 'y': 'y'}},
steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('linear_model', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])

As the printed output shows, the internal links displayed by the fit_connections and predict_connections parameters are in line with those we saw in Figure 1 and those expected by a single path pipeline. As we did not specify these values, they were created by PipeGRaphRegressor.__init__() method as a comodity. We can have a look at these values by directly inspecting the attributes values. As PipeGraphRegressor and PipeGraphClassifier are wrappers of a PipeGraph object stored in the _pipegraph attribute, we have to dig a bit deeper to find the fit_connections

In [79]:
pgraph._pipegraph.fit_connections

Out[79]:
{'linear_model': {'X': ('scaler', 'predict'), 'y': 'y'}, 'scaler': {'X': 'X'}}

Figure 2 surely will help understading the syntax used by the connections dictionary. It goes like this:

• The keys of the top level entries of the dictionary must be the same as those of the previously defined steps.
• The values assocciated to these keys define the variables from other steps that are going to be considered as inputs for the current step. They are dictionaries themselves, where:

• The keys of the nested dictionary represent the input variables as named at the current step.
• The values assocciated to these keys define the steps that hold the desired information and the variables as named at that step. This information can be written as:

• A tuple with the label of the step in position 0 followed by the name of the output variable in position 1.
• A string:
• If the string value is one of the labels from the steps, then it is interpreted as tuple, as previously, with the label of the step in position 0 and 'predict' as name of the output variable in position 1.
• Otherwise, it is considered to be a variable from an external source, such as those provided by the user while invoking the fit, predict or fit_predict methods.

Figure 2. Illustration of the connections of the PipeGraph

The choice of name 'predict' for default output variables was made for convenience reasons as it will be illustrated later on. The developers preferred using always the same word for every block even though it might not be a regressor nor a classifier.

Finally, let's get the predicted values from this PipeGraphRegressor for illustrative purposes:

In [80]:
y_pred = pgraph.predict(X)
plt.scatter(X, y, label='Original Data')

plt.scatter(X, y_pred, label='Predicted Data')
plt.title('Plots of original and predicted data')
plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Index')
plt.ylabel('Value of Data')
plt.show()


## GridSearchCV compatibility requirements¶

Both PipeGraphRegressorand PipeGraphClassifier are compatible with GridSearchCV provided the last step can be scored, either:

• by using PipeGraphRegressor or PipeGraphClassifier default scoring functions,
• by implementing a custom scoring function capable of handling that last step inputs and outputs,
• by using a NeutralRegressor or NeutralClassifier block as final step.

Those pipegraphs with a last step from scikit-learn's estimators set will work perfectly well using PipeGraphRegressor or PipeGraphClassifier default scoring functions. The other two alternative cover those cases in which a custom block with non standard inputs is provided. In that case, choosing a neutral regressor or classifier is usually a much simpler approach than writing customs scoring function. NeutralRegressor or NeutralClassifier are two classes provided for users convenience so that no special scoring function is needed. They just allow the user to pick some variables from other previous steps as X and y and provide compatibility to use a default scoring function.

### Example using default scoring functions¶

We will show more complex examples in what follows, but let's first illustrate with a simple example how to use GrisSearchCV with the default scoring functions. Figure 3 shows the steps of the model:

• scaler: a preprocessing step using a MinMaxScaler object,
• polynomial_features: a transformer step that generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified one,
• linear_model: the LinearRegression object we want to fit.

Figure 3. Using a PipeGraphRegressor object as estimator by GridSearchCV

Firstly, we import the necessary libraries and create some artificial data.

In [81]:
from sklearn.preprocessing import PolynomialFeatures

X = 2*np.random.rand(100,1)-1
y = 40 * X**5 + 3*X*2 +  3*X + 3*np.random.randn(100,1)

scaler = MinMaxScaler()
polynomial_features = PolynomialFeatures()
linear_model = LinearRegression()


Secondly, we define the steps and a param_grid dictionary as specified by GridSearchCV. In this case we just want to explore a few possibilities varying the degree of the polynomials and whether to use or not an intercept at the linear model.

In [82]:
steps = [('scaler', scaler),
('polynomial_features', polynomial_features),
('linear_model', linear_model)]

param_grid = {'polynomial_features__degree': range(1, 11),
'linear_model__fit_intercept': [True, False]}


Now, we use PipeGraphRegressor as estimator for GridSearchCV and perform the fit and predict operations. As the last steps, a linear regressor from scikit-learn, already works with the default scoring functions, no extra efforts are needed to make it compatible with GridSearchCV.

In [83]:
pgraph = PipeGraphRegressor(steps=steps)
grid_search_regressor = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
grid_search_regressor.fit(X, y)
y_pred = grid_search_regressor.predict(X)

plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()

coef = grid_search_regressor.best_estimator_.get_params()['linear_model'].coef_
degree = grid_search_regressor.best_estimator_.get_params()['polynomial_features'].degree

print('Information about the parameters of the best estimator: \n degree: {} \n coefficients: {} '.format(degree, coef))

Information about the parameters of the best estimator:
degree: 5
coefficients: [[    0.           380.45666108 -1507.75777671  3160.65259179
-3282.41255035  1335.17757867]]


This example showed how to use GridSearchCV with PipeGraphRegressor in a simple single path workflow with default scoring functions. Let's explore in next section a more complex example.

## Multiple path workflow examples¶

Untill now, all the examples we showed displayed a single path sequence of steps and thus they could have been equally easily done using sickit-learn standard Pipeline. We are going to show now in the following examples multiple path cases in which we illustrate some compatibility constrains that occur and how to deal with them successfully.

### Example: Injecting a varying vector in the sample_weight parameter of LinearRegression¶

This example illustrates the case in which a varying vector is injected to a linear regression model as sample_weight in order to evaluate them and obtain the sample_weight that generates the best results.

The steps of this model are shown in Figure 4. To perform such experiment, the following issues appear:

• The shape of the graph is not a single path workflow as those that can be implemented using Pipeline. Thus, we need to use pipegraph.

• The model has 3 input variables, X, y, and sample_weight. The Pipegraph class can accept an arbitrary number of input variables, but, in order to use scikit-learn's current implementation of GridSearchCV, only X and y are accepted. We can do the trick but previously concatenating X and sample_weight into a single pandas DataFrame, for example, in order to comply with GridSearchCV requisites. That implies that the graph must be capable of separating afterwards the augmented X into the two components again. The selector step is in charge of this splitting. This step features a ColumnSelector custom step. This is not a scikit-learn original object but a custom class that allows to split an array into columns. In this case, X augmented data is column-wise divided as specified in a mapping dictionary. We will talk later on about custom blocks.

• The information provided to the sample_weight parameter of the LinearRegression step varies on the different scenarios explored by GridSearchCV. In a GridSearchCV with Pipeline, sample_weight can't vary because it is treated as a fit_param instead of a variable. Using pipegraph's connections this is no longer a problem.

• As we need a custom transformer to apply the power function to the sample_weight vector, we implement the custom_power step featuring a CustomPower custom class. Again, we will talk later on about custom blocks.

The three other steps from the model are already known:

• scaler: implements MinMaxScaler class
• polynomial_features: Contains a PolynomialFeatures object
• linear_model: Contains a LinearRegression model

Figure 4. A multipath model

Let's import the new components:

In [84]:
import pandas as pd
from pipegraph.base import ColumnSelector
from pipegraph.demo_blocks import CustomPower


We create an augmented X in which all data but y is concatenated. In this case, we concatenate X and sample_weight vector.

In [85]:
X = pd.DataFrame(dict(X=np.array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11]),
sample_weight=np.array([0.01, 0.95, 0.10, 0.95, 0.95, 0.10, 0.10, 0.95, 0.95, 0.95, 0.01])))
y = np.array(                    [  10,    4,   20,   16,   25 , -60,   85,   64,   81,  100,  150])


Next we define the steps and we use PipeGraphRegressor as estimator for GridSearchCV.

In [86]:
scaler = MinMaxScaler()
polynomial_features = PolynomialFeatures()
linear_model = LinearRegression()
custom_power = CustomPower()
selector = ColumnSelector(mapping={'X': slice(0, 1),
'sample_weight': slice(1,2)})

steps = [('selector', selector),
('custom_power', custom_power),
('scaler', scaler),
('polynomial_features', polynomial_features),
('linear_model', linear_model)]

pgraph = PipeGraphRegressor(steps=steps)


Now, we have to define the connections of the model. We could have specified a dictionary containing the connections, but as suggested by Joel Nothman, scikit-learn users might find more convenient to use a method inject like in this example. Let's see injects docstring:

In [87]:
import inspect
print(inspect.getdoc(pgraph.inject))

Adds a connection to the graph.

Parameters:
-----------
sink: Destination
sink_var: Name of the variable at destination that is going to hold the information
source: Origin
source_var: Name of the variable at origin holding the information
into: This can be either 'fit' or 'predict', indicating which connections are described: those belonging
to 'fit_connections' or those belonging to 'predict_connections'. Default is 'fit'.

Returns:
--------
self:  PipeGraphRegressor
Returning self allows chaining operations


inject allows to chain different calls to progressively describe all the connections needed in an easy to read manner:

In [88]:
(pgraph.inject(sink='selector', sink_var='X', source='_External', source_var='X')
.inject('custom_power', 'X', 'selector', 'sample_weight')
.inject('scaler', 'X', 'selector', 'X')
.inject('polynomial_features', 'X', 'scaler')
.inject('linear_model', 'X',  'polynomial_features')
.inject('linear_model', 'y', source_var='y')
.inject('linear_model', 'sample_weight', 'custom_power'))

Out[88]:
PipeGraphRegressor(fit_connections={'selector': {'X': ('_External', 'X')}, 'custom_power': {'X': ('selector', 'sample_weight')}, 'scaler': {'X': ('selector', 'X')}, 'polynomial_features': {'X': ('scaler', 'predict')}, 'linear_model': {'X': ('polynomial_features', 'predict'), 'y': ('_External', 'y'), 'sample_weight': ('custom_power', 'predict')}},
log_level=None, predict_connections={},
steps=[('selector', ColumnSelector(mapping={'X': slice(0, 1, None), 'sample_weight': slice(1, 2, None)})), ('custom_power', CustomPower(power=1)), ('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('polynomial_features', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)), ('linear_model', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])

Then we define param_grid as expected by GridSearchCV to explore several possibilities of varying parameters.

In [89]:
param_grid = {'polynomial_features__degree': range(1, 3),
'linear_model__fit_intercept': [True, False],
'custom_power__power': [1, 5, 10, 20, 30]}

grid_search_regressor = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
grid_search_regressor.fit(X, y)
y_pred = grid_search_regressor.predict(X)

plt.scatter(X.loc[:,'X'], y)
plt.scatter(X.loc[:,'X'], y_pred)
plt.show()

power = grid_search_regressor.best_estimator_.get_params()['custom_power']
print('Power that obtains the best results in the linear model: \n {}'.format(power))

Power that obtains the best results in the linear model:
CustomPower(power=20)


This example showed how to solve current limitations of scikit-learn Pipeline:

• Displayed a multipath workflow successfully implemented by pipegraph
• Showed how to circumvent current limitations of standard GridSearchCV, in particular, the restriction on the number of input parameters
• Showed the flexibility of pipegraph for specifying the connections in an easy to read manner using the inject method
• Demonstrated the capability of injecting previous steps' output into other models parameters, such as it is the case of the sample_weight parameter in the linear regressor.

### Example: Combination of classifiers¶

A set of classifiers is combined as input to a neural network. Additionally, the scaled inputs are injected as well to the neural network. The data is firstly transformed by scaling its features.

Steps of the PipeGraph:

• scaler: A MinMaxScaler data preprocessor
• gaussian_nb: A GaussianNB classifier
• svc: A SVC classifier
• concat: A Concatenator custom class that appends the outputs of the GaussianNB, SVC classifiers, and the scaled inputs.
• mlp: A MLPClassifier object

Figure 5. PipeGraph diagram showing the steps and their connections

In [90]:
from pipegraph.base import PipeGraphClassifier, Concatenator
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

X = iris.data
y = iris.target

scaler = MinMaxScaler()
gaussian_nb = GaussianNB()
svc = SVC()
mlp = MLPClassifier()
concatenator = Concatenator()

steps = [('scaler', scaler),
('gaussian_nb', gaussian_nb),
('svc', svc),
('concat', concatenator),
('mlp', mlp)]


In this example we use a PipeGraphClassifier because the result is a classification and we want to take advantage of scikit-learn default scoring method for classifiers. Once more, we use the inject chain of calls to define the connections.

In [91]:
pgraph = PipeGraphClassifier(steps=steps)
(pgraph.inject(sink='scaler', sink_var='X', source='_External', source_var='X')
.inject('gaussian_nb', 'X', 'scaler')
.inject('gaussian_nb', 'y', source_var='y')
.inject('svc', 'X', 'scaler')
.inject('svc', 'y', source_var='y')
.inject('concat', 'X1', 'scaler')
.inject('concat', 'X2', 'gaussian_nb')
.inject('concat', 'X3', 'svc')
.inject('mlp', 'X', 'concat')
.inject('mlp', 'y', source_var='y')
)

param_grid = {'svc__C': [0.1, 0.5, 1.0],
'mlp__hidden_layer_sizes': [(3,), (6,), (9,),],
'mlp__max_iter': [5000, 10000]}

grid_search_classifier  = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
grid_search_classifier.fit(X, y)
y_pred = grid_search_classifier.predict(X)

grid_search_classifier.best_estimator_.get_params()

Out[91]:
{'concat': Concatenator(),
'fit_connections': {'concat': {'X1': ('scaler', 'predict'),
'X2': ('gaussian_nb', 'predict'),
'X3': ('svc', 'predict')},
'gaussian_nb': {'X': ('scaler', 'predict'), 'y': ('_External', 'y')},
'mlp': {'X': ('concat', 'predict'), 'y': ('_External', 'y')},
'scaler': {'X': ('_External', 'X')},
'svc': {'X': ('scaler', 'predict'), 'y': ('_External', 'y')}},
'gaussian_nb': GaussianNB(priors=None),
'gaussian_nb__priors': None,
'log_level': None,
'mlp': MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(3,), learning_rate='constant',
learning_rate_init=0.001, max_iter=10000, momentum=0.9,
nesterovs_momentum=True, power_t=0.5, random_state=None,
verbose=False, warm_start=False),
'mlp__activation': 'relu',
'mlp__alpha': 0.0001,
'mlp__batch_size': 'auto',
'mlp__beta_1': 0.9,
'mlp__beta_2': 0.999,
'mlp__early_stopping': False,
'mlp__epsilon': 1e-08,
'mlp__hidden_layer_sizes': (3,),
'mlp__learning_rate': 'constant',
'mlp__learning_rate_init': 0.001,
'mlp__max_iter': 10000,
'mlp__momentum': 0.9,
'mlp__nesterovs_momentum': True,
'mlp__power_t': 0.5,
'mlp__random_state': None,
'mlp__shuffle': True,
'mlp__tol': 0.0001,
'mlp__validation_fraction': 0.1,
'mlp__verbose': False,
'mlp__warm_start': False,
'predict_connections': {'concat': {'X1': ('scaler', 'predict'),
'X2': ('gaussian_nb', 'predict'),
'X3': ('svc', 'predict')},
'gaussian_nb': {'X': ('scaler', 'predict'), 'y': ('_External', 'y')},
'mlp': {'X': ('concat', 'predict'), 'y': ('_External', 'y')},
'scaler': {'X': ('_External', 'X')},
'svc': {'X': ('scaler', 'predict'), 'y': ('_External', 'y')}},
'scaler': MinMaxScaler(copy=True, feature_range=(0, 1)),
'scaler__copy': True,
'scaler__feature_range': (0, 1),
'steps': [('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
('gaussian_nb', GaussianNB(priors=None)),
('svc', SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)),
('concat', Concatenator()),
('mlp',
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(3,), learning_rate='constant',
learning_rate_init=0.001, max_iter=10000, momentum=0.9,
nesterovs_momentum=True, power_t=0.5, random_state=None,
verbose=False, warm_start=False))],
'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False),
'svc__C': 0.1,
'svc__cache_size': 200,
'svc__class_weight': None,
'svc__coef0': 0.0,
'svc__decision_function_shape': 'ovr',
'svc__degree': 3,
'svc__gamma': 'auto',
'svc__kernel': 'rbf',
'svc__max_iter': -1,
'svc__probability': False,
'svc__random_state': None,
'svc__shrinking': True,
'svc__tol': 0.001,
'svc__verbose': False}
In [92]:
# Code for plotting the confusion matrix taken from 'Python Data Science Handbook' by Jake VanderPlas
from sklearn.metrics import confusion_matrix
import seaborn as sns; sns.set()  # for plot styling

mat = confusion_matrix(y_pred, y)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False)
plt.xlabel('true label')
plt.ylabel('predicted label');
plt.show()


This example displayed complex data injections that are successfully managed by pipegraph.

### Example: Demultiplexor - multiplexor¶

An imaginative layout using a classifier to predict the cluster labels and fitting a separate model for each cluster. We will elaborate on this example in the examples that follow introducing variations. AS the Figure shows, the steps of the PipeGraph are:

• scaler: A :class:MinMaxScaler data preprocessor
• classifier: A :class:GaussianMixture classifier
• demux: A custom :class:Demultiplexer class in charge of splitting the input arrays accordingly to the selection input vector
• lm_0: A :class:LinearRegression model
• lm_1: A :class:LinearRegression model
• lm_2: A :class:LinearRegression model
• mux: A custom :class:Multiplexer class in charge of combining different input arrays into a single one accordingly to the selection input vector

Figure 6. PipeGraph diagram showing the steps and their connections

In [93]:
from pipegraph.base import PipeGraphRegressor, Demultiplexer, Multiplexer
from sklearn.mixture import GaussianMixture

X_first = pd.Series(np.random.rand(100,))
y_first = pd.Series(4 * X_first + 0.5*np.random.randn(100,))
X_second = pd.Series(np.random.rand(100,) + 3)
y_second = pd.Series(-4 * X_second + 0.5*np.random.randn(100,))
X_third = pd.Series(np.random.rand(100,) + 6)
y_third = pd.Series(2 * X_third + 0.5*np.random.randn(100,))

X = pd.concat([X_first, X_second, X_third], axis=0).to_frame()
y = pd.concat([y_first, y_second, y_third], axis=0).to_frame()

scaler = MinMaxScaler()
gaussian_mixture = GaussianMixture(n_components=3)
demux = Demultiplexer()
lm_0 = LinearRegression()
lm_1 = LinearRegression()
lm_2 = LinearRegression()
mux = Multiplexer()

steps = [('scaler', scaler),
('classifier', gaussian_mixture),
('demux', demux),
('lm_0', lm_0),
('lm_1', lm_1),
('lm_2', lm_2),
('mux', mux), ]


Instead of using inject as in previous example, in this one we are going to pass a dictionary describing the connections to PipeGraph constructor

In [95]:
connections = { 'scaler': {'X': 'X'},
'classifier': {'X': 'scaler'},
'demux': {'X': 'scaler',
'y': 'y',
'selection': 'classifier'},
'lm_0': {'X': ('demux', 'X_0'),
'y': ('demux', 'y_0')},
'lm_1': {'X': ('demux', 'X_1'),
'y': ('demux', 'y_1')},
'lm_2': {'X': ('demux', 'X_2'),
'y': ('demux', 'y_2')},
'mux': {'0': 'lm_0',
'1': 'lm_1',
'2': 'lm_2',
'selection': 'classifier'}}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)
pgraph.fit(X, y)

y_pred = pgraph.predict(X)
plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()


### Example: Encapsulating several blocks into a PipeGraph and reusing it¶

We consider the previous example in which we had the following pipegraph model:

We can be interested in using a fragment of the pipegraph, for example, those blocks marked with the circle (the Demultiplexer, the linear model collection, and the Multiplexer), as a single block in another pipegraph:

We prepare the data and build a PipeGraph with these steps alone:

In [97]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.mixture import GaussianMixture
from sklearn.linear_model import LinearRegression
from pipegraph.base import PipeGraph, PipeGraphRegressor, Demultiplexer, Multiplexer

# Prepare some artificial data

X_first = pd.Series(np.random.rand(100,))
y_first = pd.Series(4 * X_first + 0.5*np.random.randn(100,))
X_second = pd.Series(np.random.rand(100,) + 3)
y_second = pd.Series(-4 * X_second + 0.5*np.random.randn(100,))
X_third = pd.Series(np.random.rand(100,) + 6)
y_third = pd.Series(2 * X_third + 0.5*np.random.randn(100,))

X = pd.concat([X_first, X_second, X_third], axis=0).to_frame()
y = pd.concat([y_first, y_second, y_third], axis=0).to_frame()

In [98]:
# Create a single complex block

demux = Demultiplexer()
lm_0 = LinearRegression()
lm_1 = LinearRegression()
lm_2 = LinearRegression()
mux = Multiplexer()

three_multiplexed_models_steps = [
('demux', demux),
('lm_0', lm_0),
('lm_1', lm_1),
('lm_2', lm_2),
('mux', mux), ]

three_multiplexed_models_connections = {
'demux': {'X': 'X',
'y': 'y',
'selection': 'selection'},
'lm_0': {'X': ('demux', 'X_0'),
'y': ('demux', 'y_0')},
'lm_1': {'X': ('demux', 'X_1'),
'y': ('demux', 'y_1')},
'lm_2': {'X': ('demux', 'X_2'),
'y': ('demux', 'y_2')},
'mux': {'0': 'lm_0',
'1': 'lm_1',
'2': 'lm_2',
'selection': 'selection'}}

three_multiplexed_models = PipeGraph(steps=three_multiplexed_models_steps,
fit_connections=three_multiplexed_models_connections )


Now we can treat this PipeGraph as a reusable component and use it as a unitary step in another PipeGraph:

In [100]:
scaler = MinMaxScaler()
gaussian_mixture = GaussianMixture(n_components=3)
models = three_multiplexed_models

steps = [('scaler', scaler),
('classifier', gaussian_mixture),
('models', three_multiplexed_models), ]

connections = {'scaler': {'X': 'X'},
'classifier': {'X': 'scaler'},
'models': {'X': 'scaler',
'y': 'y',
'selection': 'classifier'},
}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)
pgraph.fit(X, y)
y_pred = pgraph.predict(X)
plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()


### Example: Dynamically built component using initialization parameters¶

Last section showed how the user can choose to encapsulate several blocks into a PipeGraph and use it as a single unit in another PipeGraph. Now we will see how these components can be dynamically built on runtime depending on initialization parameters.

We can think of programatically changing the number of regression models inside this component we isolated in the previous example. First we do it by using initialization parameters in a PipeGraph subclass we called pipegraph.base.RegressorsWithParametrizedNumberOfReplicas:

In [103]:
import inspect
from pipegraph.base import RegressorsWithParametrizedNumberOfReplicas

print(inspect.getsource(RegressorsWithParametrizedNumberOfReplicas))

class RegressorsWithParametrizedNumberOfReplicas(PipeGraph, RegressorMixin):
def __init__(self, number_of_replicas=1, model_prototype=LinearRegression(), model_parameters={}):
self.number_of_replicas = number_of_replicas
self.model_class = model_prototype
self.model_parameters = model_parameters

steps = ([('demux', Demultiplexer())] +
[('model_' + str(i), model_prototype.__class__(**model_parameters)) for i in range(number_of_replicas)] +
[('mux', Multiplexer())]
)

connections = dict(demux={'X': 'X',
'y': 'y',
'selection': 'selection'})

for i in range(number_of_replicas):
connections['model_' + str(i)] = {'X': ('demux', 'X_' + str(i)),
'y': ('demux', 'y_' + str(i))}

connections['mux'] = {str(i): ('model_' + str(i)) for i in range(number_of_replicas)}
connections['mux']['selection'] = 'selection'
super().__init__(steps=steps, fit_connections=connections)



As it can be seen from the source code, in this example we are basically interested in using a PipeGraph object whose __init__ has different parameters than the usual ones. Thus, we subclass PipeGRaph and reimplement the __init__ method. In doing so, we are capable of working out the structure of the steps and connections before calling the super().__init__ method that provides the regular PipeGraph object.

Using this new component we can build a PipeGraph with as many multiplexed models as given by the number_of_replicas parameter:

In [105]:
scaler = MinMaxScaler()
gaussian_mixture = GaussianMixture(n_components=3)
models = RegressorsWithParametrizedNumberOfReplicas(number_of_replicas=3,
model_prototype=LinearRegression(),
model_parameters={})

steps = [('scaler', scaler),
('classifier', gaussian_mixture),
('models', models), ]

connections = {'scaler': {'X': 'X'},
'classifier': {'X': 'scaler'},
'models': {'X': 'scaler',
'y': 'y',
'selection': 'classifier'},
}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)
pgraph.fit(X, y)
y_pred = pgraph.predict(X)
plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()


### Example: Dynamically built component using input signal values during the fit stage¶

Last example showed how to grow a PipeGraph object programatically during runtime using the __init__ method. In this example, we are going to show how we can change the internal structure of a PipeGraph object, not during initialization but during fit. Specifically, we will show how the multiplexed model can be dynamically added on runtime depending on input signal values during fit.

Now we consider the possibility of using the classifier's output to automatically adjust the number of replicas. This can be seen as PipeGraph changing its inner topology to adapt its connections and steps to other components context. This morphing capability opens interesting possibilities to explore indeed.

In [106]:
import inspect

class RegressorsWithDataDependentNumberOfReplicas(PipeGraph, RegressorMixin):
def __init__(self, model_prototype=LinearRegression(), model_parameters={}):
self.model_prototype = model_prototype
self.model_parameters = model_parameters
self._fit_data = {}
self._predict_data = {}
self.steps = []

def fit(self, *pargs, **kwargs):
number_of_replicas = len(set(kwargs['selection']))
self.steps = [('models', RegressorsWithParametrizedNumberOfReplicas(number_of_replicas=number_of_replicas,
model_parameters=self.model_parameters))]

self.fit_connections = dict(models={'X': 'X',
'y': 'y',
'selection': 'selection'})
self.predict_connections = self.fit_connections
self._fit_graph = build_graph(self.fit_connections)
self._predict_graph = build_graph(self.predict_connections)
super().fit(*pargs, **kwargs)
return self



Again we subclass from parent PipeGraph class and implement a different __init__. In this example we won't make use of a number_of_replicas parameter, as it will be inferred from data during fit and thus we are satisfied by passing only those parameters allowing us to change the regressor models. As it can be seen from the code, the __init__ method just stores the values provided by the user and it is the fit method the one in charge of growing the inner structure of the pipegraph.

Using this new component we can build a simplified PipeGraph:

In [108]:
scaler = MinMaxScaler()
gaussian_mixture = GaussianMixture(n_components=3)

steps = [('scaler', scaler),
('classifier', gaussian_mixture),
('models', models), ]

connections = {'scaler': {'X': 'X'},
'classifier': {'X': 'scaler'},
'models': {'X': 'scaler',
'y': 'y',
'selection': 'classifier'},
}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)
pgraph.fit(X, y)
y_pred = pgraph.predict(X)
plt.scatter(X, y)
plt.scatter(X, y_pred)
plt.show()


### Example: GridSearch on dynamically built component using input signal values¶

Previous example showed how a PipeGraph object can be dynamically built on runtime depending on input signal values during fit. Now, in this example we will show how to use GridSearchCV to explore the best combination of hyperparameters.

In [110]:
from sklearn.model_selection import train_test_split
from pipegraph.base import NeutralRegressor

# We prepare some data

X_first = pd.Series(np.random.rand(100,))
y_first = pd.Series(4 * X_first + 0.5*np.random.randn(100,))
X_second = pd.Series(np.random.rand(100,) + 3)
y_second = pd.Series(-4 * X_second + 0.5*np.random.randn(100,))
X_third = pd.Series(np.random.rand(100,) + 6)
y_third = pd.Series(2 * X_third + 0.5*np.random.randn(100,))

X = pd.concat([X_first, X_second, X_third], axis=0).to_frame()
y = pd.concat([y_first, y_second, y_third], axis=0).to_frame()

X_train, X_test, y_train, y_test = train_test_split(X, y)


To ease the calculation of the score for the GridSearchCV we add a neutral regressor as a last step, capable of calculating the score using a default scoring function. This is much more convenient than worrying about programming a custom scoring function for a block with an arbitrary number of inputs.

In [111]:
scaler = MinMaxScaler()
gaussian_mixture = GaussianMixture(n_components=3)
neutral_regressor = NeutralRegressor()

steps = [('scaler', scaler),
('classifier', gaussian_mixture),
('models', models),
('neutral', neutral_regressor)]

connections = {'scaler': {'X': 'X'},
'classifier': {'X': 'scaler'},
'models': {'X': 'scaler',
'y': 'y',
'selection': 'classifier'},
'neutral': {'X': 'models'}
}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)


Using GridSearchCV to find the best number of clusters and the best regressors

In [112]:
from sklearn.model_selection import GridSearchCV

param_grid = {'classifier__n_components': range(2,10)}
gs = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
gs.fit(X_train, y_train)
y_pred = gs.predict(X_train)
plt.scatter(X_train, y_train)
plt.scatter(X_train, y_pred)
print("Score:" , gs.score(X_test, y_test))
print("classifier__n_components:", gs.best_estimator_.get_params()['classifier__n_components'])

Score: 0.99806795623
classifier__n_components: 3


### Example: Alternative solution¶

Now we consider an alternative solution to the previous example. The solution already shown displayed the potential of being able to morph the graph during fitting. A simpler approach is considered in this example by reusing components and combining the classifier with the demultiplexed models.

In [113]:
from pipegraph.base import ClassifierAndRegressorsBundle

print(inspect.getsource(ClassifierAndRegressorsBundle))

class ClassifierAndRegressorsBundle(PipeGraph, RegressorMixin):
def __init__(self,
number_of_replicas=1,
classifier_prototype=GaussianMixture(),
classifier_parameters={},
model_prototype=LinearRegression(),
model_parameters={}):

self.number_of_replicas = number_of_replicas
self.classifier_prototype = classifier_prototype
self.classifier_parameters = classifier_parameters
self.model_prototype = model_prototype
self.model_parameters = model_parameters

steps = [('classifier', self.classifier_prototype.__class__(n_components=number_of_replicas, **classifier_parameters)) ,
('models', RegressorsWithParametrizedNumberOfReplicas(number_of_replicas=number_of_replicas,
model_prototype=model_prototype,
model_parameters=model_parameters))]
connections = dict(classifier={'X': 'X'},
models= {'X': 'X',
'y': 'y',
'selection': 'classifier'})
super().__init__(steps=steps, fit_connections=connections)



As before, we built a custom block by subclassing PipeGraph and the modifying the __init__ method to provide the parameters specifically needed for our purposes. Then we chain in the same PipeGraph the classifier, and the already available and known block for creating multiplexed models by providing parameters during __init__. It must be noticed that both the classifier and the models share have the same number of clusters and model: the number_of_replicas value provided by the user.

Using this new component we can build a simplified PipeGraph:

In [115]:
scaler = MinMaxScaler()
classifier_and_models = ClassifierAndRegressorsBundle(number_of_replicas=6)
neutral_regressor = NeutralRegressor()

steps = [('scaler', scaler),
('bundle', classifier_and_models),
('neutral', neutral_regressor)]

connections = {'scaler': {'X': 'X'},
'bundle': {'X': 'scaler', 'y': 'y'},
'neutral': {'X': 'bundle'}}

pgraph = PipeGraphRegressor(steps=steps, fit_connections=connections)


Using GridSearchCV to find the best number of clusters and the best regressors

In [116]:
from sklearn.model_selection import GridSearchCV

param_grid = {'bundle__number_of_replicas': range(3,10)}
gs = GridSearchCV(estimator=pgraph, param_grid=param_grid, refit=True)
gs.fit(X_train, y_train)
y_pred = gs.predict(X_train)
plt.scatter(X_train, y_train)
plt.scatter(X_train, y_pred)
print("Score:" , gs.score(X_test, y_test))
print("bundle__number_of_replicas:", gs.best_estimator_.get_params()['bundle__number_of_replicas'])

Score: 0.998049223614
bundle__number_of_replicas: 6


Coming soon!

Coming soon!