Research plan

  1. Feature and data explanation
  2. Primary data analysis
  3. Primary visual data analysis
  4. Insights and found dependencies
  5. Metrics selection
  6. Model selection
  7. Data preprocessing
  8. Cross-validation and adjustment of model hyperparameters
  9. Creation of new features and description of this process
  10. Plotting training and validation curves
  11. Prediction for test or hold-out samples
  12. Conclusions

Part 1. Feature and data explanation

There are provided hourly rental data spanning two years. For this competition, the training set is comprised of the first 16 days of each month, while the test set is the 17-19th day of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period. That is, predict "count" without using "count" or it's components "casual" and "registered".

Data Fields

  • datetime - hourly date + timestamp
  • season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
  • holiday - whether the day is considered a holiday
  • workingday - whether the day is neither a weekend nor holiday
  • weather -
    1. Clear, Few clouds, Partly cloudy, Partly cloudy
    2. Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    3. Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    4. Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp - temperature in Celsius
  • atemp - "feels like" temperature in Celsius
  • humidity - relative humidity
  • windspeed - wind speed
  • casual - number of non-registered user rentals initiated
  • registered - number of registered user rentals initiated
  • count - number of total rentals

Part 2. Primary data analysis

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import gc

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
from catboost import CatBoostRegressor, Pool, cv
from sklearn.metrics import mean_squared_error

import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

# Fixing random seed
np.random.seed(17)

print(os.listdir("../input"))
['tutorial_code.R', 'train_luc.csv', 'tutorial_code.ipynb', 'sample_prediction.csv', 'test_luc.csv']
In [2]:
# Fixing random seed
np.random.seed(17)
# Read data
data_df = pd.read_csv('../input/train_luc.csv')

# Convert to datetime
data_df['datetime'] = pd.to_datetime(data_df['datetime'])

# Sort by datetime
data_df.sort_values(by='datetime')

# Look at the first rows of the training set
data_df.head()
Out[2]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 3 13 16
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 8 32 40
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 5 27 32
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 3 10 13
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 0 1 1
In [3]:
data_df.shape
Out[3]:
(9174, 12)

Train contains 3 target columns: 'casual', 'registred', 'count'. Column 'count' is the sum of columns 'casual' and 'registred'. Check it:

In [4]:
(data_df['casual'] + data_df['registered'] - data_df['count']).value_counts()
Out[4]:
0    9174
dtype: int64
In [5]:
# Get info by train
data_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9174 entries, 0 to 9173
Data columns (total 12 columns):
datetime      9174 non-null datetime64[ns]
season        9174 non-null int64
holiday       9174 non-null int64
workingday    9174 non-null int64
weather       9174 non-null int64
temp          9174 non-null float64
atemp         9174 non-null float64
humidity      9174 non-null int64
windspeed     9174 non-null float64
casual        9174 non-null int64
registered    9174 non-null int64
count         9174 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(8)
memory usage: 860.1 KB

Excelent! We have not NaN.

In [6]:
# Get statistics by train_df
data_df.describe()
Out[6]:
season holiday workingday weather temp atemp humidity windspeed casual registered count
count 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000 9174.000000
mean 2.505559 0.031284 0.678875 1.414868 20.130401 23.578433 61.715064 12.737931 35.713647 154.868106 190.581753
std 1.116618 0.174094 0.466934 0.635363 7.940504 8.617957 19.401829 8.199027 49.667738 150.981155 181.011530
min 1.000000 0.000000 0.000000 1.000000 0.820000 0.760000 0.000000 0.000000 0.000000 0.000000 1.000000
25% 2.000000 0.000000 0.000000 1.000000 13.940000 16.665000 46.000000 7.001500 4.000000 35.000000 41.000000
50% 3.000000 0.000000 1.000000 1.000000 20.500000 24.240000 61.000000 11.001400 16.000000 117.000000 144.000000
75% 4.000000 0.000000 1.000000 2.000000 27.060000 31.060000 78.000000 16.997900 48.000000 222.000000 282.000000
max 4.000000 1.000000 1.000000 4.000000 41.000000 45.455000 100.000000 56.996900 362.000000 886.000000 977.000000

Part 3. Primary visual data analysis

I split data into train and hold-out samples

In [7]:
train_df, test_df, y_train, y_test = train_test_split(data_df.drop(['casual', 'registered', 'count'], axis=1), data_df[['casual', 'registered', 'count']], 
                                                      test_size=0.3, random_state=17, shuffle=True)
In [8]:
def draw_train_test_distribution(column):
    _, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,3))
    sns.distplot(train_df[column], ax = axes[0], label='train')
    sns.distplot(test_df[column], ax = axes[1], label='test');
In [9]:
# The distribution of the indicative features
draw_train_test_distribution('season')
draw_train_test_distribution('holiday')
draw_train_test_distribution('workingday')
draw_train_test_distribution('weather');

The distribution of the indicative features ('season', 'holiday', 'workingday', 'weather') on the train and test coincides. There is 'weather' with value 4 on the test and absent on the train

In [10]:
test_df[test_df['weather'] == 4]['weather'].count()
Out[10]:
1

Only 1 exampe contains 'weather'=4. This can be neglected

In [11]:
# The distribution of the numerical features on the train and test
draw_train_test_distribution('temp')
draw_train_test_distribution('atemp')
draw_train_test_distribution('humidity')
draw_train_test_distribution('windspeed')

The distribution of the scale features ('temp', 'atemp', 'humidity', 'windspeed') on the train and test coincides

In [12]:
def transformation(columnName, func = np.log1p):
    temp_train = pd.DataFrame(index=train_df.index)
    temp_train[columnName] = train_df[columnName].apply(func)

    temp_test = pd.DataFrame(index=test_df.index)
    temp_test[columnName] = test_df[columnName].apply(func)
    
    _, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,3))
    sns.distplot(temp_train, ax = axes[0])
    sns.distplot(temp_test, ax = axes[1]);
In [13]:
transformation('temp')
transformation('atemp')
transformation('humidity')

After the transformation of the distribution on the train_df and on the test_df began to resemble each other more

In [14]:
train_df['temp_tr'] = train_df['temp'].apply(np.log1p)
test_df['temp_tr'] = test_df['temp'].apply(np.log1p)
train_df['atemp_tr'] = train_df['atemp'].apply(np.log1p)
test_df['atemp_tr'] = test_df['atemp'].apply(np.log1p)
train_df['humidity_tr'] = train_df['humidity'].apply(np.log1p)
test_df['humidity_tr'] = test_df['humidity'].apply(np.log1p)

Part 4. Insights and found dependencies

In [15]:
corr = train_df.join(y_train).corr('spearman')
plt.figure(figsize = ( 12 , 10 ))
sns.heatmap(corr,annot=True,fmt='.2f',cmap="YlGnBu");
We can see a strong correlation between target columns (What was expected). Among customers, correlation with registered users is higher than with unregistered ones. In this case, unregistered looking more at the weather.
Correlations temp/temp_tr, atemp/atemp_tr, humidity/humidity_tr are equals. I'll use *_tr features, because their distibutions into train/test are similar.
**Idea: Build an ensemble using 3 models with different target!**
'Holiday' has a low correlation with targets. I'll not use it.
'Workingday' has a lower correlation with 'registered' and 'count' than with 'casual'
'Windspeed' affects both registered and unregistered users
The effect of 'temp' and 'atemp' is the same.

Part 5. Metrics selection

According to the terms of competition i will use Root Mean Squared Error (RMSE)

Part 6. Model selection

The course recommend using Gradient boosting model as one of the most powerful. I want use Catboost library, because i want to understand it. From catboost library i 'll use CatBoostRegression, because current task is related to regression.

Part 7. Data preprocessing

  • I changed 'temp', 'atemp', 'humidity' features in part 3.
  • NaN is absent.
  • I will not use OHE because the CatBoostRegressor takes a list of categorical features as a parameter
  • Scaling data in CatBoost will be auto
In [16]:
X_train = train_df.drop(['holiday', 'datetime', 'temp', 'atemp', 'humidity'], axis=1)
X_test = test_df.drop(['holiday', 'datetime', 'temp', 'atemp', 'humidity'], axis=1)

Part 8. Cross-validation and adjustment of model hyperparameters

In [17]:
cat_features = [0, 1, 2]

X_train_cbr, X_test_cbr, y_train_cbr, y_test_cbr = train_test_split(X_train, y_train, test_size=0.3, random_state=17, shuffle=True)
In [18]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
import colorama

# the number of random sets of hyperparameters
N_HYPEROPT_PROBES = 100

# hyperparameter sampling algorithm
HYPEROPT_ALGO = tpe.suggest

space ={
        'depth': hp.quniform("depth", 4, 10, 1),
        'learning_rate': hp.loguniform('learning_rate', -3.0, -0.7),
        'l2_leaf_reg': hp.uniform('l2_leaf_reg', 1, 10),
       }

def get_catboost_params(space):
    params = dict()
    params['learning_rate'] = space['learning_rate']
    params['depth'] = int(space['depth'])
    params['l2_leaf_reg'] = space['l2_leaf_reg']
    return params

def objective(space, target_column='count'):
    global obj_call_count, cur_best_rmse

    obj_call_count += 1

    print('\nCatBoost objective call #{} cur_best_acc={:7.5f}'.format(obj_call_count, cur_best_rmse) )

    params = get_catboost_params(space)

    sorted_params = sorted(space.items(), key=lambda z: z[0])
    params_str = str.join(' ', ['{}={}'.format(k, v) for k, v in sorted_params])
    print('Params: {}'.format(params_str) )

    model = CatBoostRegressor(iterations=2000,
                              cat_features = cat_features,
                            learning_rate=params['learning_rate'],
                            depth=int(params['depth']),
                            use_best_model=True,
                            eval_metric='RMSE',
                            l2_leaf_reg=params['l2_leaf_reg'],
                            early_stopping_rounds=50,
                            random_seed=17,
                            verbose=False
                            )
    model.fit(X_train_cbr, y_train_cbr[target_column], 
              eval_set=(X_test_cbr, y_test_cbr[target_column]), 
              verbose=False)
    nb_trees = model.get_best_iteration()

    print('nb_trees={}'.format(nb_trees))

    y_pred = model.predict(X_test_cbr)

    rmse = np.sqrt(mean_squared_error(y_test_cbr[target_column], y_pred))

    print('rmse={}, Params:{}, nb_trees={}\n'.format(rmse, params_str, nb_trees ))

    if rmse<cur_best_rmse:
        cur_best_rmse = rmse
        print(colorama.Fore.GREEN + 'NEW BEST RMSE={}'.format(cur_best_rmse) + colorama.Fore.RESET)


    return{'loss':rmse, 'status': STATUS_OK }
In [19]:
%%time
obj_call_count = 0
cur_best_rmse = np.inf

trials = Trials()
best = fmin(fn=objective,
                     space=space,
                     algo=HYPEROPT_ALGO,
                     max_evals=N_HYPEROPT_PROBES,
                     trials=trials,
                     verbose=1)
CatBoost objective call #1 cur_best_acc=    inf
Params: depth=6.0 l2_leaf_reg=9.162241059966508 learning_rate=0.11029754673317103
nb_trees=480
rmse=145.01128477405405, Params:depth=6.0 l2_leaf_reg=9.162241059966508 learning_rate=0.11029754673317103, nb_trees=480

NEW BEST RMSE=145.01128477405405

CatBoost objective call #2 cur_best_acc=145.01128
Params: depth=10.0 l2_leaf_reg=5.724509161849184 learning_rate=0.06224506493131391
nb_trees=398
rmse=144.71057146309488, Params:depth=10.0 l2_leaf_reg=5.724509161849184 learning_rate=0.06224506493131391, nb_trees=398

NEW BEST RMSE=144.71057146309488

CatBoost objective call #3 cur_best_acc=144.71057
Params: depth=7.0 l2_leaf_reg=6.860648877422373 learning_rate=0.23900279053690465
nb_trees=146
rmse=144.8364098482692, Params:depth=7.0 l2_leaf_reg=6.860648877422373 learning_rate=0.23900279053690465, nb_trees=146


CatBoost objective call #4 cur_best_acc=144.71057
Params: depth=6.0 l2_leaf_reg=5.0723531000176605 learning_rate=0.055663490360050125
nb_trees=527
rmse=145.3758517307871, Params:depth=6.0 l2_leaf_reg=5.0723531000176605 learning_rate=0.055663490360050125, nb_trees=527


CatBoost objective call #5 cur_best_acc=144.71057
Params: depth=8.0 l2_leaf_reg=4.467188887609437 learning_rate=0.07798321149313608
nb_trees=305
rmse=145.0052065774969, Params:depth=8.0 l2_leaf_reg=4.467188887609437 learning_rate=0.07798321149313608, nb_trees=305


CatBoost objective call #6 cur_best_acc=144.71057
Params: depth=6.0 l2_leaf_reg=6.515807358719084 learning_rate=0.08579546221856192
nb_trees=181
rmse=145.70183120954505, Params:depth=6.0 l2_leaf_reg=6.515807358719084 learning_rate=0.08579546221856192, nb_trees=181


CatBoost objective call #7 cur_best_acc=144.71057
Params: depth=6.0 l2_leaf_reg=7.005132229388859 learning_rate=0.17370100443016293
nb_trees=126
rmse=145.3425941671122, Params:depth=6.0 l2_leaf_reg=7.005132229388859 learning_rate=0.17370100443016293, nb_trees=126


CatBoost objective call #8 cur_best_acc=144.71057
Params: depth=9.0 l2_leaf_reg=1.6041320188994677 learning_rate=0.08816907045650027
nb_trees=365
rmse=145.30166819500258, Params:depth=9.0 l2_leaf_reg=1.6041320188994677 learning_rate=0.08816907045650027, nb_trees=365


CatBoost objective call #9 cur_best_acc=144.71057
Params: depth=7.0 l2_leaf_reg=8.78889905902629 learning_rate=0.30443280946477697
nb_trees=75
rmse=145.8938398093139, Params:depth=7.0 l2_leaf_reg=8.78889905902629 learning_rate=0.30443280946477697, nb_trees=75


CatBoost objective call #10 cur_best_acc=144.71057
Params: depth=10.0 l2_leaf_reg=6.55876641928497 learning_rate=0.06974394979383598
nb_trees=442
rmse=144.67508200118556, Params:depth=10.0 l2_leaf_reg=6.55876641928497 learning_rate=0.06974394979383598, nb_trees=442

NEW BEST RMSE=144.67508200118556

CatBoost objective call #11 cur_best_acc=144.67508
Params: depth=8.0 l2_leaf_reg=1.8947532362990391 learning_rate=0.09355106790501669
nb_trees=341
rmse=145.0326169629634, Params:depth=8.0 l2_leaf_reg=1.8947532362990391 learning_rate=0.09355106790501669, nb_trees=341


CatBoost objective call #12 cur_best_acc=144.67508
Params: depth=5.0 l2_leaf_reg=8.55311925599833 learning_rate=0.057554699291469044
nb_trees=652
rmse=145.08534197409543, Params:depth=5.0 l2_leaf_reg=8.55311925599833 learning_rate=0.057554699291469044, nb_trees=652


CatBoost objective call #13 cur_best_acc=144.67508
Params: depth=5.0 l2_leaf_reg=5.776979190279688 learning_rate=0.4605658225059673
nb_trees=76
rmse=146.6008249438924, Params:depth=5.0 l2_leaf_reg=5.776979190279688 learning_rate=0.4605658225059673, nb_trees=76


CatBoost objective call #14 cur_best_acc=144.67508
Params: depth=7.0 l2_leaf_reg=1.0115586588979553 learning_rate=0.4907604772074178
nb_trees=49
rmse=146.50302659652544, Params:depth=7.0 l2_leaf_reg=1.0115586588979553 learning_rate=0.4907604772074178, nb_trees=49


CatBoost objective call #15 cur_best_acc=144.67508
Params: depth=9.0 l2_leaf_reg=9.978710471927538 learning_rate=0.49085848864108644
nb_trees=30
rmse=145.16016867472996, Params:depth=9.0 l2_leaf_reg=9.978710471927538 learning_rate=0.49085848864108644, nb_trees=30


CatBoost objective call #16 cur_best_acc=144.67508
Params: depth=8.0 l2_leaf_reg=9.96633046875779 learning_rate=0.13939540136365464
nb_trees=129
rmse=145.09579597177483, Params:depth=8.0 l2_leaf_reg=9.96633046875779 learning_rate=0.13939540136365464, nb_trees=129


CatBoost objective call #17 cur_best_acc=144.67508
Params: depth=9.0 l2_leaf_reg=7.49917755542786 learning_rate=0.4233613239892547
nb_trees=84
rmse=145.26369981478982, Params:depth=9.0 l2_leaf_reg=7.49917755542786 learning_rate=0.4233613239892547, nb_trees=84


CatBoost objective call #18 cur_best_acc=144.67508
Params: depth=6.0 l2_leaf_reg=9.567300937478123 learning_rate=0.2033701538892362
nb_trees=181
rmse=145.40093558262453, Params:depth=6.0 l2_leaf_reg=9.567300937478123 learning_rate=0.2033701538892362, nb_trees=181


CatBoost objective call #19 cur_best_acc=144.67508
Params: depth=9.0 l2_leaf_reg=6.698248860743019 learning_rate=0.45339826865672467
nb_trees=37
rmse=145.91615573554319, Params:depth=9.0 l2_leaf_reg=6.698248860743019 learning_rate=0.45339826865672467, nb_trees=37


CatBoost objective call #20 cur_best_acc=144.67508
Params: depth=5.0 l2_leaf_reg=1.757778283159109 learning_rate=0.30119053622744735
nb_trees=80
rmse=146.07299751564153, Params:depth=5.0 l2_leaf_reg=1.757778283159109 learning_rate=0.30119053622744735, nb_trees=80


CatBoost objective call #21 cur_best_acc=144.67508
Params: depth=10.0 l2_leaf_reg=3.696915302131909 learning_rate=0.06730685644868378
nb_trees=377
rmse=144.8886024842629, Params:depth=10.0 l2_leaf_reg=3.696915302131909 learning_rate=0.06730685644868378, nb_trees=377


CatBoost objective call #22 cur_best_acc=144.67508
Params: depth=10.0 l2_leaf_reg=3.336138733775994 learning_rate=0.05027596115341423
nb_trees=543
rmse=144.77713975933787, Params:depth=10.0 l2_leaf_reg=3.336138733775994 learning_rate=0.05027596115341423, nb_trees=543


CatBoost objective call #23 cur_best_acc=144.67508
Params: depth=10.0 l2_leaf_reg=7.861698086581923 learning_rate=0.11304163767452166
nb_trees=226
rmse=144.56451226933365, Params:depth=10.0 l2_leaf_reg=7.861698086581923 learning_rate=0.11304163767452166, nb_trees=226

NEW BEST RMSE=144.56451226933365

CatBoost objective call #24 cur_best_acc=144.56451
Params: depth=10.0 l2_leaf_reg=8.112398470117075 learning_rate=0.12572192467830726
nb_trees=223
rmse=144.69997938619883, Params:depth=10.0 l2_leaf_reg=8.112398470117075 learning_rate=0.12572192467830726, nb_trees=223


CatBoost objective call #25 cur_best_acc=144.56451
Params: depth=9.0 l2_leaf_reg=7.981523003541508 learning_rate=0.11884944866341553
nb_trees=315
rmse=145.0479057875793, Params:depth=9.0 l2_leaf_reg=7.981523003541508 learning_rate=0.11884944866341553, nb_trees=315


CatBoost objective call #26 cur_best_acc=144.56451
Params: depth=10.0 l2_leaf_reg=6.145231429511139 learning_rate=0.15576927880405184
nb_trees=113
rmse=145.55810364192772, Params:depth=10.0 l2_leaf_reg=6.145231429511139 learning_rate=0.15576927880405184, nb_trees=113


CatBoost objective call #27 cur_best_acc=144.56451
Params: depth=8.0 l2_leaf_reg=4.870671960912379 learning_rate=0.07173185269889736
nb_trees=330
rmse=144.8749401345617, Params:depth=8.0 l2_leaf_reg=4.870671960912379 learning_rate=0.07173185269889736, nb_trees=330


CatBoost objective call #28 cur_best_acc=144.56451
Params: depth=10.0 l2_leaf_reg=7.674549514980487 learning_rate=0.10112749794192154
nb_trees=269
rmse=144.43028786084875, Params:depth=10.0 l2_leaf_reg=7.674549514980487 learning_rate=0.10112749794192154, nb_trees=269

NEW BEST RMSE=144.43028786084875

CatBoost objective call #29 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=7.627852095181636 learning_rate=0.10226168247190975
nb_trees=264
rmse=145.13382293710296, Params:depth=9.0 l2_leaf_reg=7.627852095181636 learning_rate=0.10226168247190975, nb_trees=264


CatBoost objective call #30 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=8.961360997664006 learning_rate=0.10731212355208468
nb_trees=214
rmse=145.0225116487687, Params:depth=8.0 l2_leaf_reg=8.961360997664006 learning_rate=0.10731212355208468, nb_trees=214


CatBoost objective call #31 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=9.3848178488215 learning_rate=0.1787208387888453
nb_trees=131
rmse=145.08833065723064, Params:depth=10.0 l2_leaf_reg=9.3848178488215 learning_rate=0.1787208387888453, nb_trees=131


CatBoost objective call #32 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=8.289169560998735 learning_rate=0.13774620011307132
nb_trees=256
rmse=145.02798301553528, Params:depth=9.0 l2_leaf_reg=8.289169560998735 learning_rate=0.13774620011307132, nb_trees=256


CatBoost objective call #33 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=7.420709322375874 learning_rate=0.2240876451214591
nb_trees=274
rmse=145.11576838465385, Params:depth=10.0 l2_leaf_reg=7.420709322375874 learning_rate=0.2240876451214591, nb_trees=274


CatBoost objective call #34 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=5.684481350674522 learning_rate=0.050632720084829726
nb_trees=437
rmse=145.3315581956136, Params:depth=7.0 l2_leaf_reg=5.684481350674522 learning_rate=0.050632720084829726, nb_trees=437


CatBoost objective call #35 cur_best_acc=144.43029
Params: depth=4.0 l2_leaf_reg=4.325556306894089 learning_rate=0.09805157705276114
nb_trees=319
rmse=146.11709736245982, Params:depth=4.0 l2_leaf_reg=4.325556306894089 learning_rate=0.09805157705276114, nb_trees=319


CatBoost objective call #36 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=7.02905454527499 learning_rate=0.07841900268795064
nb_trees=549
rmse=144.89010008113658, Params:depth=8.0 l2_leaf_reg=7.02905454527499 learning_rate=0.07841900268795064, nb_trees=549


CatBoost objective call #37 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=5.241061252471868 learning_rate=0.16073929392903044
nb_trees=104
rmse=145.41241581395354, Params:depth=10.0 l2_leaf_reg=5.241061252471868 learning_rate=0.16073929392903044, nb_trees=104


CatBoost objective call #38 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=6.02935860366229 learning_rate=0.12277319173098235
nb_trees=423
rmse=144.99412080647423, Params:depth=9.0 l2_leaf_reg=6.02935860366229 learning_rate=0.12277319173098235, nb_trees=423


CatBoost objective call #39 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=9.22083956041321 learning_rate=0.2689405226143646
nb_trees=151
rmse=145.5563504610345, Params:depth=8.0 l2_leaf_reg=9.22083956041321 learning_rate=0.2689405226143646, nb_trees=151


CatBoost objective call #40 cur_best_acc=144.43029
Params: depth=4.0 l2_leaf_reg=7.129913259372494 learning_rate=0.08177240257866596
nb_trees=512
rmse=145.99065501012774, Params:depth=4.0 l2_leaf_reg=7.129913259372494 learning_rate=0.08177240257866596, nb_trees=512


CatBoost objective call #41 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=8.461239548519664 learning_rate=0.063063546202746
nb_trees=477
rmse=144.61570546012183, Params:depth=10.0 l2_leaf_reg=8.461239548519664 learning_rate=0.063063546202746, nb_trees=477


CatBoost objective call #42 cur_best_acc=144.43029
Params: depth=6.0 l2_leaf_reg=7.810749966229368 learning_rate=0.19523027284975858
nb_trees=153
rmse=145.2222335280432, Params:depth=6.0 l2_leaf_reg=7.810749966229368 learning_rate=0.19523027284975858, nb_trees=153


CatBoost objective call #43 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=2.6325559061745567 learning_rate=0.09057650254322848
nb_trees=320
rmse=144.8805078052367, Params:depth=9.0 l2_leaf_reg=2.6325559061745567 learning_rate=0.09057650254322848, nb_trees=320


CatBoost objective call #44 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=9.69057459561548 learning_rate=0.1434613898096927
nb_trees=130
rmse=145.5136788347969, Params:depth=10.0 l2_leaf_reg=9.69057459561548 learning_rate=0.1434613898096927, nb_trees=130


CatBoost objective call #45 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=6.375567443980094 learning_rate=0.37052125247157625
nb_trees=76
rmse=145.55561563511412, Params:depth=7.0 l2_leaf_reg=6.375567443980094 learning_rate=0.37052125247157625, nb_trees=76


CatBoost objective call #46 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=8.801693266416986 learning_rate=0.05961174220757597
nb_trees=454
rmse=144.9372009852885, Params:depth=8.0 l2_leaf_reg=8.801693266416986 learning_rate=0.05961174220757597, nb_trees=454


CatBoost objective call #47 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=4.36681017210926 learning_rate=0.11736381594968161
nb_trees=224
rmse=145.23592047480844, Params:depth=9.0 l2_leaf_reg=4.36681017210926 learning_rate=0.11736381594968161, nb_trees=224


CatBoost objective call #48 cur_best_acc=144.43029
Params: depth=5.0 l2_leaf_reg=5.353991072221241 learning_rate=0.07320288919885691
nb_trees=278
rmse=145.59487872453843, Params:depth=5.0 l2_leaf_reg=5.353991072221241 learning_rate=0.07320288919885691, nb_trees=278


CatBoost objective call #49 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=7.350345398122761 learning_rate=0.24214868890467023
nb_trees=120
rmse=145.1058506849039, Params:depth=7.0 l2_leaf_reg=7.350345398122761 learning_rate=0.24214868890467023, nb_trees=120


CatBoost objective call #50 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=4.8739344341795325 learning_rate=0.08682365816326555
nb_trees=270
rmse=144.58627414734383, Params:depth=9.0 l2_leaf_reg=4.8739344341795325 learning_rate=0.08682365816326555, nb_trees=270


CatBoost objective call #51 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=6.811090798923736 learning_rate=0.11122189068716369
nb_trees=150
rmse=145.3693547588407, Params:depth=8.0 l2_leaf_reg=6.811090798923736 learning_rate=0.11122189068716369, nb_trees=150


CatBoost objective call #52 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=9.920676805982344 learning_rate=0.17238419804132254
nb_trees=138
rmse=145.5915270405547, Params:depth=10.0 l2_leaf_reg=9.920676805982344 learning_rate=0.17238419804132254, nb_trees=138


CatBoost objective call #53 cur_best_acc=144.43029
Params: depth=6.0 l2_leaf_reg=8.53956377157829 learning_rate=0.09531546524389174
nb_trees=345
rmse=145.12082279639336, Params:depth=6.0 l2_leaf_reg=8.53956377157829 learning_rate=0.09531546524389174, nb_trees=345


CatBoost objective call #54 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=3.9659392427527878 learning_rate=0.05611618457928404
nb_trees=393
rmse=145.2539252160496, Params:depth=7.0 l2_leaf_reg=3.9659392427527878 learning_rate=0.05611618457928404, nb_trees=393


CatBoost objective call #55 cur_best_acc=144.43029
Params: depth=5.0 l2_leaf_reg=6.383801437908959 learning_rate=0.13189915559131352
nb_trees=523
rmse=145.6382697278889, Params:depth=5.0 l2_leaf_reg=6.383801437908959 learning_rate=0.13189915559131352, nb_trees=523


CatBoost objective call #56 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=9.008432897658436 learning_rate=0.19728386888282304
nb_trees=157
rmse=145.26940656560825, Params:depth=10.0 l2_leaf_reg=9.008432897658436 learning_rate=0.19728386888282304, nb_trees=157


CatBoost objective call #57 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=7.9482765691668815 learning_rate=0.10781408965282444
nb_trees=212
rmse=145.166989602102, Params:depth=9.0 l2_leaf_reg=7.9482765691668815 learning_rate=0.10781408965282444, nb_trees=212


CatBoost objective call #58 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=5.9933135733047065 learning_rate=0.06511643027368347
nb_trees=292
rmse=145.23076407562357, Params:depth=9.0 l2_leaf_reg=5.9933135733047065 learning_rate=0.06511643027368347, nb_trees=292


CatBoost objective call #59 cur_best_acc=144.43029
Params: depth=4.0 l2_leaf_reg=2.9132675688111815 learning_rate=0.15404186871430417
nb_trees=369
rmse=146.0985709138703, Params:depth=4.0 l2_leaf_reg=2.9132675688111815 learning_rate=0.15404186871430417, nb_trees=369


CatBoost objective call #60 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=8.264121693640837 learning_rate=0.07429129404097984
nb_trees=251
rmse=144.818844456387, Params:depth=10.0 l2_leaf_reg=8.264121693640837 learning_rate=0.07429129404097984, nb_trees=251


CatBoost objective call #61 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=1.2587848395447159 learning_rate=0.051424789900755094
nb_trees=344
rmse=145.02138136378932, Params:depth=8.0 l2_leaf_reg=1.2587848395447159 learning_rate=0.051424789900755094, nb_trees=344


CatBoost objective call #62 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=7.729832872855775 learning_rate=0.3073316136478665
nb_trees=128
rmse=145.61247589788286, Params:depth=9.0 l2_leaf_reg=7.729832872855775 learning_rate=0.3073316136478665, nb_trees=128


CatBoost objective call #63 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=9.518917442874896 learning_rate=0.08053455657649686
nb_trees=443
rmse=145.1077341169676, Params:depth=10.0 l2_leaf_reg=9.518917442874896 learning_rate=0.08053455657649686, nb_trees=443


CatBoost objective call #64 cur_best_acc=144.43029
Params: depth=6.0 l2_leaf_reg=7.245156735198671 learning_rate=0.09994191725476097
nb_trees=198
rmse=145.54818986689392, Params:depth=6.0 l2_leaf_reg=7.245156735198671 learning_rate=0.09994191725476097, nb_trees=198


CatBoost objective call #65 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=6.742887363930684 learning_rate=0.18058751596119568
nb_trees=239
rmse=144.84433609439233, Params:depth=8.0 l2_leaf_reg=6.742887363930684 learning_rate=0.18058751596119568, nb_trees=239


CatBoost objective call #66 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=2.4151544957654294 learning_rate=0.08422482233351516
nb_trees=232
rmse=144.83624887212093, Params:depth=10.0 l2_leaf_reg=2.4151544957654294 learning_rate=0.08422482233351516, nb_trees=232


CatBoost objective call #67 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=3.7289052491774317 learning_rate=0.11335370562936921
nb_trees=353
rmse=144.77397143664092, Params:depth=9.0 l2_leaf_reg=3.7289052491774317 learning_rate=0.11335370562936921, nb_trees=353


CatBoost objective call #68 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=4.558400970828368 learning_rate=0.0886442020097057
nb_trees=200
rmse=144.91864253252004, Params:depth=10.0 l2_leaf_reg=4.558400970828368 learning_rate=0.0886442020097057, nb_trees=200


CatBoost objective call #69 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=5.091829704622422 learning_rate=0.12780493219378197
nb_trees=227
rmse=145.00740799093455, Params:depth=9.0 l2_leaf_reg=5.091829704622422 learning_rate=0.12780493219378197, nb_trees=227


CatBoost objective call #70 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=5.562890276388927 learning_rate=0.14396972326591337
nb_trees=123
rmse=144.86899198876887, Params:depth=10.0 l2_leaf_reg=5.562890276388927 learning_rate=0.14396972326591337, nb_trees=123


CatBoost objective call #71 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=3.991298501216731 learning_rate=0.1039053079035399
nb_trees=234
rmse=145.26090262458308, Params:depth=9.0 l2_leaf_reg=3.991298501216731 learning_rate=0.1039053079035399, nb_trees=234


CatBoost objective call #72 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=4.963522107014578 learning_rate=0.07573956163291833
nb_trees=296
rmse=145.45022518181665, Params:depth=10.0 l2_leaf_reg=4.963522107014578 learning_rate=0.07573956163291833, nb_trees=296


CatBoost objective call #73 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=3.2346612702067787 learning_rate=0.0678404250137506
nb_trees=476
rmse=144.98922146071564, Params:depth=9.0 l2_leaf_reg=3.2346612702067787 learning_rate=0.0678404250137506, nb_trees=476


CatBoost objective call #74 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=4.7032942176588515 learning_rate=0.08614467296451382
nb_trees=377
rmse=144.85887602861752, Params:depth=8.0 l2_leaf_reg=4.7032942176588515 learning_rate=0.08614467296451382, nb_trees=377


CatBoost objective call #75 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=5.933308882175034 learning_rate=0.053178003688827795
nb_trees=505
rmse=144.9545087904257, Params:depth=10.0 l2_leaf_reg=5.933308882175034 learning_rate=0.053178003688827795, nb_trees=505


CatBoost objective call #76 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=6.500850060176093 learning_rate=0.05947611551809318
nb_trees=370
rmse=145.18274711437684, Params:depth=9.0 l2_leaf_reg=6.500850060176093 learning_rate=0.05947611551809318, nb_trees=370


CatBoost objective call #77 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=6.925265350646983 learning_rate=0.09355644930318156
nb_trees=231
rmse=145.08334082932276, Params:depth=10.0 l2_leaf_reg=6.925265350646983 learning_rate=0.09355644930318156, nb_trees=231


CatBoost objective call #78 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=6.2062789386536155 learning_rate=0.1511349400236397
nb_trees=97
rmse=146.0275864794284, Params:depth=10.0 l2_leaf_reg=6.2062789386536155 learning_rate=0.1511349400236397, nb_trees=97


CatBoost objective call #79 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=5.752458962863234 learning_rate=0.16515366653505703
nb_trees=94
rmse=146.11304091549442, Params:depth=9.0 l2_leaf_reg=5.752458962863234 learning_rate=0.16515366653505703, nb_trees=94


CatBoost objective call #80 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=5.41187581379391 learning_rate=0.21704731473219457
nb_trees=90
rmse=145.14250491259637, Params:depth=8.0 l2_leaf_reg=5.41187581379391 learning_rate=0.21704731473219457, nb_trees=90


CatBoost objective call #81 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=7.5683314785178615 learning_rate=0.13027356101503806
nb_trees=278
rmse=145.22099147716543, Params:depth=10.0 l2_leaf_reg=7.5683314785178615 learning_rate=0.13027356101503806, nb_trees=278


CatBoost objective call #82 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=8.736897160788804 learning_rate=0.11527585652761213
nb_trees=260
rmse=145.16147687067954, Params:depth=7.0 l2_leaf_reg=8.736897160788804 learning_rate=0.11527585652761213, nb_trees=260


CatBoost objective call #83 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=9.27413994495512 learning_rate=0.13716232837143075
nb_trees=250
rmse=145.0096837873529, Params:depth=9.0 l2_leaf_reg=9.27413994495512 learning_rate=0.13716232837143075, nb_trees=250


CatBoost objective call #84 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=8.13363638410705 learning_rate=0.0695623944893985
nb_trees=286
rmse=145.00362959537748, Params:depth=10.0 l2_leaf_reg=8.13363638410705 learning_rate=0.0695623944893985, nb_trees=286


CatBoost objective call #85 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=4.147942360935656 learning_rate=0.12111920819215083
nb_trees=174
rmse=145.4300584156868, Params:depth=8.0 l2_leaf_reg=4.147942360935656 learning_rate=0.12111920819215083, nb_trees=174


CatBoost objective call #86 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=4.746053915612373 learning_rate=0.06273193394969004
nb_trees=369
rmse=144.69031924111266, Params:depth=10.0 l2_leaf_reg=4.746053915612373 learning_rate=0.06273193394969004, nb_trees=369


CatBoost objective call #87 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=3.5238314505552477 learning_rate=0.10560454546076624
nb_trees=284
rmse=144.92111443705087, Params:depth=9.0 l2_leaf_reg=3.5238314505552477 learning_rate=0.10560454546076624, nb_trees=284


CatBoost objective call #88 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=2.027640098413945 learning_rate=0.07971436484150025
nb_trees=281
rmse=145.49811991656497, Params:depth=9.0 l2_leaf_reg=2.027640098413945 learning_rate=0.07971436484150025, nb_trees=281


CatBoost objective call #89 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=5.187447957229791 learning_rate=0.18772205690443886
nb_trees=113
rmse=145.3466222451089, Params:depth=10.0 l2_leaf_reg=5.187447957229791 learning_rate=0.18772205690443886, nb_trees=113


CatBoost objective call #90 cur_best_acc=144.43029
Params: depth=9.0 l2_leaf_reg=9.027299387654422 learning_rate=0.09761171391501958
nb_trees=359
rmse=144.75925619743248, Params:depth=9.0 l2_leaf_reg=9.027299387654422 learning_rate=0.09761171391501958, nb_trees=359


CatBoost objective call #91 cur_best_acc=144.43029
Params: depth=8.0 l2_leaf_reg=9.758862487092959 learning_rate=0.09207418660125213
nb_trees=428
rmse=145.30906211232843, Params:depth=8.0 l2_leaf_reg=9.758862487092959 learning_rate=0.09207418660125213, nb_trees=428


CatBoost objective call #92 cur_best_acc=144.43029
Params: depth=6.0 l2_leaf_reg=8.33748205323296 learning_rate=0.24357483717499437
nb_trees=165
rmse=145.25677257303747, Params:depth=6.0 l2_leaf_reg=8.33748205323296 learning_rate=0.24357483717499437, nb_trees=165


CatBoost objective call #93 cur_best_acc=144.43029
Params: depth=7.0 l2_leaf_reg=7.128266630592688 learning_rate=0.12397060121704302
nb_trees=309
rmse=145.1521968062212, Params:depth=7.0 l2_leaf_reg=7.128266630592688 learning_rate=0.12397060121704302, nb_trees=309


CatBoost objective call #94 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=7.8632368086984945 learning_rate=0.1469632640056585
nb_trees=122
rmse=144.90379702332206, Params:depth=10.0 l2_leaf_reg=7.8632368086984945 learning_rate=0.1469632640056585, nb_trees=122


CatBoost objective call #95 cur_best_acc=144.43029
Params: depth=10.0 l2_leaf_reg=8.697116850767104 learning_rate=0.13544159624736146
nb_trees=245
rmse=144.37519868411036, Params:depth=10.0 l2_leaf_reg=8.697116850767104 learning_rate=0.13544159624736146, nb_trees=245

NEW BEST RMSE=144.37519868411036

CatBoost objective call #96 cur_best_acc=144.37520
Params: depth=10.0 l2_leaf_reg=8.728970322345344 learning_rate=0.16998424453808686
nb_trees=165
rmse=145.73722098683078, Params:depth=10.0 l2_leaf_reg=8.728970322345344 learning_rate=0.16998424453808686, nb_trees=165


CatBoost objective call #97 cur_best_acc=144.37520
Params: depth=5.0 l2_leaf_reg=9.423346345947845 learning_rate=0.21720662461938978
nb_trees=104
rmse=146.65092709020996, Params:depth=5.0 l2_leaf_reg=9.423346345947845 learning_rate=0.21720662461938978, nb_trees=104


CatBoost objective call #98 cur_best_acc=144.37520
Params: depth=10.0 l2_leaf_reg=8.081514620182077 learning_rate=0.1388520800446899
nb_trees=158
rmse=144.50469439993932, Params:depth=10.0 l2_leaf_reg=8.081514620182077 learning_rate=0.1388520800446899, nb_trees=158


CatBoost objective call #99 cur_best_acc=144.37520
Params: depth=10.0 l2_leaf_reg=8.076558345583846 learning_rate=0.2880422071470888
nb_trees=169
rmse=145.7859169555674, Params:depth=10.0 l2_leaf_reg=8.076558345583846 learning_rate=0.2880422071470888, nb_trees=169


CatBoost objective call #100 cur_best_acc=144.37520
Params: depth=6.0 l2_leaf_reg=9.912696008978289 learning_rate=0.2078671865557629
nb_trees=311
rmse=145.07572758395168, Params:depth=6.0 l2_leaf_reg=9.912696008978289 learning_rate=0.2078671865557629, nb_trees=311

CPU times: user 20min 20s, sys: 2min 17s, total: 22min 38s
Wall time: 6min 12s
In [20]:
print('The best params:')
print( best )
The best params:
{'depth': 10.0, 'l2_leaf_reg': 8.697116850767104, 'learning_rate': 0.13544159624736146}
In [21]:
cbr = CatBoostRegressor(random_seed=17, 
                        eval_metric='RMSE', 
                        iterations=2000, 
                        max_depth=best['depth'], 
                        early_stopping_rounds=50, 
                        learning_rate=best['learning_rate'],
                        l2_leaf_reg=best['l2_leaf_reg'],
                       use_best_model=True)
cbr.fit(X_train_cbr, y_train_cbr['count'], 
       eval_set=(X_test_cbr, y_test_cbr['count']), 
       cat_features=cat_features,
       silent=True,
       plot=True);