Notebook

Scikit-learn's batch logistic regression doesn't support a binomial response -- where your data is (successes, trials) rather than (success or failure). This can be a drag when you have a lot of trials with the exact same features, as your design matrix will be repetitive and much larger than necessary.

It isn't too bad to do it yourself though, which I'll show here. But first, my notation:

$n$ - number of trials
$k$ - number of successes
$X$ - design matrix
$\lambda$ - regularization parameter; I use lambd in the code because lambda is reserved in python

In the code, I treat the intercept separately and don't regularize it, though it's pretty straightforward to modify things and include it as a normal feature.

In [1]:

from scipy.optimize import fmin_l_bfgs_b
from numpy import log, exp, hstack

def logistic(x):
    return 1/(1+exp(-x))
 
def log_loss(y_orig, n, k, eps=1e-15):
    y = y_orig.copy()
    y[y < eps] = eps
    y[y > 1 - eps] = 1 - eps
    x = -k * log(y) - (n - k) * log(1 - y)
    return x.sum()/n.sum()
 
def lbfgs_func(x, X, n, k, lambd=1.0):
    intercept, theta = x[0], x[1:]
    y = logistic(X.dot(theta) + intercept)
    penalty = lambd/2*(theta*theta).sum()/n.sum()
    obj = log_loss(y, n, k) + penalty
 
    r = n*logistic(X.dot(theta)+intercept) - k
    grad_intercept = r.sum()/n.sum()
    grad_coefs = (X.T.dot(r) + lambd*theta)/n.sum()
    grad = hstack([grad_intercept, grad_coefs])
 
    return obj, grad

To use the code, you need your design matrix $X$ and a vector each of trials and successes, $n$ and $k$. It'll look something like this:

x0 = np.zeros(X.shape[1]+1)
x_optimal, objective, info = fmin_l_bfgs_b(lbfgs_func, x0, args=(X, n, k))
intercept, theta = x_optimal[0], x_optimal[1:]

Now intercept and theta will have your optimal parameters. To make predictions on new data, do something like this:

y_pred = logistic(X_new.dot(theta) + intercept)

Note that for large problems you might want to play with the defaults for fmin_l_bfgs_b; in particular I've found changing the factr keyword argument to be helpful.

Example¶

Two features with four training samples each. The first feature got 3/4 successes, the second only one out of four. You can see that scikit-learn's answer agrees with mine.

In [2]:

import numpy as np
from sklearn import linear_model

In [3]:

X = np.matrix('1 0; 1 0; 1 0; 1 0; 0 1; 0 1; 0 1; 0 1').A
y = np.array([1, 1, 1, 0, 1, 0, 0, 0])

C = 10
clf = linear_model.LogisticRegression(C=C)
clf.fit(X, y)
clf.intercept_, clf.coef_

Out[3]:

(array([ 0.]), array([[ 0.97281217, -0.97281217]]))

In [4]:

X = np.array([[1, 0], [0, 1]])
n = np.array([4, 4])
k = np.array([3, 1])

lambd = 1/C
x0 = np.zeros(X.shape[1]+1)
x_optimal, *_ = fmin_l_bfgs_b(lbfgs_func, x0, args=(X, n, k, lambd))
intercept, theta = x_optimal[0], x_optimal[1:]
intercept, theta

Out[4]:

(-5.1209353873023446e-16, array([ 0.97280393, -0.97280393]))