#!/usr/bin/env python # coding: utf-8 # # Exercise 10- CostSensitive Churn # # [paper](http://download.springer.com/static/pdf/125/art%253A10.1186%252Fs40165-015-0014-6.pdf?originUrl=http%3A%2F%2Fdecisionanalyticsjournal.springeropen.com%2Farticle%2F10.1186%2Fs40165-015-0014-6&token2=exp=1462974790~acl=%2Fstatic%2Fpdf%2F125%2Fart%25253A10.1186%25252Fs40165-015-0014-6.pdf*~hmac=05041d990b7e5a5e70d6efc1fbb29c2a380465c6edc84be1feda1b6d49588a1a) # [slides](http://www.slideshare.net/albahnsen/maximizing-a-churn-campaigns-profitability-with-cost-sensitive-predictive-analytics) # # Customer churn predictive modeling deals with predicting the probability of a customer defecting # using historical, behavioral and socio-economical information. This tool is of great benefit to # subscription based companies allowing them to maximize the results of retention campaigns. The # problem of churn predictive modeling has been widely studied by the data mining and machine learning # communities. It is usually tackled by using classification algorithms in order to learn the # different patterns of both the churners and non-churners. Nevertheless, current state-of-the-art # classification algorithms are not well aligned with commercial goals, in the sense that, the models # miss to include the real financial costs and benefits during the training and evaluation phases. In # the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive # power, does not yield to the best results when measured by the actual financial cost, i.e., # investment per subscriber on a loyalty campaign and the financial impact of failing to detect a # real churner versus wrongly predicting a non-churner as a churner. # # The two main objectives of subscription-based companies are to acquire new subscribers and # retain those they already have, mainly because profits are directly linked with the number of # subscribers. In order to maximize the profit, companies must increase the customer base by # incrementing sales while decreasing the number of churners. Furthermore, it is common knowledge # that retaining a customer is about five times less expensive than acquiring a new one , this creates pressure to have better and more effective churn campaigns. # # A typical churn campaign consists in identifying from the current customer base which ones are # more likely to leave the company, and make an offer in order to avoid that behavior. # With this in mind the companies use intelligence to create and improve retention and collection # strategies. In the first case, this usually implies an offer that can be either a discount or a # free upgrade during certain span of time. In both cases the company has to assume a cost for that # offer, therefore, accurate prediction of the churners becomes important. The logic of this flow is # shown in the following figure. # ![fig1](../notebooks/images/ch5_fig1.png) # The churn campaign process starts with the sales that every month increase the customer # base, however, monthly there is a group of customers that decide to leave the company for many # reasons. Then the objective of a churn model is to identify those customers before they take the # decision of defecting. # # Using a churn model, those customers more likely to leave are predicted as churners and # an offer is made in order to retain them. However, it is known that not all customers will accept # the offer, in the case when a customer is planning to defect, it is possible that the offer is not # good enough to retain him or that the reason for defecting can not be influenced by an offer. # Using historical information, it is estimated that a customer will accept the offer with # probability $\gamma$. # On the other hand, there is the case in which the churn model misclassified a non-churner as # churner, also known as false positives, in that case the customer will always accept the offer that # means and additional cost to the company since those misclassified customers do not have the # intentions of leaving. # # In the case were the churn model predicts customers as non-churners, there is also the possibility # of a misclassification, in this case an actual churner is predicted as non-churner, since # these customers do not receive an offer and they will leave the company, these cases are known as # false negatives. Lastly, there is the case were the customers are actually non-churners, then # there is no need to make a retention offer to these customers since they will continue to be part # of the customer base. # # It can be seen that a churn campaign (or churn model) has three main points. First, avoid false # positives since there is a financial cost of making an offer where it is not needed. Second, find # the right offer to give to those customers identified as churners. And lastly, to decrease # the number of false negatives. # In the following figure, the financial impact of a churn model is shown. # ![fig1](../notebooks/images/ch5_fig2.png) # Note than we take # into account the costs and not the profit in each case. # When a customer is predicted to be a churner, an offer is made with the objective of avoiding # the customer defecting. However, if a customer is actually a churner, he may or not accept the # offer with a probability $\gamma_i$. If the customer accepts the offer, the financial impact is # equal to the cost of the offer ($C_{o_i}$) plus the administrative cost of contacting the # customer ($C_a$). On the other hand, if the customer declines the offer, the cost is the # expected income that the clients would otherwise generate, also called customer lifetime value # ($CLV_i$), plus $C_a$. Lastly, if the customer is not actually a churner, he will be happy to # accept the offer and the cost will be $C_{o_i}$ plus $C_a$. # # In the case that the customer is predicted as non-churner, there are two possible outcomes. # Either the customer is not a churner, then the cost is zero, or the customer is a churner and the # cost is $CLV_i$. # # # | | Actual Positive ($y_i=1$) | Actual Negative ($y_i=0$)| # |--- |:-: |:-: | # | Predicted Positive ($c_i=1$) | $C_{TP_i}=\gamma_iC_{o_i}+(1-\gamma_i)(CLV_i+C_a)$ | $C_{FP_i}=C_{o_i}+C_a$ | # | Predicted Negative ($c_i=0$) | $C_{FN_i}=CLV_i$ | $C_{TN_i}=0$ | # In[1]: import pandas as pd import numpy as np # In[2]: import zipfile with zipfile.ZipFile('../datasets/cost_sensitive_classification_churn.csv.zip', 'r') as z: f = z.open('cost_sensitive_classification_churn.csv') data = pd.read_csv(f, index_col=0) # In[3]: data.head() # In[4]: data.target.value_counts(normalize=True) # In[5]: X =data[['x'+str(i) for i in range(1, 47)]] y = data.target cost_mat = data[['C_FP','C_FN','C_TP','C_TN']].values # In[6]: from sklearn.cross_validation import train_test_split temp = train_test_split(X, y, cost_mat, test_size=0.33, random_state=42) X_train, X_test, y_train, y_test, cost_mat_train, cost_mat_test = temp # # Exercice 10.1 # # # * Train 4 different models to predict target (Churn) using x1-x46 as features # 1. Logistic Regression # 2. Ensemble # 3. Under-sampling LR # 4. Under-sampling Ensemble # In[ ]: # # Exercice 10.2 # # * Calculate the savings of the different models # * Compare F1Score and Savings # In[ ]: # # Exercice 10.3 # # Using the probabilities of each model estimate a BMR classifier # # Compare the savings and F1Score # In[ ]: # # Exercice 10.4 # # Estimate a CostSensitiveDecisionTreeClassifier # In[ ]: