From Wiki:
'Conjoint analysis' is a survey-based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual product or service.
The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. A controlled set of potential products or services is shown to survey respondents and by analyzing how they make preferences between these products, the implicit valuation of the individual elements making up the product or service can be determined. These implicit valuations (utilities or part-worths) can be used to create market models that estimate market share, revenue and even profitability of new designs.
Here are a very brief description of a few conjoint analysis methods:
In this type of CA, a large number of full product descriptions is displayed to the respondent yielding large amounts of data for each one of them. Different product descriptions are presented for acceptability or preference evaluations.
This type of CA varies the choice sets presented based on the respondents’ preference. As a consequence, the features and levels shown are increasingly more competitive optimizing the data.
The CBC (or discrete-choice conjoint analysis) is the most common type. It requires respondents to select their preferred full-profile concept repeatedly from sets of around 3 to 5 full profile concepts. This idea is to try to simulate an actual buying scenario, and mimick shopping behavior as close as possible. From the trade-offs that are made when the respondent chooses one, or none, of the available choices, the importance of the attribute features and levels can be statistically derived. Based on the results one can estimate the value of each of the levels and also the optimal combinations that make-up products.
This type of CA is used to estimate utilities from respondents’ choice data. It is particularly useful when the respondent cannot provide preference evaluations for all attribute levels due to the size of the data collection.
This type of CA presents to the respondents an assortment of packages that must be selected under best/most preferred and worst/least preferred
We will focus on Choice-Based Conjoint Analysis in the following:
The basic assumptions of Conjoint Analysis are:
The linear regression model with conjoint preference data has the form:
$$R_i = u_0 + u_{j}^k X_{ij}^k$$where, $R_i$ is the ranking/rating assigned to product $i$,
$$X_{ij}^k = \left\{ {\begin{array}{*{20}{l}} 1&{{\text{if product }}i{\text{ has level j on attribute }}k}\\ 0&{{\text{otherwise}}} \end{array}} \right.$$and $u_j$ is the utility coefficient for level $j$ on attribute $k$.
import pandas as pd
filename = 'data/mobile_services_ranking.csv'
pd.read_csv(filename)
brand | startup | monthly | service | retail | apple | samsung | ranking | ||
---|---|---|---|---|---|---|---|---|---|
0 | "AT&T" | "$100" | "$100" | "4G NO" | "Retail NO" | "Apple NO" | "Samsung NO" | "Nexus NO" | 11 |
1 | "Verizon" | "$300" | "$100" | "4G NO" | "Retail YES" | "Apple YES" | "Samsung YES" | "Nexus NO" | 12 |
2 | "US Cellular" | "$400" | "$200" | "4G NO" | "Retail NO" | "Apple NO" | "Samsung YES" | "Nexus NO" | 9 |
3 | "Verizon" | "$400" | "$400" | "4G YES" | "Retail YES" | "Apple NO" | "Samsung NO" | "Nexus NO" | 2 |
4 | "Verizon" | "$200" | "$300" | "4G NO" | "Retail NO" | "Apple NO" | "Samsung YES" | "Nexus YES" | 8 |
5 | "Verizon" | "$100" | "$200" | "4G YES" | "Retail NO" | "Apple YES" | "Samsung NO" | "Nexus YES" | 13 |
6 | "US Cellular" | "$300" | "$300" | "4G YES" | "Retail NO" | "Apple YES" | "Samsung NO" | "Nexus NO" | 7 |
7 | "AT&T" | "$400" | "$300" | "4G NO" | "Retail YES" | "Apple YES" | "Samsung NO" | "Nexus YES" | 4 |
8 | "AT&T" | "$200" | "$400" | "4G YES" | "Retail NO" | "Apple YES" | "Samsung YES" | "Nexus NO" | 5 |
9 | "T-Mobile" | "$400" | "$100" | "4G YES" | "Retail NO" | "Apple YES" | "Samsung YES" | "Nexus YES" | 16 |
10 | "US Cellular" | "$100" | "$400" | "4G NO" | "Retail YES" | "Apple YES" | "Samsung YES" | "Nexus YES" | 3 |
11 | "T-Mobile" | "$200" | "$200" | "4G NO" | "Retail YES" | "Apple YES" | "Samsung NO" | "Nexus NO" | 6 |
12 | "T-Mobile" | "$100" | "$300" | "4G YES" | "Retail YES" | "Apple NO" | "Samsung YES" | "Nexus NO" | 10 |
13 | "US Cellular" | "$200" | "$100" | "4G YES" | "Retail YES" | "Apple NO" | "Samsung NO" | "Nexus YES" | 15 |
14 | "T-Mobile" | "$300" | "$400" | "4G NO" | "Retail NO" | "Apple NO" | "Samsung NO" | "Nexus YES" | 1 |
15 | "AT&T" | "$300" | "$200" | "4G YES" | "Retail YES" | "Apple NO" | "Samsung YES" | "Nexus YES" | 14 |
We now will calculate $X_{ij}^k$ from the definition above, where we recall
$$X_{ij}^k = \left\{ {\begin{array}{*{20}{l}} 1&{{\text{if product }}i{\text{ has level j on attribute }}k}\\ 0&{{\text{otherwise}}} \end{array}} \right.$$For example for product:
The cell below performs the following steps:
The for
loops over the attributes
The first loop fixes the attribute to brand
The line below counts the number of levels in the attribute brand
nlevels = len(list(np.unique(conjoint_data_frame[brand])))
The aux
variables is the the list of names corresponding to that attribute (brand):
array(['"AT&T"', '"T-Mobile"', '"US Cellular"', '"Verizon"'], dtype=object)
The following line appends this array into an empty list of level names which becomes:
level_name = [['"AT&T"', '"T-Mobile"', '"US Cellular"', '"Verizon"']]
Next the variables begin
and end
are calculated and a list new_part_worth
is created, which contains the part worth associated with the level 'T-Mobile'. Notice that the list has three elements since the last we be obtained by imposing zero sum:
begin = 1
end = 1 + 4 - 1 = 4
new_part_worth = list(main_effects_model_fit.params[1:4]) = [0.0, -0.25, 0.0]
The command below grabs the parameters from main_effects_model_fit
skipping the intercept
main_effects_model_fit.params[1:4])
The next line calculates the next value since the utilities are zero-centered.
The range of levels of the attribute brand
is appended to the part_worth_range
list
After the for loop is finished, we have a list of list, each sub-list containing the part-worths of an attribute
import numpy as np
import pandas as pd
import statsmodels.api as sm
def lr_params(filename):
df = pd.read_csv(filename)
cols = df.columns.tolist()
dummies = pd.concat([pd.get_dummies(df[col], drop_first = True, prefix= col)
for col in cols[0:-1]], axis=1)
dummies.columns = [c.replace('"','').replace(" ", "_").lower() for c in dummies.columns.tolist()]
X,y = dummies, df[cols[-1]]
X = sm.add_constant(X)
lr = sm.OLS(y, X).fit()
betas = lr.params.round(3)
v = dummies.columns.tolist()
res = list(zip(v,betas))
res = pd.DataFrame(res, columns=['attribute', 'beta'])
attributes = ['brand', 'startup', 'monthly', 'service','retail', 'apple', 'samsung', 'google']
levels, pw, pw_range = [],[],[]
b = 1
for att in attributes:
num_levels = len(list(np.unique(df[att])))
levels.append(list(np.unique(df[att])))
a = b
b = a + num_levels - 1
pw_new = [round(i,3) for i in list(lr.params[a:b])]
pw_new.append((-1) * sum(pw_new))
pw.append(pw_new)
pw_range.append(max(pw_new) - min(pw_new))
importance = []
for item in pw_range:
importance.append(round(100 * (item / sum(pw_range)),2))
name_dict = {'brand' : 'Provider', \
'startup' : 'Start-up Cost', 'monthly' : 'Monthly Cost', \
'service' : '4G Service', 'retail' : 'Nearby retail store', \
'apple' : 'Apple products sold', 'samsung' : 'Samsung products sold', \
'google' : 'Google/Nexus products sold'}
lst = []
idx = 0
for att in attributes:
print('\nAttribute and Importance:', name_dict[att],'and',importance[idx])
print(' Level Part-Worths')
for level in range(len(levels[idx])):
print(' ',levels[idx][level],'-->', pw[idx][level])
lst.append([levels[idx][level],pw[idx][level]])
idx = idx + 1
dfnew = pd.DataFrame(list(zip(name_dict.values(),importance)),
columns=['attribute', 'importance']).sort_values('importance',ascending=False)
lst_new = [[lst[i][0].replace('"',''),lst[i][1]] for i in range(len(lst))]
print(lst_new)
tup = (lr.summary(),res,dfnew,lst_new)
return tup
tup = lr_params(filename)
Attribute and Importance: Provider and 0.96 Level Part-Worths "AT&T" --> -0.25 "T-Mobile" --> -0.0 "US Cellular" --> 0.25 "Verizon" --> -0.0 Attribute and Importance: Start-up Cost and 8.61 Level Part-Worths "$100" --> -0.75 "$200" --> -0.75 "$300" --> -1.5 "$400" --> 3.0 Attribute and Importance: Monthly Cost and 58.85 Level Part-Worths "$100" --> -3.0 "$200" --> -6.25 "$300" --> -10.75 "$400" --> 20.0 Attribute and Importance: 4G Service and 13.4 Level Part-Worths "4G NO" --> 3.5 "4G YES" --> -3.5 Attribute and Importance: Nearby retail store and 1.91 Level Part-Worths "Retail NO" --> -0.5 "Retail YES" --> 0.5 Attribute and Importance: Apple products sold and 1.91 Level Part-Worths "Apple NO" --> -0.5 "Apple YES" --> 0.5 Attribute and Importance: Samsung products sold and 8.61 Level Part-Worths "Samsung NO" --> 2.25 "Samsung YES" --> -2.25 Attribute and Importance: Google/Nexus products sold and 5.74 Level Part-Worths "Nexus NO" --> 1.5 "Nexus YES" --> -1.5 [['AT&T', -0.25], ['T-Mobile', -0.0], ['US Cellular', 0.25], ['Verizon', -0.0], ['$100', -0.75], ['$200', -0.75], ['$300', -1.5], ['$400', 3.0], ['$100', -3.0], ['$200', -6.25], ['$300', -10.75], ['$400', 20.0], ['4G NO', 3.5], ['4G YES', -3.5], ['Retail NO', -0.5], ['Retail YES', 0.5], ['Apple NO', -0.5], ['Apple YES', 0.5], ['Samsung NO', 2.25], ['Samsung YES', -2.25], ['Nexus NO', 1.5], ['Nexus YES', -1.5]]
/Users/marcotavora/miniconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1394: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=16 "anyway, n=%i" % int(n))
print('Summary of statistics:')
tup[0]
print('Utilities:')
tup[1]
print('Importances:')
tup[2]
Summary of statistics:
Dep. Variable: | ranking | R-squared: | 0.999 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.989 |
Method: | Least Squares | F-statistic: | 97.07 |
Date: | Sun, 15 Jul 2018 | Prob (F-statistic): | 0.0794 |
Time: | 00:45:34 | Log-Likelihood: | 10.568 |
No. Observations: | 16 | AIC: | 8.864 |
Df Residuals: | 1 | BIC: | 20.45 |
Df Model: | 14 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 11.1250 | 0.484 | 22.980 | 0.028 | 4.974 | 17.276 |
brand_t-mobile | -0.2500 | 0.354 | -0.707 | 0.608 | -4.742 | 4.242 |
brand_us_cellular | -1.865e-14 | 0.354 | -5.28e-14 | 1.000 | -4.492 | 4.492 |
brand_verizon | 0.2500 | 0.354 | 0.707 | 0.608 | -4.242 | 4.742 |
startup_$200 | -0.7500 | 0.354 | -2.121 | 0.280 | -5.242 | 3.742 |
startup_$300 | -0.7500 | 0.354 | -2.121 | 0.280 | -5.242 | 3.742 |
startup_$400 | -1.5000 | 0.354 | -4.243 | 0.147 | -5.992 | 2.992 |
monthly_$200 | -3.0000 | 0.354 | -8.485 | 0.075 | -7.492 | 1.492 |
monthly_$300 | -6.2500 | 0.354 | -17.678 | 0.036 | -10.742 | -1.758 |
monthly_$400 | -10.7500 | 0.354 | -30.406 | 0.021 | -15.242 | -6.258 |
service_4g_yes | 3.5000 | 0.250 | 14.000 | 0.045 | 0.323 | 6.677 |
retail_retail_yes | -0.5000 | 0.250 | -2.000 | 0.295 | -3.677 | 2.677 |
apple_apple_yes | -0.5000 | 0.250 | -2.000 | 0.295 | -3.677 | 2.677 |
samsung_samsung_yes | 2.2500 | 0.250 | 9.000 | 0.070 | -0.927 | 5.427 |
google_nexus_yes | 1.5000 | 0.250 | 6.000 | 0.105 | -1.677 | 4.677 |
Omnibus: | 29.718 | Durbin-Watson: | 2.000 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2.667 |
Skew: | -0.000 | Prob(JB): | 0.264 |
Kurtosis: | 1.000 | Cond. No. | 9.06 |
Utilities:
attribute | beta | |
---|---|---|
0 | brand_t-mobile | 11.125 |
1 | brand_us_cellular | -0.250 |
2 | brand_verizon | -0.000 |
3 | startup_$200 | 0.250 |
4 | startup_$300 | -0.750 |
5 | startup_$400 | -0.750 |
6 | monthly_$200 | -1.500 |
7 | monthly_$300 | -3.000 |
8 | monthly_$400 | -6.250 |
9 | service_4g_yes | -10.750 |
10 | retail_retail_yes | 3.500 |
11 | apple_apple_yes | -0.500 |
12 | samsung_samsung_yes | -0.500 |
13 | google_nexus_yes | 2.250 |
Importances:
attribute | importance | |
---|---|---|
2 | Monthly Cost | 58.85 |
3 | 4G Service | 13.40 |
1 | Start-up Cost | 8.61 |
6 | Samsung products sold | 8.61 |
7 | Google/Nexus products sold | 5.74 |
4 | Nearby retail store | 1.91 |
5 | Apple products sold | 1.91 |
0 | Provider | 0.96 |