Python for CogSci Research

IPython and IPython Notebook

http://ipython.org/notebook.html

In [1]:
3 + 5
Out[1]:
8
In [2]:
import os
In [3]:
x = 5
y = 3
x + y
Out[3]:
8
In [4]:
import json
In [5]:
json.dumps({"x": x, "y": y})
Out[5]:
'{"x": 5, "y": 3}'

Project structure

config.json
analysis/
    analyses/
    results/
    figures/
bin/
data/
    human/
    model
experiment/
lib/
stimuli/
  • config.json: global config file that specifies where resources are, model parameters, etc.
  • analysis: all scripts and results for analyzing the data
  • bin: helper scripts for running the experiment, running simulations, viewing stimuli, etc.
  • data: collected or simulated data
  • experiment: psiTurk experiment files
  • lib: importable python files that are used in analysis and model simulations
  • stimuli: one file per stimulus, usually JSON

Running model simulations

NumPy

In [6]:
import numpy as np
In [7]:
np.array([7, 3, 8, 2])
Out[7]:
array([7, 3, 8, 2])
In [8]:
[7, 3, 8, 2]
Out[8]:
[7, 3, 8, 2]
In [9]:
np.arange(10)
Out[9]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]:
np.arange(10) + 1
Out[10]:
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
In [11]:
range(10) + 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-795b8db94926> in <module>()
----> 1 range(10) + 1

TypeError: unsupported operand type(s) for +: 'range' and 'int'
In [12]:
[x + 1 for x in range(10)]
Out[12]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
In [13]:
np.arange(10) + np.arange(1, 11)
Out[13]:
array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

SciPy

In [14]:
import scipy.stats
In [15]:
rv = scipy.stats.norm(0, 1)
rv
Out[15]:
<scipy.stats._distn_infrastructure.rv_frozen at 0x10b0cea20>
In [16]:
samples = rv.rvs(10000)
samples
Out[16]:
array([-1.0612782 , -0.17584912, -0.92949248, ...,  2.07068902,
        0.67094535,  0.16437123])
In [17]:
x = np.linspace(-3, 3)
y = rv.pdf(x)
y
Out[17]:
array([ 0.00443185,  0.00635135,  0.00896675,  0.01247075,  0.01708592,
        0.02306069,  0.03066159,  0.04016108,  0.05182083,  0.0658706 ,
        0.08248352,  0.10174921,  0.12364689,  0.1480211 ,  0.17456307,
        0.20280069,  0.2320998 ,  0.26167871,  0.29063661,  0.31799518,
        0.34275126,  0.36393672,  0.38068082,  0.39226937,  0.39819528,
        0.39819528,  0.39226937,  0.38068082,  0.36393672,  0.34275126,
        0.31799518,  0.29063661,  0.26167871,  0.2320998 ,  0.20280069,
        0.17456307,  0.1480211 ,  0.12364689,  0.10174921,  0.08248352,
        0.0658706 ,  0.05182083,  0.04016108,  0.03066159,  0.02306069,
        0.01708592,  0.01247075,  0.00896675,  0.00635135,  0.00443185])

Matplotlib

In [18]:
%matplotlib inline
import matplotlib.pyplot as plt
In [19]:
plt.hist(samples, normed=True, color='k', bins=100)
plt.plot(x, y, 'r-', lw=3)
Out[19]:
[<matplotlib.lines.Line2D at 0x10bcd60f0>]

Analyzing data

Pandas

In [20]:
import pandas as pd
In [21]:
data = pd.read_csv("experiment.csv")
data.head()
Out[21]:
pid trial correct goesIn layout numBounces response responseTime stim isControl timestamp HoleClass HoleWidth
0 04bd6d 1 True False FM_100 1 False 2343 B_1_6 False 2014-11-17 15:15:49.501000 FM 100
1 04bd6d 2 True True FI_200 1 True 1257 B_1_9 False 2014-11-17 15:15:55.471000 FI 200
2 04bd6d 3 False False CM_100 0 True 2977 B_0_4 False 2014-11-17 15:16:02.895000 CM 100
3 04bd6d 4 True True CI_100 2 True 2176 B_2_2 False 2014-11-17 15:16:08.507000 CI 100
4 04bd6d 5 True True FI_200 0 True 562 B_0_9 False 2014-11-17 15:16:12.666000 FI 200
In [22]:
data.set_index(["pid", "trial"]).sortlevel().head()
Out[22]:
correct goesIn layout numBounces response responseTime stim isControl timestamp HoleClass HoleWidth
pid trial
01f4fb 1 True False FM_100 2 False 1630 B_2_1 False 2014-11-17 15:42:17.860000 FM 100
2 True False CM_100 2 False 1784 B_2_7 False 2014-11-17 15:42:24.314000 CM 100
3 True False FM_100 0 False 1286 B_0_1 False 2014-11-17 15:42:30.528000 FM 100
4 False True FI_100 2 False 908 B_2_11 False 2014-11-17 15:42:37.152000 FI 100
5 False False CM_100 2 True 1126 B_2_15 False 2014-11-17 15:42:43.202000 CM 100
In [23]:
rt = data.groupby(["HoleClass", "HoleWidth"])["responseTime"].median()
rt
Out[23]:
HoleClass  HoleWidth
CI         100          1011.5
           200          1002.5
CM         100          1097.5
           200          1088.0
CtrFI      300           482.0
           350           510.0
CtrFM      100           738.0
FI         100          1020.0
           200           813.5
FM         100           919.0
           200           924.0
Name: responseTime, dtype: float64

Matplotlib

In [24]:
plt.bar(np.arange(len(rt)), rt, align='center')
plt.xticks(np.arange(len(rt)), rt.index)
plt.xlim(-0.75, len(rt) - 0.25)
fig = plt.gcf()
fig.set_figwidth(16)
fig.set_figheight(6)

Seaborn

In [25]:
import seaborn as sns
In [26]:
plt.bar(np.arange(len(rt)), rt, align='center')
plt.xticks(np.arange(len(rt)), rt.index)
plt.xlim(-0.75, len(rt) - 0.25)
fig = plt.gcf()
fig.set_figwidth(16)
fig.set_figheight(6)
In [27]:
sns.factorplot("HoleClass", "responseTime", "HoleWidth", data, estimator=np.median)
fig = plt.gcf()
fig.set_figwidth(8)
fig.set_figheight(6)
/Users/jhamrick/.virtualenvs/ipython3/lib/python3.4/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)

Demo: responses vs. response times

In [28]:
exp = data.groupby('isControl').get_group(False)
exp.head()
Out[28]:
pid trial correct goesIn layout numBounces response responseTime stim isControl timestamp HoleClass HoleWidth
0 04bd6d 1 True False FM_100 1 False 2343 B_1_6 False 2014-11-17 15:15:49.501000 FM 100
1 04bd6d 2 True True FI_200 1 True 1257 B_1_9 False 2014-11-17 15:15:55.471000 FI 200
2 04bd6d 3 False False CM_100 0 True 2977 B_0_4 False 2014-11-17 15:16:02.895000 CM 100
3 04bd6d 4 True True CI_100 2 True 2176 B_2_2 False 2014-11-17 15:16:08.507000 CI 100
4 04bd6d 5 True True FI_200 0 True 562 B_0_9 False 2014-11-17 15:16:12.666000 FI 200
In [29]:
rt_per_stim = exp.groupby(["stim", "HoleClass", "HoleWidth"])["responseTime"].median()
rt_per_stim
Out[29]:
stim   HoleClass  HoleWidth
B_0_0  CI         100           672.5
                  200           781.0
       CM         100          1002.0
                  200          1213.0
       FI         100           499.0
                  200           483.5
       FM         100           749.5
                  200           840.0
B_0_1  CI         100           774.0
                  200           588.5
       CM         100           956.5
                  200           687.5
       FI         100           878.0
                  200           479.5
       FM         100           808.5
...
B_2_8  CI         200          1464.5
       CM         100          1155.5
                  200          1470.0
       FI         100          1574.0
                  200          1464.5
       FM         100          1242.5
                  200          1381.0
B_2_9  CI         100          1307.5
                  200          1108.0
       CM         100          1131.5
                  200           906.5
       FI         100          1207.0
                  200           909.0
       FM         100          1070.0
                  200          1182.0
Name: responseTime, Length: 384, dtype: float64
In [30]:
resp_per_stim = exp.groupby(["stim", "HoleClass", "HoleWidth"])["response"].mean()
resp_per_stim
Out[30]:
stim   HoleClass  HoleWidth
B_0_0  CI         100          0.925000
                  200          0.846154
       CM         100          0.150000
                  200          0.150000
       FI         100          1.000000
                  200          1.000000
       FM         100          0.000000
                  200          0.000000
B_0_1  CI         100          0.948718
                  200          0.973684
       CM         100          0.275000
                  200          0.650000
       FI         100          0.769231
                  200          1.000000
       FM         100          0.075000
...
B_2_8  CI         200          0.650000
       CM         100          0.500000
                  200          0.589744
       FI         100          0.461538
                  200          0.825000
       FM         100          0.175000
                  200          0.230769
B_2_9  CI         100          0.475000
                  200          0.600000
       CM         100          0.875000
                  200          0.500000
       FI         100          0.692308
                  200          0.850000
       FM         100          0.179487
                  200          0.325000
Name: response, Length: 384, dtype: float64
In [31]:
plt.plot(resp_per_stim, rt_per_stim, 'o')
plt.xlabel("Mean response to 'will it go in?'")
plt.ylabel("Median response time")
plt.title("Responses vs. response time")
Out[31]:
<matplotlib.text.Text at 0x10d59a9b0>

Demo: R magic

In [32]:
import rpy2
In [33]:
%load_ext rpy2.ipython
In [34]:
%%R -i exp

library(lme4)

# check for an interaction between the hole class and hole width
full.lmer = lmer(responseTime ~ HoleClass*HoleWidth + (1|stim), exp)
noint.lmer = lmer(responseTime ~ HoleClass + HoleWidth + (1|stim), exp)
anova(noint.lmer, full.lmer)
Loading required package: Matrix
Loading required package: Rcpp
refitting model(s) with ML (instead of REML)
Data: exp
Models:
noint.lmer: responseTime ~ HoleClass + HoleWidth + (1 | stim)
full.lmer: responseTime ~ HoleClass * HoleWidth + (1 | stim)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
noint.lmer  7 261851 261905 -130919   261837                             
full.lmer  10 261830 261906 -130905   261810 27.146      3  5.486e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Demo: Interactive Widgets

In [35]:
from IPython.html.widgets import interact
:0: FutureWarning: IPython widgets are experimental and may change in the future.
In [36]:
@interact
def plot_histogram(num_samples=(0, 10000), bins=(10, 100), normed=True):
    rv = scipy.stats.norm(0, 1)
    samples = rv.rvs(num_samples)
    x = np.linspace(-3, 3)
    y = rv.pdf(x)
    plt.hist(samples, normed=normed, color='k', bins=bins)
    plt.plot(x, y, 'r-', lw=3)

Other tools

  • scikit-learn
  • sympy
  • statsmodels