Strutural Equation Modelling (SEM) with Jupyter and Anaconda

This notebook is intended as an example of how to "do SEM" using different tools that are accessible in a Jupyter IPython Notebook installed with Anaconda. The notebook uses a Python3 kernel and interfaces with several R packages. Alternatives would be to run an R kernel in the notebook directly, or to use an RStudio markdown notebook and the reticulate package to access Python functionality from R.

Stefan Rank (strank at strank.info)

To Do for future revisions:

For local use, I recommend jupyterlab as the interface, usually acessible as http://localhost:8888/lab

Installing Required (and Optional) Components

To run the code in this notebook, a local (or hosted) installation of Anaconda is required. The following installs R components available through Anaconda. We use the --yes option to avoid prompts for confirmation (impossible in the notebook) but that also means you cannot check what will be installed before it proceeds. To avoid this, you could run the installations from an Anaconda prompt or using the Anaconda Navigator.

R Studio is not strictly required, but it is a useful alternative way of using R. Also note that this assumes an Anaconda installation/environment that you have permission to change. On Windows, that means choosing the recommended "for this user only" option on install of Anaconda. Otherwise, these installations require Administrator / root rights.

In [2]:
!conda install --yes r-essentials
Solving environment: ...working... done

# All requested packages already installed.

In [3]:
# !conda install --yes rstudio
Solving environment: ...working... done

# All requested packages already installed.

In [4]:
!conda install --yes r-lavaan
Solving environment: ...working... done

# All requested packages already installed.

Installing and Enabling the Python-R bridge

Now we need rpy2 which provides the bridge between this python notebook and R. (Also installing tzlocal due to a current dependency bug in rpy2.)

In [3]:
!conda install --yes tzlocal rpy2
Solving environment: ...working... done

# All requested packages already installed.

Let's enable the rpy2 extension, so that we can then execute R code with the %%R magic command at the top of a cell.

In [1]:
%load_ext rpy2.ipython

If you are on Windows, and you do not get text output from %%R cells showing up in the notebook, but instead in the console window where jupyter is running, this is a bug in rpy2 on Windows, and there's a workaround to capture stdout by running the following cells, see https://github.com/vitorcurtis/RWinOut

In [21]:
%%R
install.packages(c("R.utils"))
In [23]:
!curl -O "https://raw.githubusercontent.com/vitorcurtis/RWinOut/master/RWinOut.py"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1259  100  1259    0     0  11550      0 --:--:-- --:--:-- --:--:-- 11550
In [2]:
%load_ext RWinOut
The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

Installing Interface Conveniences

And finally we want to see plots directly in the notebook. The simplest way is to request inline plots.

In [3]:
%matplotlib inline

But assuming you are running this in a JuyterLab interface, you might want the ipympl library to get interactive widget plots.

In [41]:
# We install ipympl with pip, as it is not yet readily available with conda:
!pip install ipympl
In [41]:
# nodejs is needed for the interactive features if using JupyterLab,
# the corresponding package for normal notebooks, widgetsnbextension, should already be installed.
!conda install --yes nodejs
In [41]:
# install the jupyterlab extensions:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager
!jupyter labextension install jupyter-matplotlib
Requirement already satisfied: ipympl in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (0.2.1)
Requirement already satisfied: ipykernel>=4.7 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (5.1.0)
Requirement already satisfied: ipywidgets>=7.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (7.4.2)
Requirement already satisfied: six in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (1.11.0)
Requirement already satisfied: matplotlib>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (2.2.2)
Requirement already satisfied: ipython>=5.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (7.1.1)
Requirement already satisfied: traitlets>=4.1.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (4.3.2)
Requirement already satisfied: jupyter-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.2.3)
Requirement already satisfied: tornado>=4.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.1.1)
Requirement already satisfied: widgetsnbextension~=3.4.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (3.4.2)
Requirement already satisfied: nbformat>=4.2.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (4.4.0)
Requirement already satisfied: numpy>=1.7.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.15.3)
Requirement already satisfied: cycler>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.2.2)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.7.5)
Requirement already satisfied: pytz in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2018.7)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.0.1)
Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.4.0)
Requirement already satisfied: pygments in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.2.0)
Requirement already satisfied: setuptools>=18.5 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (40.5.0)
Requirement already satisfied: decorator in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (4.3.0)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.0.7)
Requirement already satisfied: jedi>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.13.1)
Requirement already satisfied: pickleshare in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.7.5)
Requirement already satisfied: backcall in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.0)
Requirement already satisfied: ipython-genutils in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from traitlets>=4.1.0->ipykernel>=4.7->ipympl) (0.2.0)
Requirement already satisfied: jupyter-core in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (4.4.0)
Requirement already satisfied: pyzmq>=13 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (17.1.2)
Requirement already satisfied: notebook>=4.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.7.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->ipympl) (2.6.0)
Requirement already satisfied: wcwidth in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.7)
Requirement already satisfied: parso>=0.3.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jedi>=0.10->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.3.1)
Requirement already satisfied: Send2Trash in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.5.0)
Requirement already satisfied: prometheus-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2)
Requirement already satisfied: jinja2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (2.10)
Requirement already satisfied: nbconvert in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.3.1)
Requirement already satisfied: terminado>=0.8.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.1)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.0)
Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.4.2)
Requirement already satisfied: bleach in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (3.0.2)
Requirement already satisfied: testpath in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2)
Requirement already satisfied: entrypoints>=0.2.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.2.3)
Requirement already satisfied: mistune>=0.7.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.4)
Requirement already satisfied: webencodings in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.5.1)
Solving environment: ...working... done

# All requested packages already installed.

jupyter-widgets-jupyterlab-manager-0.38.1.tgz
yarn install v1.9.4
info No lockfile found.
[1/5] Validating package.json...
[2/5] Resolving packages...
warning css-loader > cssnano > autoprefixer > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
warning css-loader > cssnano > postcss-merge-rules > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
warning css-loader > cssnano > postcss-merge-rules > caniuse-api > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
[3/5] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[4/5] Linking dependencies...
warning "@jupyterlab/vdom-extension > @nteract/[email protected]" has incorrect peer dependency "[email protected]^15.6.1".
[5/5] Building fresh packages...
success Saved lockfile.
Done in 75.75s.
yarn run v1.9.4
$ webpack
Hash: f1eac10162dd2d44bf4d
Version: webpack 4.12.2
Time: 39111ms
Built at: 2018-11-02 20:57:26
                                                             Asset       Size                                  Chunks             Chunk Names
                                         1.055322dcf6c2bb19185f.js    888 KiB                                       1  [emitted]  
                              674f50d287a8c48dc19ba404d20fe713.eot    162 KiB                                          [emitted]  
                              912ec66d7572ff821749319396470bde.svg    434 KiB                                          [emitted]  
                             fee66e712a8a08eef5805a46892932ad.woff   95.7 KiB                                          [emitted]  
                              b06871f281fee6b241d60582ae9369b9.ttf    162 KiB                                          [emitted]  
                                      main.5a992934528990838951.js   54.6 KiB                                    main  [emitted]  main
                                         0.cc98107762fcc28532b3.js    4.5 KiB                                       0  [emitted]  
                                      vega.91b98e783d16fd1b9e23.js  519 bytes                                    vega  [emitted]  vega
    [email protected]/controls~vega.3dd933b62461edbc58d8.js   22.6 KiB  [email protected]/controls~vega  [emitted]  [email protected]/controls~vega
                              vendors~main.44f7c1af2649541ea6c6.js   9.41 MiB                            vendors~main  [emitted]  vendors~main
                              vendors~vega.1e302f886dbe300cf0ef.js   2.76 MiB                            vendors~vega  [emitted]  vendors~vega
                            af7ae505a9eed503f8b8e6982036873e.woff2   75.4 KiB                                          [emitted]  
         [email protected]/controls.741a7524652a40694e8a.js    266 KiB       vendors~[email protected]/controls  [emitted]  [email protected]/controls
                                  main.5a992934528990838951.js.map   63.5 KiB                                    main  [emitted]  main
                                     0.cc98107762fcc28532b3.js.map   5.73 KiB                                       0  [emitted]  
                                  vega.91b98e783d16fd1b9e23.js.map  251 bytes                                    vega  [emitted]  vega
[email protected]/controls~vega.3dd933b62461edbc58d8.js.map   15.6 KiB  [email protected]/controls~vega  [emitted]  [email protected]/controls~vega
                          vendors~main.44f7c1af2649541ea6c6.js.map   10.9 MiB                            vendors~main  [emitted]  vendors~main
                          vendors~vega.1e302f886dbe300cf0ef.js.map   2.19 MiB                            vendors~vega  [emitted]  vendors~vega
                                     1.055322dcf6c2bb19185f.js.map   1.04 MiB                                       1  [emitted]  
     [email protected]/controls.741a7524652a40694e8a.js.map    303 KiB       [email protected]/controls  [emitted]  [email protected]/controls
                                                        index.html   1.53 KiB                                          [emitted]  
Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map
 [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built]
 [1] vertx (ignored) 15 bytes {main} [optional] [built]
 [4] buffer (ignored) 15 bytes {main} [optional] [built]
 [5] crypto (ignored) 15 bytes {main} [optional] [built]
 [6] readable-stream (ignored) 15 bytes {main} [built]
 [7] supports-color (ignored) 15 bytes {main} [built]
 [8] chalk (ignored) 15 bytes {main} [built]
 [9] fs (ignored) 15 bytes {main} [built]
[10] node-fetch (ignored) 15 bytes {vega} [built]
[11] fs (ignored) 15 bytes {vega} [built]
 [ANye] ./build/index.out.js 35.9 KiB {main} [built]
 [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built]
 [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built]
 [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built]
 [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built]
    + 2390 hidden modules

WARNING in jquery
  Multiple versions of jquery found:
    2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js
    3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js


WARNING in vega-lite
  Multiple versions of vega-lite found:
    2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js
    2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js

Check how you can resolve duplicate packages: 
https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle

Child html-webpack-plugin for "index.html":
     1 asset
    Entrypoint undefined = index.html
    [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built]
    [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built]
    [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built]
    [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built]
        + 1 hidden module
Done in 48.33s.
Node v8.9.3

> C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack @jupyter-widgets/jupyterlab-manager
Node v8.9.3

> node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install
> node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build
jupyter-matplotlib-0.3.0.tgz
yarn install v1.9.4
info No lockfile found.
[1/5] Validating package.json...
[2/5] Resolving packages...
warning css-loader > cssnano > autoprefixer > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
warning css-loader > cssnano > postcss-merge-rules > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
warning css-loader > cssnano > postcss-merge-rules > caniuse-api > [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
[3/5] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[4/5] Linking dependencies...
warning "@jupyterlab/vdom-extension > @nteract/[email protected]" has incorrect peer dependency "[email protected]^15.6.1".
[5/5] Building fresh packages...
success Saved lockfile.
Done in 107.61s.
yarn run v1.9.4
$ webpack
Hash: f1eac10162dd2d44bf4d
Version: webpack 4.12.2
Time: 36409ms
Built at: 2018-11-02 21:00:13
                                                             Asset       Size                                  Chunks             Chunk Names
                                         1.055322dcf6c2bb19185f.js    888 KiB                                       1  [emitted]  
                              674f50d287a8c48dc19ba404d20fe713.eot    162 KiB                                          [emitted]  
                              912ec66d7572ff821749319396470bde.svg    434 KiB                                          [emitted]  
                             fee66e712a8a08eef5805a46892932ad.woff   95.7 KiB                                          [emitted]  
                              b06871f281fee6b241d60582ae9369b9.ttf    162 KiB                                          [emitted]  
                                      main.5a992934528990838951.js   54.6 KiB                                    main  [emitted]  main
                                         0.cc98107762fcc28532b3.js    4.5 KiB                                       0  [emitted]  
                                      vega.91b98e783d16fd1b9e23.js  519 bytes                                    vega  [emitted]  vega
    [email protected]/controls~vega.3dd933b62461edbc58d8.js   22.6 KiB  [email protected]/controls~vega  [emitted]  [email protected]/controls~vega
                              vendors~main.44f7c1af2649541ea6c6.js   9.41 MiB                            vendors~main  [emitted]  vendors~main
                              vendors~vega.1e302f886dbe300cf0ef.js   2.76 MiB                            vendors~vega  [emitted]  vendors~vega
                            af7ae505a9eed503f8b8e6982036873e.woff2   75.4 KiB                                          [emitted]  
         [email protected]/controls.741a7524652a40694e8a.js    266 KiB       [email protected]/controls  [emitted]  [email protected]/controls
                                  main.5a992934528990838951.js.map   63.5 KiB                                    main  [emitted]  main
                                     0.cc98107762fcc28532b3.js.map   5.73 KiB                                       0  [emitted]  
                                  vega.91b98e783d16fd1b9e23.js.map  251 bytes                                    vega  [emitted]  vega
[email protected]/controls~vega.3dd933b62461edbc58d8.js.map   15.6 KiB  [email protected]/controls~vega  [emitted]  [email protected]/controls~vega
                          vendors~main.44f7c1af2649541ea6c6.js.map   10.9 MiB                            vendors~main  [emitted]  vendors~main
                          vendors~vega.1e302f886dbe300cf0ef.js.map   2.19 MiB                            vendors~vega  [emitted]  vendors~vega
                                     1.055322dcf6c2bb19185f.js.map   1.04 MiB                                       1  [emitted]  
     [email protected]/controls.741a7524652a40694e8a.js.map    303 KiB       [email protected]/controls  [emitted]  [email protected]/controls
                                                        index.html   1.53 KiB                                          [emitted]  
Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map
 [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built]
 [1] vertx (ignored) 15 bytes {main} [optional] [built]
 [4] buffer (ignored) 15 bytes {main} [optional] [built]
 [5] crypto (ignored) 15 bytes {main} [optional] [built]
 [6] readable-stream (ignored) 15 bytes {main} [built]
 [7] supports-color (ignored) 15 bytes {main} [built]
 [8] chalk (ignored) 15 bytes {main} [built]
 [9] fs (ignored) 15 bytes {main} [built]
[10] node-fetch (ignored) 15 bytes {vega} [built]
[11] fs (ignored) 15 bytes {vega} [built]
 [ANye] ./build/index.out.js 35.9 KiB {main} [built]
 [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built]
 [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built]
 [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built]
 [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built]
    + 2390 hidden modules

WARNING in jquery
  Multiple versions of jquery found:
    2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js
    3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js


WARNING in vega-lite
  Multiple versions of vega-lite found:
    2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js
    2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js

Check how you can resolve duplicate packages: 
https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle

Child html-webpack-plugin for "index.html":
     1 asset
    Entrypoint undefined = index.html
    [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built]
    [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built]
    [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built]
    [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built]
        + 1 hidden module
Done in 46.88s.
Node v8.9.3

> C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack jupyter-matplotlib
Node v8.9.3

> node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install
> node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build

Now enable widget-based plots:

In [1]:
%matplotlib widget

Finally, since this is a long document, the following extension adds a table-of-contents sidebar to the JupyterLab interface:

In [ ]:
!jupyter labextension install @jupyterlab/toc

Installing More R Packages

Now for some R packages that are not readily available in an Anaconda default install. They might be available through the conda-forge "channel" - however, at the time of writing, I cannot recommend this, as the performance of conda install is abysmal when using R packages from that repository.

When using a server-hosted notebook, some or all of these packages might already be installed.

In [16]:
%%R
install.packages(c("semPlot", "OpenMx", "semTools", "sem", "gpairs", "GGally"))

A note to avoid possible confusion: lavaan provides a function cfa as a convenience for confirmatory factor analysis. There is also an R package called cfa - however, that one is not related to SEM.

Importing and Configuring Python & R Packages

First, import all the basic packages in Python, such as pandas, numpy, matplotlib.

Also import seaborn for simple high-level plots with decent looks. It complements the default plotting provided by matplotlib. See here, for a useful brief overview of looking at data using some of seaborn's plot types: https://elitedatascience.com/python-seaborn-tutorial

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Next, some of the libraries that are statistics-oriented.

In [5]:
import statsmodels.api as sm
import statsmodels.formula.api as smf # the R-like interface for statsmodels
import statsmodels.graphics as smg
import sklearn

Now for loading R packages, we will be using.

In [6]:
%%R
library(lavaan)
library(semPlot)
library(OpenMx)
library(semTools)

In R, there are also several packages providing convenient high-level plots, such as generalized pairs plots.

In [7]:
%%R
library(GGally)
library(gpairs)

Configure Plotting

For the purpose of getting nice visual output, we will also set some defaults for plotting libraries.

In [8]:
defaultfigwidth, defaultfigheight = 10, 9

# set a slightly larger default size for plots. default dpi is 100
plt.rcParams['figure.figsize'] = [defaultfigwidth, defaultfigheight]

# enable seaborn's defaults for nicer plots overall:
sns.set(color_codes=True)

There is currently no built-in way to set default dimensions for plots in R. The %R magic command from the rpy2 library accepts width, height, and units parameters like this: %%R -w 10 -h 9 -u in -r 100 but it would be nice to set defaults.

Since this is python, there is a way around that using monkey-patching. Note that this is usually a Bad Idea(TM) and should be avoided if possible. It is also purely cosmetic for the purposes of this notebook, so it can be safely ignored. :)

In [9]:
# these are the defaults we want to set:
default_units = 'in' # inch, to make it more easily comparable to matpplotlib
default_res = 100 # dpi, same as default in matplotlib
default_width = 10
default_height = 9
# try monkey-patching a function in rpy2, so we effectively get these
# default settings for the width, height, and units arguments of the %R magic command
import rpy2
old_setup_graphics = rpy2.ipython.rmagic.RMagics.setup_graphics

def new_setup_graphics(self, args):
    if getattr(args, 'units') is not None:
        if args.units != default_units: # a different units argument was passed, do not apply defaults
            return old_setup_graphics(self, args)
    args.units = default_units
    if getattr(args, 'res') is None:
        args.res = default_res
    if getattr(args, 'width') is None:
        args.width = default_width
    if getattr(args, 'height') is None:
        args.height = default_height        
    return old_setup_graphics(self, args)

rpy2.ipython.rmagic.RMagics.setup_graphics = new_setup_graphics

Loading Data

We are borrowing example data from this excellent course offered at Harvard, S090A1: https://canvas.harvard.edu/courses/8737/pages/data

The actual data is from the Zambian Early Childhood Development Project. The full sample has more than 1600 Zambian six-year-olds, from a study led by G√ľnther Fink and Stephanie Zuilkowski.

The data is in the proprietory stata format, so we first need to convert it and import it. We will use pandas.read_stata but this could also be accomplished in R with the foreign package. First, we write a helper function to download the data file if it is not in the current directly. Defining a function will allow us to re-use it later for other datasets.

In [10]:
import os.path
import urllib.request

def downloadIfMissing(filenameData, remoteLocation):
    '''Check if the file exists. If not, try downloading from remoteLocation.'''
    if not os.path.isfile(filenameData):
        with urllib.request.urlopen(remoteLocation) as response:
            with open(filenameData, 'xb') as destinationFile:
                destinationFile.write(response.read())
In [11]:
# make sure we have the small data file available in the current directory, if not, try to download it:
filenameSmallZambiaData = "S090_InClass_Zambia.dta"
downloadIfMissing(filenameSmallZambiaData, "https://canvas.harvard.edu/courses/8737/files/1839865/download")
In [12]:
# read the data into a pandas dataframe
smallZambiaDF = pd.read_stata(filenameSmallZambiaData)
len(smallZambiaDF) # should return 1613
Out[12]:
1613
In [13]:
# make sure we have the full measurement data file available in the current directory, if not, try to download it:
filenameMeasureZambiaData = "S090_InClass_Zambia_Measurement.dta"
downloadIfMissing(filenameMeasureZambiaData, "https://canvas.harvard.edu/courses/8737/files/1994882/download")
In [14]:
# read the data into a pandas dataframe
measureZambiaDF = pd.read_stata(filenameMeasureZambiaData)
len(measureZambiaDF) # should return 1623
Out[14]:
1623

Data Overview

We now have the data in a dataframe, let's get an overview of what kind of data we are dealing with.

Now let's have a look at the dataframe.

In [15]:
smallZambiaDF.head() # equivalent to [:5] i.e. first five entries
Out[15]:
childid male urban ece reasoning socemo vocab vocabsq wealth books
0 101 Male Urban ECE 3.0 1.05 18.0 324.0 3.0 No books in home
1 102 Male Urban No ECE 4.0 1.00 19.0 361.0 2.0 No books in home
2 103 Female Urban No ECE 4.0 1.80 19.0 361.0 3.0 No books in home
3 104 Male Urban No ECE 5.0 2.35 12.0 144.0 3.0 No books in home
4 105 Male Urban No ECE 5.0 1.50 25.0 625.0 2.0 Books in home
In [10]:
smallZambiaDF.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1613 entries, 0 to 1612
Data columns (total 10 columns):
childid      1613 non-null int16
male         1613 non-null category
urban        1613 non-null category
ece          1613 non-null category
reasoning    1613 non-null float32
socemo       1613 non-null float32
vocab        1613 non-null float32
vocabsq      1613 non-null float32
wealth       1613 non-null float32
books        1613 non-null category
dtypes: category(4), float32(5), int16(1)
memory usage: 53.9 KB
In [11]:
smallZambiaDF.describe(include='all', percentiles=[]) # describe categorical and numerical columns, don't bother with percentiles
Out[11]:
childid male urban ece reasoning socemo vocab vocabsq wealth books
count 1613.000000 1613 1613 1613 1613.000000 1613.000000 1613.000000 1613.000000 1613.000000 1613
unique NaN 2 2 2 NaN NaN NaN NaN NaN 2
top NaN Female Urban No ECE NaN NaN NaN NaN NaN No books in home
freq NaN 810 814 1110 NaN NaN NaN NaN NaN 1171
mean 4158.768754 NaN NaN NaN 4.451333 1.647789 21.475512 489.346558 2.957222 NaN
std 2376.382896 NaN NaN NaN 2.521065 0.450903 5.307204 216.253510 1.430792 NaN
min 101.000000 NaN NaN NaN 0.000000 0.444444 0.000000 0.000000 1.000000 NaN
50% 4312.000000 NaN NaN NaN 4.000000 1.611111 22.000000 484.000000 3.000000 NaN
max 8125.000000 NaN NaN NaN 10.000000 3.000000 30.000000 900.000000 5.000000 NaN
In [12]:
smallZambiaDF.male.describe()
Out[12]:
count       1613
unique         2
top       Female
freq         810
Name: male, dtype: object
In [13]:
smallZambiaDF.male.value_counts()
Out[13]:
Female    810
Male      803
Name: male, dtype: int64
In [14]:
smallZambiaDF.wealth.value_counts()
Out[14]:
1.0    356
4.0    340
2.0    311
5.0    307
3.0    299
Name: wealth, dtype: int64
In [16]:
# visually check relations between numeric variables
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='scatter')
# we have to use an explicit height per facet-figure here, since a grid of figures doesn't follow the matplotlib default size
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
In [17]:
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='reg') # linear regressions on top of scatter
In [18]:
pd.scatter_matrix(smallZambiaDF)
plt.show()
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: pandas.scatter_matrix is deprecated, use pandas.plotting.scatter_matrix instead
  """Entry point for launching an IPython kernel.
In [19]:
plt.figure()
sns.swarmplot(x="male", y="socemo", data=smallZambiaDF)
plt.show()
In [17]:
%%R -i smallZambiaDF
pairs(smallZambiaDF)
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  res = PandasDataFrame.from_items(items)
In [18]:
%%R -i smallZambiaDF
ggpairs(smallZambiaDF) # from library GGally