Michaël Defferrard, PhD student, EPFL LTS2
Conda is a package and environment manager. It allows you to create environments, ideally one per project, and install packages into them. It is available for Windows, macOS and Linux.
Anaconda is a commercial distribution that comes with many of the packages used by data scientists. Miniconda is a lighter open distribution. Both install conda
, from which you'll be able to install many packages.
conda-forge is a community-driven collection of recipes to build conda packages. It contains many more packages than the official defaults channel.
Get basic information from your conda installation:
!conda info
active environment : ntds_2018 active env location : /home/michael/.conda/envs/ntds_2018 shell level : 1 user config file : /home/michael/.condarc populated config files : /home/michael/.condarc conda version : 4.5.11 conda-build version : not installed python version : 3.7.0.final.0 base environment : /usr (read only) channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/linux-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/pro/linux-64 https://repo.anaconda.com/pkgs/pro/noarch package cache : /home/michael/.conda/pkgs envs directories : /home/michael/.conda/envs /usr/envs platform : linux-64 user-agent : conda/4.5.11 requests/2.19.1 CPython/3.7.0 Linux/4.18.8-arch1-1-ARCH arch/rolling glibc/2.28 UID:GID : 1000:1000 netrc file : None offline mode : False
List your environments:
!conda env list
# conda environments: # eeg_denoising /home/michael/.conda/envs/eeg_denoising ntds_2018 * /home/michael/.conda/envs/ntds_2018 numpy_mkl /home/michael/.conda/envs/numpy_mkl numpy_openblas /home/michael/.conda/envs/numpy_openblas python2 /home/michael/.conda/envs/python2 scnn /home/michael/.conda/envs/scnn test /home/michael/.conda/envs/test base /usr
List the packages in an environment:
!conda list -n ntds_2018
# packages in environment at /home/michael/.conda/envs/ntds_2018: # # Name Version Build Channel backcall 0.1.0 py_0 conda-forge blas 1.1 openblas conda-forge bleach 2.1.4 py_1 conda-forge bzip2 1.0.6 h470a237_2 conda-forge ca-certificates 2018.8.24 ha4d7672_0 conda-forge certifi 2018.8.24 py37_1001 conda-forge curl 7.61.0 h93b3f91_2 conda-forge cycler 0.10.0 py_1 conda-forge dbus 1.13.0 h3a4f0e9_0 conda-forge decorator 4.3.0 py_0 conda-forge entrypoints 0.2.3 py37_2 conda-forge expat 2.2.5 hfc679d8_2 conda-forge fontconfig 2.13.1 h65d0f4c_0 conda-forge freetype 2.9.1 h6debe1e_4 conda-forge gettext 0.19.8.1 h5e8e0c9_1 conda-forge git 2.19.0 pl526hbb17d3c_0 conda-forge glib 2.55.0 h464dc38_2 conda-forge gmp 6.1.2 hfc679d8_0 conda-forge gst-plugins-base 1.12.5 hde13a9d_0 conda-forge gstreamer 1.12.5 h61a6719_0 conda-forge html5lib 1.0.1 py_0 conda-forge icu 58.2 hfc679d8_0 conda-forge ipykernel 4.8.2 py37_0 conda-forge ipython 6.5.0 py37_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.4.2 py_0 conda-forge jedi 0.12.1 py37_0 conda-forge jinja2 2.10 py_1 conda-forge jpeg 9c h470a237_1 conda-forge jsonschema 2.6.0 py37_2 conda-forge jupyter 1.0.0 py_1 conda-forge jupyter_client 5.2.3 py_1 conda-forge jupyter_console 5.2.0 py37_1 conda-forge jupyter_core 4.4.0 py_0 conda-forge jupyterlab 0.34.9 py37_0 jupyterlab_launcher 0.13.1 py_2 conda-forge kiwisolver 1.0.1 py37h2d50403_2 conda-forge krb5 1.14.6 0 conda-forge libffi 3.2.1 hfc679d8_5 conda-forge libgcc-ng 7.2.0 hdf63c60_3 conda-forge libgfortran 3.0.0 1 conda-forge libiconv 1.15 h470a237_3 conda-forge libpng 1.6.35 ha92aebf_2 conda-forge libsodium 1.0.16 h470a237_1 conda-forge libssh2 1.8.0 h5b517e9_2 conda-forge libstdcxx-ng 7.2.0 hdf63c60_3 conda-forge libuuid 2.32.1 h470a237_2 conda-forge libxcb 1.13 h470a237_2 conda-forge libxml2 2.9.8 h422b904_5 conda-forge markupsafe 1.0 py37h470a237_1 conda-forge matplotlib 3.0.0 py37h0b34cb6_1 conda-forge mistune 0.8.3 py_0 conda-forge nbconvert 5.3.1 py_1 conda-forge nbformat 4.4.0 py_1 conda-forge ncurses 6.1 hfc679d8_1 conda-forge networkx 2.2 py_1 conda-forge nomkl 3.0 0 notebook 5.6.0 py37_1 conda-forge numpy 1.15.1 py37_blas_openblashd3ea46f_1 [blas_openblas] conda-forge openblas 0.2.20 8 conda-forge openssl 1.0.2p h470a237_0 conda-forge pandas 0.23.4 py37hf8a1672_0 conda-forge pandoc 2.2.2 1 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.3.1 py_0 conda-forge pcre 8.41 hfc679d8_3 conda-forge perl 5.26.2 h470a237_0 conda-forge pexpect 4.6.0 py37_0 conda-forge pickleshare 0.7.4 py37_0 conda-forge pip 18.0 py37_1 conda-forge prometheus_client 0.3.1 py_1 conda-forge prompt_toolkit 1.0.15 py_1 conda-forge pthread-stubs 0.4 h470a237_1 conda-forge ptyprocess 0.6.0 py37_0 conda-forge pygments 2.2.0 py_1 conda-forge pygsp 0.5.1 py_0 conda-forge pyparsing 2.2.1 py_0 conda-forge pyqt 5.6.0 py37h8210e8a_7 conda-forge python 3.7.0 h5001a0f_2 conda-forge python-dateutil 2.7.3 py_0 conda-forge pytz 2018.5 py_0 conda-forge pyzmq 17.1.2 py37hae99301_0 conda-forge qt 5.6.2 hf70d934_9 conda-forge qtconsole 4.4.1 py37_1 conda-forge readline 7.0 haf1bffa_1 conda-forge scipy 1.1.0 py37_blas_openblash7943236_201 [blas_openblas] conda-forge send2trash 1.5.0 py_0 conda-forge setuptools 40.4.0 py37_1000 conda-forge simplegeneric 0.8.1 py_1 conda-forge sip 4.18.1 py37hfc679d8_0 conda-forge six 1.11.0 py37_1 conda-forge sqlite 3.25.1 hb1c47c0_0 conda-forge terminado 0.8.1 py37_1 conda-forge testpath 0.3.1 py37_1 conda-forge tk 8.6.8 ha92aebf_0 conda-forge tornado 5.1 py37h470a237_1 conda-forge traitlets 4.3.2 py37_0 conda-forge wcwidth 0.1.7 py_1 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.31.1 py37_1001 conda-forge widgetsnbextension 3.4.1 py37_0 conda-forge xorg-libxau 1.0.8 h470a237_6 conda-forge xorg-libxdmcp 1.1.2 h470a237_7 conda-forge xz 5.2.4 h470a237_1 conda-forge zeromq 4.2.5 hfc679d8_5 conda-forge zlib 1.2.11 h470a237_3 conda-forge
Install packages in an environment. The package will be installed in the activated environment if an environment name is not given.
!conda install -n ntds_2018 git
Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done # All requested packages already installed.
Want to know more? Look at the conda user guide.
Python is one of the main programming languages used by data scientists, along R and Julia. As an open and general purpose language, it is replacing MATLAB in many scientific and engineering fields. Python is the default language used for machine learning.
Below are very basic examples of Python code. Want to learn more? Look at the Python Tutorial.
if 1 == 1:
print('hello')
hello
for i in range(5):
print(i)
0 1 2 3 4
a = 4
while a > 2:
print(a)
a -= 1
4 3
Lists are mutable, i.e. we can change the objects they store.
a = [1, 2, 'hello', 3.2]
print(a)
a[2] = 'world'
print(a)
[1, 2, 'hello', 3.2] [1, 2, 'world', 3.2]
Tuples are not mutable.
(1, 2, 'hello')
(1, 2, 'hello')
Sets contain unique values.
a = {1, 2, 3, 3, 4}
print(a)
print(a.intersection({2, 4, 6}))
{1, 2, 3, 4} {2, 4}
Dictionaries map keys to values.
a = {'one': 1, 'two': 2, 'three': 3}
a['two']
2
def add(a, b):
return a + b
add(1, 4)
5
class A:
d = 10
def add(self, c):
return self.d + c
a = A()
a.add(20)
30
class B(A):
def sub(self, c):
return self.d - c
b = B()
print(b.add(20))
print(b.sub(20))
30 -10
x = 1
x = 'abc'
x
'abc'
add('hel', 'lo')
'hello'
add([1, 2], [3, 4, 5])
[1, 2, 3, 4, 5]
print(int('120') + 10)
print(str(120) + ' items')
130 120 items
Jupyter notebooks allow to mix text, math, code, and results (numerical or figures) in a single document. It is intended for interactive computing and is very useful to explore data, teach concepts, create reports. Code can be written in many programming languages, including Python, Julia, R, MATLAB, C++.
A list:
Text in a paragraph. Text can be italic, bold, verbatim
. We can define hyperlinks.
A numbered list:
Some inline math: $x = \frac12$
Some display math: $$ f(x) = \frac{e^{-x}}{4} $$
20 / 100 * 30
6.0
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
y = np.random.uniform(size=100)
plt.plot(y);
Want to learn more? Look at the documentation.
git is an open-source distributed version control system. It allows users to collaborate on projects (not only software!), synchronize and track their changes. It is most often used with an hosting service such as GitHub or GitLab. Those services add many tools to facilitate issue tracking, code review, continuous integration, etc.
conda install git
.Two kinds of users:
git pull
before every lab. Do not modify the content of the folder. That is like your inbox, you only copy files from there and modify them outside.git branch milestone1_solution
git checkout milestone1_solution
git checkout master
and get new stuff from the TAs with git pull
. Again, you should never modify master (you could do it locally, but only the TAs have write access to the github repo).Those who want to backup or share their work on github.
git remote add my_repo git@github.com:username/ntds_2018.git
git push -u my_repo milestone1_solution
.Same as before, except you can now make a pull request for your changes to be integrated into master and be available to all of us.
All the code for your projects will have to be handled as a repository on GitHub. While you don't have to collaborate with git (i.e., you can create a single commit at the end with all of your code), we highly recommend you to use it. It is a very good way to manage your project, as it allows you to come back to previous states, synchronize your changes without being lost with versions, track who did what, discuss issues and code, etc. As such, we recommend you to use git from the start to get the basics. Once you feel ready, create a repository for your project and start working on a milestone there.
Below are the basic packages used for scientific computing and data science. We'll introduce them as needed during the following tutorials.
Want to learn more? Look at the Scipy Lecture Notes.
Finally, the below packages will be useful to work with networks and graphs.