Conda is a package and environment manager. It allows you to create environments, ideally one per project, and install packages into them. It is available for Windows, macOS and Linux.
Anaconda is a commercial distribution that comes with many of the packages used by data scientists. Miniconda is a lighter open distribution. Both install conda
, from which you'll be able to install many packages.
conda-forge is a community-driven collection of recipes to build conda packages. It contains many more packages than the official defaults channel.
Get basic information from your conda installation:
!conda info
active environment : ntds_2019 active env location : /home/michael/.conda/envs/ntds_2019 shell level : 1 user config file : /home/michael/.condarc populated config files : /home/michael/.condarc conda version : 4.7.2 conda-build version : not installed python version : 3.7.4.final.0 virtual packages : base environment : /usr (read only) channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/michael/.conda/pkgs envs directories : /home/michael/.conda/envs /usr/envs platform : linux-64 user-agent : conda/4.7.2 requests/2.22.0 CPython/3.7.4 Linux/5.2.6-arch1-1-ARCH arch/ glibc/2.29 UID:GID : 1000:1000 netrc file : None offline mode : False
List your environments:
!conda env list
# conda environments: # complexes /home/michael/.conda/envs/complexes eeg_denoising /home/michael/.conda/envs/eeg_denoising ntds_2019 * /home/michael/.conda/envs/ntds_2019 osmnx /home/michael/.conda/envs/osmnx python2 /home/michael/.conda/envs/python2 scnn /home/michael/.conda/envs/scnn snn /home/michael/.conda/envs/snn base /usr
List the packages in an environment:
!conda list -n ntds_2019
# packages in environment at /home/michael/.conda/envs/ntds_2019: # # Name Version Build Channel _libgcc_mutex 0.1 main attrs 19.1.0 py_0 conda-forge backcall 0.1.0 py_0 conda-forge bleach 3.1.0 py_0 conda-forge bzip2 1.0.8 h516909a_1 conda-forge ca-certificates 2019.9.11 hecc5488_0 conda-forge certifi 2019.9.11 py37_0 conda-forge cffi 1.12.3 py37h8022711_0 conda-forge cpuonly 1.0 0 pytorch curl 7.65.3 hf8cf82a_0 conda-forge cycler 0.10.0 py_1 conda-forge decorator 4.4.0 py_0 conda-forge defusedxml 0.5.0 py_1 conda-forge dgl 0.3.1 py37_0 dglteam entrypoints 0.3 py37_1000 conda-forge expat 2.2.5 he1b5a44_1003 conda-forge freetype 2.10.0 he983fc9_1 conda-forge gettext 0.19.8.1 hc5be6a0_1002 conda-forge git 2.23.0 pl526hce37bd2_2 conda-forge icu 64.2 he1b5a44_1 conda-forge intel-openmp 2019.4 243 ipykernel 5.1.2 py37h5ca1d4c_0 conda-forge ipython 7.8.0 py37h5ca1d4c_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge jedi 0.15.1 py37_0 conda-forge jinja2 2.10.1 py_0 conda-forge joblib 0.13.2 py_0 conda-forge json5 0.8.5 py_0 conda-forge jsonschema 3.0.2 py37_0 conda-forge jupyter_client 5.3.1 py_0 conda-forge jupyter_core 4.4.0 py_0 conda-forge jupyterlab 1.1.3 py_0 conda-forge jupyterlab_server 1.0.6 py_0 conda-forge kiwisolver 1.1.0 py37hc9558a2_0 conda-forge krb5 1.16.3 h05b26f9_1001 conda-forge libblas 3.8.0 12_openblas conda-forge libcblas 3.8.0 12_openblas conda-forge libcurl 7.65.3 hda55be3_0 conda-forge libedit 3.1.20170329 hf8c457e_1001 conda-forge libffi 3.2.1 he1b5a44_1006 conda-forge libgcc-ng 9.1.0 hdf63c60_0 libgfortran-ng 7.3.0 hdf63c60_0 libiconv 1.15 h516909a_1005 conda-forge liblapack 3.8.0 12_openblas conda-forge libopenblas 0.3.7 h6e990d7_1 conda-forge libpng 1.6.37 hed695b0_0 conda-forge libsodium 1.0.17 h516909a_0 conda-forge libssh2 1.8.2 h22169c7_2 conda-forge libstdcxx-ng 9.1.0 hdf63c60_0 markupsafe 1.1.1 py37h14c3975_0 conda-forge matplotlib-base 3.1.1 py37he7580a8_1 conda-forge mistune 0.8.4 py37h14c3975_1000 conda-forge mkl 2019.4 243 nbconvert 5.6.0 py37_1 conda-forge nbformat 4.4.0 py_1 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge networkx 2.3 py_0 conda-forge ninja 1.9.0 h6bb024c_0 conda-forge notebook 6.0.1 py37_0 conda-forge numpy 1.15.4 py37h8b7e671_1002 conda-forge openssl 1.1.1c h516909a_0 conda-forge pandas 0.25.1 py37hb3f55d8_0 conda-forge pandoc 2.7.3 0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.5.1 py_0 conda-forge pcre 8.41 hf484d3e_1003 conda-forge perl 5.26.2 h516909a_1006 conda-forge pexpect 4.7.0 py37_0 conda-forge pickleshare 0.7.5 py37_1000 conda-forge pip 19.2.3 py37_0 conda-forge prometheus_client 0.7.1 py_0 conda-forge prompt_toolkit 2.0.9 py_0 conda-forge ptyprocess 0.6.0 py_1001 conda-forge pycparser 2.19 py37_1 conda-forge pygments 2.4.2 py_0 conda-forge pygsp 0.5.1 py_0 conda-forge pyparsing 2.4.2 py_0 conda-forge pyrsistent 0.15.4 py37h516909a_0 conda-forge python 3.7.3 h33d41f4_1 conda-forge python-dateutil 2.8.0 py_0 conda-forge pytorch 1.2.0 py3.7_cpu_0 [cpuonly] pytorch pytz 2019.2 py_0 conda-forge pyzmq 18.1.0 py37h1768529_0 conda-forge readline 8.0 hf8c457e_0 conda-forge scikit-learn 0.21.3 py37hcdab131_0 conda-forge scipy 1.3.1 py37h921218d_2 conda-forge send2trash 1.5.0 py_0 conda-forge setuptools 41.2.0 py37_0 conda-forge six 1.12.0 py37_1000 conda-forge sqlite 3.29.0 hcee41ef_1 conda-forge terminado 0.8.2 py37_0 conda-forge testpath 0.4.2 py_1001 conda-forge tk 8.6.9 hed695b0_1003 conda-forge tornado 6.0.3 py37h516909a_0 conda-forge traitlets 4.3.2 py37_1000 conda-forge wcwidth 0.1.7 py_1 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.33.6 py37_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zeromq 4.3.2 he1b5a44_2 conda-forge zlib 1.2.11 h516909a_1006 conda-forge
Install packages in an environment. The package will be installed in the activated environment if an environment name is not given.
!conda install -n ntds_2019 git
Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done Solving environment: \ | / - \ | / - \ | / - \ | / - \ | / done # All requested packages already installed.
Want to know more? Look at the conda user guide.
Python is one of the main programming languages used by data scientists, along R and Julia. As an open and general purpose language, it is replacing MATLAB in many scientific and engineering fields. Python is the most popular language used for machine learning.
Below are very basic examples of Python code. Want to learn more? Look at the Python Tutorial.
if 1 == 1:
print('hello')
hello
for i in range(5):
print(i)
0 1 2 3 4
a = 4
while a > 2:
print(a)
a -= 1
4 3
Lists are mutable, i.e., we can change the objects they store.
a = [1, 2, 'hello', 3.2]
print(a)
a[2] = 'world'
print(a)
[1, 2, 'hello', 3.2] [1, 2, 'world', 3.2]
Tuples are not mutable.
(1, 2, 'hello')
(1, 2, 'hello')
Sets contain unique values.
a = {1, 2, 3, 3, 4}
print(a)
print(a.intersection({2, 4, 6}))
{1, 2, 3, 4} {2, 4}
Dictionaries map keys to values.
a = {'one': 1, 'two': 2, 'three': 3}
a['two']
2
def add(a, b):
return a + b
add(1, 4)
5
class A:
d = 10
def add(self, c):
return self.d + c
a = A()
a.add(20)
30
class B(A):
def sub(self, c):
return self.d - c
b = B()
print(b.add(20))
print(b.sub(20))
30 -10
x = 1
x = 'abc'
x
'abc'
add('hel', 'lo')
'hello'
add([1, 2], [3, 4, 5])
[1, 2, 3, 4, 5]
print(int('120') + 10)
print(str(120) + ' items')
130 120 items
Jupyter notebooks allow to mix text, math, code, and results (numerical or figures) in a single document. It is intended for interactive computing and is very useful to explore data, teach concepts, create reports. Code can be written in many programming languages, including Python, Julia, R, MATLAB, C++.
A list:
Text in a paragraph. Text can be italic, bold, verbatim
. We can define hyperlinks.
A numbered list:
Some inline math: $x = \frac12$
Some display math: $$ f(x) = \frac{e^{-x}}{4} $$
20 / 100 * 30
6.0
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
y = np.random.uniform(size=100)
plt.plot(y);
Want to learn more? Look at the documentation.
git is an open-source distributed version control system. It allows users to collaborate on projects (not only software!), synchronize and track their changes. It is most often used with an hosting service such as GitHub or GitLab. Those services add many tools to facilitate issue tracking, code review, continuous integration, etc.
conda install git
.Two kinds of users:
git pull
before every lab. Do not modify the content of the folder. That is like your inbox, you only copy files from there and modify them outside.git branch assignment1_solution
git checkout assignment1_solution
git checkout master
and get new stuff from the TAs with git pull
. Again, you should never modify master (you could do it locally, but only the TAs have write access to the github repo).Those who want to backup or share their work on github.
git remote add my_repo git@github.com:username/ntds_2019.git
git push -u my_repo milestone1_solution
.Same as before, except that you can now make a pull request for your changes to be integrated into master and be available to all of us.
All the code for your projects will have to be handled as a repository on GitHub. While you don't have to collaborate with git (i.e., you can create a single commit at the end with all of your code), we highly recommend you to use it. It is a very good way to manage your project, as it allows you to come back to previous states, synchronize your changes without being lost with versions, track who did what, discuss issues and code, etc. As such, we recommend you to use git from the start to get the basics. Once you feel ready, create a repository for your project and start working on an assignment there.
Below are the basic packages used for scientific computing and data science.
Want to learn more? Look at the Scipy Lecture Notes.
Finally, the below packages will be useful to work with networks and graphs.
We provide a non exhaustive list of tools and concepts that can help you improve your Python coding skills. They are by no means things that you need to master immediately.
Numpy and pytorch indexing and broadcasting rules. They take some time to understand, but are really essential. They will help you avoid writing loops, which are considerably slower and sometimes memory inefficient.
Some common Python built-in functions.
Scipy functions.
pdist
and cdist
are considerably faster than loops to compute pairwise distances between objects (e.g., to build a nearest neighbors graph).
Object-oriented programming.
Google python style guide. Not essential, but following these rules make things easier when you work in a group.
Unit tests.