Conda is a cross platform, package management system widely used in the scientific and data science Python communities. Conda can be used to package and distribute software written in any language but has first class support for Python packages. This talk will briefly cover how to use conda to install and manage data science packages as well as how conda can be used to create isolated computing environments. The main focus of the talk will be an in-depth look at how to easily and reproducibly create conda packages for your own Python software, and options for how to share these packages with others. Finally, combining a collection of conda packages into custom cross-platform installable conda-based Python distributions will be explored.
A number of powerful Python data science libraries exist. Installing and managing these libraries can be difficult and time consuming.
Conda is a cross platform package manager which installs binary conda packages.
# Before talk
conda create -n depy python=3.5
source activate depy
# In terminal during talk
python
>>> import pandas # No module named pandas...what!
conda list # not much installed
conda install pandas
conda install matplotlib ipython
ipython
>>> import pandas
>>> pandas.__version__ # 0.18.1
conda list
conda list pandas
conda remove pandas
conda install pandas=0.16
python -c "import pandas; print(pandas.__version__)" # 0.16.2
conda update pandas
python -c "import pandas; print(pandas.__version__)" # 0.18.1
# lots of other commands
conda --help
Conda can create isolated environments that have their own set of installed and managed packages.
source deactivate
# say we want to more easily go between a pandas 0.16 and 0.18, make seperate environments
conda create
conda create -n depy_pandas16 pandas=0.16 python=2.7 # also want to use Python 2.7
source activate depy_pandas16
python # notice that we have python 2.7
>>> print "Hi DePy"
>>> import pandas
>>> pandas.__version.__
0.16.2
source deactivate
conda create -n depy_pandas18 pandas=0.18 python # defaults to latest Python, 3.5
source activate depy_pandas18
python
>>> import pandas
>>> pandas.__version.__
source deactivate
Packages are hard linked into the enviroment to save disk space.
Conda can search for packages from the repository provided by Continuum as well as packages created by users and shared on the Anaconda cloud.
conda search scikit-learn
conda search tensorflow
anaconda search tensorflow
anaconda show jjhelmus/tensorflow
# Search on Anaconda.org
Conda packages can be built from a recipe and shared on Anaconda.org
cd recipe/nmrglue
cat meta.yaml
cat build.sh
cat bld.sh
cd ..
conda build nmrglue
<lots of text>
anaconda upload ...
cd from_skeleton
conda skeleton pypi nmrglue
A community led collection of recipes, build infrastructure and distributions for the conda package manager.
# Show conda-forge webpage
# Show staged-recipes repo
cd ..
cd staged-recipes
cd recipes
ls
cat meta.yaml
git checkout -b nmrglue
git add .
git commit -m "Add nmrglue recipe"
git pull-request
# Show staged-recipe repo
# Show Py-ART repo
constructor is a tool for creating an installer from a collection of conda packages.
conda install constructor
construct.yaml
file