by Landung Setiawan (landungs@uw.edu)
We will start this tutorial by looking at a picture of the perfect python environment.
Definition of python virtual environment: a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.
What are the benefits of having a well defined python environment?
Brainstorming: https://etherpad.wikimedia.org/p/geohack2019-conda
pip
as a python package manager,along with virtualenv
as the python environment manager.
2. Next modern classic pipenv
as a python package and environment manager.
3. What we'll learn, conda
.
used by the scientific software community
Miniconda:
Lightweight distribution of conda; only contains the necessary python packages.
Anaconda:
A data science platform distribution of conda; comes with a lot of scientific python packages.
Bob is a post-doc. He has been programming in Python for a few years now,
and he is very comfortable managing his own python environment,
previously using pip
and virtualenv
, but now he's with the "cool" kids using pipenv
.
Recently, his studies are shifting more towards a geospatial focus, and he will need python libraries such as gdal, fiona, and netcdf. Let's see what happens.
%%html
<script
class="scenario-vid"
id="asciicast-EhWPMkBn6I8jVHEvJrkwfVMAg"
src="https://asciinema.org/a/EhWPMkBn6I8jVHEvJrkwfVMAg.js"
data-speed=5 async></script>
Bob got an error ...
Bob looked at https://pypi.org/project/GDAL/, but it's still really confusing to set this up... HELP!
In the other side of the world, we meet Sandy. She is an advanced undergrad that has attended one of the hackweek at UW eScience. She just started to really program in Python. Her senior thesis project requires her to analyze a geospatial entity. Similar to bob she knows that she will need to use gdal, fiona, and netcdf. Having learned about conda
in the hackweek she started following the conda
workflow in creating a new project. Let's see what happens.
%%html
<script id="asciicast-2Kz7LgkK7BJ1gg7gi1wtJZorn"
src="https://asciinema.org/a/2Kz7LgkK7BJ1gg7gi1wtJZorn.js"
data-speed=5 async></script>
Sandy succeeded!
pipenv
is basically just a nice wrapper that uses pip
and virtualenv
under the hoodpip
is simply just a python package managerpip
does not handle library dependencies outside of the python packages as well as the python packages themselvespip
wheels can solve some of the lower level dependencies problems that we run into in bob's case, but GDAL Developers did not include these dependencies within the wheels, users have to set it up themselves!NOTE: Conda can manage pip packages, but pip cannot manage conda packages
# Check conda version to make sure it's installed.
conda info
To see the full documentation for any command, type the command followed by --help. For example, to learn about the conda update command:
conda update --help
What is a conda environment?
# List out available environments
conda env list # The starred * environment is the current activate environment
# Create conda environment from command line (Not Best Practice)
conda create --name myenv --channel conda-forge python=3.6
# Activate conda environment
conda activate myenv
# Deactivate conda environment
conda deactivate
# Create conda environment from environment file (Recommended Best Practice)
conda env create --file environment.yml
# Removing conda environments
conda env remove --yes --name myenv
Sample of environment.yml
name: tutorial-env
channels:
- conda-forge
dependencies:
- python=3.7
- numpy
- matplotlib
- pandas
- bokeh
- rise
- nb_conda_kernels
- ipykernel
If you follow these guidelines, you should be able to give your environment file to anyone, and they will be able to install your packages with no problem.
What is a conda channel?
# List out your channels and priorities
conda config --get channels
# If you have a few trusted channels that you prefer to use, you can pre-configure these so that everytime you are creating an environment, you won’t need to explicitly declare the channel.
conda config --add channels conda-forge
# strict priority and conda-forge at the top will ensure
# that all of your packages will be from conda-forge unless they only exist on defaults
conda config --set channel_priority strict
conda config --set show_channel_urls True
NOTE: The highest priority channel is where your packages will be installed from no matter if another channel has a higher version!
Conda forge is a community led collection of recipes, build infrastructure and distributions for the conda package manager.
Watch Filipe’s talk from pycon, one of the conda-forge lead developer, https://www.youtube.com/watch?v=qJFkIuzD6tI for more info about how to put your packages into the conda-forge channel!
What is a conda package?
You can search for conda packages at https://anaconda.org/ or the terminal shown below.
# Look at the packages you have installed
conda list
# Let's search for gdal conda
conda search gdal
# Install a single conda package
conda install -c conda-forge gdal
# Or install multiple packages
conda install -c conda-forge gdal fiona
# Removing a conda package
conda remove -n myenv gdal
Instruction on how to compile the conda package and its metadata
package:
name: pandas
version:
source:
url: https://github.com/pydata/pandas/archive/v.tar.gz
sha256: d9f67bb17f334ad395e01b2339c3756f3e0d0240cb94c094ef711bbfc5c56c80
build:
number: 0
script: python setup.py install --single-version-externally-managed --record=record.txt
about:
home: http://pandas.pydata.org
license: BSD 3-clause
summary: 'High-performance, easy-to-use data structures and data analysis tools.'
extra:
recipe-maintainers:
- jreback
- jorisvandenbossche
- TomAugspurger
For official walkthrough go to https://bit.ly/tryconda
For conda cheat sheet, go to https://tinyurl.com/y49fjnoj