Saeed Amen - Copyright Cuemacro 2021 - https://www.cuemacro.com - saeed@cuemacro.com
In this notebook, we discuss how to install the Anaconda distribution of Python and other applications you need to run do development in Python, which will be useful for doing financial analysis in Python with libraries including the Cuemacro libraries, finmarketpy, findatapy and chartpy.
You can also download a lot of this material from https://github.com/cuemacro/teaching - in particular the scripts for installing the conda environments. We'll also be installing a lot of other useful libraries for machine learning, natural language processing etc.
We'll also make other suggestions so you can do financial data analysis, such as making sure that your firewalls allow access to sites like Quandl.
This is an important one! Make sure that you can download data via Python for libraries like Quandl and also to install Python.
Once you've done your Python installation and also the py38class
environment detailed below (and have signed up for a Quandl account and API key for free), try running the code at the end of this guide to check it's installed the correct packages
Your firewall should also allow you to have access to allow the downloading of packages via conda and pip
conda config --set proxy_servers.http http://id:pw@proxyserver:port
conda config --set proxy_servers.https https://id:pw@proxyserver:port
pip install --proxy=https://[id:pwd@]proxyserver:port somepackage
If opening up firewalls is too much of an issue or you don't have admin rights to install Python locally, then one easier way to get started with Python for learning purposes, is to use something like Google Colab (see below).
This will however, at the very least need the website https://colab.research.google.com/ to be accessible (some firms may block some Google websites) or website https://repl.it/languages/python3 which offers online Python execution
You can download install the Anaconda distribution for Python from https://www.anaconda.com/distribution
There are versions of Anaconda for Windows, Linux and Mac operating systems. Make sure you install the 64 bit version of Anaconda. The 32 bit version is not compatible with some of the Python libraries we'll be using, and furthermore, it'll likely run out of memory easily.
By default the Anaconda will be installed at the following folder, which will depend on your username. Make sure you keep a note of where you installed Anaconda for later, in particular when you need to tell Anaconda where you installed the distribution!
C:\Users\<your-username>\Anaconda3\
/Users/<your-username>/anaconda3
or ``/Users//home/<your-username>/anaconda3
Anaconda comes with many data science focused Python libraries. However, we'll still need to install quite a few other extra ones. Also in some instances, we'll install a different versions of certain libraries (including an earlier one for Pandas).
On Windows It is also recommended to add some Anaconda folders (should look similar to the below) to your Windows PATH (the Anaconda installer usually has a setting you can tick for this, but if that doesn't work do it in your environment variables in Control Panel). If this isn't set you can have issues when running certain libraries like xlwings.
C:\Users\<your-username>\Anaconda3
C:\Users\<your-username>\Anaconda3\Scripts
C:\Users\<your-username>\Anaconda3\Library\bin
You might need to allow Win32 long paths, otherwise Windows will restrict the number of folders on your path, can end up being very long with Python installations! See https://stackoverflow.com/questions/26155135/node-npm-windows-file-paths-are-too-long-to-install-packages/37528731#37528731 on how to set this.
We can download environment_windows.yml (click to download) from our browser or via curl
in the command line (or it will be alongside the teaching notes folder if you attend a Cuemacro course). We can use that YML file to create our conda environment with all the necessary packages. This also installs the exact same libraries versions that I have (and reduces the likelihood of version conflicts between libraries). The conda environment file is periodically updated for new versions of libraries. Sometimes conda might hang for a very long time, if this is the case, try the "slower" method below.
Open up the Anaconda Prompt (should be in the Start Menu usually labelled Anaconda Prompt (Anaconda3) try to avoid using Anaconda Powershell Prompt (Anaconda3), because it can sometimes have issues with curl
)
In this prompt, your Anaconda folder will be on the path (ie. it will recognise where conda
is installed etc.)
conda activate
and press Enter to exit the current conda environment, then conda remove -n py38class --all --yes
and press Enter to remove any existing environments called py38class
environment_windows.yml
file you can do it from the command line using curl
:curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_windows.yml > environment_windows.yml
and press Enterconda env create -f environment_windows.yml
and press Enterruamel_yaml.reader.ReaderError: unacceptable character #x0000: special characters are not allowed in "<unicode string>", position 3
(see this GitHub issue) try running the below instead and press Enter in the Anaconda Prompt:conda env create -f .\environment_windows.yml
environment_windows.yml
file or you can simply cd
to that folder and then run the above conda commandcd \Users\<your-username>\Downloads
and then pressing EnterThe environment_windows.yml
file (or similar name) basically has all the instructions required to recreate a conda environment
To create your own environment.yml
file (for backup purposes, or if you'd like to distribute your conda environment) run the below command in your Anaconda Prompt
conda env export > environment_windows.yml
A conda environment is a separate version of Python, where you can install all your own Python environments. This is a slower way to do it, but this makes it easier to change the versions of the libraries (note, might not be 100% same versions as my libraries). Also try this method if the above one takes too long.
Note that underneath create_conda_env_windows.bat
are using mamba - which is a faster implementation of conda to install some libraries versus conda. If you have any difficulties with mamba, you might need to change references in create_conda_env_windows.bat
where mamba is installing libraries to conda. If this method still doesn't work/or hangs, skip to the section "Installing a conda environment if all else fails".
create_conda_env_windows.bat
¶create_conda_env_windows.bat
create_conda_env_windows.bat
from GitHub by clicking this linkC:\pythoncourse
(or if you create it elsewhere, keep a note of that)create_conda_env_windows.bat
in the C:\pythoncourse
folderconda
is installed etc.)cd\
C:\
drivecd\pythoncourse
create_conda_env_windows.bat
using the below command (typed all on one line in the Anaconda Prompt then press Enter:curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_windows.bat > create_conda_env_windows.bat
create_conda_env_windows.bat
Both of the above should install the xlwings Python library. You also need to add the xlwings addin to Excel too. Instructions for this can be found at https://docs.xlwings.org/en/stable/addin.html (xlwings is also supported on Excel for Mac, although the functionality may differ). It won't work on Linux, given there's no Linux version of Excel). Usually this will involve, closing Excel, then running the following commands in your Anaconda Prompt, before restarting Excel:
conda activate py38class
xlwings addin install
We can download environment_linux.yml (click to download) if we're installing on Linux or environment_mac.yml (click to download) for Mac from our browser or via curl
in the command line. We can use that YML file to create our conda environment with all the necessary packages . This should be a lot faster to run. This also installs the exact same libraries versions that I have (and reduces the likelihood of version conflicts between libraries). The conda environment file is periodically updated for new versions of libraries. Sometimes conda might hang for a very long time, if this is the case, try the "slower" method below.
Open a Terminal window (usually a black window icon on both Linux and Mac) making sure that Anaconda is on your path on Linux or Mac
In this prompt, your Anaconda folder will be on the path (ie. it will recognise where conda
is installed etc.)
conda activate
to exit the current conda environment and press Enter, then conda remove -n py38class --all --yes
and press Enter to remove any existing environments called py38class
environment_linux.yml
or environment_mac.yml
file you can do it from the command line using curl
:curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_linux.yml > environment_linux.yml
and press Enter (or curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_mac.yml > environment_mac.yml
on Mac)conda env create -f environment_linux.yml
and press Enter (or conda env create -f environment_mac.yml
on Mac)environment_linux.yml
(or environment_mac.yml
) file or you can simply cd
to that folder and then run the above conda commandcd ~/Downloads
and then pressing EnterThe environment_linux.yml
file (or environment_mac.yml
or similar name) basically has all the instructions required to recreate a conda environment
To create your own environment.yml
file (for backup purposes, or if you'd like to distribute your conda environment) run the below command in your Anaconda Prompt
conda env export > environment_linux.yml
on Linuxconda env export > environment_mac.yml
on MacA conda environment is a separate version of Python, where you can install all your own Python environments. For Linux
and Mac, we'll install more libraries, which you might need to use later (some of which aren't fully supported by Windows). Note that underneath create_conda_env_linux.sh
(or Mac) are using mamba - which is a faster implementation of conda to install some libraries versus conda. If you have any difficulties with mamba, you might need to change references in create_conda_env_linux.sh
(or Mac) where mamba is installing libraries to conda. If this method still doesn't work/or hangs, skip to the section "Installing a conda environment if all else fails".
create_conda_env_mac.sh
or create_conda_env_linux.sh
¶create_conda_env_mac.sh
or create_conda_env_linux.sh
create_conda_env_mac.sh
from GitHub by clicking this link or create_conda_env_linux.sh
from GitHub by clicking this link or/Users/<your-username>/pythoncourse
for Mac or /home/<your-username>/pythoncourse
for Linuxcreate_conda_env_linux.bat
in the /Users/<your-username>/pythoncourse
for Mac or /home/<your-username>/pythoncourse
for Linuxcreate_conda_env_mac.sh
or create_conda_env_linux.sh
from the Terminal window/Users/<your-username>/pythoncourse
for Mac or /home/<your-username>/pythoncourse
for Linuxcurl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_mac.sh > create_conda_env_mac.sh
curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_linux.sh > create_conda_env_linux.sh
conda activate
and pressing enter,/Users/<your-username>/opt/anaconda3/bin
) and press enter to runcd /Users/<your-username>/anaconda3/bin
for Maccd /home/<your-username>/anaconda3/bin
for Linuxexport PATH=/Users/<your-username>/anaconda3/bin:$PATH
export PATH=/home/<your-username>/anaconda3/bin:$PATH
.bashrc
which is in your home folder
chmod +x /Users/<your-username>/pythoncourse/create_conda_env_mac.sh
chmod +x /home/<your-username>/pythoncourse/create_conda_env_linux.sh
./Users/<your-username>/pythoncourse/create_conda_env_mac.sh
./home/<your-username>/pythoncourse/create_conda_env_linux.sh
If either the faster or slower methods for installing a conda environment fail, you can try to create something similar to the py38class environment in a more manual way. Then later, you will likely to need to install libraries as you need them through the course.
In your Anaconda Prompt on Windows or Terminal on Linux/Mac, run the following commands.
Switch to the base
environment
conda activate
Remove any existing py38class
environment
conda remove -n py38class --all --yes
Create a barebones py38class
environment (if this hangs, we'll just have to use our base
environment
conda create -n py38class python=3.8
Activate the new py38class
environment (if it managed to install, otherwise we'll just install everything in our base
environment, which isn't ideal, but will at least get us the libraries we need), install the anaconda
libraries, which include things like Pandas, NumPy etc. and install the minimal set of Cuemacro finance libraries to get you started, and do pip install as needed with additional libraries
conda activate py38class
conda install anaconda
(and if you find it hangs remove this line)pip install pandas==1.2.3 finmarketpy chartpy findatapy cufflinks kaleido dash plotly
pip install xlwings
When running a Jupyter notebook if you get a kernal error displaed and in the command line it shows an error like Kernal Error - FileNotFoundError: [WinError 2] The system cannot find the file specified
. Basically, this means that Jupyter can't find the correct Python under your conda environment (eg. py38class
).
One fix is to search for kernel.json
under your Anaconda folder, and check it has the correct path for the Python under your conda environment. Eg. for py38class
we would likely find it in <<Whatever your Anaconda Folder Is>>/py38class/share/jupyter/kernels/python3/kernel.json
and below is a sample file.
{ "argv": [ "c:/Anaconda3/envs/py38class\\python.exe", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "Python 3 (ipykernel)", "language": "python", "metadata": { "debugger": true } }
Once, you've made these changes you'll need to restart Jupyter.
Alternatively, you might need to install the ipykernel
from the command line. To do this in the Anaconda Prompt type in the following:
conda activate py38class
python -m ipykernel install --user --name=py38class
More details can be found on https://github.com/jupyter/notebook/issues/4079 and https://towardsdatascience.com/get-your-conda-environment-to-show-in-jupyter-notebooks-the-easy-way-17010b76e874
C:\Users\<your-username>\Anaconda3\envs\py38class
/Users/<your-username>/anaconda3/envs/py38class
/home/<your-username>/anaconda3/envs/py38class
conda activate
¶Sometimes you might find that running conda activate
in the shell (eg. terminal) doesn't seem to return you to the base
environment or that running conda activate py38class
doesn't change your environment.
To fix this you can try running conda init
in your shell, and then close it and open it.
If you do not want to install Anaconda on your own machine (or if you don't have the correct permissions to do so), you can instead use Google Colab, which gives you a Jupyter notebook in the cloud. This can be a good solution for those wanting to learn Python. You'll need a Google account, otherwise you can't save down your notebooks. You can access it at https://colab.research.google.com/
If you find there are libraries which aren't available, you can install these in Google Colab (as with any Jupyter notebook) using !pip
inside the notebook. Below, we have a number of useful libraries to get you started. You might need to run more than these. Note that you might need to re-run pip on Google Colab every so often, because the server gets restarted, and the library installations will be lost.
There are several other alternatives to Google Colab, such as https://cocalc.com/
!pip install \
redis pathos pyarrow==4.0.0 pandas==1.2.3 quandl \
finmarketpy chartpy findatapy \
cufflinks==0.17.3 kaleido \
plotly==4.14.3
Git is version control software, which maybe useful to install some Python libraries we'll use (in practice you can install these without Git, but they might not be the latest versions). It's also worth understanding how to use version control if you want to code later! You can download and install Git for Windows, Linux and Mac operating systems from https://git-scm.com/downloads
Note that for Linux, you can install from the command line but the syntax depends on your Linux distribution https://git-scm.com/download/linux discusses this in some detail.
If you want to use your GPU for certain operations in particular for PyTorch and TensorFlow, you may need to update your NVIDIA graphics driver. First check if you have a GPU accelerated graphics card which supports CUDA (usually most newer NVIDIA graphics cards). This means you can install GPU accelerated versions of machine libraries like TensorFlow and PyTorch.
To do this, you need to manually install various CUDA libraries. For full details on how to install these see https://www.tensorflow.org/install/gpu (both for Windows and Linux). Once you've done that you can edit the scripts below where indicated, so the GPU enabled Python versions of PyTorch and TensorFlow are installed (rather than the CPU version). Note, the CPU versions work fine, but will be slower and by default the environments below install CPU versions for maximum compatibility in case you don't have an NVIDIA card. If you are running on the cloud, you need to check that the cloud machine you are using has a GPU. Typically free instances are CPU only.
It is helpful to install Bloomberg's blpapi Python library if you can (and have a Bloomberg account and have a Bloomberg Terminal with Windows).
The py38class
environment for Windows already contains blpapi, so you don't need to do this step!
However, if you want to install it without conda, you follow the instructions at https://github.com/cuemacro/findatapy/blob/master/BLOOMBERG.md
If you'd like to use tabula-py (extracting tables from PDF) and pytesseract (for doing optical character recognition), as well as install the Python libraries with pip, you'll also need to some further steps:
tabula-py uses the Java runtime underneath, hence it needs the Java runtime installed on your path.
pytesseract is a wrapper for Tesseract, which needs to be installed first
By default Arctic is installed by py38class
which makes it easy to install Pandas DataFrames. However, if you want to use it with Pandas 1.2.3, you might need to install the latest version from the GitHub repo for Arctic (https://github.com/man-group/arctic), by running the following commands:
conda activate py38class
pip uninstall arctic
pip install git+https://github.com/man-group/arctic.git
(Note this requires Git to be installed as discussed above)Also to use S3 to store Pandas DataFrames using findatapy, you'll need to install s3fs
conda activate py38class
pip install s3fs
More broadly to setup using AWS services via Python, you'll need to install the AWS CLI and set your credentials. This Jupyter notebook examples how to setup AWS CLI and also how to use S3 with findatapy to store DataFrames as Parquet files.
You can test your Python installation, by starting the Anaconda prompt (switching to the right conda environment) and then starting Jupyter (in Google Colab, you'll start a Jupyter notebook via it's own interface) using the below commands.
Note, that you will likely have to change the default notebook-dir parameter (or you can just omit it, in which case Jupyter will use the current working directory).
conda activate py38class
jupyter notebook --notebook-dir='e:/cuemacro/pythoncourse/pythoncourse/notebooks'
Then try running the below Python code in a Jupyter notebook or the Python interpreter to see if some of the libraries we've installed work. This is not an exhaustive test, but only a few which we'll use a lot.
import chartpy
import quandl
import finmarketpy
import findatapy
import pandas
import numpy
import plotly