#!/usr/bin/env python # coding: utf-8 # # Course set-up # In[1]: __author__ = "Christopher Potts" __version__ = "CS224u, Stanford, Spring 2020" # This notebook covers the steps you'll need to take to get set up for [CS224u](http://web.stanford.edu/class/cs224u/). # ## Contents # # 1. [Anaconda](#Anaconda) # 1. [The course Github repository](#The-course-Github-repository) # 1. [Main data distribution](#Main-data-distribution) # 1. [Additional installations](#Additional-installations) # 1. [Installing the package requirements](#Installing-the-package-requirements) # 1. [PyTorch](#PyTorch) # 1. [TensorFlow](#TensorFlow) # 1. [NLTK data](#NLTK-data) # 1. [Jupyter notebooks](#Jupyter-notebooks) # ## Anaconda # # We recommend installing [the free Anaconda Python distribution](https://www.anaconda.com/distribution/), which includes IPython, Numpy, Scipy, matplotlib, scikit-learn, NLTK, and many other useful packages. This is not required, but it's an easy way to get all these packages installed. Unless you're very comfortable with Python package management and like installing things, this is the option for you! # # Please be sure that you download the __Python 3__ version, which currently installs Python 3.7. Although our code is largely compatible with Python 2, __we're not supporting Python 2__. # # One you have Anaconda installed, it makes sense to create a virtual environment for the course. In a terminal, run # # ```conda create -n nlu python=3.7 anaconda``` # # to create an environment called `nlu`. # # Then, to enter the environment, run # # ```conda activate nlu``` # # To leave it, you can just close the window, or run # # ```conda deactivate``` # # If your version of Anaconda is older than version 4.4 (see `conda --version`), then replace `conda` with `source` in the above (and consider upgrading your Anaconda!). # # [This page](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) has more detailed instructions on managing virtual environments with Anaconda. # ## The course Github repository # # The core materials for the course are on Github: # # https://github.com/cgpotts/cs224u # # We'll be working in this repository a lot, and it will receive updates throughout the quarter, as we add new materials and correct bugs. # # If you're new to git and Github, we recommend using [Github's Desktop Apps](https://desktop.github.com). Then you just have to clone our repository and sync your local copy with the official one when there are updates. # # If you are comfortable with git in the command line, you can type the following command to clone the course's Github repo: # # ```git clone https://github.com/cgpotts/cs224u``` # ## Main data distribution # # The datasets needed to run the course notebooks and complete the assignments are in the following zip archive: # # http://web.stanford.edu/class/cs224u/data/data.tgz # # We recommend that you download it, unzip it, and place it in the same directory as your local copy of this Github repository. If you decide to put it somewhere else, you'll need to adjust the paths given in the "Set-up" sections of essentially all the notebooks. # ## Additional installations # # Be sure to do these additional installations from __inside your virtual environment__ for the course! # ### Installing the package requirements # # Just run # # ```pip install -r requirements.txt``` # # from inside the course directory to install the core additional packages. # # People who aren't using Anaconda should edit `requirements.txt` so that it installs all the prerequisites that come with Anaconda. For Anaconda users, there's no need to edit it or even open it. # ### PyTorch # # The PyTorch library has special installation instructions depending on your computing environment. For Anaconda users, we recommend # # ```conda install pytorch torchvision -c pytorch``` # # For non-Anaconda users, or if you have a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus), we recommend following the instructions posted here: # # https://pytorch.org/get-started/locally/ # # For this course, you should be running at least version `1.3.0`: # In[2]: import torch torch.__version__ # ### TensorFlow # # We won't work too much with TensorFlow in this course, but the `mittens` package, which implements `GloVe`, will be much faster if TensorFlow is available. It should work under both TensorFlow v1 and v2. # # To install TensorFlow, with Anaconda or another environment: # # ```pip install tensorflow``` # # If you have a CUDA-enabled GPU, you should instead do # # ```pip install tensorflow-gpu``` # # If you're using an older version of Anaconda, you might be better off with # # ```conda install -c conda-forge tensorflow``` # # For additional instructions: # # https://www.tensorflow.org/install/ # # For this course, you should be running at least version 1.15.0. # ### NLTK data # # Anaconda comes with NLTK but not with its data distribution. To install that, open a Python interpreter and run `import nltk; nltk.download()`. If you decide to download the data to a different directory than the default, then you'll have to set `NLTK_DATA` in your shell profile. (If that doesn't make sense to you, then we recommend choosing the default download directory!) # ## Jupyter notebooks # # The majority of the materials for this course are Jupyter notebooks, which allow you to work in a browser, mixing code and description. It's a powerful form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming), and increasingly a standard for open science. # # To start a notebook server, navigate to the directory where you want to work and run # # ```jupyter notebook --port 5656``` # # The port specification is optional. # # This should launch a browser that takes you to a view of the directory you're in. You can then open notebooks for working and create new notebooks. # # A major advantage of working with Anaconda is that you can switch virtual environments from inside a notebook, via the __Kernel__ menu. If this isn't an option for you, then run this command while inside your virtual environment: # # ```python -m ipykernel install --user --name nlu --display-name "nlu"``` # # (If you named your environment something other than `nlu`, then change the `--name` and `--display-name` values.) # # [Additional discussion of Jupyter and kernels.](https://stackoverflow.com/questions/39604271/conda-environments-not-showing-up-in-jupyter-notebook)