Before you begin

About the content

The content for Learn Data Science beta is written in IPython Notebook. This is a technology that allows executable code to be run via a browser with the results visible right in the browser. For this edition of the content we are using Python code for executable content. With IPython Notebook the executable code could be R or even MATLAB.

IPython Notebook and Python

  • If you are new to Python see http://python.org - it will be best to spend some time learning basic Python syntax before attempting the interactive parts of these lessons.

  • If you are new to IPython Notebook see http://ipython.org/notebook - prepare to be impressed.

  • If you are new to numpy/scipy/matplotlib/pandas see http://www.scipy.org/ - it is not necessary to have a mastery of all these to get value out of these lessons. However the more you know these technologies the faster you will be able to apply these techniques immediately.

  • In any case you can always read the Overview lesson for each technique and learn the material in parallel.

Important

Please note this very important point before continuing.

When you execute code somewhere in the middle of a notebook, the code fragment may be dependent on imports, intermediate results in local variables and other such initializations in preceding cells. So it is better to execute code cells sequentially, starting at the top, when new to IPython Notebook, to avoid being confused by such extraneous errors.

Even so if you want to dive in and not have to execute each preceding code cell then you'll need to execute the menu item 'Cell->Run All'. This executes each code cell for you sequentially once, so from that point on all dependencies on prior code cells are satisfied. While the code is running you'll see a flashing message towards the top right corner of the notebook, in the grey menubar, which says 'Kernel Busy'. This means the code is being executed once through sequentially. Expand the browser to full screen to make sure you can see the message. When the message goes away and stays that way for 10 seconds or so, the execution is complete. You may now dive right in.

If, during 'run all' you see a pink colored warning message, it can be ignored - it appears to be a Python or IPython issue that does not affect our content. To get rid of it for aesthetic reasons, click on the cell above it and hit shift-enter. It should disappear. Basically, executing the code that generated the message a second time makes the message disappear.

Executing code etc.

  • To execute the code in a particular cell, click on the cell and hit shift-enter.
  • Before you execute the code in an arbitrary cell it is good to run all the code once so that all imports and variables are initialised.
  • To execute all code in a notebook click Menu->Cell->RunAll. You should do this at the start of reading each new notebook.
  • If you see a pink colored warning message, it can be ignored. To get rid of it for aesthetic reasons, click on the cell above it and hit shift enter. It should disappear.
  • On the other hand if you get a full blown exception, it means something went wrong. Typically it means you didn't run Menu->Cell->RunAll and something did not get initialised. The error message itself should tell you more.

  • You will occasionally see a code cell that says "TRY THIS". Do actually try what it's asking you to. The notebook is meant to be interactive.

  • The content gets progressively more challenging and the intent is that a community support system will develop around the content. In the meanwhile there's StackOverflow.

  • Make sure you have all the supporting images and datasets in the right locations or you'll see lots of exceptions. The dataset and image directories are included and should work without error unless they have been moved around.

  • See installation instructions at http://opendst.com for a fresh install from github repo or from a zip file.
  • If you're having problems that are hard to fix, a fresh installation, from instructions at opendst.com, is recommended.