Introduction

This tutorial contains all the necessary steps you must take in order to start this course. Keep in mind that starting to program is similar to learning a new language. There will be times of frustration and when you don't understand what you're saying/doing wrong, but perserverence is key to success when learning how to program. Another piece of advice for your programming journey is to focus on the structure, logic and concepts of programming.

This tutorial is structured in two parts:

  • One part being the simple version that is meant to get you setup in 15 minutes.
  • And a second part for those interested in a more comprehensive and in depth tutorial (this extend version will assume that basic instalations were already made).

If you are new to programming, we would suggest you keep reading on from here (the simple version). If you already have the basic programming set up jump to the second part named extended version. We begin by introducing the elements you need and then start with getting all set up accompanied with an explanation why you need it. Python, Jupyter Notebook and Github are names that will be familier to you after this tutorial.

Introduction to the Command Line

Terminal

The command line serves as an essential user interface which is navigated by text commands (prompts) rather than a mouse. It is possible to carry out all operations which often cannot be acessed through your normal GUI (Graphical User Interface). On Windows machines it is known as the Command Prompt and on Mac OsX & Linux operating systems it is known as Terminal. When begining to code it is essential to understand some basic commands in order to utilize the command line. Bellow is a table of essential commands with a brief explanation of how to use them.

Commands OsX Windows Linux
List Files ls mv ls
Find Current Location pwd chdir pwd
Change Directory cd cd cd
  • Lisitng Files: When the ls/mv command is entered, the command line will enumerate a list of all the items listed within the directory. When you first open the command line, this will be all the main components of your computer as seen in the figure above.
  • Finding Your Current Location: This command will return exatly where in which directory your are, and the path taken to get to the current directory. For example if the command line returns /User/Jordan/Desktop/Papers/PQA this means you are currently in the folder labeled "PQA", located in the folder "Papers", which is on the desktop of the computer's user "Jordan."
  • Changing Directories: Now that you know how to find where you are located you can begin to navigate in the command line. When cd alone is entered, the command line will return you to the original directory. It can also be used in conjunction with another directory to the specified direcotry. E.g. cd Desktop will take you to the desktop, and cd Desktop/Papers will take you to through the desktop to the folder "Papers."

For more commands follow these links for Linux, Windows, and Mac Os.

Introduction to Python

Python is a programming language and was designed to be readable and easily understood. To better understand what a programming language is, imagine people as computers, in order to get people to do something you have to communicate with them. Programming languages are generally much simpler than normal languages, but different, which is why many people struggle with them, just imagine how you would feel if you had to live in a village in India where they only spoke Hindi, that would be much more difficult. Many beginners find coding with Python highly satisfactory, as they are able to construct prototypes and tools quickly and with ease. But the benefits don't stop there, python is also free and has a large community to help you if you get stuck. It is arguably the most beginner-friendly language, which is why we recommend it in this tutorial to get you started with coding. So lets get started!

Installing Python with Anaconda

Open your browser and follow this link to the anaconda download page, and download the version recommended for your system (Linux, Osx, Windows). Once installed Anaconda is ready to use. NB: Always choose the latest version, which is the one with the highest number.

Jupyter Notebook

After installing Python, you will need to use an Integrated Development Environment (IDE) to begin coding. Try to imagine an IDE as Word or Pages, it helps you to build beautiful documents by writing text in your language, insert illustrations and format it while turning it into something you can use. If you didn't have Word or Pages how would you make your CV, assignment, or application? An IDE lets you write in a programming language and turns into beatiful applications that you can use. Similar to Word and Pages, IDEs help you with supporting functions in this process, like the debugging tool which helps you correct errors just like word helps you with spelling and sentence structure errors. Or by highlighting elements of code (e.g. variables, strings, numbers) for better understanding as well as automatic formatting, which makes sure parenthesis are closed or that lines are indented. Now that you understand the meaning and importance behind the abstract IDE acronym lets continue our progress towards getting fully set up for your coding journey.

In our opinion the best IDE to get started with programming is with Jupyter Notebook Jupyter is also included in the Anaconda Distribution, so if you have followed our instructions to the letter chances are that you already have jupyter notebook on your computer. If not then start following the instructions to the letter. I hear some of you complain "but I'm not gonna use python I'll use JavaScript", to that my response is firstly, don't, and secondly you have to have python installed on your computer to use Jupyter Notebook, so why make life difficult? Just install the Anaconda Distributer and you're set.

How to launch Python with Jupyter Notebook

Once you have Python and an IDE installed (Jupyter Notebook), you are ready to begin coding! To do so, you would need to launch Python and the IDE first. Lucky you are using Jupyter since it's capable of running the code within the IDE itself. If your IDE would not be capable of this, you would need to run your code on the command line or terminal. To launch Jupyter do the following.

Launching Jupyter Notebooks

  1. Open your Terminal/Command Line
  2. Launch Jupyter Notebook with the command jupyter notebook and Jupyter Notebooks UI will then open in your browser
  3. Open a new file in Jupyter's UI and begin coding!

Once you write your first lines of code in the IDE, and feel ready to try out your program, you can run it in your terminal if you are using an IDE other than Jupyter Notebook, or run it in Jupyter Notebook itself as follows.

Running Your Code

  1. Select the cell with your code
  2. Next, press either the run button or use the shortcut shift + enter. You should find that the latest version of Python you have installed has been started.

Now that you have everything set up, you are ready to start experimenting and building stuff. Python will give you the tool necessary to build various applications, but Markdown will help you edit text that you want to show in your application.

Introduction to Markdown

Markdown serves as the main text formating language for Jupyter Notebooks. Markdown is spacing and case sensitive. For instance, this means that when a user types mycode and myCode, the program recognises them as two different variables. The same goes for spacing - mycode and my code would be registered as two sepaprate variables. Markdown is very similar to HTML, as it is designed to be easily converted to HTML. The following is a list of 15 useful Markdown commands and how they appear in when the Markdown code is run.

Some Commands: Markdown Markdown

What is Git and Github?

It is an online distributed version control system which tracks changes made to a project file. In laymans's terms, it is a system which allows you to track all the changes made to a project. This is useful because it makes it easier to collaborate on projects since the system tracks the changes made by you and others. The online platform we will use to access this system is called Github.

To use Git, users would have to "clone" a copy of the online repository from Github onto their own hard drive and work on the file independently. After finalising the changes to the code, they will then upload their edited version back online. Git is primarily used for source-code management in software development, but it can be used to keep track of changes in any set of files.

Key Terms Used

Key Term Explanation
Version Control System A system that track changes in files over time and maintains a library of all past versions of those files. These previous versions may be recalled at a later time. A more detailed explanation is provided in Chapter 4.2.
Repository A folder containing all tracked files as well as the version control history. It can be saved onto a local folder on your computer or it can be stored on an online platform (i.e. remote repository). Github is an example of remote repository.
Snapshot Changes mades while developing a program which may later be committed.
Commit A snapshot of changes made to the staged files.
Stage The staging area holds the files to be included in the next commit.
Track A tracked file is one that is recognized by the Git repository from previous snapshots.
Branching Having multiple versions of the code simultaneously in a repository, where each branch has its own commit history and current version.
Local The version of a repository that is stored on your personal computer.
Remote The version of a repository that is stored on a remote (i.e. online) server.
Clone Create a local copy of a remote repository on your personal computer.
Fork Make a copy of another user’s repository on GitHub to your own account.
Merge To update files by incorporating the changes introduced in new commits.
Pull To retrieve commits from a remote repository and merge them into a local repository.
Push To send commits from a local repository to a remote repository.
Pull request A message sent by one GitHub user to merge the commits in their remote repository into another user’s remote repository.

Step-by-Step Github

Do you have a Github account?

  • YES: then sign in with your log in details.
  • NO: then sign up by creating a password and entering your email address.

Github

Now that you have downloaded the desktop version of Github you have choice between using the desktop interface to work collaboratively on projects or you may use the terminal directly. Below you will find an introduction how to use either of these two options.

  • I want to use the Github desktop to access github. Then click on Github desktop and watch a movie that will explain it much better than we could ever do in text format.
  • I want to use the terminal/command to access github. The command line is the only place where you can run all Git commands. So you should know how to open Terminal in Mac or Linux or Windows. The following table consists of the most commonly used command functions used:
Commands Explanation
git init Initializes a new Git repository and begins tracking an existing directory. It adds a hidden subfolder within the existing directory that houses the internal data structure required for version control.
git clone Creates a local copy of a project that already exists remotely. The clone includes all the project’s files, history, and branches.
git add Stages a change. Git tracks changes to a developer’s codebase, but it’s necessary to stage and take a snapshot of the changes to include them in the project’s history. This command performs staging, the first part of that two-step process. Any changes that are staged will become a part of the next snapshot and a part of the project’s history. Staging and committing separately gives developers complete control over the history of their project without changing how they code and work.
git commit Saves the snapshot to the project history and completes the change-tracking process. Anything that’s been staged with git add will become a part of the snapshot with git commit.
git status Shows the status of changes as untracked, modified, or staged.
git branch Shows the branches being worked on locally.
git checkout Git checkout followed by the name of the branche conducts you to the branch.
git merge Merges lines of development together. This command is typically used to combine changes made on two distinct branches.
git pull Updates the local line of development with updates from its remote counterpart. Developers use this command if a teammate has made commits to a branch on a remote, and they would like to reflect those changes in their local environment.
git push Udates the remote repository with any commits made locally to a branch.
git log Viewing the Commit History.
git help Getting help.

And much more.

Other Recources

When you begin coding you will inevitably run into challenges. Thankfully there are many online communities and platforms where coders come together to help eachother. Some of these platforms include Stackoverflow or subreddits like r/programming and r/pyhton on Reddit. Additionally, there are several free online recources to help you to enhance your coding skills, such as MIT OpenCourseware, SoloLearn, and Codecademy.


Extended Version:

Introduction to Python

What is Python?

Python is a general-purpose programming language, which means that it can be used for nearly everything. Unlike most programming languages, Python is an interpreted language, which means that the written code is not actually translated to a computer-readable format at runtime. This type of language is also referred to as a "scripting language" because it was initially meant for developing simple projects.

Python is also an object-oriented, high-level programming language with dynamic semantics, which makes it highly attractive for Rapid Application Development , as well as a tool to connect existing components together. Python can also be used to process text, display numbers or images, solve scientific equations, and save data. In essence, it is used behind the scenes to process many elements.

Python was designed for its users to learn syntax easily, hence its emphasis on readability. This reduces the cost of program maintenance as it enables teams to collaborate effectively without significant language and experience barriers. Furthermore, Python supports the use of modules and packages, encouraging program modularity, and code reuse across a diversity of projects. Once a module or package has been developed, it may be scaled for use in other projects. The Python interpreter and the extensive standard library are available in source or binary form, free of charge for all major platforms and may be distributed with ease.

Since its inception, the concept of Python being a "scripting language" has changed considerably. Python is now used to write large, commercial style applications instead of trivial ones. This reliance on Python has expanded even more so with the Internet gaining popularity. Today, a large majority of web applications and platforms rely on Python, including Google's search engine, Instagram, and the web-oriented transaction system of the New York Stock Exchange (NYSE). Even NASA utilises Python when to program their equipment and space machinery.

Advantages and how they help you

  • Python is designed with simplicity in mind: It is not a coincidence that there are so many beginner-level programming courses using Python. Its syntax is very readable, even if you are not a machine. It uses human language elements such as not, and, or to signify exatly what you would expect. Its use of indentation instead of brackets also makes it less prone to frustrating errors that simply come up because you used the wrong type of bracket in the wrong place.
  • It is an interpreted, high-level language: Python is in the category of interpreted languages which means you do not need to compile (i.e. "translate") your code before running it, the interpreter will take care of that on-the-fly. This means if you only want to write a few lines of code to get a job done, Python is the language for you. As your code will be translated (interpreted) line-by-line, finding and fixing errors in your code is far easier than with compiled languages. At the same time, Python is high-level, meaning fairly far away from machine code (ones and zeros) in order to be better understandable for humans.
  • It comes with a large number of libraries: Libraries are one of the central parts of Python that will allow you to build a vast array of applications quickly. They are essentially bundles of code that someone else has written in the past and has decided to share with other programmers. For you, this means you do not have to code everything from scratch but you can leverage existing solutions to get you where you want to be, quickly. To provide an example: If you would like to develop an automated trading strategy using Python, there are libraries you can use free of charge to easily backtest it, such as the Backtesting library. By integrating this library into your code, you will be able to use code that someone else wrote to see how your investment strategy would have performed in the past, a standard way to check the validity of investment strategies. This way, instead of painstakingly coding everything yourself, you might be making money already!
  • It is object-oriented and functional: These are two of the main programming paradigms that dictate how you write code. In a nutshell, object orientation means data structures in your code will be modeled to resemble real world concepts. If you are writing a timetable app that shows which members of your team have appointments in a day or week, it might make sense to create a so called class for team members which contains the same type of information for every member. An instance of this class would be created to save everyone's name, email address and appointments. All of these instances are objects.
    Functional programming is a different paradigm that focuses on creating, applying and interlinking functions. Similar to a mathematical function, a function in coding can take inputs and return outputs. In functional programming, functions can return other functions and act as data structures.
    These concepts might sound abstract to you now, so the most important point to take away at this time is that you most likely will not be restricted by programming paradigms when using Python.
  • It is portable (including mobile applications): Portability means that you can write code on any operating system and it will work exactly as well on any other operating system (with some minor limitations, of course).
  • It is an open-source language: As you might already know, Python is free to use even if you plan to sell programs you have written in it. So if you think you have a brilliant solution you would like to sell, you are free to do so without paying licensing fees.

Disadvantages

  • It is limited in execution speed: Simplicity comes at a price. While we mentioned above that the high-level, interpreted nature of Python helps users understand it more intuitively, the same cannot be said for machines. Compared to other languages, Python rides on more layers of "translation" which slows it down. However, do not fret as this will most likely not impact you (unless you are really into things like calculating millions of digits of pi).
  • It has some design restrictions: Again, it is the simplicity of Python that forces us to make some trade-offs. The use of so-called dynamic typing is an example for that. Dynamic typing means you do not have to declare what type a variable has (e.g. whether it is in form of text, called a string like "hello world" or whether it is a number such as an integer like 42) but that Python will figure it out by the time your code runs. While this makes your life much easier by removing another thing to think about, it can lead to errors in the code which you will have to deal with when you run it.

Comparison to other languages

It is important to consider that Python may not be the ideal language to use in all situations. Although one of the most versatile, other languages offer features to address certain types of problems better than others.

Java

Python runs relatively slower than Java. However, due to the program's built-in high-level data types and dynamic typing, Python takes much less time to develop. Typically, Python programs are 3-5 times shorter than equivalent Java programs.

JavaScript

JavaScript and Python are similar in term sof their "objevct-based" subset. Also, both languages support a programming style that uses simple functions and variables without engaging in class definitions.

Unfortunately, apart from the aforementioned, that is all there is to JavaScript. On the contrary, Python has a larger capacity to support writing much larger programs and superior code reuse via its true object-oriented programming style, where inheritance and classes play a vital role.

C++

C++ is originated from C language and provides the feature of compilation. It is similar to Java in terms running speed. However, C++ codes tend to be 5-10 times longer than that of Python.

How to install Python

When installing Python you can download the latest version of Python independently or as part of a distributiuon like Anaconda. The benefits of installing a Python distribution like Anaconda includes the reduced risk of messing up the required system libraries and a access to a wide variety of pre-installed open-source packages. Both options are listed below.

Installing Anaconda

Open your browser and follow this link to the anaconda download page and download the version recommended for your system (Linux, Osx, Windows). Once installed Anaconda is ready to use.

Installing Python

Python can be obtained from the Python Software Foundation website at python.org. You will need to download the relevant installer for your operating system and running it on your machine.

Windows

  1. Go to the Windows section of the Python Software Foundation website here and on the left side of the page under "Stable Releases" go to the latest version (as of now 3.9.2) and click on either "Windows installer (32-bit)" or "Windows installer (64-bit)" depending on whether you have a 32-bit or 64-bit system. If you do not know which architecture your system has, right-click the Windows symbol in your task bar and go to "System". In the section "Device specifications" check the entry next to "System type".
  2. Once the installer has been downloaded, open it by double clicking. Click on "Install now" in the window that will appear: Python installer image Windows

MacOS

While Macs come with Python 2.7 pre-installed and you could just use that, we recommend you install the newest version of Python as follows:

  1. Go to the Mac OS X (sic!) section of the Python Software Foundation website here and on the left side of the page under "Stable Releases" go to the latest version (as of now 3.9.2) and click on either "macOS 64-bit Intel installer" or "macOS 64-bit universal2 installer" depending on whether your machine has an Intel CPU or an Apple Silicon CPU. If you do not know which of the two your system has, click the Apple symbol in the top-left corner of your screen and go to "About this Mac". In the section "Processor" it will tell you what is inside.
  2. Once the installer has been downloaded, open it by double clicking. Click through the installer by hitting "Continue", accept the license and hit "Install".

Python installer image macOS

How to launch Python

Once you have Python and an IDE installed, you are ready to begin coding! To do so, you would need to launch Python and the IDE first. As mentioned before, some IDEs are capable of running the code within the IDE itself. If your IDE is not capable of this, you would need to run your code on the command line or terminal.

Windows

  1. Open Command Prompt
  2. Execute the following command: py filename.py

You should find that the latest version of Python you have installed has been started.

MacOS

  1. Open Terminal window
  2. Execute the following command: python filename.py

You should find that the latest version of Python you have installed has been started.

How and where to look for help?

As you start writing your first lines of code, you will inevitably make mistakes. This, however, is totally normal and happens to everyone all the time - even to the most experienced programmers. Even if those encrypted-looking, frightening red error messages seem annoying and unsatisfactory, it is of most importance to know how to deal with them. Therefore, try to look at your mistakes as the best way to improve your skills and your Python-knowledge and most importantly: don’t panic! This being said, we will now focus on how to interpret an error message and where to look for help efficiently.

How to read an error message

In order to illustrate how to read and deal with an error message, let's best look at an example. Consider the following simple code:

In [1]:
alpha = 10
beta = 20
gamma = alpha + beta

print(Gamma)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-5f651ca246b5> in <module>
      3 gamma = alpha + beta
      4 
----> 5 print(Gamma)

NameError: name 'Gamma' is not defined

In this example, we assigned the integer 10 to a variable called alpha and the integer 20 to another variable called beta. Then we assigned the result of the sum of alpha and beta to a variable called gamma. To show the result of the simple calculation, we used the built-in print function: print(Gamma).

As you can see, an error occurs. Let us show step by step how to deal with this specific error message.

Name Error

We recommend that you start analyzing the error message from bottom to top, as the black arrow suggests. This is due to the following reason (inspired by https://realpython.com/python-traceback/):

What appears first is the name of the error, highlighted yellow in the upper left corner. This gives you a first impression of what went wrong: in this particular case we are obviously dealing with a NameError. However, the second yellow box at the bottom of the code contains exactly the same information, but in more detail. It further tells us that we forgot to define a variable called Gamma. As can be seen, the latter contains additional information, which suggests that we read this one first.

Next we consider the green area, which the arrow points to. This way, the traceback automatically locates the error in the code. In this case, the NameError appeared in the fifth line.

The rest of the error message gives further information on the file name, module name etc. It simply specifies where to find a code, but this part of the error message is rather negligible for our purpose.

With the given information, we can conclude that the error occured because we tried to print out a variable called Gamma, which isn't recognised by Python (because Python is a case-sensitive programming language, i.e., it clearly differetiates between lowercase and uppercase letters, as you will learn later on in another tutorial).

A more general way to analyze an error, detached from a specific example

Since the above example was rather easy, you might ask yourself how to interpret and use such a traceback in another setting, i.e., for another, more complex code. This clearly motivates the following outcome-oriented approach:

  1. Make sure you understand which error occured (e.g. a NameError)
  2. Follow the traceback message from bottom to top to find out in which line the error occured (e.g. in line 5)
  3. Ask yourself what the intended code should do
  4. Try to understand exactly what the code did instead
  5. Come up with solutions to fix the problem

The first four steps became clear with the above example. However, the fifth and maybe most important step - fixing your code to make it work eventually - may cause various difficulties. That's why knowing where to look for help is an aboslutely crucial part of programming. In this respect, we may consider ourselves lucky to be programming in Python, since it has a huge community willing to help you. Most problems which you will be confronted with will very likely alredy have been asked and answered by another programmer.

The arguably most popular forum is called "Stack Overflow", which is a great way to look for answers. Just try googling "How to do XYZ in Python" and you will most likely be directed to a Stack Overflow post. One of the many useful gadgets of Google Colab is the fact that you can search for an answer to your own error message in a blink of a second - just click on the button "Search Stack Overflow" at the end of a traceback (check the screenshot above).

It is essential that you try not to ignore error messages, since this is the only way to really improve in coding and get a broader understanding of how the computer reacts to certain inputs. Additionaly, even though it may seem to slow you down in the beginning, it is definitely the most time efficient way to deal with a problem while coding: as a beginner, you probably won't find your mistake without some external help.

Integrated Development Environemnts

The next important step in preparing your machine to begin coding is to download an integrated development environment (IDE). IDEs serve as text editors, dedicated to creating an easier environment to design, write, organize and share your code. When choosing which IDE is right for you it is important to consider the following features:

  • Language Compatibility: Some IDEs are better suited for certain languages. However, most IDEs are capable of understanding numerous languges.
  • Debugging Tools: Good editors have debugging tools to help find and correct errors in your code.
  • Syntax Highlighting: IDEs generlly highlight keywords, symbols and variables which facilitates understanding of your code.
  • Automatic Fomratting: Majority of IDEs are capable of correctly indenting lines and automatically closing parantheses based on the chosen language.
  • Ability to Run Code from wihtin the IDE
  • Ability to Share Code on Github: Many IDEs include a simplified function for sharing code on Github and provide easy access to other repositories.

Spyder

Spyder is an open-source IDE included in the Anaconda Python Distribution. Spyder primarily targets data scientists and is specifically desgined for Python use.

Spider IDE

Visual Studio Code

Visual Studio Code is a text editor built on Electron. It is a light weight IDE which can be configured to work on almost any task, compatible with almost every language. It is also highly integrated with Git and Githib.

VS Code IDE

Atom

Similar to Visual Studio Code, Atom is an sleek editor built on Electron which was originally purposed for application design. The IDE was created by Github and has a large community around it. It supports Javascript, HTML, and CSS. However, Python must be run on an extension. One major advantage of Atom is its ability to incorporate packages from Github such as Hydrogen. Hydrogen enables it users to run select snippets of code within the editor itself, as is seen in the GIF below.

Atom IDE

Repl.it

Unlike the aformentioned IDEs, Repl.it is an online IDE where Python and Javascript are the primary languages used. It is not as capable as most IDEs, but it still allows for basic coding on multiple devices as it enables users to log into their account anywhere.

Repl.it

PyCharm

PyCharm is one of the most popular and powerful IDEs out there. It includes rich features such as code analysis, machine learning enhanced code completion, Git integration, web development tools (not included in the free version), and more. If you are planning on developing large scale projects with Python, PyCharm might be the IDE for you.

Jupyter Notebook

Jupyter is also included in the Anaconda Distribution and is a versatile tool for programming. Unlike the others, Jupyter is not a conventional IDE as users are able to document their work on it. However, Jupyter is not recomended for the sole use of writing complicated or extensive code. More information about Jupyter Notebook will be elaborated on in the subsequent chapters.

Jupyter Notebook IDE

Introduction to Jupyter

Jupyter

Jupyter is a platform which enables its users to display their programs in plain text while simultaneously sharing their original codes as well. In recent years, Jupyter Notebook has become increasingly popular in the scientific community due to its efficacy in combining scientific results with interactivle code in a plethora of programming langauges. Jupyter can also be used indepedently as an IDE to develop, create, and run your code, making it an increadibly useful and versatile tool.

Jupyter Intro

Installing Jupyter

Jupyter comes preinstalled with the Anaconda Distribution used for Python. However, if one has already downloaded Python independently of Anaconda, there are alternative ways to install Jupyter. You first have to check that you have the latest verion of pip installed. You may do so by executing the command pip install --upgrade pip. If you find that the latest version of pip is not intalled on your device, please follow the insructions on this link. Next use the command pip install jupyter to install Jupyter Notebook on to your device.

Launching Jupyter

For Windows users, launching the Jupyter Notebook can be done easily through the Anaconda application which may be accessed via the start menu. As for OsX and Linux users, the terminal window or command line have to be opened first. Navigate through to the files you want to launch in the Jupyter Notebook (using the aforementioned commands in the introduction), then enter the command: jupyter notebook. This will open the selected folders in Jupyter Notebook's online application in your browser. From this point you can either open your existing .ipynb files or create a new notebook which will look similar to the figure below.

Jupyter Menu

Once you have created a new notebook (or open a previously existing one), you can begin by finding the Help tab in the menu bar and selecting the user interface tour, which takes you through an overview of the features of Jupyter's user interface (UI). Some important featues include the cell function. Cells are containers for text or code to be displayed or executed by the notebook's kernel.

Cell creation example

Markdown cells are used for writing text, creating tables and inserting images. These cells are written in the Markdown code. A brief introduction of what Markdown is would be expounded in the subsequent section.

Coding cells is similar to an IDE, in that you can use them to create and run your code in the notebook. To do so, write the code in a cell like the example below. To run it, select the desired cell and use either the shortcut shift + enter or hit the run button in the cell tab at the top of the page. Try running the python code in the shell below!

In [1]:
print("Hello World!")
Hello World!

For a more exstensive tutorial of the Jupyter Notebook, follow this link.

Git and Github Extended

Github

GitHub is a collaborative code hosting site built on top of the git distributed version control system (DVCS) (refer to Chapter 4.2 for more a more detailed explanation on DVCS). GitHub reposes on a “fork & pull” model in which developers create their own copy of a repository that they then submit via a pull request. With the pull request, developers want the project master to pull their changes into the main branch.

Git GitHub Diagram

In addition to code hosting, collaborative code reviewing, and integrated issue tracking, GitHub has integrated social features as well. Users are able to subscribe to information by “watching” projects and “following” users. Some users can award stars to codes belonging to other users, which essentially has the same effect as "liking" a post on Facebook. Users also have profiles, that can be populated with their personal information, and contain their recent activity on the site. With over 57 million repositories hosted, GitHub is currently the largest code hosting site in the world.

GitHub Scroll

Version Control Systems

People might need to collaborate with developers on other systems. Version control systems are one way to do it. Version Control Systems record changes to a file over time so that that it is possible to recall specific versions later. In other words, version control systems allow one to revert selected files or even the entire project back to a previous state. Version Control Systems also allow one to compare changes over time, see who last modified something, who introduced an issue and when. We typically make a distinction between centralised and distributed version control systems.

Centralised Version Control System (CVCS)

In centralized version control systems, each user typically gets his or her own working copy, but there is only one central repository, often located on remote server. As soon as one commits, it is possible for the other developers to update and to see the changes. To check who made the changes and what the change were, users need to update the centralized server after executing a commit. The centralized server contains all the versioned files and number of developers that checked out files from that central place.

CVCS

However, there are downsides to this. Firstly, the most obvious problem is the single point of failure that the centralized server represents. If the server goes down, during that duration nobody can collaborate or save changes. Secondly, another major issue is the hard disk. If one has the entire history saved in one local folder, he risks to lose everthing if the system fails or crashes.

Distributed Version Control Systems (DVCS)

DVCS

In distributed version control systems, users get their own repository and working copy. Unlike a centralized version control system where working on a single server presents a major risk for the project development, distributed version systems stores in each users' local repository the full history of the file. Thus, if any server fails, edited repositories may be copied and restored back on the server.

To view modifications in a file, there are 4 steps one needs to execute. First, you will need to make a commit. At this stage, others still have no access to the changes made until you push your changes to the central repository. When you make the update, do not get others' changes unless you have first pulled those changes into your repository. Since the system is distributed, e.g. each developer gets their own local repository, nearly every operation can be done offline at incredible speed. This means that you can do commits, branches, merges, etc. file annotation entirely offline and generally instantly.

Advantages and Disadvantages of CVCS and DVCS

Features DVCS CVCS
Users can work productively when not connected to a network Yes No
Common operations such as commits, reverting changes, etc. are faster Yes No
Users can use the changes they do not want to publish Yes No
Initial checkout of a repository is slower (since all branches have to be copied in each local repository) No Yes
Additional storage required for every user to have a complete copy of the codebase history Yes No
Working copies are effectively remote backups Yes No
Various development models can be used Yes No
Common operations such as commits, reverting changes, etc. are faster Yes No

Installing Git

Installing Git is not complicated. Access the homepage and look for the rubrique "Download". You then have to select the OS you are working in.

Git Basics

We previously introduced Git as a distributed version control system. This means that it allows users to efficiently collaborate on a certain project. It is also able to perform actions extremely fast as Git only needs to access the hard drive. With all this information, you may still wonder what the expectations of Git developers were when developing this platform. We list 5 important expectations:

  • Speed.
  • Simple design.
  • Strong support for non-linear development.
  • Fully distributed.
  • Able to handle large projects efficiently.

Speed

Having a tool capable to rapidly take account of modifications in the file makes the collaboration easier. Compared to other systems, Git is often praised for its speed. The major difference between Git and any other VCS is the way Git thinks about data. Most systems view data of a set of files and changes made to each file. However, Git thinks about information as snapshot. When a developer changes a file, Git does not store again the file. Rather, it looks up the file stored in your computer and compares it with an updated file. The difference between the old and new file is the change. By then, Git does not have to ask a remote server to do it what drastically increases its speed.

Simple design

For many Git beginners, Git is a difficult to apprehend. And in fact, it is, especially if you are a windows user since Git provides its best support for Linux, then Mac. You will have to learn and understand a lof of new notions and definitions. However, after doing this, you will probably have a better understanding of Git functions and Git mechanisms. A basic knowledge of git functions is also requiered. The command syntax is also complex and sometimes unusual names. However, once you have mastered it, you should realise that Git is a quite user-friendly program that allows you to efficiently structure your project.

Strong support for non-linear development (branches)

A central feature of Git is branching. In Git, you can create a new local branch for everything you work on. The new local branch is a minor branch that is connected to the mainline, aka master branch. For each feature, each idea or bugfix,you can easily create a new branch, do a few commits on that branch and then merge it into your master branch or throw it away. You don’t have to mess up the master branch just to save or test your experimental ideas.

Fully distributed

In this context, fully distributed means that every developer has their own repository that has the entire commit history of the project. A central property in distributed version control systems.

Able to handle large projects efficiently

Git has some extensive functions to deal with large repositories with a very long history. Two solutions to deal with large repositories are presented by the Atlassian blog:

  • The git shallow clone: Instead of loading the whole history of the repository, we decide to pull down the latest n commits of the history.
  • The git filter: The command allows you to reduce branch complexity by deleting or modyfing some branches in the tree structure.

Git Characteristics

Git Has Integrity

It’s impossible to change the contents of any file or directory without Git detecting it. If a file is lost or information has been lost in a file, if a file get corrupted or if any change has happened, Git is able to detect it. This is due to the fact that Git every information in Git has a correspoding hash value.

Git Generally Only Adds Data

When you do actions in Git, nearly all of them only add data to the Git database. After you commit a snapshot into Git, it is very difficult to lose the information.

The Three States

You will ocassionaly hear about "the three stages" in Git. This simply refers to the possible stages of the file, e.g. commited, modified or staged.

Step-by-Step Github

Create a Repository

A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.

Your hello world repository can be a place where you store ideas, resources, or even share and discuss things with others.

To create a new repository:

  1. In the upper right corner, next to your avatar or identicon, click and then select New repository.
  2. Name your repository hello-world.
  3. Write a short description.
  4. Select Initialize this repository with a README.
  5. Click Create repository.

New Repository

Create a Branch/fork

Branching is the way to work on different versions of a repository at one time.

By default your repository has one branch named master which is considered to be the definitive branch. We use branches to experiment and make edits before committing them to master.

When you create a branch off the master branch, you’re making a copy, or snapshot, of master as it was at that point in time. If someone else made changes to the master branch while you were working on your branch, you could pull in those updates.

This diagram shows:

  • The master branch
  • A new branch called feature (because we’re doing ‘feature work’ on this branch)
  • The journey that feature takes before it’s merged into master

Branch

Have you ever saved different versions of a file? For instance:

  • story.txt
  • story-joe-edit.txt
  • story-joe-edit-reviewed.txt

Branches accomplish similar goals in GitHub repositories.

During a programming course, like this one, you can use branches for keeping bug fixes (improving/repairing code) and feature work (building new functions to an application) separate from our master (production, which contains all accepted side branches/forks) branch. When a change is ready, they merge their branch into master.

To create a new branch

  1. Go to your new repository hello-world.
  2. Click the drop down at the top of the file list that says branch: master.
  3. Type a branch name, readme-edits, into the new branch text box.
  4. Select the blue Create branch box or hit “Enter” on your keyboard.

Create Branch

Now you have two branches, master and readme-edits. They look exactly the same, but not for long! Next we’ll add our changes to the new branch.

Make and commit changes

Bravo! Now, you’re on the code view for your readme-edits branch, which is a copy of master. Let’s make some edits.

On GitHub, saved changes are called commits. Each commit has an associated commit message, which is a description explaining why a particular change was made. Commit messages capture the history of your changes, so other contributors can understand what you’ve done and why.

Make and commit changes

  1. Click the README.md file.
  2. Click the pencil icon in the upper right corner of the file view to edit.
  3. In the editor, write a bit about yourself.
  4. Write a commit message that describes your changes.
  5. Click Commit changes button.

Make and committ

These changes will be made to just the README file on your readme-editsbranch, so now this branch contains content that’s different from master.

Open a Pull Request

Nice edits! Now that you have changes in a branch off of master, you can open a pull request.

Pull Requests are the heart of collaboration on GitHub. When you open a pull request, you are requesting that the original author review your proposed changes and pull in your contribution and merge them into their branch. Pull requests show the differences between the content from both branches. The changes, additions, and subtractions are shown in green and red. As soon as you make a commit, you can open a pull request and start a discussion, even before the code is finished.

By using GitHub’s @mention system in your pull request message, you can ask for feedback from specific people or teams, whether they’re down the hall or 10 time zones away.

You can even open pull requests in your own repository and merge them yourself. It’s a great way to learn the GitHub flow before working on larger projects.

Open a Pull Request for changes to the README Create Pull Create Pull