#!/usr/bin/env python # coding: utf-8 # # Revision control software # J.R. Johansson (jrjohansson at gmail.com) # # The latest version of this [IPython notebook](http://ipython.org/notebook.html) lecture is available at [http://github.com/jrjohansson/scientific-python-lectures](http://github.com/jrjohansson/scientific-python-lectures). # # The other notebooks in this lecture series are indexed at [http://jrjohansson.github.io](http://jrjohansson.github.io). # In[13]: from IPython.display import Image # In any software development, one of the most important tools are revision control software (RCS). # # They are used in virtually all software development and in all environments, by everyone and everywhere (no kidding!) # # RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks! # # # ## There are two main purposes of RCS systems: # 1. Keep track of changes in the source code. # * Allow reverting back to an older revision if something goes wrong. # * Work on several "branches" of the software concurrently. # * Tags revisions to keep track of which version of the software that was used for what (for example, "release-1.0", "paper-A-final", ...) # 2. Make it possible for several people to collaboratively work on the same code base simultaneously. # * Allow many authors to make changes to the code. # * Clearly communicating and visualizing changes in the code base to everyone involved. # ## Basic principles and terminology for RCS systems # In an RCS, the source code or digital content is stored in a **repository**. # # * The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository. # # * A user can **checkout** the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated. # # * When a task has been completed, the changes to the local files are **commited** (saved to the repository). # # * If someone else has been making changes to the same files, a **conflict** can occur. In many cases conflicts can be **resolved** automatically by the system, but in some cases we might manually have to **merge** different changes together. # # * It is often useful to create a new **branch** in a repository, or a **fork** or **clone** of an entire repository, when we doing larger experimental development. The main branch in a repository is called often **master** or **trunk**. When work on a branch or fork is completed, it can be merged in to the master branch/repository. # # * With distributed RCSs such as GIT or Mercurial, we can **pull** and **push** changesets between different repositories. For example, between a local copy of there repository to a central online repository (for example on a community repository host site like github.com). # ### Some good RCS software # 1. GIT (`git`) : http://git-scm.com/ # 2. Mercurial (`hg`) : http://mercurial.selenic.com/ # # In the rest of this lecture we will look at `git`, although `hg` is just as good and work in almost exactly the same way. # ## Installing git # On Linux: # # $ sudo apt-get install git # # On Mac (with macports): # # $ sudo port install git # # The first time you start to use git, you'll need to configure your author information: # # $ git config --global user.name 'Robert Johansson' # $ git config --global user.email robert@riken.jp # ## Creating and cloning a repository # To create a brand new empty repository, we can use the command `git init repository-name`: # In[4]: # create a new git repository called gitdemo: get_ipython().system('git init gitdemo') # If we want to fork or clone an existing repository, we can use the command `git clone repository`: # In[5]: get_ipython().system('git clone https://github.com/qutip/qutip') # Git clone can take a URL to a public repository, like above, or a path to a local directory: # In[6]: get_ipython().system('git clone gitdemo gitdemo2') # We can also clone private repositories over secure protocols such as SSH: # # $ git clone ssh://myserver.com/myrepository # ## Status # Using the command `git status` we get a summary of the current status of the working directory. It shows if we have modified, added or removed files. # In[34]: get_ipython().system('git status') # In this case, only the current ipython notebook has been added. It is listed as an untracked file, and is therefore not in the repository yet. # ## Adding files and committing changes # To add a new file to the repository, we first create the file and then use the `git add filename` command: # In[35]: get_ipython().run_cell_magic('file', 'README', '\nA file with information about the gitdemo repository.\n') # In[36]: get_ipython().system('git status') # After having added the file `README`, the command `git status` list it as an *untracked* file. # In[37]: get_ipython().system('git add README') # In[38]: get_ipython().system('git status') # Now that it has been added, it is listed as a *new file* that has not yet been commited to the repository. # In[39]: get_ipython().system('git commit -m "Added a README file" README') # In[40]: get_ipython().system('git add Lecture-7-Revision-Control-Software.ipynb') # In[41]: get_ipython().system('git commit -m "added notebook file" Lecture-7-Revision-Control-Software.ipynb') # In[42]: get_ipython().system('git status') # After *committing* the change to the repository from the local working directory, `git status` again reports that working directory is clean. # ## Commiting changes # When files that is tracked by GIT are changed, they are listed as *modified* by `git status`: # In[43]: get_ipython().run_cell_magic('file', 'README', '\nA file with information about the gitdemo repository.\n\nA new line.\n') # In[44]: get_ipython().system('git status') # Again, we can commit such changes to the repository using the `git commit -m "message"` command. # In[45]: get_ipython().system('git commit -m "added one more line in README" README') # In[46]: get_ipython().system('git status') # ## Removing files # To remove file that has been added to the repository, use `git rm filename`, which works similar to `git add filename`: # In[47]: get_ipython().run_cell_magic('file', 'tmpfile', '\nA short-lived file.\n') # Add it: # In[48]: get_ipython().system('git add tmpfile') # In[49]: get_ipython().system('git commit -m "adding file tmpfile" tmpfile') # Remove it again: # In[51]: get_ipython().system('git rm tmpfile') # In[52]: get_ipython().system('git commit -m "remove file tmpfile" tmpfile') # ## Commit logs # The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the `-m "message"` is omitted when invoking the `git commit` message an editor will be opened for you to type a commit message (for example useful when a longer commit message is required). # # We can look at the revision log by using the command `git log`: # In[53]: get_ipython().system('git log') # In the commit log, each revision is shown with a timestampe, a unique has tag that, and author information and the commit message. # ## Diffs # All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use `git diff` so see what has changed in a file: # In[54]: get_ipython().run_cell_magic('file', 'README', '\nA file with information about the gitdemo repository.\n\nREADME files usually contains installation instructions, and information about how to get started using the software (for example).\n') # In[55]: get_ipython().system('git diff README') # That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff. # # In github (a web-based GIT repository hosting service) it can look like this: # In[24]: Image(filename='images/github-diff.png') # ## Discard changes in the working directory # To discard a change (revert to the latest version in the repository) we can use the `checkout` command like this: # In[58]: get_ipython().system('git checkout -- README') # In[59]: get_ipython().system('git status') # ## Checking out old revisions # If we want to get the code for a specific revision, we can use "git checkout" and giving it the hash code for the revision we are interested as argument: # In[60]: get_ipython().system('git log') # In[61]: get_ipython().system('git checkout 1f26ad648a791e266fbb951ef5c49b8d990e6461') # Now the content of all the files like in the revision with the hash code listed above (first revision) # In[62]: get_ipython().system('cat README') # We can move back to "the latest" (master) with the command: # In[63]: get_ipython().system('git checkout master') # In[64]: get_ipython().system('cat README') # In[65]: get_ipython().system('git status') # ## Tagging and branching # ### Tags # Tags are named revisions. They are useful for marking particular revisions for later references. For example, we can tag our code with the tag "paper-1-final" when when simulations for "paper-1" are finished and the paper submitted. Then we can always retrieve exactly the code used for that paper even if we continue to work on and develop the code for future projects and papers. # In[66]: get_ipython().system('git log') # In[67]: get_ipython().system('git tag -a demotag1 -m "Code used for this and that purpuse"') # In[68]: get_ipython().system('git tag -l') # In[69]: get_ipython().system('git show demotag1') # To retrieve the code in the state corresponding to a particular tag, we can use the `git checkout tagname` command: # # $ git checkout demotag1 # ## Branches # With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when several people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk. # # In GIT, we can create a new branch like this: # In[70]: get_ipython().system('git branch expr1') # We can list the existing branches like this: # In[71]: get_ipython().system('git branch') # And we can switch between branches using `checkout`: # In[81]: get_ipython().system('git checkout expr1') # Make a change in the new branch. # In[74]: get_ipython().run_cell_magic('file', 'README', '\nA file with information about the gitdemo repository.\n\nREADME files usually contains installation instructions, and information about how to get started using the software (for example).\n\nExperimental addition.\n') # In[76]: get_ipython().system('git commit -m "added a line in expr1 branch" README') # In[77]: get_ipython().system('git branch') # In[78]: get_ipython().system('git checkout master') # In[79]: get_ipython().system('git branch') # We can merge an existing branch and all its changesets into another branch (for example the master branch) like this: # # First change to the target branch: # In[82]: get_ipython().system('git checkout master') # In[83]: get_ipython().system('git merge expr1') # In[84]: get_ipython().system('git branch') # We can delete the branch `expr1` now that it has been merged into the master: # In[85]: get_ipython().system('git branch -d expr1') # In[86]: get_ipython().system('git branch') # In[88]: get_ipython().system('cat README') # ## pulling and pushing changesets between repositories # If the repository has been cloned from another repository, for example on github.com, it automatically remembers the address of the parent repository (called origin): # In[5]: get_ipython().system('git remote') # In[4]: get_ipython().system('git remote show origin') # ### pull # We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository: # In[6]: get_ipython().system('git pull origin') # We can register addresses to many different repositories, and pull in different changesets from different sources, but the default source is the origin from where the repository was first cloned (and the work origin could have been omitted from the line above). # ### push # After making changes to our local repository, we can push changes to a remote repository using `git push`. Again, the default target repository is `origin`, so we can do: # In[7]: get_ipython().system('git status') # In[8]: get_ipython().system('git add Lecture-7-Revision-Control-Software.ipynb') # In[9]: get_ipython().system('git commit -m "added lecture notebook about RCS" Lecture-7-Revision-Control-Software.ipynb') # In[11]: get_ipython().system('git push') # ## Hosted repositories # Github.com is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed). # # With a hosted repository it is easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc. # # Some good hosted repositories are # # * Github : http://www.github.com # * Bitbucket: http://www.bitbucket.org # In[14]: Image(filename='images/github-project-page.png') # ## Graphical user interfaces # There are also a number of graphical users interfaces for GIT. The available options vary a little bit from platform to platform: # # http://git-scm.com/downloads/guis # In[15]: Image(filename='images/gitk.png') # ## Further reading # * http://git-scm.com/book # * http://www.vogella.com/articles/Git/article.html # * http://cheat.errtheblog.com/s/git