How Long Does It Take To Review an IPython Pull Request?¶

Fernando Perez wrote:

IPython: 8 minutes from bug on my box to reviewed/merged PR. We're doing something right... https://github.com/ipython/ipython/pull/4294

This got me thinking: how long does it usually take to do a review on the IPython project?

In [1]:

import pandas as pd
import json
import itertools
import collections
import re
import arrow
rcParams['figure.figsize'] = (15, 3)
pd.set_option('display.mpl_style', 'default')

I recently released a python package called git2json for parsing git logs. It isn't well written, but it does what it says on the tin.

In [2]:

!git2json --git-dir=/home/tavish/code/ipython/.git > ipython-log.json

We can parse the resulting JSON file and take a peek at the data structure.

In [3]:

log = json.load(open('ipython-log.json'))
print log[0]

{u'committer': {u'date': 1380325308, u'timezone': u'-0700', u'name': u'Fernando Perez', u'email': u'fernando.perez@berkeley.edu'}, u'author': {u'date': 1380325308, u'timezone': u'-0700', u'name': u'Fernando Perez', u'email': u'fernando.perez@berkeley.edu'}, u'tree': u'ea7fbb2694175423d776f649104ba42520db7cea', u'parents': [u'795c9e30ee8ad2e1a4e780c439fc50b565670e26', u'796a90bdeef377041c353019fdae19aaa72a76d3'], u'commit': u'b8295e9af6745b9c466d11bb31a6eef221e231c1', u'message': u"Merge pull request #4294 from minrk/tornado-2don't require tornado 3 in `--post serve`", u'changes': []}

In [4]:

print "Number of commits = ", len(log)

Number of commits =  12382

Before we answer the question posed at the beginning of this article, I'll answer some easier questions. First off: which files in IPython have been committed to the most? This is a simplified version of "code churn" which is reasonably effective for predicting bugs. (More complicated models include lines modified or take semantic differences into account). Since this is Friday night and I'm not trying to get a paper into ICSE, I'll just take the number of commits for each .py file.

In [5]:

file_changes = lambda: itertools.chain.from_iterable(
    [change[2] for change in commit['changes'] if re.match(r'^.*\.py$', change[2])]
    for commit in log
)

In [6]:

rcParams['figure.figsize'] = (15, 6)
fchanges = file_changes()
fchange_count = collections.Counter(fchanges)
a = average(fchange_count.values())
most_common = fchange_count.most_common(20)
df = pd.DataFrame(most_common)
df.head()
df.index = df[0]
df = df[[1]]
df.head()
p = df.plot(kind='bar', legend=False)
p.set_title('Most-changed Files in IPython')
p.set_ylabel('Commits')
# Draw a red line at the average
hlines(a, 0, len(df), colors='r')

Out[6]:

<matplotlib.collections.LineCollection at 0x831ed90>

/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:1214: UserWarning: findfont: Font family ['monospace'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))

Someone more familiar with IPython development will know how obvious the above graph is.

Next, I'll make a simple plot showing weekly commit counts over time, similar to the plots GitHub would show you. I'll create a data frame from a list in the format [(date_rounded_down_to_week, commit_id)] and then groupby() the date.

In [7]:

def weekly_date_resolution(ts):
    ar = arrow.arrow(ts)
    day_of_month = ar.timetuple().tm_mday
    week = int(day_of_month) / 7
    new_day = (week*7)+1
    assert new_day > 0
    assert new_day < 30
    try:
        day_adjusted = ar.replace(day=new_day)
    except ValueError:
        new_day = day_of_month # just keep the original
        day_adjusted = ar.replace(day=new_day)
    return day_adjusted.date()

commit_times = lambda: (
    (weekly_date_resolution(commit['committer']['date']), commit['commit'])
    for commit in log
)
dfct = pd.DataFrame(list(commit_times()), columns=['date', 'id'])
dfct = dfct.groupby('date').aggregate(len)
dfct.head()

Out[7]:

	id
date
2005-07-01	1
2005-07-15	6
2005-08-08	2
2005-08-15	4
2005-08-22	1

A few more lines gives us a basic plot.

In [8]:

p = dfct.plot(legend=False)
p.set_title('Weekly commits on IPython')
p.set_ylabel('Commits')

Out[8]:

<matplotlib.text.Text at 0x8303250>

Let's get back to the original question: how long does it usually take to do a review?

I'll make a distinction between two different kinds of commits:

Merge commits: The commit that merged the pull request branch into the main branch.
PR commits: The HEAD (or most recent) commit on the pull request's source branch.

In [9]:

def is_pull_request_merge(message):
    '''Match commit messages that start with "Merge pull request #"'''
    return re.match(r'^Merge pull request #', message)


def pull_request_number(message):
    '''Extract the PR # from the commit message.'''
    return int(re.match(r'^Merge pull request #(\d+)', message).groups()[0])


# Create a table for the merge commits.
merge_commits = [
    [
        commit['parents'][1], # the second parent seems to be the source branch
        commit['committer']['date'],
        pull_request_number(commit['message']),
    ]
    for commit in log if is_pull_request_merge(commit['message'])
]
# Also create a set of merge commits for use in the next cell.
merge_commit_parent_ids = [mc[0] for mc in merge_commits]
merge_df = pd.DataFrame(
    merge_commits,
    columns=['merge_commit', 'merge_timestamp', 'pr'],
    index=merge_commit_parent_ids,
)
merge_df.head()

Out[9]:

	merge_commit	merge_timestamp	pr
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325308	4294
b0d6287174b62de9b89190b868317d12b5bc4ae4	b0d6287174b62de9b89190b868317d12b5bc4ae4	1380323422	4270
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380301596	4278
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380271407	4282
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380215023	4279

In [10]:

# Create a table for pull request commits.
pr_commits = [
    [
        commit['commit'],
        commit['committer']['date'],
        len(commit['changes']), # let's sneak in another basic churn metric
    ]
    for commit in log
    # Do we have a merge commit for this commit?
    if commit['commit'] in merge_commit_parent_ids
]
commit_ids = [prc[0] for prc in pr_commits]
pr_df = pd.DataFrame(pr_commits, columns=['commit', 'commit_timestamp', 'churn'], index=commit_ids)
pr_df.head()

Out[10]:

	commit	commit_timestamp	churn
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325245	1
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380162595	1
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380158115	3
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380153904	1
d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380099263	1

Now that we have these two data frames, we can merge them together.

In [11]:

both_df = pr_df.join(merge_df)
both_df.head()

Out[11]:

	commit	commit_timestamp	churn	merge_commit	merge_timestamp	pr
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325245	1	796a90bdeef377041c353019fdae19aaa72a76d3	1380325308	4294
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380162595	1	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380271407	4282
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380158115	3	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380215023	4279
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380153904	1	07b4962bd787068896f0e2b5dfb977441c7b391a	1380301596	4278
d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380099263	1	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380149814	4131

We can take the delta by subtracting the merge timestamp from the commit timestamp. This is the difference between the commit-under-review's timestamp and the merge timestamp, in seconds. We can do some basic math to get the difference on other scales. The HTML version of the table shows the results.

In [12]:

rcParams['figure.figsize'] = (5, 10)
both_df['delta'] = both_df['merge_timestamp'] - both_df['commit_timestamp']
delta = both_df['delta']
both_df['delta_weeks'] = delta/60.0/60.0/24.0/7.0
both_df['delta_days'] = delta/60.0/60.0/24.0
both_df['delta_hours'] = delta/60.0/60.0
both_df['delta_mins'] = delta/60.0
both_df['delta_secs'] = delta
both_df.head()

Out[12]:

	commit	commit_timestamp	churn	merge_commit	merge_timestamp	pr	delta	delta_weeks	delta_days	delta_hours	delta_mins	delta_secs
796a90bdeef377041c353019fdae19aaa72a76d3	796a90bdeef377041c353019fdae19aaa72a76d3	1380325245	1	796a90bdeef377041c353019fdae19aaa72a76d3	1380325308	4294	63	0.000104	0.000729	0.017500	1.050000	63
e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380162595	1	e51d4d021c57c846ff1ad439d20b3fcc6b1ebc77	1380271407	4282	108812	0.179914	1.259398	30.225556	1813.533333	108812
68b0f9d2d5f29c540f7873141cb09026c492d0ef	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380158115	3	68b0f9d2d5f29c540f7873141cb09026c492d0ef	1380215023	4279	56908	0.094094	0.658657	15.807778	948.466667	56908
07b4962bd787068896f0e2b5dfb977441c7b391a	07b4962bd787068896f0e2b5dfb977441c7b391a	1380153904	1	07b4962bd787068896f0e2b5dfb977441c7b391a	1380301596	4278	147692	0.244200	1.709398	41.025556	2461.533333	147692
d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380099263	1	d4d6bc63d6c6dd34c5fe1270a90caec2e0208bfa	1380149814	4131	50551	0.083583	0.585081	14.041944	842.516667	50551

The first thing that stands out is (as of this writing) commit 796a90bdeef3770 has a delta of one minute. Fernando reported 8 minutes, which is much more plausible. This makes me wonder if some people have out-of-date clocks. Update: Fernando says that the review time was 1 minute; 8 minutes was from bug to pull request to commit.

Anyways, let's make a box plot. Box plots are fun.

In [13]:

rcParams['figure.figsize'] = (15, 3)
both_df[['delta_days']].boxplot(vert=False)

Out[13]:

{'boxes': [<matplotlib.lines.Line2D at 0x866bbd0>],
 'caps': [<matplotlib.lines.Line2D at 0x866b210>,
  <matplotlib.lines.Line2D at 0x866b710>],
 'fliers': [<matplotlib.lines.Line2D at 0x866d610>,
  <matplotlib.lines.Line2D at 0x866dad0>],
 'medians': [<matplotlib.lines.Line2D at 0x866d110>],
 'whiskers': [<matplotlib.lines.Line2D at 0x8665a50>,
  <matplotlib.lines.Line2D at 0x8665c90>]}

Based on this plot, we see that the IPython folks are pretty fast when it comes to reviewing and merging pull requests — the ones that get committed, anyways. There are some outliers, though: some take up to half a year before getting committed.

We can filter out all of these ranges that are larger than a month and ask: if a review takes less than a month, how long is it likely to take?

In [14]:

rcParams['figure.figsize'] = (15, 3)
both_df[both_df['delta_days'] < 30][['delta_days']].boxplot(vert=False)

Out[14]:

{'boxes': [<matplotlib.lines.Line2D at 0x8633550>],
 'caps': [<matplotlib.lines.Line2D at 0x862db50>,
  <matplotlib.lines.Line2D at 0x8633090>],
 'fliers': [<matplotlib.lines.Line2D at 0x8633f50>,
  <matplotlib.lines.Line2D at 0x8637450>],
 'medians': [<matplotlib.lines.Line2D at 0x8633a50>],
 'whiskers': [<matplotlib.lines.Line2D at 0x862d390>,
  <matplotlib.lines.Line2D at 0x862d5d0>]}

Based on this, it looks like most reviews happen within the first five days. The median is (I'm eyeballing this) around 1 day. Not bad at all.

If we look closely at the pull requests outside this range, we might see any number of things: pull requests that are large and hard to understand or pull requests that are blocked by some other task. I don't think pandas would help us much: we'd have to use our brains and look for patterns.

OK, one more question. Can we see any relationship between the review time and the number of files modified by the last pull request commit? This is a sad way to do it — since I'm not taking into account all commits on a pull request, I'm missing a lot of useful information — but it might be fun.

In [15]:

# The data is skewed, so i'll use the spearman correlation
both_df[['churn', 'delta']].corr(method='spearman')

Out[15]:

	churn	delta
churn	1.000000	0.048028
delta	0.048028	1.000000

There's almost no correlation, and the correlation that is there is probably just noise. Should we conclude that the number of files that you change in a pull request doesn't affect how long it will take to be reviewed? I don't think so, for two reasons:

I'm completely ignoring pull requests that get ignored or closed. My professor has looked at how long it takes patches to get accepted on large open source projects like linux and he found that on certain projects like Linux, large patches were ignored or rejected. No one had time to understand them and review them. This might be the case for IPython. (Does anybody from the IPython project think this is true?)
I'm only processing the tip of the pull request branch. A pull request can have many commits, and I should incorporate those into my churn metric.

I didn't really make any claims here, but let's pretend this is a real paper.

Threats to Validity¶

It is 1am on a Friday night.
The churn metric only represents the tip (ha) of the pull request iceberg.
This analysis doesn't look at how review time changed over time. Can we get a less skewed picture of pull request length by looking at only the period in which the project was funded?
This analysis doesn't look at actual review time — the pull request could have been made much later than when the pull request commit was made.

Conclusions¶

The IPython folks are very prompt. Send them a decent pull request and they'll probably merge it within a day.
The IPython Notebook is pretty cool.

Updates¶

A previous version of this notebook was incorrectly adding timezone differences to timestamps. After some experimentation, I've discovered that the timestamps provided by git2json don't need to be adjusted for timezone unless you want to display it in a human-readable format. Fixing this bug solved the "time travel" issue.