Calculating how many emails pass through the FSL mailing list

I searched the FSL mailing list archives, found the summary page for April 2014 and copied and pasted the table of contents into a text file. That file is saved on GitHub and can be accessed with the urllib2 python library.

In [1]:
import urllib2
archive_file = urllib2.urlopen('')
archive_lines = archive_file.readlines()
archive_lines = [ line.rstrip() for line in archive_lines]

The first few lines of this file are:

In [2]:
for line in archive_lines[:10]:
    print line
"radiating" normalised functional images (5 messages)
.vmr and .vmp BESA output files (1 message)
1 PostDoc + 2 PhD positions in Computational Cognitive NeuroImaging; University of Birmingham, UK (1 message)
2x2 ANCOVA results visualization (6 messages)
3D viewer gif? (6 messages)
a problem with read_avw (2 messages)
About probtrackx2 rseed flag (2 messages)
Advanced Clinical Neuroimaging Course, Brussels, May 30-31, 2014 (1 message)
Analysis steps for resting state fMRI (12 messages)
ASL Perfusion Values (2 messages)

So the first thing we can find out are how many distinct email subjects were used in the month of April:

In [3]:

But, to be honest, we don't really care about the subject lines; we want to know how many emails were sent in total. And that requires a little parsing of this text file. Specifically we'll split each line at the last '(' character, discard everything to the left of that, and strip out the constant string ' messages)' from the part that's left.

In [4]:
archive_messages_n = [ x.rsplit('(',1)[1].rstrip(' messages)') for x in archive_lines ]
print archive_messages_n[:10]
['5', '1', '1', '6', '6', '2', '2', '1', '12', '2']

We can convert those values to integers and save them in a numpy array data frame so we can get a total value!

In [5]:
import numpy as np
array = np.array(archive_messages_n,
print array.sum()

871 emails in one month! Crazy days!

Thank you FMRIB ;)