NIPS Experiment: Duplicate Papers

Neil D. Lawrence 9th June 2014

In 2014 for NIPS we chose to do an experiment where 10% of papers were rereviewed. To do this we needed to split the program committee so that no program member would see the same paper twice.

This notebook selects 10% of papers randomly for duplication. It then reads the allocated split of area chairs into two groups: group1 and group2 (from AC.groups.csv). It loads in the reviewers from the reviewer database and splits them into two equally sized groups randomly.

In [10]:
import cmtutils
import sqlite3
import os
import pandas as pd
import numpy as np
import csv
np.random.seed(seed=130000) # set the seed so that the split is repeatable.

First load in the papers from the paper data base.

In [11]:
filename = '2014-06-06_paper_list.xls'
cmt = cmtutils.cmt_papers_read(filename=filename)

Now permute the paper list and select 170 (approximately 10%) for duplication. Sort the resulting list for convenience.

In [12]:
duplicate_list = np.random.permutation(cmt.papers.index)[:170]
duplicate_list.sort()

Save the duplicate list to a .csv file to export it to CMT.

In [13]:
f = open(os.path.join(cmtutils.nips_data_directory, "duplicate_papers.csv"), 'wb')
wr = csv.writer(f)
wr.writerow(duplicate_list)
f.close()

Connect to the reviewer data base and load in all reviewers indexing by email.

In [14]:
con = sqlite3.connect(os.path.join(cmtutils.nips_data_directory, "reviewers.db"))
all_reviewers = pd.read_sql("SELECT * from Reviewers WHERE IsReviewer=1", con, index_col='Email')

Read the list allocating area chairs to groups from AC.groups.csv.

In [15]:
area_chairs = pd.read_csv(os.path.join(cmtutils.nips_data_directory, "AC.groups.csv"), index_col='Email')

Now find index (Email) of every reviewer that isn't an area chair, randomly permute the list.

In [16]:
reviewer_groups = all_reviewers.join(area_chairs)
meta_reviewer_series = reviewer_groups.loc[:, 'IsMetaReviewer']
reviewer_index = np.random.permutation(series[pd.isnull(meta_reviewer_series)].index)

Now split that list into two and allocate reviewers to groups

In [17]:
l = len(reviewer_index)
group1_list = reviewer_index[:l/2]
group2_list = reviewer_index[l/2:]
for ind in group1_list:
    reviewer_groups.loc[ind, 'Group'] = 'group1'
for ind in group2_list:
    reviewer_groups.loc[ind, 'Group'] = 'group2'
reviewer_groups.describe()
ndf = reviewer_groups.groupby('Group').size() # check group sizes are roughly equal.
print ndf          
Group
group1    742
group2    741
dtype: int64

Write the reviewer groups to file.

In [18]:
reviewer_groups.loc[:, 'Group'].to_csv(os.path.join(cmtutils.nips_data_directory, "reviewer_groups.csv"))