In 2014 for NIPS we chose to do an experiment where 10% of papers were rereviewed. To do this we needed to split the program committee so that no program member would see the same paper twice.
This notebook selects 10% of papers randomly for duplication. It then reads the allocated split of area chairs into two groups:
AC.groups.csv). It loads in the reviewers from the reviewer database and splits them into two equally sized groups randomly.
import cmtutils import sqlite3 import os import pandas as pd import numpy as np import csv np.random.seed(seed=130000) # set the seed so that the split is repeatable.
First load in the papers from the paper data base.
filename = '2014-06-06_paper_list.xls' cmt = cmtutils.cmt_papers_read(filename=filename)
Now permute the paper list and select 170 (approximately 10%) for duplication. Sort the resulting list for convenience.
duplicate_list = np.random.permutation(cmt.papers.index)[:170] duplicate_list.sort()
Save the duplicate list to a
.csv file to export it to CMT.
f = open(os.path.join(cmtutils.nips_data_directory, "duplicate_papers.csv"), 'wb') wr = csv.writer(f) wr.writerow(duplicate_list) f.close()
Connect to the reviewer data base and load in all reviewers indexing by email.
con = sqlite3.connect(os.path.join(cmtutils.nips_data_directory, "reviewers.db")) all_reviewers = pd.read_sql("SELECT * from Reviewers WHERE IsReviewer=1", con, index_col='Email')
Read the list allocating area chairs to groups from
area_chairs = pd.read_csv(os.path.join(cmtutils.nips_data_directory, "AC.groups.csv"), index_col='Email')
Now find index (Email) of every reviewer that isn't an area chair, randomly permute the list.
reviewer_groups = all_reviewers.join(area_chairs) meta_reviewer_series = reviewer_groups.loc[:, 'IsMetaReviewer'] reviewer_index = np.random.permutation(series[pd.isnull(meta_reviewer_series)].index)
Now split that list into two and allocate reviewers to groups
l = len(reviewer_index) group1_list = reviewer_index[:l/2] group2_list = reviewer_index[l/2:] for ind in group1_list: reviewer_groups.loc[ind, 'Group'] = 'group1' for ind in group2_list: reviewer_groups.loc[ind, 'Group'] = 'group2' reviewer_groups.describe() ndf = reviewer_groups.groupby('Group').size() # check group sizes are roughly equal. print ndf
Group group1 742 group2 741 dtype: int64
Write the reviewer groups to file.
reviewer_groups.loc[:, 'Group'].to_csv(os.path.join(cmtutils.nips_data_directory, "reviewer_groups.csv"))