# Generate Bidding Lists for Reviewers¶

### 11th June 2014 Neil D. Lawrence¶

This notebook loads in the TPMS scores and key word similarities and and allocates the highest similarity score matches to reviewers for bidding on.

In [ ]:
import cmtutils
import os
import pandas as pd
import re
import numpy as np


First things first, we need to get all the current information out of CMT. That includes: external matching scores, conflict information, keyword overlap. We do this from Assignments & Conflicts > *** > Automatic Assignment Wizard where *** is either reviewers or meta reviewers. Here's the link for meta-reviewers. Proceed through the wizard putting in some values. Then at the end click on Export Data for Custom Assignment. You will need to select: Subject Areas: Paper and Meta-Reviewer, Toronto Paper Matching System and Conflicts for setting things up for bidding. For setting things up for final allocation you also need bids.

### TPMS Reviewer Matching Scores¶

https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ManageAssignmentsExport.aspx?data=externalmatching&view=cs&format=tab&excludemetareviewer=1&serviceid=1

In [ ]:
# First we load in the external matching scores.

filename = '2014-06-19_externalMatchingScores.tsv'
filename=os.path.join(cmtutils.cmt_data_directory, filename)
affinity = pd.read_csv(filename, delimiter='\t', index_col='PaperID', na_values=['N/A']).fillna(0)
#data = cmtutils.xl_read(, index='Paper ID', dataframe=True)
#affinity = data.items
# Scale affinities to be between 0 and 1.
affinity -= affinity.values.min()
affinity /= affinity.values.max()


### Paper Subject Areas¶

Now load in paper subject areas and group them by the Paper ID. This file is downloaded from:

https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ManageAssignmentsExport.aspx?data=subjectareas&view=cs&format=excel

In [ ]:
# Now we load in paper subject areas
filename = '2014-06-13_paperSubjectAreas.xls'
data = cmtutils.xl_read(filename=os.path.join(cmtutils.cmt_data_directory, filename), index='Selected Subject Area', dataframe=True, worksheet_number=1)
paper_subject = data.items.groupby(by=['Paper ID'])


### Reviewer Subject Areas¶

Load in reviewer (or meta reviewer) subject areas and group them by email. This file is downloaded from:

https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ManageAssignmentsExport.aspx?data=subjectareas&view=cr&format=excel

In [ ]:
# Now we load in (meta-)reviewer subject areas
filename = '2014-06-13_reviewerSubjectAreas.xls'
data = cmtutils.xl_read(filename=os.path.join(cmtutils.cmt_data_directory, filename), index='Selected Subject Area', dataframe=True, worksheet_number=1)
reviewer_subject = data.items.groupby(by=['Email'])


### Possible Assignments and Conflicts¶

Possible assignments is derived from the conflicts. It lists the people that the paper could be assigned to. This file is downloaded from:

https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ManageAssignmentsExport.aspx?data=possibleassignments&view=cs&format=tab&excludemetareviewer=1

https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ManageAssignmentsExport.aspx?data=conflicts&view=cs&format=tab

In [ ]:
if True: # Read from the TSV format CMT provide.
filename = 'Conflicts.txt'
with open(os.path.join(cmtutils.cmt_data_directory, filename)) as fin:
rows = ( line.strip().split('\t') for line in fin)
conflicts_groups = { row[0]:row[1:] for row in rows}
papers = conflicts_groups.keys()
conflicts_by_reviewer = {}

for paper in papers:
for reviewer in conflicts_groups[paper]:
if conflicts_by_reviewer.has_key(reviewer):
conflicts_by_reviewer[reviewer].append(paper)
else:
conflicts_by_reviewer[reviewer] = [paper]
conflicts_file = True
else:
# And finally we load in 'possible assignments'
filename = '2014-06-13_possibleAssignmentsByPaper.xls'
data = cmtutils.xl_read(filename=os.path.join(cmtutils.nips_data_directory, filename), index='Paper ID', dataframe=True)
possible_assignments = data.items
regex = re.compile(r'$([^)]*)$')
papers = possible_assignments.index
conflicts_file = False
#conflicts = conflicts.set_index('Reviewer/Meta-Reviewer')
#conflicts_groups = conflicts.groupby('PaperID').groups


Compute a simple similarity based on subject overlap. The similarity is the number of overlapping keywords divided by the square root of the number of reviewer keywords multiplied by the square root of the number of subject keywords. None of the above is removed as a term if it is present.

This actually turns out not to be a very sensible way of doing it. I was only just getting used to pandas when I wrote this. There's a more sensible (much faster) way of getting these similarities out in the reviewer calibration notebook.

In [ ]:
subject_sim = pd.DataFrame(np.zeros((len(paper_subject.groups), len(reviewer_subject.groups))),
index=paper_subject.groups, columns=reviewer_subject.groups)
for paper in paper_subject.groups:
set_paper = set(paper_subject.groups[paper]) - set(['None of the above'])
for reviewer in reviewer_subject.groups:
set_reviewer = set(reviewer_subject.groups[reviewer]) - set(['None of the above'])
if len(set_paper)>0 and len(set_reviewer)>0:
norm = np.sqrt(len(set_paper))*np.sqrt(len(set_reviewer))
else:
norm = 1. # don't normalise if the vector is all zeros!
subject_sim.loc[paper, reviewer] = len(set_reviewer & set_paper)/norm


Weight $\alpha$ portion of the affinities and $1-\alpha$ of the keyword similarities.

### Allocate to Top 40 High Scoring Reviewers¶

A little big of background is needed here. At the time the code was written Corinna and I were struggling to get CMT to perform an allocation. It was across the weekend so there was no support, and it turned out the scale of the 2014 NIPS had broken a few different things. This caused me to start writing paper allocation code, within the space of a few days, without having much knowledge of the literature. This first piece of code simply allocates each paper to the top 40 high scoring reviewers. It is superceded by the code that follows. The code that follows ranks the entire matrix and starts by allocating to the highest score in the matrix.

In [ ]:
alpha=0.5
assignment = {}
all_reviewers = affinity.columns
for reviewer in all_reviewers:
assignment[reviewer] = []
assignment_paper = {}

all_scores = (alpha*affinity + (1-alpha)*subject_sim)
min_vals = all_scores.min()
max_vals = all_scores.max()
normalise_scores = True

for paper_str in papers:
paper = int(paper_str)
if conflicts_file:
reviewers = set(all_reviewers) - set(conflicts_groups[paper_str])
else:
reviewers = regex.findall(possible_assignments['Assigned Meta-Reviewers'][paper])
assert(len(reviewers)==int(possible_assignments['Number of Meta-Reviewers'][paper]))
scores = (1-alpha)*subject_sim.loc[paper][reviewers]
if paper in affinity.index:
scores += alpha*affinity.loc[paper][reviewers]
else:

if normalise_scores:
scores -= min_vals[reviewers]
scores/=(max_vals-min_vals)[reviewers]
#print scores
scores.sort(ascending=False)
assignment_paper[paper] = scores[:40].index
for reviewer in assignment_paper[paper]:
assignment[reviewer].append(paper)


Use this code if you loaded in the conflicts.xls file.

In [ ]:
all_scores = (alpha*affinity + (1-alpha)*subject_sim)
min_vals = all_scores.min()
max_vals = all_scores.max()
normalise_scores = True

In [ ]:
all_scores.index

In [ ]:
# Identify conflicts by setting to -1
rank_scores = all_scores.copy()
for paper in conflicts_groups:
rank_scores.loc[int(paper)][conflicts_groups[paper]] = -1.

In [ ]:
paper


### Ranking All Scores¶

After some thought, this next piece of code was preferred. Now all scores are taken and ranked. The papers then can be allocated from the most similar paper-reviewer pair and downwards.

In [ ]:
score_vec = rank_scores.reset_index()
score_vec = pd.melt(score_vec, id_vars=['index'])
#score_vec = score_vec[score_vec.value != -1.]
score_vec = score_vec[score_vec.value > 0.1]
score_vec = score_vec[pd.notnull(score_vec.value)]
score_vec.columns = ['PaperID', 'Email', 'Score']
score_vec = score_vec.sort_index(by='Score', ascending=False)

In [ ]:
paper_number_assigned = {}
reviewer_number_assigned = {}
max_number_paper = 17
max_number_reviewer = 25
assignment_paper = {}
assignment_reviewer = {}

for idx in score_vec.index:
paper = str(score_vec['PaperID'][idx])
assign = True
if paper_number_assigned.has_key(paper):
if paper_number_assigned[paper]>=max_number_paper:
assign = False
continue
else:
paper_number_assigned[paper] = 0

reviewer = str(score_vec['Email'][idx])
if reviewer_number_assigned.has_key(reviewer):
if reviewer_number_assigned[reviewer]>=max_number_reviewer:
assign = False
continue
else:
reviewer_number_assigned[reviewer] = 0

if assign:
if assignment_paper.has_key(paper):
assignment_paper[paper].append(reviewer)
else:
assignment_paper[paper] = [reviewer]

if assignment_reviewer.has_key(reviewer):
assignment_reviewer[reviewer].append(paper)
else:
assignment_reviewer[reviewer] = [paper]
paper_number_assigned[paper] += 1
reviewer_number_assigned[reviewer] += 1



### FInd Reviewers with Less than 25 Papers¶

Now reviewers who haven't got a full allocation of 25 papers to rank are allocated a top up number of papers. In later runs of the allocation algorithm, papers were allocated in batches to reviewers (each reviewer allocated up to 5, then to 10, then to 20 etc.) to balance things a little more. But in this early stage allocation to get the bidding going it was done in this 'top up' style way. Due to the problems with CMT and the allocation steps being unforeseen, we felt quite a lot of time pressure at this point.

In [ ]:
all_papers = affinity.index
min_papers = 25
for reviewer in affinity.columns:
if reviewer_number_assigned.has_key(reviewer):
num_papers = reviewer_number_assigned[reviewer]
if num_papers < min_papers:
required_papers = min_papers - num_papers
elif reviewer == '[email protected]':
required_papers = 25
else:
continue
else:
required_papers = min_papers

papers = set(all_papers) - set(conflicts_by_reviewer[reviewer])
scores = alpha*affinity.loc[papers][reviewer]
if reviewer in subject_sim.columns:
scores += (1-alpha)*subject_sim.loc[papers][reviewer]
else:

scores.sort(ascending=False)
else:



This bit of code writes the allocation for sharing with Corinna, just for hand checking to ensure that something sensible is going on.

In [ ]:
f = open(os.path.join(cmtutils.nips_data_directory, 'reviewer_bidding_allocation.txt'), 'w')

for reviewer in assignment_reviewer:
f.write('Reviewer ' + reviewer + '\n')
f.write('\n')
for paper in assignment_reviewer[reviewer]:
f.write(str(paper) + " " + "https://cmt.research.microsoft.com/NIPS2014/Protected/Chair/ViewSubmissionDetails.aspx?paperId=" + str(paper) + '\n')
f.write('\n')
f.close()



This code was for writing the export file for CMT to load in the bidding allocation.

In [ ]:
f = open(os.path.join(cmtutils.nips_data_directory, 'reviewer_bidding_allocation.tsv'), 'w')

for reviewer in assignment_reviewer:
for paper in assignment_reviewer[reviewer]:
f.write(', '.join([reviewer, str(paper)]) + '\n')
f.close()



This code is similar, but uses the CMT XML format which they find easier to load in.

In [ ]:
f = open(os.path.join(cmtutils.nips_data_directory, 'reviewer_assignments.xml'), 'w')
f.write('<assignments>\n')
for paper in assignment_paper:
f.write('  <submission submissionId="' + paper + '">\n')
for reviewer in assignment_paper[paper]:
f.write('    <reviewer email="' + reviewer + '"/>\n')
f.write('  </submission>\n')
f.write('</assignments>\n')
f.close()

In [ ]:
f = open(os.path.join(cmtutils.nips_data_directory, 'additional_reviewer_bidding_allocation.tsv'), 'w')

f.write(', '.join([reviewer, str(paper)]) + '\n')
f.close()

In [ ]:
str_val = ''
if len(str_val) >0:
str_val += ';' + reviewer
else:
str_val = reviewer
print str_val

In [ ]:
len(set(additional_reviewers.keys() + assignment_reviewer.keys()))


### Finding Extra Reviewers¶

Some reviewers complained that they weren't seeing enough papers in their area. Most of these reviewers had many secondary subject areas. The similarity measure being used above (mainly for the purposes of speed) was originally not weighting primary key differently from the secondary keys. This meant reviewers with many secondary keys were getting a lot of papers not in their core area. In this next section of code we added additional papers to reviewers for bidding.

In [ ]:
alpha = 0.5
all_papers = affinity.index
min_papers = 25

for reviewer in reviewers:
required_papers = 25
if conflicts_by_reviewer.has_key(reviewer):
papers = set(all_papers) - set(conflicts_by_reviewer[reviewer])
else:
papers = set(all_papers)
scores = (1-alpha)*subject_sim.loc[papers][reviewer]
if reviewer in affinity.columns:
scores += alpha*affinity.loc[papers][reviewer]
else:

scores.sort(ascending=False)

f = open(os.path.join(cmtutils.nips_data_directory, 'further_additional_reviewer_bidding_allocation.tsv'), 'w')