Paper Allocation

July 2014 Neil D. Lawrence

This notebook performs allocation of papers to reviewers and area chairs. This is done by computing similarities between reviewers and papers and then allocating according to similarity. Similarities are computed based on TPMS scores and primary/secondary subject areas.

In [9]:
import cmtutils
import os
import pandas as pd
import re
import numpy as np
import sqlite3

First things first, we need to get all the current information out of CMT. That includes: external matching scores, conflict information, keyword overlap. We do this from Assignments & Conflicts > *** > Automatic Assignment Wizard where *** is either reviewers or meta reviewers. Here's the link for meta-reviewers. Proceed through the wizard putting in some values. Then at the end click on Export Data for Custom Assignment. You will need to select: Subject Areas: Paper and Meta-Reviewer, Toronto Paper Matching System and Conflicts for setting things up for bidding. For setting things up for final allocation you also need bids and the reviewer quotas will be needed from Reviewer Quotas. Download them all to the CMT data directory.

In [10]:
submissions = cmtutils.papers()
reviewers = cmtutils.reviewers()
Loaded Papers.
Loaded Paper Subjects.
Loaded Users.
Loaded Reviewer Subjects.

Computing the Similarities

Now we have loaded in the paper and reviewer information, we can create a similarities structure from cmtutils that takes in the information from the submissions and the reviewers and computes the similarities.

In [11]:
similarities = cmtutils.similarities(submissions, reviewers) # Forms a new similarities structure
Loaded TPMS scores
Loaded bids

Computing the Assignment

Now we have the similarities computed from the bids, subject similarities and TPMS scores we can form an assignment. The cmtutils.assignment object loads in Shotgun clusters, conflicts and quotas in preparation for the assignment. These form the constraints which the assignment must satisfy. Note that the indexing is all done on emails, so the emails in the conflicts and quota files must match the user emails loaded in above.

The constraints are that 'shotgun clusters' must be allocated to the same area chair. Then 'conflicts' cannot be allocated and the quota gives the maximum to be allocated to any reviewer.

Now we set the group of users we want to assign to. First we set perform the meta-reviewers assignment. We can select the meta reviewers by the update_group method with those users for whom IsMetaReviewer is set to Yes. The a.score_quantile sets the percentile of similarity scores that are retained for ranking. Scores below that value are discared when ranking to keep weak matches all similarly bad.

In [54]:
a.clear_assignment()
a.prep_assignment()
a.update_group(group=similarities.reviewers.users['IsMetaReviewer']=='Yes')
a.score_quantile=0.7
a.rank_similarity_scores(similarities=similarities)
Loaded Quota.
Loaded shotgun clusters.
Loaded Conflicts
Allocating to 93 users.
Retaining scores greater than 70.0 percentile which is -inf

Staggered Allocation

Now stagger the paper allocation. It's difficult to normalise similarities across reviewers, so to avoid a greedy affect where a reviewer or area chair has very high similarities and gets all the papers first, we stagger the allocation with an increasing quota. Here we first allocate 5 then 9, then 13, then 17 and then a maximum of 20 papers to each area chair. This keeps the allocation more balanced.

Single Pass Allocation

The allocate method only goes through the vector of scores (which is formed from the similarity matrix) once. It doesn't run until the allocation process has converged. That means that after the loops it is not necessarily the case that all papers have been allocated. We will check that below.

In [55]:
for max_papers in [5, 9, 13, 17, 20]:
    a.max_papers = max_papers
    a.max_reviewers = 1
    a.allocate(reviewer_type='metareviewer')
In [56]:
print "We have", a.unassigned_reviewers(reviewers, reviewer_type='metareviewer').sum(), "unassigned area chairs."
print "We have", a.unassigned_papers(submissions, reviewer_type='metareviewer').sum(), "unassigned papers."
We have 93 unassigned area chairs.
We have 1773 unassigned papers.

If all papers are allocated (0 unassigned papers) then we can write out the assignment for import into CMT.

Reviewer Assignment

Now we've completed the area chair assignment we can turn to the reviewers. One thing that authors worry about is the level of expertise of their reviewing set. To try and ensure the expertise was evenly distributed across the papers we made use of a two stage reviewer allocation. Chris Hiestand provided us with information about how many previously published NIPS papers each reviewer has had. We used this to split the reviewers into two sets, those with more than one paper since NIPS 2007, and those with one or less papers.

Experienced Reviewers

Those reviewers that have more than one NIPS paper since 2007 were allocated to papers first. The aim was to ensure that all papers had at least one experienced reviewer.

We set the max_papers to be allocated to each reviewer to 4 and we set max_reviewers to be allocated to 1 per paper (we want at least 1 expert per paper at this point).

Before allocating these reviewers, the first thing to do is to remove the Shotgun cluster constraint, because we can't fulfill the constraints that Shotgun clusters impose (that they should share the same reviewers) at the reviewer level. Only at the meta-reviewer level.

In [58]:
a.shotgun_clusters = [] # clear shotguns for reviewer allocation.
a.update_group(group=(reviewers.users['PapersSince2007']>1) 
               & (reviewers.users['IsReviewer']=='Yes') 
               & (reviewers.users['IsMetaReviewer']=='No'))
a.score_quantile=0.5
a.rank_similarity_scores(similarities)
a.max_papers = 4
a.max_reviewers = 1
a.allocate(reviewer_type='reviewer')
Allocating to 495 users.
Retaining scores greater than 50.0 percentile which is 2.66975355153

Now we have a preliminary allocation to the expert reviewers (area chairs are able to modify this allocation before we distribute it to the reviewers). We now need to top up this allocation using the other reviewers (who don't have more than one NIPS paper since 2007). The first step is to update the group for allocation to contain all reviewers but no meta reviewers. Then we can allocate to papers. We've updated the maximum number of reviewers to allocate per paper to 3.

In [60]:
# Add allocations of all reviewers up to three per paper
a.update_group(group=(reviewers.users['IsReviewer']=='Yes') 
               & (reviewers.users['IsMetaReviewer']=='No'))
a.rank_similarity_scores(similarities)
a.max_reviewers = 3
a.allocate(reviewer_type='reviewer')
Allocating to 1444 users.
Retaining scores greater than 50.0 percentile which is 2.76079001358

Now we need to check what we have left in the system. The allocation process doesn't guarantee that any minimum constraints on reviewer or paper allocation are fulfilled. It could be run again, to increase assignment, but there is a balance. If a paper is difficult to allocate, it would seem sensible to have a human pair of eyes look at it. For this reason we may want to leave the paper with some missing reviewers. But to monitor the situtation we can print how many reviewers we have unassigned. We can also cross check that we don't have unassigned papers (the latter would happen if there aren't enough reviewers in the system to match the papers, the former happens if there are too more reviewers than are needed to review the papers).

In [23]:
print "We have", a.unassigned_reviewers(reviewers).sum(), "unassigned reviewers."
print "We have", a.unassigned_papers(submissions).sum(), "unassigned papers."
We have 201 unassigned reviewers.
We have 123 unassigned papers.

If we are happy with the numbers we can write the assignment for import to CMT.

We can check the quality of an allocation by computing the product between the allocation and the similarities.

In [26]:
a.prod(similarities)
Out[26]:
2964.5881687163892

The allocation changes over time (from area chairs changing it, and from extra reviewers being required). In this notebook we show how that can be monitored so that reviewers can be notified.

In [24]:
a.write(reviewer_type='reviewer')