The Condorcet-Schulze method¶

The Schulze method is a refinemnet of the original Condorcet methods that are used for deciding the winning candidate of an election in which all the voters can express a number of different preferences.

To read all about the method we have been using, have a look at Wikipedia.

How to use the condorcet module¶

Create a dataframe with the ordered poreferences for each voter. The first column shall be the first preference for the i-th voter.

The entries of each cell are the names of the candidates.

Each voter can vote for all or part of the candidates. The non voted candidates are considered to be less preferred than the voted ones.

In [1]:

import condorcet
import numpy as np
import pandas as pd

Create or import your dataframe¶

Make sure that the structure of the dataframe is a in the example below: each row represents the preferences of each voter (there are six voters in the exaple below). The names of the candidates are a, b, c, d.

In [2]:

# create dataframe
table = np.asarray([
    ["b","c","a"],
    ["c","a","b"],
    ["c","a","b"],
    ["b","a","c"],
    ["d","a","c"],
    ["d","a","c"]
    ])
df = pd.DataFrame(table, columns = ["first_preference","second_preference","third_preference"])
df

Out[2]:

	first_preference	second_preference	third_preference
0	b	c	a
1	c	a	b
2	c	a	b
3	b	a	c
4	d	a	c
5	d	a	c

Run on the dataframe¶

Just run the condorcet.compute_ranks method on the dataframe directly and the result will appear as a list of lists.

In the example below, the output is [['a', 'c'], ['b'], ['d']], which means that there are two tie candidates in the first place, a and c, one candidate in the second place b and the least preferred candidate is d.

In [3]:

# run the classification
condorcet.compute_ranks(df)

Out[3]:

[['a', 'c'], ['b'], ['d']]

Dive deeper¶

You can extract the d[V,W] and p[V,W] matrices with the methods:

condorcet._compute_d(weighted_ranks, candidate_names)
condorcet._compute_p(dmat, candidate_names)

To extract the candidate names and the weighted ranks to be input to the above methods, please use:

weighted_ranks = condorcet.weighted_ranks_from_df(df)
candidate_names = condorcet.candidate_names_from_df(df)

Test on Wikipedia example¶

We test the method on wikipedia example to make sure that the algorithm works properly.

In [4]:

# create dataframe
df = pd.DataFrame()

# populate with wikipedia preferences
df["first_column"] = 5*['A']+5*['A']+8*['B']+3*['C']+7*['C']+2*['C']+7*['D']+8*['E']
df["second_column"] = 5*['C']+5*['D']+8*['E']+3*['A']+7*['A']+2*['B']+7*['C']+8*['B']
df["third_column"] = 5*['B']+5*['E']+8*['D']+3*['B']+7*['E']+2*['A']+7*['E']+8*['A']
df["fourth_column"] = 5*['E']+5*['C']+8*['A']+3*['E']+7*['B']+2*['D']+7*['B']+8*['D']

# we are not putting explicitly the last column, as it will be automatically inferred by the algorithm
df

Out[4]:

	first_column	second_column	third_column	fourth_column
0	A	C	B	E
1	A	C	B	E
2	A	C	B	E
3	A	C	B	E
4	A	C	B	E
5	A	D	E	C
6	A	D	E	C
7	A	D	E	C
8	A	D	E	C
9	A	D	E	C
10	B	E	D	A
11	B	E	D	A
12	B	E	D	A
13	B	E	D	A
14	B	E	D	A
15	B	E	D	A
16	B	E	D	A
17	B	E	D	A
18	C	A	B	E
19	C	A	B	E
20	C	A	B	E
21	C	A	E	B
22	C	A	E	B
23	C	A	E	B
24	C	A	E	B
25	C	A	E	B
26	C	A	E	B
27	C	A	E	B
28	C	B	A	D
29	C	B	A	D
30	D	C	E	B
31	D	C	E	B
32	D	C	E	B
33	D	C	E	B
34	D	C	E	B
35	D	C	E	B
36	D	C	E	B
37	E	B	A	D
38	E	B	A	D
39	E	B	A	D
40	E	B	A	D
41	E	B	A	D
42	E	B	A	D
43	E	B	A	D
44	E	B	A	D

In [5]:

# run the classification
condorcet.compute_ranks(df)

Out[5]:

[['E'], ['A'], ['C'], ['B'], ['D']]

In [6]:

# checking the d and p matrix
candidate_names = condorcet.candidate_names_from_df(df)
weighted_ranks = condorcet.weighted_ranks_from_df(df)
d = condorcet._compute_d(weighted_ranks, candidate_names)
p = condorcet._compute_p(d, candidate_names)
print("d matrix: ", d)
print("p matrix: ", p)

d matrix:  defaultdict(<class 'int'>, {('A', 'C'): 26, ('A', 'B'): 20, ('A', 'E'): 22, ('A', 'D'): 30, ('C', 'B'): 29, ('C', 'E'): 24, ('C', 'D'): 17, ('B', 'E'): 18, ('B', 'D'): 33, ('E', 'D'): 31, ('D', 'E'): 14, ('D', 'C'): 28, ('D', 'B'): 12, ('E', 'C'): 21, ('E', 'B'): 27, ('B', 'A'): 25, ('B', 'C'): 16, ('E', 'A'): 23, ('D', 'A'): 15, ('C', 'A'): 19})
p matrix:  {('A', 'B'): 28, ('A', 'C'): 28, ('A', 'D'): 30, ('A', 'E'): 24, ('B', 'A'): 25, ('B', 'C'): 28, ('B', 'D'): 33, ('B', 'E'): 24, ('C', 'A'): 25, ('C', 'B'): 29, ('C', 'D'): 29, ('C', 'E'): 24, ('D', 'A'): 25, ('D', 'B'): 28, ('D', 'C'): 28, ('D', 'E'): 24, ('E', 'A'): 25, ('E', 'B'): 28, ('E', 'C'): 28, ('E', 'D'): 31}

Test passed!¶

The algorithmis sound and we were able to reproduce exaclty Wikipedia's example.

Application to the gtda-challenge-2020¶

We first import the csv file and then we run the method

In [7]:

# import file as dataframe

df_temp = pd.read_csv("data/Evaluation of Jupyter Notebooks.csv")
df = df_temp[df_temp.columns[-3:]]
df

Out[7]:

	Who would you choose as winner?	Who would you choose for second place?	Who would you choose for third place?
0	marabernardi	rorondre	lordgrilo
1	Antonio-Leitao	lordgrilo	marabernardi
2	marabernardi	rorondre	kazuyamagiwa
3	lordgrilo	Antonio-Leitao	marabernardi
4	kazuyamagiwa	filco306	rorondre
5	filco306	rorondre	marabernardi
6	rorondre	filco306	marabernardi
7	Antonio-Leitao	filco306	rorondre
8	marabernardi	filco306	lordgrilo
9	rorondre	marabernardi	lordgrilo
10	marabernardi	kazuyamagiwa	rorondre
11	marabernardi	kazuyamagiwa	Antonio-Leitao
12	filco306	lordgrilo	rorondre

In [8]:

# run the final classification

condorcet.compute_ranks(df)

Out[8]:

[['marabernardi'],
 ['filco306', 'rorondre'],
 ['lordgrilo'],
 ['kazuyamagiwa'],
 ['Antonio-Leitao']]

Congratulations!!¶

Congratulations to all the candidates that submitted a notebook to the challenge! All of the notebooks were original, well structured and clearly explained. Thank you for your efforts!

I truly hope you enjoyed the challenge and do not hesitate to share your notebooks with your friends and colleagues!

The final classification is:¶

1st place, marabernardi
2nd place, rorondre
2nd place ex aequo, filco306
4th place, lordgrilo
5th place, kazuyamagiwa
6th place, Antonio-Leitao

Bravo!! ;)

In [ ]:

	first_column	second_column	third_column	fourth_column
0	A	C	B	E
1	A	C	B	E
2	A	C	B	E
3	A	C	B	E
4	A	C	B	E
5	A	D	E	C
6	A	D	E	C
7	A	D	E	C
8	A	D	E	C
9	A	D	E	C
10	B	E	D	A
11	B	E	D	A
12	B	E	D	A
13	B	E	D	A
14	B	E	D	A
15	B	E	D	A
16	B	E	D	A
17	B	E	D	A
18	C	A	B	E
19	C	A	B	E
20	C	A	B	E
21	C	A	E	B
22	C	A	E	B
23	C	A	E	B
24	C	A	E	B
25	C	A	E	B
26	C	A	E	B
27	C	A	E	B
28	C	B	A	D
29	C	B	A	D
30	D	C	E	B
31	D	C	E	B
32	D	C	E	B
33	D	C	E	B
34	D	C	E	B
35	D	C	E	B
36	D	C	E	B
37	E	B	A	D
38	E	B	A	D
39	E	B	A	D
40	E	B	A	D
41	E	B	A	D
42	E	B	A	D
43	E	B	A	D
44	E	B	A	D

	first_column	second_column	third_column	fourth_column
0	A	C	B	E
1	A	C	B	E
2	A	C	B	E
3	A	C	B	E
4	A	C	B	E
5	A	D	E	C
6	A	D	E	C
7	A	D	E	C
8	A	D	E	C
9	A	D	E	C
10	B	E	D	A
11	B	E	D	A
12	B	E	D	A
13	B	E	D	A
14	B	E	D	A
15	B	E	D	A
16	B	E	D	A
17	B	E	D	A
18	C	A	B	E
19	C	A	B	E
20	C	A	B	E
21	C	A	E	B
22	C	A	E	B
23	C	A	E	B
24	C	A	E	B
25	C	A	E	B
26	C	A	E	B
27	C	A	E	B
28	C	B	A	D
29	C	B	A	D
30	D	C	E	B
31	D	C	E	B
32	D	C	E	B
33	D	C	E	B
34	D	C	E	B
35	D	C	E	B
36	D	C	E	B
37	E	B	A	D
38	E	B	A	D
39	E	B	A	D
40	E	B	A	D
41	E	B	A	D
42	E	B	A	D
43	E	B	A	D
44	E	B	A	D

	first_column	second_column	third_column	fourth_column
0	A	C	B	E
1	A	C	B	E
2	A	C	B	E
3	A	C	B	E
4	A	C	B	E
5	A	D	E	C
6	A	D	E	C
7	A	D	E	C
8	A	D	E	C
9	A	D	E	C
10	B	E	D	A
11	B	E	D	A
12	B	E	D	A
13	B	E	D	A
14	B	E	D	A
15	B	E	D	A
16	B	E	D	A
17	B	E	D	A
18	C	A	B	E
19	C	A	B	E
20	C	A	B	E
21	C	A	E	B
22	C	A	E	B
23	C	A	E	B
24	C	A	E	B
25	C	A	E	B
26	C	A	E	B
27	C	A	E	B
28	C	B	A	D
29	C	B	A	D
30	D	C	E	B
31	D	C	E	B
32	D	C	E	B
33	D	C	E	B
34	D	C	E	B
35	D	C	E	B
36	D	C	E	B
37	E	B	A	D
38	E	B	A	D
39	E	B	A	D
40	E	B	A	D
41	E	B	A	D
42	E	B	A	D
43	E	B	A	D
44	E	B	A	D