In this article, I try to visualize the last four semesters in my degree as a directed graph with courses as nodes and prerequisite relations as edges. Will my plan work, or am I missing prerequisites? NetworkX and yEd are used for creating and displaying the graph.

The problem¶

I want to graduate in at least two years which means I need to pass a certain set of courses in a specific sequence.

There are a lot of complications:

Not all courses are available every semester; certain courses can only be taken in the Fall term, others only in the Winter.
Many courses have prerequisites, often the prerequisites have prerequisites.
There are electives and a large library of possibilities.

Definitions¶

According to uOttawa's sheet on course enrolment terminology, the academic year is divided into three "terms":

Fall (September to December)
Winter (January to April)
Spring-Summer (May to August)

Courses are encoded in the following way:

Each course is designated by a seven-character code made up of three letters and four digits. The three letters stand for the subject. For example, CRM stands for criminology. The first digit is for the year of study (1000-level courses — those that start with 1 — are first year courses)...

Collecting the data¶

The University of Ottawa does not make this easy. Course information is spread out over at least three separate databases:

The course catalogue. Contains most of the information I want, except for scheduling and availability. No search engine, so I had to look up a course code page and CTRL-F.
The course timetable. Has a search engine (a clumsy one). Provides schedules for courses in the next two terms. There is no way to know if a course will be held in a term later than the next two, so I have to hope that the pattern doesn't change much (i.e. since SEG 2105 is only held in the Fall term this year, it will only be held in the Fall next year). Does not provide any prerequisite information! I feel bad for any one who tries to use this as their sole information source...
The enrollment system. I can't link to this as it can only be accessed through "uoZone" (uOttawa's internal system). As far as I know, this is the only place where course availability and number of available seats are given. This software is atrocious. Instead of following the universally agreed-upon course code system, courses are assigned a seemingly random sequence of numbers and letters. I have 1000 other complaints, but I'll spare you the rant.

After hours pouring over the program requirements and the course catalogue, I had a list of candidates and a possible schedule. But will it work? Have I missed a prerequisite or planned to take a course in the Fall that is only available in the Winter? This stuff is too complicated...I need a visual.

Importing the data¶

I created a spreadsheet of all the courses I'm planning on taking and furnished it with various information about them. The most confusing aspect is the prerequisite chain, so I'd like to visualize this as a directed graph with courses as nodes and prerequisite links as edges.

In [1]:

import pandas as pd

In [89]:

df = pd.read_csv('course_schedule.csv')
df

Out[89]:

	subject_code	digit_code	course_code	course_name	prerequisites	expected_term	expected_year	required	term_availability
0	CSI	2110	CSI2110	Data Structures and Algorithms	ITI 1121, MAT 1348	F	2017	True	F, W
1	SEG	2105	SEG2105	Introduction to Software Engineering	ITI 1121	F	2017	True	F
2	PSY	4130	PSY4130	History and Systems of Psychology	NaN	F	2017	False	F, W
3	MAT	1320	MAT1320	Calculus I	NaN	F	2017	False	S, F, W
4	CSI	2101	CSI2101	Discrete Structures	MAT 1348	W	2018	True	W
5	CSI	2911	CSI2911	Professional Practice in Computing	NaN	W	2018	True	W
6	CSI	2132	CSI2132	Databases I	CSI 2110	W	2018	True	W
7	MAT	1322	MAT1322	Calculus II	MAT 1320	W	2018	False	S, F, W
8	CSI	2120	CSI2120	Programming Paradigms	CSI 2110	W	2018	True	W
9	CSI	3105	CSI3105	Design and Analysis of Algorithms	CSI 2110, CSI 2101	F	2018	True	F
10	CSI	3120	CSI3120	Programming Language Concepts	CSI 2101, CSI 2120	F	2018	False	F
11	SEG	3125	SEG3125	Analysis and Design of User Interfaces	SEG 2105	W	2019	False	S, W
12	CSI	4130	CSI4130	Computer Graphics	CSI 3105, MAT 1341	W	2019	False	W
13	CSI	3140	CSI3140	WWW Structures, Techniques and Standards	CSI 2110, CSI 2132	W	2019	False	S, W
14	CSI	3130	CSI3130	Databases II	CSI 2132	F	2018	False	F

Create list of nodes¶

Each course_code will be a node in the directed graph.

In [90]:

df.columns

Out[90]:

Index(['subject_code', 'digit_code', 'course_code', 'course_name',
       'prerequisites', 'expected_term', 'expected_year', 'required',
       'term_availability'],
      dtype='object')

In [91]:

courses = df['course_code']
courses

Out[91]:

0     CSI2110
1     SEG2105
2     PSY4130
3     MAT1320
4     CSI2101
5     CSI2911
6     CSI2132
7     MAT1322
8     CSI2120
9     CSI3105
10    CSI3120
11    SEG3125
12    CSI4130
13    CSI3140
14    CSI3130
Name: course_code, dtype: object

Create list of edges¶

Nodes are linked by prerequisite relations. For example, MAT1320 -> MAT1322 (because taking Calculus II requires having already passed Calculus I).

In [92]:

prerequisites = df['prerequisites']
prerequisites

Out[92]:

0      ITI 1121, MAT 1348
1                ITI 1121
2                     NaN
3                     NaN
4                MAT 1348
5                     NaN
6                CSI 2110
7                MAT 1320
8                CSI 2110
9     CSI 2110, CSI 2101 
10     CSI 2101, CSI 2120
11               SEG 2105
12     CSI 3105, MAT 1341
13     CSI 2110, CSI 2132
14               CSI 2132
Name: prerequisites, dtype: object

Weird Unicode thing¶

I got the prerequisites by copy/pasting from the course catalogue, but some of them have extra characters.

In [121]:

# Copy and pasted straight from my spreadsheet
weird_string = " ITI 1121, MAT 1348"
repr(weird_string)

Out[121]:

"'\\xa0ITI\\xa01121,\\xa0MAT\\xa01348'"

These characters are invisible when printed:

In [122]:

print(weird_string)

 ITI 1121, MAT 1348

According to samwize on StackOverflow, \xa0 is "actually non-breaking space in Latin1 (ISO 8859-1), also chr(160)". I don't really understand what that means, but I'm guessing \xa0 is some kind of substitute for a space.

In any case, keeping these characters made for ugly output so I chose to remove them. StackOverflow suggests unicodedata.normalize(). The documentation is a bit vague on what the different "forms" mean ("NFC", "NFKC", "NFD", and "NFKD") and I don't feel like researching this right now so...TRIAL AND ERROR IT IS!

In [124]:

import unicodedata

In [128]:

unicode_forms = ["NFC", "NFKC", "NFD", "NFKD"]

for form in unicode_forms:
    rep = repr(unicodedata.normalize(form, weird_string))
    print(f"{form}: {rep}")

NFC: '\xa0ITI\xa01121,\xa0MAT\xa01348'
NFKC: ' ITI 1121, MAT 1348'
NFD: '\xa0ITI\xa01121,\xa0MAT\xa01348'
NFKD: ' ITI 1121, MAT 1348'

NFKC and NFKD look good. I'll pick the former for no good reason.

In [129]:

def course_formatter2(course):
    '''(str) -> str
    Takes a course code and removes unicode (\xa0) character as well as commas
    and spaces.'''
    return unicodedata.normalize("NFKC", course).replace(',', '').replace(' ', '')

Why course_formatter2()? Because I used course_formatter() before looking into the problem more closely, it worked OK.

In [132]:

def course_formatter(course):
    '''(str) -> str
    Takes a course code and removes unicode (\xa0) character as well as commas
    and spaces.'''
    return course.replace(u'\xa0', '').replace(',', '').replace(' ', '')

In [131]:

# list of tuples
edges = []

for i, relation in enumerate(prerequisites):
    # Ignore courses with no prerequisites
    if not pd.isnull(relation):
        rs = relation.split(',')
        tmp = []
        for r in rs:
            tmp.append((courses[i], course_formatter2(r)))
        edges.append(tmp)

edges

Out[131]:

[[('CSI2110', 'ITI1121'), ('CSI2110', 'MAT1348')],
 [('SEG2105', 'ITI1121')],
 [('CSI2101', 'MAT1348')],
 [('CSI2132', 'CSI2110')],
 [('MAT1322', 'MAT1320')],
 [('CSI2120', 'CSI2110')],
 [('CSI3105', 'CSI2110'), ('CSI3105', 'CSI2101')],
 [('CSI3120', 'CSI2101'), ('CSI3120', 'CSI2120')],
 [('SEG3125', 'SEG2105')],
 [('CSI4130', 'CSI3105'), ('CSI4130', 'MAT1341')],
 [('CSI3140', 'CSI2110'), ('CSI3140', 'CSI2132')],
 [('CSI3130', 'CSI2132')]]

This is a bit confusing because it looks like CSI2110 -> ITI1121 (when it is the other way around).

In [133]:

# The same, but swapping the order
edges = []

for i, relation in enumerate(prerequisites):
    # Ignore courses with no prerequisites
    if not pd.isnull(relation):
        rs = relation.split(',')
        tmp = []
        for r in rs:
            tmp.append((course_formatter(r), courses[i]))
        edges.append(tmp)

edges

Out[133]:

[[('ITI1121', 'CSI2110'), ('MAT1348', 'CSI2110')],
 [('ITI1121', 'SEG2105')],
 [('MAT1348', 'CSI2101')],
 [('CSI2110', 'CSI2132')],
 [('MAT1320', 'MAT1322')],
 [('CSI2110', 'CSI2120')],
 [('CSI2110', 'CSI3105'), ('CSI2101', 'CSI3105')],
 [('CSI2101', 'CSI3120'), ('CSI2120', 'CSI3120')],
 [('SEG2105', 'SEG3125')],
 [('CSI3105', 'CSI4130'), ('MAT1341', 'CSI4130')],
 [('CSI2110', 'CSI3140'), ('CSI2132', 'CSI3140')],
 [('CSI2132', 'CSI3130')]]

Create DiGraph¶

With the NetworkX library.

In [134]:

import networkx as nx

In [135]:

G = nx.DiGraph()

Adding nodes and edges¶

In [136]:

G.add_nodes_from(courses)

In [137]:

for edge in edges:
    G.add_edges_from(edge)

Drawing the graph?¶

NetworkX provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. In the future, graph visualization functionality may be removed from NetworkX or only available as an add-on package. - Documentation for NetworkX

I'm going to try it anyway.

In [100]:

import matplotlib.pyplot as pl

In [101]:

%matplotlib inline

In [110]:

# Hide deprecation warnings
import warnings
warnings.filterwarnings('ignore')

In [111]:

nx.draw(G)

In [112]:

nx.draw_circular(G)

In [104]:

nx.draw_networkx(G)

Kinda cool, but I think yEd might be better for producing a finished graphic.

Export to GraphML¶

NetworkX can export to a lot of different formats, but I chose GraphML because yEd also supports it.

In [105]:

nx.write_graphml(G, "course_prerequisites.graphml")

This exported the nodes and edges, but the labels are missing! Google led me to this blog which had some good advice.

In [116]:

# Give nodes labels
for node in G.nodes():
    G.node[node]['label'] = node

In [107]:

# Export again
nx.write_graphml(G, "course_prerequisites.graphml")

The were exported, but yEd still required some tweaking to get them to display. The blog post I linked to earlier explains it (with screenshots!). Here is the critical part:

"We need to map the "Label" property imported from the GraphML file onto the internal property that yEd uses for labels. To do this, click Edit, Properties Mapper. Click the little plus sign under Configurations, and then 'New Configuration for Nodes.' Now click the plus sign next to 'Mappings'. If you're lucky, yEd should figure out what you're trying to do, and automatically select the right mapping." - Berzan

Finished graph¶

Notes:

I used yEd's "hierarchy" layout.
Courses I've already taken are color coded green.
There are four groups, representing the courses I intend to take for a given term.
A few arrows were repositioned (by hand) to make the group labels easier to read.

Interpretation¶

Taking the courses in this order should ensure prerequisites are always satisfied.

There are a few other interesting takeaways:

CSI 2110 is a prerequisite for four courses. Failing to acquire it in the Fall of 2017 will severely screw up my schedule.
PSY 4130 and CSI 2911 are not prerequisites for any course, so they can be acquired at any time.
MAT 1320 and MAT 1322 are a closed system. I'm a little surprised MAT 1322 is not a prerequisite for anything.

Also, I should think about how to best distribute my remaining courses. There is a chance of graduating in three terms instead of four if I can find a replacement for CSI4130 and take more courses in Fall 2018 (or over the summer). Reducing to three semesters may require taking less desirable courses but it would save me a lot of time and money.

If I do end up taking four semesters, I could shift CSI 2911 to a later semester to even the course load. Or I could keep it as it is and plan to work part-time. Scholarships and government grants usually require students to be "full-time", so I can't drop below three courses (the minimum for "full-time" status at uOttawa).

Weaknesses¶

The biggest weakness is the lack of definite course/semester availability after Winter 2018. I am just guessing that these courses will be available during these semesters. There are other issues, but this is the main one.

Data collection was disappointingly manual. The University of Ottawa simply does not provide good access to course information. Scraping is an option, but a frustrating one; the course timetable server is so old that Python's Request library cannot communicate with it [1] and uOttawa's HTML is not easy to parse. Manual collection is feasible, but doesn't scale and is obviously not convenient.

Footnotes¶

[1] - IRC, the issue I encountered was SSL handshake failure. The newest cipher protocol supported by the server had already been deprecated by Requests. While it was possible to use an older version of Requests, the connection was unreliable. A similar issue (similar problem, different website).