In this article, I try to visualize the last four semesters in my degree as a directed graph with courses as nodes and prerequisite relations as edges. Will my plan work, or am I missing prerequisites? NetworkX and yEd are used for creating and displaying the graph.
I want to graduate in at least two years which means I need to pass a certain set of courses in a specific sequence.
There are a lot of complications:
Not all courses are available every semester; certain courses can only be taken in the Fall term, others only in the Winter.
Many courses have prerequisites, often the prerequisites have prerequisites.
There are electives and a large library of possibilities.
According to uOttawa's sheet on course enrolment terminology, the academic year is divided into three "terms":
Courses are encoded in the following way:
Each course is designated by a seven-character code made up of three letters and four digits. The three letters stand for the subject. For example, CRM stands for criminology. The first digit is for the year of study (1000-level courses — those that start with 1 — are first year courses)...
The University of Ottawa does not make this easy. Course information is spread out over at least three separate databases:
The course catalogue. Contains most of the information I want, except for scheduling and availability. No search engine, so I had to look up a course code page and CTRL-F
.
The course timetable. Has a search engine (a clumsy one). Provides schedules for courses in the next two terms. There is no way to know if a course will be held in a term later than the next two, so I have to hope that the pattern doesn't change much (i.e. since SEG 2105 is only held in the Fall term this year, it will only be held in the Fall next year). Does not provide any prerequisite information! I feel bad for any one who tries to use this as their sole information source...
The enrollment system. I can't link to this as it can only be accessed through "uoZone" (uOttawa's internal system). As far as I know, this is the only place where course availability and number of available seats are given. This software is atrocious. Instead of following the universally agreed-upon course code system, courses are assigned a seemingly random sequence of numbers and letters. I have 1000 other complaints, but I'll spare you the rant.
After hours pouring over the program requirements and the course catalogue, I had a list of candidates and a possible schedule. But will it work? Have I missed a prerequisite or planned to take a course in the Fall that is only available in the Winter? This stuff is too complicated...I need a visual.
I created a spreadsheet of all the courses I'm planning on taking and furnished it with various information about them. The most confusing aspect is the prerequisite chain, so I'd like to visualize this as a directed graph with courses as nodes and prerequisite links as edges.
import pandas as pd
df = pd.read_csv('course_schedule.csv')
df
subject_code | digit_code | course_code | course_name | prerequisites | expected_term | expected_year | required | term_availability | |
---|---|---|---|---|---|---|---|---|---|
0 | CSI | 2110 | CSI2110 | Data Structures and Algorithms | ITI 1121, MAT 1348 | F | 2017 | True | F, W |
1 | SEG | 2105 | SEG2105 | Introduction to Software Engineering | ITI 1121 | F | 2017 | True | F |
2 | PSY | 4130 | PSY4130 | History and Systems of Psychology | NaN | F | 2017 | False | F, W |
3 | MAT | 1320 | MAT1320 | Calculus I | NaN | F | 2017 | False | S, F, W |
4 | CSI | 2101 | CSI2101 | Discrete Structures | MAT 1348 | W | 2018 | True | W |
5 | CSI | 2911 | CSI2911 | Professional Practice in Computing | NaN | W | 2018 | True | W |
6 | CSI | 2132 | CSI2132 | Databases I | CSI 2110 | W | 2018 | True | W |
7 | MAT | 1322 | MAT1322 | Calculus II | MAT 1320 | W | 2018 | False | S, F, W |
8 | CSI | 2120 | CSI2120 | Programming Paradigms | CSI 2110 | W | 2018 | True | W |
9 | CSI | 3105 | CSI3105 | Design and Analysis of Algorithms | CSI 2110, CSI 2101 | F | 2018 | True | F |
10 | CSI | 3120 | CSI3120 | Programming Language Concepts | CSI 2101, CSI 2120 | F | 2018 | False | F |
11 | SEG | 3125 | SEG3125 | Analysis and Design of User Interfaces | SEG 2105 | W | 2019 | False | S, W |
12 | CSI | 4130 | CSI4130 | Computer Graphics | CSI 3105, MAT 1341 | W | 2019 | False | W |
13 | CSI | 3140 | CSI3140 | WWW Structures, Techniques and Standards | CSI 2110, CSI 2132 | W | 2019 | False | S, W |
14 | CSI | 3130 | CSI3130 | Databases II | CSI 2132 | F | 2018 | False | F |
Each course_code will be a node in the directed graph.
df.columns
Index(['subject_code', 'digit_code', 'course_code', 'course_name', 'prerequisites', 'expected_term', 'expected_year', 'required', 'term_availability'], dtype='object')
courses = df['course_code']
courses
0 CSI2110 1 SEG2105 2 PSY4130 3 MAT1320 4 CSI2101 5 CSI2911 6 CSI2132 7 MAT1322 8 CSI2120 9 CSI3105 10 CSI3120 11 SEG3125 12 CSI4130 13 CSI3140 14 CSI3130 Name: course_code, dtype: object
Nodes are linked by prerequisite relations. For example, MAT1320 -> MAT1322 (because taking Calculus II requires having already passed Calculus I).
prerequisites = df['prerequisites']
prerequisites
0 ITI 1121, MAT 1348 1 ITI 1121 2 NaN 3 NaN 4 MAT 1348 5 NaN 6 CSI 2110 7 MAT 1320 8 CSI 2110 9 CSI 2110, CSI 2101 10 CSI 2101, CSI 2120 11 SEG 2105 12 CSI 3105, MAT 1341 13 CSI 2110, CSI 2132 14 CSI 2132 Name: prerequisites, dtype: object
I got the prerequisites by copy/pasting from the course catalogue, but some of them have extra characters.
# Copy and pasted straight from my spreadsheet
weird_string = " ITI 1121, MAT 1348"
repr(weird_string)
"'\\xa0ITI\\xa01121,\\xa0MAT\\xa01348'"
These characters are invisible when printed:
print(weird_string)
ITI 1121, MAT 1348
According to samwize on StackOverflow, \xa0
is "actually non-breaking space in Latin1 (ISO 8859-1), also chr(160)". I don't really understand what that means, but I'm guessing \xa0
is some kind of substitute for a space.
In any case, keeping these characters made for ugly output so I chose to remove them. StackOverflow suggests unicodedata.normalize()
. The documentation is a bit vague on what the different "forms" mean ("NFC", "NFKC", "NFD", and "NFKD") and I don't feel like researching this right now so...TRIAL AND ERROR IT IS!
import unicodedata
unicode_forms = ["NFC", "NFKC", "NFD", "NFKD"]
for form in unicode_forms:
rep = repr(unicodedata.normalize(form, weird_string))
print(f"{form}: {rep}")
NFC: '\xa0ITI\xa01121,\xa0MAT\xa01348' NFKC: ' ITI 1121, MAT 1348' NFD: '\xa0ITI\xa01121,\xa0MAT\xa01348' NFKD: ' ITI 1121, MAT 1348'
NFKC and NFKD look good. I'll pick the former for no good reason.
def course_formatter2(course):
'''(str) -> str
Takes a course code and removes unicode (\xa0) character as well as commas
and spaces.'''
return unicodedata.normalize("NFKC", course).replace(',', '').replace(' ', '')
Why course_formatter2()
? Because I used course_formatter()
before looking into the problem more closely, it worked OK.
def course_formatter(course):
'''(str) -> str
Takes a course code and removes unicode (\xa0) character as well as commas
and spaces.'''
return course.replace(u'\xa0', '').replace(',', '').replace(' ', '')
# list of tuples
edges = []
for i, relation in enumerate(prerequisites):
# Ignore courses with no prerequisites
if not pd.isnull(relation):
rs = relation.split(',')
tmp = []
for r in rs:
tmp.append((courses[i], course_formatter2(r)))
edges.append(tmp)
edges
[[('CSI2110', 'ITI1121'), ('CSI2110', 'MAT1348')], [('SEG2105', 'ITI1121')], [('CSI2101', 'MAT1348')], [('CSI2132', 'CSI2110')], [('MAT1322', 'MAT1320')], [('CSI2120', 'CSI2110')], [('CSI3105', 'CSI2110'), ('CSI3105', 'CSI2101')], [('CSI3120', 'CSI2101'), ('CSI3120', 'CSI2120')], [('SEG3125', 'SEG2105')], [('CSI4130', 'CSI3105'), ('CSI4130', 'MAT1341')], [('CSI3140', 'CSI2110'), ('CSI3140', 'CSI2132')], [('CSI3130', 'CSI2132')]]
This is a bit confusing because it looks like CSI2110 -> ITI1121 (when it is the other way around).
# The same, but swapping the order
edges = []
for i, relation in enumerate(prerequisites):
# Ignore courses with no prerequisites
if not pd.isnull(relation):
rs = relation.split(',')
tmp = []
for r in rs:
tmp.append((course_formatter(r), courses[i]))
edges.append(tmp)
edges
[[('ITI1121', 'CSI2110'), ('MAT1348', 'CSI2110')], [('ITI1121', 'SEG2105')], [('MAT1348', 'CSI2101')], [('CSI2110', 'CSI2132')], [('MAT1320', 'MAT1322')], [('CSI2110', 'CSI2120')], [('CSI2110', 'CSI3105'), ('CSI2101', 'CSI3105')], [('CSI2101', 'CSI3120'), ('CSI2120', 'CSI3120')], [('SEG2105', 'SEG3125')], [('CSI3105', 'CSI4130'), ('MAT1341', 'CSI4130')], [('CSI2110', 'CSI3140'), ('CSI2132', 'CSI3140')], [('CSI2132', 'CSI3130')]]
With the NetworkX library.
import networkx as nx
G = nx.DiGraph()
G.add_nodes_from(courses)
for edge in edges:
G.add_edges_from(edge)
NetworkX provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. In the future, graph visualization functionality may be removed from NetworkX or only available as an add-on package. - Documentation for NetworkX
I'm going to try it anyway.
import matplotlib.pyplot as pl
%matplotlib inline
# Hide deprecation warnings
import warnings
warnings.filterwarnings('ignore')
nx.draw(G)
nx.draw_circular(G)
nx.draw_networkx(G)
Kinda cool, but I think yEd might be better for producing a finished graphic.
NetworkX can export to a lot of different formats, but I chose GraphML because yEd also supports it.
nx.write_graphml(G, "course_prerequisites.graphml")
This exported the nodes and edges, but the labels are missing! Google led me to this blog which had some good advice.
# Give nodes labels
for node in G.nodes():
G.node[node]['label'] = node
# Export again
nx.write_graphml(G, "course_prerequisites.graphml")
The were exported, but yEd still required some tweaking to get them to display. The blog post I linked to earlier explains it (with screenshots!). Here is the critical part:
"We need to map the "Label" property imported from the GraphML file onto the internal property that yEd uses for labels. To do this, click Edit, Properties Mapper. Click the little plus sign under Configurations, and then 'New Configuration for Nodes.' Now click the plus sign next to 'Mappings'. If you're lucky, yEd should figure out what you're trying to do, and automatically select the right mapping." - Berzan
Notes:
I used yEd's "hierarchy" layout.
Courses I've already taken are color coded green.
There are four groups, representing the courses I intend to take for a given term.
A few arrows were repositioned (by hand) to make the group labels easier to read.
Taking the courses in this order should ensure prerequisites are always satisfied.
There are a few other interesting takeaways:
CSI 2110 is a prerequisite for four courses. Failing to acquire it in the Fall of 2017 will severely screw up my schedule.
PSY 4130 and CSI 2911 are not prerequisites for any course, so they can be acquired at any time.
MAT 1320 and MAT 1322 are a closed system. I'm a little surprised MAT 1322 is not a prerequisite for anything.
Also, I should think about how to best distribute my remaining courses. There is a chance of graduating in three terms instead of four if I can find a replacement for CSI4130 and take more courses in Fall 2018 (or over the summer). Reducing to three semesters may require taking less desirable courses but it would save me a lot of time and money.
If I do end up taking four semesters, I could shift CSI 2911 to a later semester to even the course load. Or I could keep it as it is and plan to work part-time. Scholarships and government grants usually require students to be "full-time", so I can't drop below three courses (the minimum for "full-time" status at uOttawa).
The biggest weakness is the lack of definite course/semester availability after Winter 2018. I am just guessing that these courses will be available during these semesters. There are other issues, but this is the main one.
Data collection was disappointingly manual. The University of Ottawa simply does not provide good access to course information. Scraping is an option, but a frustrating one; the course timetable server is so old that Python's Request library cannot communicate with it [1] and uOttawa's HTML is not easy to parse. Manual collection is feasible, but doesn't scale and is obviously not convenient.
[1] - IRC, the issue I encountered was SSL handshake failure. The newest cipher protocol supported by the server had already been deprecated by Requests. While it was possible to use an older version of Requests, the connection was unreliable. A similar issue (similar problem, different website).