The following is a good list of resources to read for more information on this week's lesson.
Your project for this week is a continuation of last week's. There is more detail in the exercise notebook, but you're going to be creating documentation for a sample (read: fake) open-source project. You're going to need at least a README, a license, and a code of conduct.
This is a lot more self-guided than past weeks' exercises. So, have fun! If you're totally lost and don't know what to do, send me an email or come to office hours.
I can't stress enough the importance of reading and familiarizing yourself with READMEs before trying to write one; it's not a daunting or even a particularly difficult tast, but it is one that will take some time. No amount of "how-to's" are going to help you write a README if you've never read one before. So, take some time and skim a README or two to get a sense of tone, style, structure, and approach.
In the early days of computing, writing code was almost entirely an academic exercise. Computers were not connected to one another over the Internet (because the Internet didn't exist), so code was shared by publishing it in academic journals. If you wanted to run somebody's program, you got a grad student to type it in on your computer for you.
This all changed with the ARPANET, which was developed in the 1960s and expanded throughout the next few decades. With this, academic institutions were linked and, thus, sharing code from computer to computer became possible.
However, the concept of "open-source software" or "licensing" wasn't relevant; there were maybe a few hundred people in the entire world who you could share your code with.
As home and business computing became more prevalent, it was common for users to have access to the entire source code of all of their programs, and were free to modify and share this as they wished. This was made possible by the fact that "computing companies" writ large made their money from selling hardware, and "bundled" their own software in every hardware purchase.
For example: Imagine I'm a programmer for IBM, and I have an idea for a better version of some program, say, a database, that comes bundled with all IBM mainframes. So I quit at IBM, and start "My Systems, Inc.". I write a better database software for IBM machines and start selling it to businesses with IBM mainframes. But, wait, if I give everyone who buys my software the entire source code, what's to prevent them from emailing it to their friends? Or renaming it and selling it themselves? I don't sell any hardware; the software is my entire business. So I need to protect myself. So, I copyright my code, and don't give people the source code. I just give them the already-made program, runnable but not human-readable.
The development of proprietary, third-party software leads to software being written and sold not in the public domain, with its source code kept private. In 1980, US copyright law was extended to computer programs, treating them like other materials like books, trade documents, movies, and music.
A movement emerged to counter both corporations like IBM and companies copyrighting their software. That movement has a number of names, but you'll hear it referred to as open-source software or free software.
This movement developed out of a community of programmers and hackers who worked on improving the software that came with computers and was available at the time. Their response to the crisis of proprietary, closed-source software was to build software that was open-source and licensed permissively, meaning that the owner of the software had the right to do what they wanted with it.
This video is a good introduction to open source.
Like with "bash", "the shell", "the command line", etc., last week, "free software" and "open-source software" functionally mean the same thing. However, various people in these communities have incredibly strong opinions about this terminology, so tread carefully if you want to avoid making people angry. We're not going to get into any more detail than this, but if you're interested, there's an entire Wikipedia article about it.
In 1989, the GNU project released the GNU General Public License (the GPL), the first free-software license. It made sure that anyone who uses your GPL-licensed software would have to redistribute it and their derivative under the GPL.
Other licenses, like the MIT and Apache licenses, were more permissive: they basically say "do whatever you want to do, just don't sue me". At the extreme end of this spectrum are the Do What You Want Public License and the Unlicense.
We'll see that there are issues with all of these principles (meritocracy especially), but these are the general guiding lights of people who work on open-source software projects.
You're, right now, relying on at least three obvious open-source projects right now. Firstly, the supercomputer running Jupyter Lab is a Linux machine. Jupyter itself is an open-source project, and so is Python.
These projects are usually run by the people who wrote the programs intially, and they have control over which changes and improvements are accepted in the "official version" of the software. They're sometimes referred to as "benevolent dictators," because, while they have total control over the code, they also have the best interests of the software in mind, and tend to accept code that improves the software.
The nature of open-sourcing, though, means that, not only can anyone contribute, anyone can copy the software and make their own version of it. This can be referred to as "forking". So, if you disagree with the leader of a software project, you're welcome to make your own project based on it, with you at the helm.
A license is, simply, a list of rules detailing who can use a piece of software and how they can use it:
There are a plethora of software licenses available for open-source and free software. Let's take a look at a comparison of some of them.
Creation always involves building upon something else. There is no art that doesn't reuse. And there will be less art if every reuse is taxed by the appropriator.
— Lawrence Lessig
The Creative Commons project, which made some of the licenses we looked at, was founded by Lawrence Lessig, Harvard professor, lawyer, and activist, to promote "free culture". This can be looked at as a companion/extension to "free software". CC licenses are used sometimes for software, but much more often for writing, art, photography, and general creative endeavour. Not all CC licenses are "free culture" licenses, as some of them ban people from reselling software, using it for commercial businesses, or changing the work in any way.
As you can see, many licenses for software exist to reserve some rights to those who make the software. You can be restricted from making money off of the software, from using it without telling people you're using it, or prevented from modifying it altogether.
It's worth pointing out that "free software" doesn't necessarily mean that the software doesn't cost money. It means that it's free in a legal sense, much like free speech.
Also, something being open-source, meaning you can see the code and are free to contribute to, use, and change it, doesn't mean that the software is in the public domain, meaning that it has no owners and is considered the collective property of all people.
You'll often see technical writing contrasted with creative writing, but, frankly, I think that's stupid. Technical writing, while it may not require a Fiction MFA, gives a programmer the opportunity to stretch their legs a little bit and have some fun talking about what their code does, and why.
All of you have done at least some technical writing:
Good technical writing is concise, focused, easy to understand, free of errors, and is audience-based.
A good example of technical writing is the README. A README is a short document detailing what a piece of software does, how to use it, and how one could contribute to it if they wanted to. Let's break these three parts up a bit more.
This literally is just a few sentences explaining, in plain English, what your software does. The audience for this is pretty general, so try to avoid technical jargon if possible. Think of a tech-savvy CEO browsing articles for what software to tell her CTO to look into. Don't bore her or make her feel stupid.
You're going to want to spell out two things here:
Installation is important for software that's not a
.exe file, in other words, a program that doesn't automatically install and run itself. Describe what files they need to move where, anything they need to configure on their system, and how to make sure everything is installed correctly.
A lot of programs have a "test" you can run on the command line that'll print out something like this:
You're running foobar v2.3. The current directory is /home/foobar. The database is MySQL. If you're reading this, everything is running correctly.
If your program is complicated, consider including a test like this.
READMEs are, by definition, short pieces of writing, so don't write an exhaustive list of every possible thing your software can be used for. That's what more formal documentation is for. All you need to do is write a short description of how to use your software for what it's supposed to do, generally. Think of it as typing
--help after a shell command. When you run
cat --help, you get this:
It's not a complete description of every possible use case of any user on any system known to man. It's just a description of the basic usage—how most people will approach it. That's what you should do for this section of your readme.
Your audience here is pretty tech-savvy: network adminstrators, high-level IT folks, advanced amateurs. Don't be afraid to use technical language, but again, don't intentionally obfuscate. Be as simple as possible, but don't leave anything important out.
This is the only part of a README that's specific to open-source projects. Other software you'll write will have documentation of how it works for laypeople, and descriptions of how to install and use it for the tech-savvy people who are using it on their systems. But only for open-source projects do you tell people how to add features or fix bugs themselves.
Your tone is extremely important here. Consider your audience: you're talking to developers and hackers who have a significant amount of tech-savviness, but you're also asking them for a favor. Open-source projects generally don't have the budget to pay more than one or two full-time developers, if that, so 99% of the work done is volunteered.
So, obviously, you want to make it easy to contribute to the project. Describe to them how to fork your code and what they need to do to run your program and test their changes. If you have a format for submitting changes (pull requests on GitHub), tell them what it is. And, most importantly, be nice. You are asking these people to volunteer on your project, after all.
There are some important elements to keep in your README-writing toolbox, so to speak.
When you're describing how to use your program, it might be helpful to include some short examples. If your software is something like a Python package, it can be as simple as showing how to import your package and use a function. These serve more as illustrations of your software's format rather than practical uses.
For installation, it can be helpful to provide an example of the shell commands necessary for installation and testing. This can be done in much the same format.
You can insert code snippets in Markdown like this:
That snippet looks like this when rendered:
import numpy as np a = np.arange(15).reshape(3, 5)
You can also use single backticks for shorter snippets, like this: `cat setup.py`, which looks like this:
GIFs aren't just fun, they can also be illustrative of how your program works. If you want to demonstrate a user interface or show what something should look like, a quick screen recording with a program like Gyazo displayed as a GIF can be worth, literally, a thousand words.
Let's get a little meta and review the README for JupyterLab, which happens to be absolutely fantastic.
Open-source projects can seem, and indeed many people say, that they're "all about the code". But that's a pretty myopic view of something that, in addition to just a nameless, faceless codebase, contains hundreds and sometimes thousands of dedicated, opinionated humans who care deeply about structure and process. Everyone who's acting in good faith on a project wants the same thing: for the project to flourish and be good. But, inevitably, people are going to write bad code. People are going to get angry at each other. People with pretty abhorrent views are going to bring that along with them. That, combined with the insular and sometimes monocultural nature of programming, reveals that open-source software projects aren't just software, they're projects.
Meritocracy, or the idea that how much one's voice is valued should be proportional to the quality of their contributions, seems to be a north-star principle in a lot of open-source projects. "Judge the code, not the people" is an oft-heard refrain. But this creates a false dichotomy: you can't separate people from the code they write, and how people communicate is just as important as the code they write, if not more so.
Women, people of color, and queer/trans people are already woefully underrepresented in computer science spaces and, in large part, on Internet-based communities writ large. Just as intentionally valuing diversity and inclusion can create an environment where marginalized programmers feel comfortable, people acting badly on projects can create an environment that can make marginalized people unwelcome, and thus less likely to contribute.
So, enforced meritocracy can actually create an environment where good code is being sacrificed for the sake of people who contribute to the project and create a hostile environment for newcomers and marginalized groups.
Codes of conduct are not a new invention. Many academic and software conferences have required people attending to state that they won't be discriminatory or engage in harassment while at the conference. This is a good tool to intentionally create a welcoming space for diverse groups.
Codes of conduct haven't been as widespread in open-source software; often, there will be a few vague words about civility or a quote from a movie: "Be excellent to each other" was quoted in the Linux kernel's "code of conflict" for a long time, and Wil Wheaton's eponymous "Wheaton's Law"—simply, "Don't be a dick."—occurs frequently as well.
There are a number of codes of conduct, but the most widespread by far is the Contributor Covenant, whose author, Coraline Ada Ehmke, you've read already.
Developed in 2014, it's since undergone a number of versions and changes, including translation into 30 languages. It's been adopted by more than 40,000 open-source projects, including Firefox, Google, Microsoft, and Linux.
Many projects, including Jupyter, have developed their own codes of conduct. But, CoCs, to be effective, have to spell out, in some detail, what behaviour is unacceptable and what the steps are for recourse. "Be excellent to each other" isn't good enough.
Similar to how a license is a contract between software owners and software users, a code of conduct is a contract between project maintainers and project contributors. As we've said before, project maintainers are "benevolent dictators", solely responsible for whose changes are accepted and who contributes to a project. Whether or not a project has a code of conduct, maintainers can kick someone off of a project. Codes of conduct formalize that relationship and make it easier for all parties involved to recognize what is acceptable and what happens when someone says or does something harmful to the project.
And now, onto your project. Excited?