Lab 6 Lecture

Creating and Editing Plaintext

Topics

  • open-source
  • licenses
  • basic technical writing
  • READMEs
  • codes of conduct

Readings

The following is a good list of resources to read for more information on this week's lesson.

Project

Your project for this week is a continuation of last week's. There is more detail in the exercise notebook, but you're going to be creating documentation for a sample (read: fake) open-source project. You're going to need at least a README, a license, and a code of conduct.

This is a lot more self-guided than past weeks' exercises. So, have fun! If you're totally lost and don't know what to do, send me an email or come to office hours.

A note about preparation

I can't stress enough the importance of reading and familiarizing yourself with READMEs before trying to write one; it's not a daunting or even a particularly difficult tast, but it is one that will take some time. No amount of "how-to's" are going to help you write a README if you've never read one before. So, take some time and skim a README or two to get a sense of tone, style, structure, and approach.

Free and open-source software

In the early days of computing, writing code was almost entirely an academic exercise. Computers were not connected to one another over the Internet (because the Internet didn't exist), so code was shared by publishing it in academic journals. If you wanted to run somebody's program, you got a grad student to type it in on your computer for you.

This all changed with the ARPANET, which was developed in the 1960s and expanded throughout the next few decades. With this, academic institutions were linked and, thus, sharing code from computer to computer became possible.

However, the concept of "open-source software" or "licensing" wasn't relevant; there were maybe a few hundred people in the entire world who you could share your code with.

Proprietary software

As home and business computing became more prevalent, it was common for users to have access to the entire source code of all of their programs, and were free to modify and share this as they wished. This was made possible by the fact that "computing companies" writ large made their money from selling hardware, and "bundled" their own software in every hardware purchase.

For example: Imagine I'm a programmer for IBM, and I have an idea for a better version of some program, say, a database, that comes bundled with all IBM mainframes. So I quit at IBM, and start "My Systems, Inc.". I write a better database software for IBM machines and start selling it to businesses with IBM mainframes. But, wait, if I give everyone who buys my software the entire source code, what's to prevent them from emailing it to their friends? Or renaming it and selling it themselves? I don't sell any hardware; the software is my entire business. So I need to protect myself. So, I copyright my code, and don't give people the source code. I just give them the already-made program, runnable but not human-readable.

The development of proprietary, third-party software leads to software being written and sold not in the public domain, with its source code kept private. In 1980, US copyright law was extended to computer programs, treating them like other materials like books, trade documents, movies, and music.

Open-source as a response

A movement emerged to counter both corporations like IBM and companies copyrighting their software. That movement has a number of names, but you'll hear it referred to as open-source software or free software.

This movement developed out of a community of programmers and hackers who worked on improving the software that came with computers and was available at the time. Their response to the crisis of proprietary, closed-source software was to build software that was open-source and licensed permissively, meaning that the owner of the software had the right to do what they wanted with it.

This video is a good introduction to open source.

A note on terminology

Like with "bash", "the shell", "the command line", etc., last week, "free software" and "open-source software" functionally mean the same thing. However, various people in these communities have incredibly strong opinions about this terminology, so tread carefully if you want to avoid making people angry. We're not going to get into any more detail than this, but if you're interested, there's an entire Wikipedia article about it.

Licenses

In 1989, the GNU project released the GNU General Public License (the GPL), the first free-software license. It made sure that anyone who uses your GPL-licensed software would have to redistribute it and their derivative under the GPL.

Other licenses, like the MIT and Apache licenses, were more permissive: they basically say "do whatever you want to do, just don't sue me". At the extreme end of this spectrum are the Do What You Want Public License and the Unlicense.

The principles of open source

  • Open exchange
  • Participation
  • Rapid prototyping
  • Meritocracy
  • Community

We'll see that there are issues with all of these principles (meritocracy especially), but these are the general guiding lights of people who work on open-source software projects.

Open-source projects

You're, right now, relying on at least three obvious open-source projects right now. Firstly, the supercomputer running Jupyter Lab is a Linux machine. Jupyter itself is an open-source project, and so is Python.

These projects are usually run by the people who wrote the programs intially, and they have control over which changes and improvements are accepted in the "official version" of the software. They're sometimes referred to as "benevolent dictators," because, while they have total control over the code, they also have the best interests of the software in mind, and tend to accept code that improves the software.

The nature of open-sourcing, though, means that, not only can anyone contribute, anyone can copy the software and make their own version of it. This can be referred to as "forking". So, if you disagree with the leader of a software project, you're welcome to make your own project based on it, with you at the helm.

Licenses

A license is, simply, a list of rules detailing who can use a piece of software and how they can use it:

  • Can they install it on any computer they want?
  • Can they give it to someone else?
  • Can they change the software and give that to others?
  • Can they sell it?

There are a plethora of software licenses available for open-source and free software. Let's take a look at a comparison of some of them.

Creative Commons

Creation always involves building upon something else. There is no art that doesn't reuse. And there will be less art if every reuse is taxed by the appropriator.

                 — Lawrence Lessig

The Creative Commons project, which made some of the licenses we looked at, was founded by Lawrence Lessig, Harvard professor, lawyer, and activist, to promote "free culture". This can be looked at as a companion/extension to "free software". CC licenses are used sometimes for software, but much more often for writing, art, photography, and general creative endeavour. Not all CC licenses are "free culture" licenses, as some of them ban people from reselling software, using it for commercial businesses, or changing the work in any way.

Open-source vs. public domain

As you can see, many licenses for software exist to reserve some rights to those who make the software. You can be restricted from making money off of the software, from using it without telling people you're using it, or prevented from modifying it altogether.

It's worth pointing out that "free software" doesn't necessarily mean that the software doesn't cost money. It means that it's free in a legal sense, much like free speech.

Also, something being open-source, meaning you can see the code and are free to contribute to, use, and change it, doesn't mean that the software is in the public domain, meaning that it has no owners and is considered the collective property of all people.

Technical writing

You'll often see technical writing contrasted with creative writing, but, frankly, I think that's stupid. Technical writing, while it may not require a Fiction MFA, gives a programmer the opportunity to stretch their legs a little bit and have some fun talking about what their code does, and why.

All of you have done at least some technical writing:

  1. Commenting your code. Line-by-line descriptions of what your code does, and longer descriptions of how functions, um, function is technical writing. In fact, it's some of the most important technical writing you'll do as a developer. Commenting your code is essential (especially in open-source environments), because it lets other people understand what's going on.
  2. Your lab exercises. Some of the problems in your lab exercises involved writing what you thought was going to happen, running code, and then, if you were wrong, explaining why. That thought process is essential for good, clear technical writing; chances are, if you were confused by something, someone doing the same thing is going to be confused, too, at least some of the time. Working through your thought process allows you to explain how to get to the correct answer in a way that's easy to follow.

Technical writing in one sentence

Good technical writing is concise, focused, easy to understand, free of errors, and is audience-based.

  • concise: Be short. Don't say in a paragraph of florid prose what you can fit into a sentence. You want people who care to not be alienated by the task of reading what you have to say.
  • focused: Stay on topic. This doesn't mean "be boring": you can be fun, you can joke around, you can use GIFs and images, but don't soliloquize about things tangientially related to your topic. That's what personal blogs are for, not technical documents.
  • easy to understand: Don't use jargon. It's okay to be technical, and in most cases, you're going to be writing for people who have at least some computer knowledge, but try to use the simplest terms possible to explain what you're talking about without sacrificing necessary detail.
  • free of errors: Well, duh.
  • audience-based: This ties into "easy to understand"; the words that you use are going to change depending on who you're talking to. How you'd describe a database you built to your grandmother at Thanksgiving is different from how you'd describe it to a programmer who's trying to build an iPhone app using your software. It's important to recognize that both are examples of technical writing! But the audience is vastly different. Modulate how you speak depending on to whom you are speaking.

README.md

A good example of technical writing is the README. A README is a short document detailing what a piece of software does, how to use it, and how one could contribute to it if they wanted to. Let's break these three parts up a bit more.

What the software does

This literally is just a few sentences explaining, in plain English, what your software does. The audience for this is pretty general, so try to avoid technical jargon if possible. Think of a tech-savvy CEO browsing articles for what software to tell her CTO to look into. Don't bore her or make her feel stupid.

How to use it

You're going to want to spell out two things here:

  1. how to install your software, and
  2. how to actually use your software for what it does.

Installation documentation

Installation is important for software that's not a .app or .exe file, in other words, a program that doesn't automatically install and run itself. Describe what files they need to move where, anything they need to configure on their system, and how to make sure everything is installed correctly.

A lot of programs have a "test" you can run on the command line that'll print out something like this:

You're running foobar v2.3. The current directory is /home/foobar. The database is MySQL. If you're reading this, everything is running correctly.

If your program is complicated, consider including a test like this.

Usage documentation

READMEs are, by definition, short pieces of writing, so don't write an exhaustive list of every possible thing your software can be used for. That's what more formal documentation is for. All you need to do is write a short description of how to use your software for what it's supposed to do, generally. Think of it as typing --help after a shell command. When you run cat --help, you get this:

Usage: cat [OPTION]... [FILE]... Concatenate FILE(s), or standard input, to standard output. -A, --show-all equivalent to -vET -b, --number-nonblank number nonempty output lines, overrides -n -e equivalent to -vE -E, --show-ends display $ at end of each line -n, --number number all output lines -s, --squeeze-blank suppress repeated empty output lines -t equivalent to -vT -T, --show-tabs display TAB characters as ^I -u (ignored) -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB --help display this help and exit --version output version information and exit With no FILE, or when FILE is -, read standard input. Examples: cat f - g Output f's contents, then standard input, then g's contents. cat Copy standard input to standard output. GNU coreutils online help: For complete documentation, run: info coreutils 'cat invocation'

It's not a complete description of every possible use case of any user on any system known to man. It's just a description of the basic usage—how most people will approach it. That's what you should do for this section of your readme.

Your audience here is pretty tech-savvy: network adminstrators, high-level IT folks, advanced amateurs. Don't be afraid to use technical language, but again, don't intentionally obfuscate. Be as simple as possible, but don't leave anything important out.

How to contribute

This is the only part of a README that's specific to open-source projects. Other software you'll write will have documentation of how it works for laypeople, and descriptions of how to install and use it for the tech-savvy people who are using it on their systems. But only for open-source projects do you tell people how to add features or fix bugs themselves.

Your tone is extremely important here. Consider your audience: you're talking to developers and hackers who have a significant amount of tech-savviness, but you're also asking them for a favor. Open-source projects generally don't have the budget to pay more than one or two full-time developers, if that, so 99% of the work done is volunteered.

So, obviously, you want to make it easy to contribute to the project. Describe to them how to fork your code and what they need to do to run your program and test their changes. If you have a format for submitting changes (pull requests on GitHub), tell them what it is. And, most importantly, be nice. You are asking these people to volunteer on your project, after all.

Details of writing a README

There are some important elements to keep in your README-writing toolbox, so to speak.

Snippets and example code

When you're describing how to use your program, it might be helpful to include some short examples. If your software is something like a Python package, it can be as simple as showing how to import your package and use a function. These serve more as illustrations of your software's format rather than practical uses.

For installation, it can be helpful to provide an example of the shell commands necessary for installation and testing. This can be done in much the same format.

You can insert code snippets in Markdown like this:

```python import numpy as np a = np.arange(15).reshape(3, 5) ```

That snippet looks like this when rendered:

import numpy as np

a = np.arange(15).reshape(3, 5)

You can also use single backticks for shorter snippets, like this: `cat setup.py`, which looks like this: cat setup.py.

GIFs!

GIFs aren't just fun, they can also be illustrative of how your program works. If you want to demonstrate a user interface or show what something should look like, a quick screen recording with a program like Gyazo displayed as a GIF can be worth, literally, a thousand words.

An example

Let's get a little meta and review the README for JupyterLab, which happens to be absolutely fantastic.

Codes of conduct and the politics of software

Open-source projects can seem, and indeed many people say, that they're "all about the code". But that's a pretty myopic view of something that, in addition to just a nameless, faceless codebase, contains hundreds and sometimes thousands of dedicated, opinionated humans who care deeply about structure and process. Everyone who's acting in good faith on a project wants the same thing: for the project to flourish and be good. But, inevitably, people are going to write bad code. People are going to get angry at each other. People with pretty abhorrent views are going to bring that along with them. That, combined with the insular and sometimes monocultural nature of programming, reveals that open-source software projects aren't just software, they're projects.

Diversity and meritocracy

Meritocracy, or the idea that how much one's voice is valued should be proportional to the quality of their contributions, seems to be a north-star principle in a lot of open-source projects. "Judge the code, not the people" is an oft-heard refrain. But this creates a false dichotomy: you can't separate people from the code they write, and how people communicate is just as important as the code they write, if not more so.

Women, people of color, and queer/trans people are already woefully underrepresented in computer science spaces and, in large part, on Internet-based communities writ large. Just as intentionally valuing diversity and inclusion can create an environment where marginalized programmers feel comfortable, people acting badly on projects can create an environment that can make marginalized people unwelcome, and thus less likely to contribute.

So, enforced meritocracy can actually create an environment where good code is being sacrificed for the sake of people who contribute to the project and create a hostile environment for newcomers and marginalized groups.

Enter the CoC

Codes of conduct are not a new invention. Many academic and software conferences have required people attending to state that they won't be discriminatory or engage in harassment while at the conference. This is a good tool to intentionally create a welcoming space for diverse groups.

Codes of conduct haven't been as widespread in open-source software; often, there will be a few vague words about civility or a quote from a movie: "Be excellent to each other" was quoted in the Linux kernel's "code of conflict" for a long time, and Wil Wheaton's eponymous "Wheaton's Law"—simply, "Don't be a dick."—occurs frequently as well.

There are a number of codes of conduct, but the most widespread by far is the Contributor Covenant, whose author, Coraline Ada Ehmke, you've read already.

Developed in 2014, it's since undergone a number of versions and changes, including translation into 30 languages. It's been adopted by more than 40,000 open-source projects, including Firefox, Google, Microsoft, and Linux.

Many projects, including Jupyter, have developed their own codes of conduct. But, CoCs, to be effective, have to spell out, in some detail, what behaviour is unacceptable and what the steps are for recourse. "Be excellent to each other" isn't good enough.

Parts of a code of conduct

  • Good codes of conduct spell out acceptable and unacceptable behavior. Is your project trying to target young children, getting them to write for an open-source project from a young age? Then you'll probably want to specify that contributors should keep cursing (normally acceptable on the internet) to a minimum. Writing an app to provide support resources for victims of sexual assault and harassment? The community should be sensitive to the needs of those with PTSD, and you'll want to spell that out.
  • They also should lay out what happens when something goes wrong. You'll usually want to start by simply allowing the offending party to apologize, sincerely, to those who they wronged. If it's particularly egregious, a suspension or even permanent expulsion from the project can be warranted.
  • Implicit in this is a statement of the values of the project community. How a code of conduct is written and even its existence can indicate what a community values: welcoming marginalized people, writing good code, being nice to newcomers.

A code of conduct is a contract

Similar to how a license is a contract between software owners and software users, a code of conduct is a contract between project maintainers and project contributors. As we've said before, project maintainers are "benevolent dictators", solely responsible for whose changes are accepted and who contributes to a project. Whether or not a project has a code of conduct, maintainers can kick someone off of a project. Codes of conduct formalize that relationship and make it easier for all parties involved to recognize what is acceptable and what happens when someone says or does something harmful to the project.

In summary

  • Open-source and free software has been around for a while and runs on basically every computer ever.
  • Licenses help you tell your users how they can use your software, and restrict people from misusing it.
  • Technical writing is best when short and sweet.
  • READMEs are the most important thing you'll ever write in an open-source project, because they're the only thing that everyone is guaranteed to read.
  • Codes of conduct help you run a good open-source project with contributors who are all on the same page.

And now, onto your project. Excited?

yes, obviously