Notebook

Lab Lesson¶

Introduction to the Command Line¶

Topics¶

file directories and abstraction
the command line
navigating filesystems
command line autocomplete
viewing and creating files
running programs from the command line
manipulating files and folders
creating files on the command line
man pages
piping, carets, and redirection

Readings¶

This lecture is based on your readings for the week: Tracy Osborn's Really Friendly Command Line Intro; Software Carpentry's The Unix Shell; and the optional essay by Neal Stephenson, In the Beginning was the Command Line. For your reference:

Terminology¶

Here's a short description detailing the different terms that people use for the command line and different versions of it. If you're confused by some of the verbiage or would like a bit more depth on the different types of command lines people use, check it out here. If you're a bit shakey on some of the terminology, it might be information overload for you, so don't feel like it's mandatory to read.

Exercises¶

There are a few in-class short exercises that deal with what you'll be learning today, to be completed in small groups of two or three. There is not, however, a lab exercise notebook for you to complete over the course of the week, so you have a bit of a break from lab work this week. Instead, make a GitHub account, get the URL to your profile, and submit it to Canvas.

Filesystems, abstraction, and the folder metaphor¶

Look to the left of this notebook. If you haven't minimized it, it's going to be a list of files and folders. They're all things that you've created or uploaded to your personal folder on the supercomputer.

On your personal computer, you've got a similar thing going: files organized into folders. For me on Mac, it looks a little something like this:

my home folder

If you use Windows, it'll probably look like this:

an example Window directory

Operating systems come with programs like Finder and Windows Explorer to help us navigate, organize, and view all of our files, folders, and programs. But, this is just a simplification of the truth. In reality, your files are, quite literally, an array of ones and zeros on your hard drive. Calling them "files" and organizing them into "folders" is just a metaphor.

Abstraction¶

The legendary computer scientist Donald Knuth said, in an interview, the following:

The psychological profiling [of a computer scientist] is mostly the ability to shift levels of abstraction, from low level to high level. To see something in the small and to see something in the large.

Abstraction is the key word there; the idea that almost everything we think is simple actually has a lot more going on underneath the surface is of vital importance to computer science. Abstraction is basically the idea that sometimes it's beneficial to ignore a bunch of details when describing some part of how a computer or program works.

Take this little Python program as an example:

In [ ]:

def add_two(num1, num2):
    num1 = int(num1)
    num2 = int(num2)
    sum = num1 + num2
    return sum

num1 = input("Enter one number: ")
num2 = input("Enter a second: ")

print("Your sum is", add_two(num1, num2))

How would you describe what your program does to someone who's never programmed?

Would you tell them about variables? What a function is? Converting from a string to an integer? Of course not.

What you're doing is abstraction: glossing over details to serve a functional purpose.

Computational Thinking¶

Abstraction is just one part of computational thinking that we discussed during lecture. As a reminder the four ideas are:

Algorithms
Decomposition
Abstraction
Pattern Recognition

How do the other ideas of computational thinking play into what we've learned so far about Python and files?

How much of the computational thinking does the programmer have to do versus how much the computer has to do?

Back to files and folders¶

"Men are not disturbed by things, but the view they take of things."

— Epictetus (55-135 A.D.)

"What about things like bullets?"

— Behavioralist Herb Kimmel, upon hearing the above quote in 1981 (source)

It was said earlier that files and folders are just your computer using a metaphor to talk about data. Through that metaphor, your computer is abstracting the details of what files actually are from you. When you look at your files with a file browser like Finder, Windows Explorer, or the file viewer on JupyterHub, what you're looking at is an abstraction your computer tells you to make organizing your documents easier.

What file browsers are actually looking at is something called the file system. Like the name indicates, a file system (sometimes spelled as one word, filesystem) is a system for managing files. It's basically how your computer thinks about the data that's on your disk. It organizes it into structures to make your files easy to find, quick to access, and simple to change.

These layers of abstraction basically look like this:

file diagram

The command line: another view¶

Now, open up a terminal in Jupyter and put it side by side to this notebook. What do you see?

It's pretty austere, but you can do a lot with it.

Before we dive into the command line, open up the Jupyter file browser and take a look at your home folder. Keep the filenames and folders in mind for this next part. Now, go back to the terminal window, type in ls, and hit Enter. (Don't worry about what ls stands for right now, we'll get to that very soon.)

Notice how the things that the terminal printed out are the same as what's in your file browser? That's because the command line is just another way of looking at the files on a computer. File browsers and the command line are different ways of viewing your filesystem.

another file graphic, but this one's better

A note on Unix¶

In the Really Friendly Introduction to the Command Line you read, you saw the command line being referred to as the Unix shell. Unix describes computer operating systems like Linux and Mac, but not Windows. Because Pitt's supercomputer is a Unix computer, the shell we're using is a Unix shell. You can read more about this terminology in the addendum for today's lecture here.

Unix shells all basically follow the same format: they list the username, computer's name, and the $, which can stand for "shell". Terminal on Mac, which is also a Unix shell, looks like this:

my shell (very cool)

Viewing files¶

Let's take a break from the command line for just a minute to do some journaling. (No, seriously.)

Open up a blank text file in Jupyter and write what you're thinking right now. Just a few sentences, and then rename the file to be whatever name you want. Make sure you save it in your main directory!

Now, let's try ls again. In case you haven't guessed, ls lists the files in whatever directory you're in.

Well, what directory are we in?¶

Good question! If you ever need to know the full address that you're working in on the command line, use the pwd command. pwd stands for print working directory. It'll tell you what folder you're currently looking at! (Directory is another word for folder on computers.)

So, when you pwd (on the SCI Jupyterhub), you'll get something that looks like this:

[abc123@jupyterhub ~]$ pwd /home/jupyter-abc123

That's your working directory on the supercomputer. home is the name of the folder that contains all the users and jupyter-abc123 is the name of your personal folder!

On your home computer, you should see something that looks like this:

/c/Users/username/Desktop/CMPINF0010/lab-4

That's your working directory on your own computer. c is the name of the drive the files are located on, username is whatever username you're using on the computer, etc, etc.

When you click on the little home icon in the file browser on Jupyter, it'll take you to that personal main folder. That's where you should save your journal, if you didn't already.

The `cat` command¶

my cat, Luna

No, not that kind of cat. (Sad!) The shell command cat, which is short for concatenate, will output the contents of any file to the terminal window.

You're going to use cat to read the journal you wrote.

First, do ls. Do you see the name of the text file you wrote your journal in? If you don't, make sure you saved your file in your main folder.

Now, it's time for cat. Type cat filename.txt (where filename is what you named your file. If you included spaces in your filename, put it in quotations. Additionally, make sure you get the case correctly.

Now, hit Enter. You should see your journal entry, printed out for the world to see. (Or, at least, for you.)

Navigating around: fun with directories¶

The cd command allows you to change what directory you're currently looking at. cd, as you may have guessed, stands for "change directory".

That last folder, the one we're currently in, is has the same name as our Pitt username.

A note on autocomplete¶

Before we go navigating around willy-nilly, let's introduce the concept of autocompletion to you. If you don't have any folders in your main folder on JupyterHub, take the time now to create one.

The mkdir command, which is short for "make directory", lets you, um, make a directory. Try doing mkdir "hello world" (remember, the command line cares about spaces).

Then, type in cd h and press the Tab key.

Whoa, what happened? Autocomplete. The command line can provide guesses on what it thinks you're going to type, which can save you a lot of typing. Not all commands have autocomplete (in fact, cd really is the big one), but when they do, it's helpful to keep in mind.

Okay, let's navigate¶

Let's cd into the folder we just created. Do pwd once you're in there, just to see what it looks like, and try ls. (There's nothing in the folder. Big surprise.)

Okay, so how do we get out? Type cd ..: just like that, cd and two periods, nothing else. Press Enter. Now, pwd to find out where the heck you are.

Directory shortcuts¶

There are a few shortcuts that the command line provides for you that help navigating and managing your files and directories. They are as follows:

shortcut	description
`.`	the current directory
`..`	one directory "above" the current one
`~`	your main folder

You'll use those a lot for navigating and moving files.

Does it just keep going?¶

Try cd .. once more. Now where are you?

Python, but cooler¶

Use cd ~ to come back home, and then navigate to this week's repo. I'm sure you noticed, in this repo, a file with the extension .py. .py files are Python programs, like a code cell in a Jupyter notebook.

You can run Python programs on the command line using the python command. Go ahead and run the file whoareyou.py in this week's repo by typing python whoareyou.py.

Native `bash` commands vs. installed commands¶

If you installed Anaconda on your Mac (which uses the same type of command line that we're using now, called bash) and you try to run a Python file from the command line, it should (theoretically) work, because we've installed python. However, before the semester, there's a decent chance it wouldn't work. You would have gotten something like this:

user@computer $ python my_cool_program.py -bash: python: command not found

That's because the python command isn't installed automatically on all computers that have the bash command line; it's not native to bash. You're able to run python because python is already installed on the SCI JupyterHub, or you took the time to install Anaconda on your Mac.

But, you never installed a program called ls, or cat, or cd. That's because those commands are native to bash. They're included automatically.

There are literally hundreds of native bash commands, for everything from displaying a calendar (cal, unsurprisingly) to printing out the last 10 lines of a file (tail).

There's a list of the default bash commands here: https://ss64.com/bash/

Group Assignment: what's wrong with this picture¶

Below are a list of shell commands that we've seen so far and a description of what we're hoping to accomplish with each one. The issue, is that the person who wrote them is not very good at the command line, so they have made a fatal mistake in each one. Your task is to fix the mistake in each of the commands. Good luck!

I want to print out a file whose name is "tax documents.txt".

cat tax documents.txt

I want to figure out what directory I'm in.

print currdir

I want to create two directories, named "foo" and "bar". (Note: these commands work, but they're too long. You can combine them into one command!)

mkdir foo
mkdir bar

I want to change my current directory to my home folder.

cd ..

I want to view the calendar for this month.

cat

Copying, moving, and recursion¶

From the main repo, you should see the directory testfiles. cd to it, and look around a little bit.

Copy the testfiles directory, with all of the files in it, into your home folders. How are we going to do that?

The mv command allows you to move files and folders around. It uses the format mv source destination, so, if you wanted to move a file called test.txt to a folder called myfiles, you would type mv test.txt myfiles. But, mv's not what we want to do here.
Instead, we want to copy the directory. The cp command has the exact same format as mv, but instead of moving, it'll make a copy wherever you specify.
So, now we know how to copy things. How do we get things into your home folders? (Remember that table from a while back?)
One more thing: by default, commands like mv and cp are lazy. They'll only look at any files in your current directory. If you want to copy all of the files in a folder, you have to use the flag -r. That stands for "recursive", which means that it'll copy again once you're inside a directory until you reach the end of the "tree".

File manipulation and wildcards¶

So, after you've copied the test files to your home directory, open and refresh the file browser on Jupyter. See the "testfiles" directory there? You're welcome to poke around in it to make sure all the files are still there.

Now, what you need to do is delete every file that's from 2002. How would you do that using a normal, GUI file browser?

Doing these kinds of repetitive tasks using the command line is much, much easier, thanks to the magic of wildcards. Wildcards allow you to match any text of any length. There are two main wildcards:

* matches any text of any length. For example, if you wanted to match every Python file, the wildcard for that would be *.py, etc.
? matches any single character. You can string them together as much as you want. For example, text files with five-character names which begin with "a" would be matched with a????.txt. If you wanted to search for files with the years in the 1900s, you could use the 19?? wildcard.

Question: how would you match a text file that begins with a person's name and ends with a number between 100 and 199?

You can delete files with the rm command. This can also delete directories if you use the same -r flag you did when you copied the test files over.

So, now remove all of the files in the testfiles directory which contains "2002". How do you think we're going to do that?

The command line is fiddly¶

You probably have some questions about what we've done so far, like:

Why do we have to put filenames with spaces in quotes?
Why are filenames case-sensitive?
Hell, why is any of this case-sensitive?
What's with the spaces and hyphens and stuff?
...et cetera

Well, here's your answer:

fiddly (adjective, British)

complicated or detailed and awkward to do or use.

You've experienced some fiddliness thus far in this class, and those of you that have done software development in other classes or on your own have certainly dealt with it, but the command line takes that to another level.

Fiddliness is a byproduct of the Unix aesthetic which, at its core, is a dislike of extra stuff: cutesy descriptions, unnecessary information, using an entire word when an abbreviation will do.

But it's worth getting through the fiddliness for one reason: the command line is incredibly powerful.

Group Assignment: out of order¶

Below are two things: a description, in plain English, of a task we want to accomplish and a list of the relevant shell commands. The shell commands, however, are not in their proper order, and executing them as they are would wreak havoc on any system. Your task is to put the commands in order so we can do what we properly want to do. Good luck!

WE want to:

switch to home directory
make two folders called "apples" and "oranges"
move all text files in my current directory into the "apples" folder
switch to the "oranges" folder
copy the "apples" folder and all of its files into the "oranges" folder

The commands are as follows, out of order:

cp -r ~/apples .
cd ~
mv *.txt apples
cd oranges
mkdir apples oranges

Some more journaling¶

Up to this point, you've only ever looked at files that either you created in Jupyter or ones that were provided for you. You can do text editing directly from the command line, without ever opening another window. A lot of software developers prefer using the command line to write code, for its power, customizability, and the ease of running and testing programs as you write them.

There are three main text editors for the command line: nano, vim, and emacs.

nano: Also called pico, this is the simplest and easiest to use text editor. It's pretty close to something like Notepad, but without the ability to use your mouse. This is definitely the one that's best for teaching basic text editing.
vim: One of the two main text editors that programmers like to use. vim is noted for its simplicity, which borders on impossibility to use. vim allows the user to switch between "modes", which allow you to manipulate, select, and insert text.
emacs: The other major programming text editor, emacs is notorious for customizability and an incredible amount of features. There is a Twitter client for emacs. emacs allows the user to create various "windows" inside the text editor for editing things side-by-side, running code while editing it, and even has its own programming language (emacs-lisp) for creating new features for it.
There are some other, weirder ones, like ed, which all have their strengths and uses. We won't really get into these ones.

Those bullet points can basically be summed up with this cartoon from xkcd:

Real programmers set the universal constants at the start such that the universe evolves to contain the disk with the data they want.

Let's learn `nano`!¶

Most major text editors allow you to put the name of a file as the argument for the shell command. For example, to open the file "test.txt" in nano, you'd simply type nano test.txt. This works just as well for files that don't exist (yet). You can also simply type nano to open up a blank file.

Making sure you're in your main folder, let's write another journal, like you did earlier.

First, run either nano or nano filename.txt, where "filename.txt" is whatever you want to name your journal. If you don't enter a filename now, you'll put one in later.
When you first open nano, you will immediately be able to type in or edit text. If you opened a new or blank file, your screen will be pretty much empty. If your file wasn't empty, you'll see its contents!
Now, go forth and journal! Write a few sentences on what you're thinking at this exact moment. No essays necessary, just a line or two.
You use keyboard shortcuts in nano to save your files and quit. The shortcuts you can use are listed at the bottom of the screen. Most shortcuts that you will use start with the Control key, denoted by the caret sign ^. To save your file, hit Control and the letter O at the same time (^O). nano will then ask you what you want the filename to be: if you typed one in earlier, it will show up here and you can just hit enter to confirm; if you haven't typed one in yet, you'll have to type one in now before hitting enter.
To exit nano, you use the "quit" command, which is ^X. You're done! That was painless.

So, what happened to your journal? Go back to the Jupyter file browser and refresh it. When you see your second journal file, open it up in Jupyter. There's your journal, safe and sound! Congratulations, you've just successfully written text on the command line. Welcome to the club; we've got some famous members.

George R. R. Martin, no, really

A diversion¶

So, the only files we've looked at, really, have been Python code and text files, which are both pretty boring, if we're being honest. You can use cat to see what any file looks like "under the hood", so to speak. You can check out source code and see how things are arranged and stored in other files.

With that in mind, let's answer a question that must be burning in everyone's mind: what actually are Jupyter notebooks?

Navigate to the folder where this lecture notebook is. Then, using cat (and autocomplete, if you're lazy), print the file "Lab-4-Lesson.ipynb" to the command line.

Weird, isn't it?

The most useful thing you'll learn about the command line¶

This isn't even a command line program, it's more of a metaprogram.

Let's say you encounter a weird command that you don't recognize, like cal. (Pretend I didn't show you cal earlier on.) You want to know everything about how cal works, what it does, and how to use it. And you don't have Google.

Go to your command line and type man cal. See what pops up?

You can do look at the man page for any shell command that's installed on your computer. It'll give you a basic overview of the command, its options, and how to use it. With that in mind...

Group Assignment: man pages¶

Okay, so, you've seen a bunch of bash commands, and you know how to figure out what an arbitrary command does with man pages, so here's your assignment:

Pick a command from the list here. Don't pick something we've talked about already. (NOTE: Not all commands listed there are in every bash instance, and some of them may not work on the supercomputer. If you get something that you don't understand or that doesn't work, pick something else.)
Use man to figure out what it does and what options and flags it has.
If you want to, Google the command. Most bash commands have Wikipedia pages that are quite informative.
Fill out the blanks below for your command. Give its name, what it's short for (if anything), describe briefly what it does, and give a few examples of how you would use the command.

As an example, here's what I'd fill out for cat.

Command: cat

Short for: concatenate

What does it do? prints out the whole contents of the file to the command line

Examples of usage:

cat test.txt - will print out test.txt

cat -b example.txt - will print out example.txt with the lines numbered

Now, you're on your own! The space for you to write is in the Markdown cell below. And remember, you can make something appear like code by surrounding it with backticks (`).

Command:

Short for:

What does it do?

Examples of usage:

One last thing: redirection¶

In addition to allowing you to automate commands with scripting, the command line lets you plug commands and files into one another basically ad infinitum. What does that mean? Well, let's say you didn't know how to use wildcards, and you wanted to only print out the filenames of text files in a directory.

For the sake of the example, you can't use ls *.txt. So, what if you could tell another program, like grep, which searches through files for text, to search through what ls produces?

You could copy the output of ls, paste it into a file, and then grep through the file. But that's messy and inelegant, and the command line is nothing if not elegant.

You can pipe programs into each other. Using the | character (which is called a pipe, and is produced by typing Shift-), you basically hook up the output from one program into the input of another. So, let's search for text files in the output of ls, shall we?

The command to run is ls | grep ".txt". Don't worry too much about grep there, just trust that it'll only print a line if it contains whatever's in quotes.

What's being done here is, instead of printing the output of ls, it's being given to grep to chew through. It's like you literally hooked up a physical pipe for ls to put its output into, and then hooked that pipe up to grep's input.

The same, but different¶

You can do the same thing with files! The echo command just prints whatever text you want onto the command line. Pretty boring, huh. But when combined with redirection, you can create files with whatever text you want!

You redirect a program's output to a file using >. So, to echo text to a file, you'd type echo "some text" > filename.txt. This can be very useful for creating a lot of files at once, say, if you're trying to generate a bunch of files for every year and letter of the alphabet? Hmmm...

Communication¶

We discussed communication, its different forms, and infrastructure systems in lecture last week. Let's relate communciation to the command line!

How would you classify communication with the computer with the command line: unicast, multicast, or broadcast? Could it fall into multiple categories under different circumstances?

What benefits does communicating with the computer through the command line offer over the graphical user interface? What about the opposite?

Summary¶

That's it for today! Hopefully, you were convinced that the command line is awesome. And also the worst thing ever.

By way of review:

What you think of as "files" are actually lies your computer tells you.
The command line is a different way of looking at that.
You can move around the command line with cd, see what you're looking at with ls, and find out where you are with pwd.
You can view any file with cat, move anything with mv, copy anything with cp, create directories with mkdir, and edit text with nano.
Find a command you don't recognize? You can just man it.
You can run programs from the command line! You just need python.
Asterisks and question marks are the best things ever.
No, wait, pipes and carets are the best things ever.