Understanding lists and manipulating lines

By Allison Parrish

In this tutorial, I explain how the list data structure works in Python. After going over the basics, I'll show you how to use list comprehensions as a powerful and succinct method to poetically manipulate lines and words from a text.

Lists: the basics

A list is a type of value in Python that represents a sequence of values. The list is a very common and versatile data structure in Python and is used frequently to represent (among other things) tabular data and words in a text. Here's how you write one out in Python:

In [1]:
[5, 10, 15, 20, 25, 30]
Out[1]:
[5, 10, 15, 20, 25, 30]

That is: a left square bracket, followed by a series of comma-separated expressions, followed by a right square bracket. Items in a list don't have to be values; they can be more complex expressions as well. Python will evaluate those expressions and put them in the list.

In [2]:
[5, 2*5, 3*5, 4*5, 5*5, 6*5]
Out[2]:
[5, 10, 15, 20, 25, 30]

Lists can have an arbitrary number of values. Here's a list with only one value in it:

In [3]:
[5]
Out[3]:
[5]

And here's a list with no values in it:

In [4]:
[]
Out[4]:
[]

Here's what happens when we ask Python what type of value a list is:

In [5]:
type([1, 2, 3])
Out[5]:
list

It's a value of type list.

Like any other kind of Python value, you can assign a list to a variable:

In [6]:
my_numbers = [5, 10, 15, 20, 25, 30]

Getting values out of lists

Once we have a list, we might want to get values out of the list. You can write a Python expression that evaluates to a particular value in a list using square brackets to the right of your list, with a number representing which value you want, numbered from the beginning (the left-hand side) of the list. Here's an example:

In [7]:
[5, 10, 15, 20][2]
Out[7]:
15

If we were to say this expression out loud, it might read, "I have a list of four things: 5, 10, 15, 20. Give me back the second item in the list." Python evaluates that expression to 15, the second item in the list.

Here's what it looks like to use this indexing notation on a list stored in a variable:

In [8]:
my_numbers[2]
Out[8]:
15

The second item? Am I seeing things. 15 is clearly the third item in the list.

You're right---good catch. But for reasons too complicated to go into here, Python (along with many other programming languages!) starts list indexes at 0, instead of 1. So what looks like the third element of the list to human eyes is actually the second element to Python. The first element of the list is accessed using index 0, like so:

In [9]:
[5, 10, 15, 20][0]
Out[9]:
5

The way I like to conceptualize this is to think of list indexes not as specifying the number of the item you want, but instead specifying how "far away" from the beginning of the list to look for that value.

If you attempt to use a value for the index of a list that is beyond the end of the list (i.e., the value you use is higher than the last index in the list), Python gives you an error:

In [10]:
my_numbers[47]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-72aecccf47fe> in <module>()
----> 1 my_numbers[47]

IndexError: list index out of range

Note that while the type of a list is list, the type of an expression using index brackets to get an item out of the list is the type of whatever was in the list to begin with. To illustrate:

In [11]:
type(my_numbers)
Out[11]:
list
In [12]:
type(my_numbers[0])
Out[12]:
int

Indexes can be expressions too

The thing that goes inside of the index brackets doesn't have to be a number that you've just typed in there. Any Python expression that evaluates to an integer can go in there.

In [13]:
my_numbers[2 * 2]
Out[13]:
25
In [14]:
x = 3
[5, 10, 15, 20][x]
Out[14]:
20

Other operations on lists

Because lists are so central to Python programming, Python includes a number of built-in functions that allow us to write expressions that evaluate to interesting facts about lists. For example, try putting a list between the parentheses of the len() function. It will evaluate to the number of items in the list:

In [15]:
len(my_numbers)
Out[15]:
6
In [16]:
len([20])
Out[16]:
1
In [17]:
len([])
Out[17]:
0

The in operator checks to see if the value on the left-hand side is in the list on the right-hand side.

In [18]:
3 in my_numbers
Out[18]:
False
In [19]:
15 in my_numbers
Out[19]:
True

The max() function will evaluate to the highest value in the list:

In [20]:
readings = [9, 8, 42, 3, -17, 2]
max(readings)
Out[20]:
42

... and the min() function will evaluate to the lowest value in the list:

In [21]:
min(readings)
Out[21]:
-17

The sum() function evaluates to the sum of all values in the list.

In [22]:
sum([2, 4, 6, 8, 80])
Out[22]:
100

Finally, the sorted() function evaluates to a copy of the list, sorted from smallest value to largest value:

In [23]:
sorted(readings)
Out[23]:
[-17, 2, 3, 8, 9, 42]

Negative indexes

If you use -1 as the value inside of the brackets, something interesting happens:

In [24]:
fib = [1, 1, 2, 3, 5]
fib[-1]
Out[24]:
5

The expression evaluates to the last item in the list. This is essentially the same thing as the following code:

In [25]:
fib[len(fib) - 1]
Out[25]:
5

... except easier to write. In fact, you can use any negative integer in the index brackets, and Python will count that many items from the end of the list, and evaluate the expression to that item.

In [26]:
fib[-3]
Out[26]:
2

If the value in the brackets would "go past" the beginning of the list, Python will raise an error:

In [27]:
fib[-14]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-27-afdbe64c1786> in <module>()
----> 1 fib[-14]

IndexError: list index out of range

Generating lists with range()

The expression list(range(n)) returns a list from 0 up to (but not including) n. This is helpful when you just want numbers in a sequence:

In [28]:
list(range(10))
Out[28]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can specify where the list should start and end by supplying two parameters to the call to range:

In [29]:
list(range(-10, 10))
Out[29]:
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

List slices

The index bracket syntax explained above allows you to write an expression that evaluates to a particular item in a list, based on its position in the list. Python also has a powerful way for you to write expressions that return a section of a list, starting from a particular index and ending with another index. In Python parlance we'll call this section a slice.

Writing an expression to get a slice of a list looks a lot like writing an expression to get a single value. The difference is that instead of putting one number between square brackets, we put two numbers, separated by a colon. The first number tells Python where to begin the slice, and the second number tells Python where to end it.

In [30]:
[4, 5, 6, 10, 12, 15][1:4]
Out[30]:
[5, 6, 10]

Note that the value after the colon specifies at which index the slice should end, but the slice does not include the value at that index. (You can tell how long the slice will be by subtracting the value before the colon from the value after it.)

Also note that---as always!---any expression that evaluates to an integer can be used for either value in the brackets. For example:

In [31]:
x = 3
[4, 5, 6, 10, 12, 15][x:x+2]
Out[31]:
[10, 12]

Finally, note that the type of a slice is list:

In [32]:
type(my_numbers)
Out[32]:
list
In [33]:
type(my_numbers[1:4])
Out[33]:
list

Omitting slice values

Because it's so common to use the slice syntax to get a list that is either a slice starting at the beginning of the list or a slice ending at the end of the list, Python has a special shortcut. Instead of writing:

In [39]:
my_numbers[0:3]
Out[39]:
[5, 10, 15]

You can leave out the 0 and write this instead:

In [40]:
my_numbers[:3]
Out[40]:
[5, 10, 15]

Likewise, if you wanted a slice that starts at index 4 and goes to the end of the list, you might write:

In [41]:
my_numbers[4:]
Out[41]:
[25, 30]

Getting the last two items in my_numbers:

In [42]:
my_numbers[:2]
Out[42]:
[5, 10]

Negative index values in slices

Now for some tricky stuff: You can use negative index values in slice brackets as well! For example, to get a slice of a list from the fourth-to-last element of the list up to (but not including) the second-to-last element of the list:

In [43]:
my_numbers[-4:-2]
Out[43]:
[15, 20]

To get the last three elements of the list:

In [44]:
my_numbers[:-3]
Out[44]:
[5, 10, 15]

All items from my_numbers from the third item from the end of the list upto the end of the list:

In [45]:
my_numbers[-3:]
Out[45]:
[20, 25, 30]

Strings and lists

Strings and lists share a lot of similarities! The same square bracket slice and index syntax works on strings the same way it works on lists:

In [75]:
message = "importantly"
In [76]:
message[1]
Out[76]:
'm'
In [77]:
message[-2]
Out[77]:
'l'
In [78]:
message[-5:-2]
Out[78]:
'ant'

Weirdly, max() and min() also work on strings... they just evaluate to the letter that comes latest and earliest in alphabetical order (respectively):

In [79]:
max(message)
Out[79]:
'y'
In [80]:
min(message)
Out[80]:
'a'

You can turn a string into a list of its component characters by passing it to list():

In [81]:
list(message)
Out[81]:
['i', 'm', 'p', 'o', 'r', 't', 'a', 'n', 't', 'l', 'y']
In [82]:
list("我爱猫!😻")
Out[82]:
['我', '爱', '猫', '!', '😻']

The letters in a string in alphabetical order:

In [83]:
sorted(list(message))
Out[83]:
['a', 'i', 'l', 'm', 'n', 'o', 'p', 'r', 't', 't', 'y']

List comprehensions: Applying transformations to lists

A very common task in both data analysis and computer programming is applying some operation to every item in a list (e.g., scaling the numbers in a list by a fixed factor), or to create a copy of a list with only those items that match a particular criterion (e.g., eliminating values that fall below a certain threshold). Python has a succinct syntax, called a list comprehension, which allows you to easily write expressions that transform and filter lists.

A list comprehension has a few parts:

  • a source list, or the list whose values will be transformed or filtered;
  • a predicate expression, to be evaluated for every item in the list;
  • (optionally) a membership expression that determines whether or not an item in the source list will be included in the result of evaluating the list comprehension, based on whether the expression evaluates to True or False; and
  • a temporary variable name by which each value from the source list will be known in the predicate expression and membership expression.

These parts are arranged like so:

[ predicate expression for temporary variable name in source list if membership expression ]

The words for, in, and if are a part of the syntax of the expression. They don't mean anything in particular (and in fact, they do completely different things in other parts of the Python language). You just have to spell them right and put them in the right place in order for the list comprehension to work.

Here's an example, returning the squares of integers zero up to ten. First, we'll create a list to_ten that contains the range of numbers:

In [52]:
to_ten = list(range(10))
to_ten
Out[52]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [53]:
[x * x for x in to_ten]
Out[53]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In the example above, x*x is the predicate expression; x is the temporary variable name; and [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is the source list. There's no membership expression in this example, so we omit it (and the word if).

There's nothing special about the variable x; it's just a name that we chose. We could easily choose any other temporary variable name, as long as we use it in the predicate expression as well. Below, I use the name of one of my cats as the temporary variable name, and the expression evaluates the same way it did with x:

In [55]:
[shumai * shumai for shumai in to_ten]
Out[55]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Notice that the type of the value that a list comprehension evaluates to is itself type list:

In [56]:
type([x * x for x in to_ten])
Out[56]:
list

The source doesn't have to be a variable that contains a list. It can be any expression that evaluates to a list, or in fact any expression that evaluates to an iterable. For example:

In [58]:
[x * x for x in [0, -1, 1, -2, 2, -3, 3]]
Out[58]:
[0, 1, 1, 4, 4, 9, 9]

... or:

In [79]:
[x * x for x in range(10)]
Out[79]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We've used the expression x * x as the predicate expression in the examples above, but you can use any expression you want. For example, to scale the values of a list by 0.5:

In [59]:
[x * 0.5 for x in range(10)]
Out[59]:
[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]

In fact, the expression in the list comprehension can just be the temporary variable itself, in which case the list comprehension will simply evaluate to a copy of the original list:

In [60]:
[x for x in range(10)]
Out[60]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You don't technically even need to use the temporary variable in the predicate expression:

In [61]:
[42 for x in range(10)]
Out[61]:
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42]

Bonus exercise: Write a list comprehension for the list range(5) that evaluates to a list where every value has been multiplied by two (i.e., the expression would evaluate to [0, 2, 4, 6, 8]).

The membership expression

As indicated above, you can include an expression at the end of the list comprehension to determine whether or not the item in the source list will be evaluated and included in the resulting list. One way, for example, of including only those values from the source list that are greater than or equal to five:

In [62]:
[x*x for x in range(10) if x >= 5]
Out[62]:
[25, 36, 49, 64, 81]

Splitting strings into lists

The split() method is a funny thing you can do with a string to transform it into a list. If you have an expression that evaluates to a string, you can put .split() right after it, and Python will evaluate the whole expression to mean "take this string, and 'split' it on white space, giving me a list of strings with the remaining parts." For example:

In [129]:
"this is a test".split()
Out[129]:
['this', 'is', 'a', 'test']

Notably, while the type of a string is str, the type of the result of split() is list:

In [130]:
type("this is a test".split())
Out[130]:
list

If the string in question has some delimiter in it other than whitespace that we want to use to separate the fields in the resulting list, we can put a string with that delimiter inside the parentheses of the split() method. Maybe you can tell where I'm going with this at this point!

From string to list of numbers: an example

For example, I happen to have here a string that represents the total points scored by LeBron James in each of his NBA games in the 2013-2014 regular season.

17,25,26,25,35,18,25,33,39,30,13,21,22,35,28,27,26,23,21,21,24,17,25,30,24,18,38,19,33,26,26,15,30,32,32,36,25,21,34,30,29,27,18,34,30,24,31,13,37,36,42,33,31,20,61,22,19,17,23,19,21,24,43,15,25,32,38,17,13,32,17,34,38,29,37,36,27

You can either cut-and-paste this string from the notes, or see a file on github with these values here.

Now if I just cut-and-pasted this string into a variable and tried to call list functions on it, I wouldn't get very helpful responses:

In [131]:
raw_str = "17,25,26,25,35,18,25,33,39,30,13,21,22,35,28,27,26,23,21,21,24,17,25,30,24,18,38,19,33,26,26,15,30,32,32,36,25,21,34,30,29,27,18,34,30,24,31,13,37,36,42,33,31,20,61,22,19,17,23,19,21,24,43,15,25,32,38,17,13,32,17,34,38,29,37,36,27"
max(raw_str)
Out[131]:
'9'

This is wrong—we know that LeBron James scored more than nine points in his highest scoring game. The max() function clearly does strange things when we give it a string instead of a list. The reason for this is that all Python knows about a string is that it's a series of characters. It's easy for a human to look at this string and think, "Hey, that's a list of numbers!" But Python doesn't know that. We have to explicitly "translate" that string into the kind of data we want Python to treat it as.

Bonus advanced exercise: Take a guess as to why, specifically, Python evaluates max(raw_str) to 9. Hint: what's the result of type(max(raw_str))?

What we want to do, then, is find some way to convert this string that represents integer values into an actual Python list of integer values. We'll start by splitting this string into a list, using the split() method, passing "," as a parameter so it splits on commas instead of on whitespace:

In [132]:
str_list = raw_str.split(",")
str_list
Out[132]:
['17',
 '25',
 '26',
 '25',
 '35',
 '18',
 '25',
 '33',
 '39',
 '30',
 '13',
 '21',
 '22',
 '35',
 '28',
 '27',
 '26',
 '23',
 '21',
 '21',
 '24',
 '17',
 '25',
 '30',
 '24',
 '18',
 '38',
 '19',
 '33',
 '26',
 '26',
 '15',
 '30',
 '32',
 '32',
 '36',
 '25',
 '21',
 '34',
 '30',
 '29',
 '27',
 '18',
 '34',
 '30',
 '24',
 '31',
 '13',
 '37',
 '36',
 '42',
 '33',
 '31',
 '20',
 '61',
 '22',
 '19',
 '17',
 '23',
 '19',
 '21',
 '24',
 '43',
 '15',
 '25',
 '32',
 '38',
 '17',
 '13',
 '32',
 '17',
 '34',
 '38',
 '29',
 '37',
 '36',
 '27']

Looks good so far. What does max() have to say about it?

In [133]:
max(str_list)
Out[133]:
'61'

This.. works. (But only by accident—see below.) But what if we wanted to find the total number of points scored by LBJ? We should be able to do something like this:

In [134]:
sum(str_list)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-134-537ec6b789f1> in <module>()
----> 1 sum(str_list)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

... but we get an error. Why this error? The reason lies in what kind of data is in our list. We can check the data type of an element of the list with the type() function:

In [135]:
type(str_list[0])
Out[135]:
str

A-ha! The type is str. So the error message we got before (unsupported operand type(s) for +: 'int' and 'str') is Python's way of telling us, "You gave me a list of strings and then asked me to add them all together. I'm not sure what I can do for you."

So there's one step left in our process of "converting" our "raw" string, consisting of comma-separated numbers, into a list of numbers. What we have is a list of strings; what we want is a list of numbers. Fortunately, we know how to write an expression to transform one list into another list, applying an expression to each member of the list along the way—it's called a list comprehension. Equally fortunately, we know how to write an expression that converts a string representing an integer into an actual integer (int()). Here's how to write that expression:

In [136]:
[int(x) for x in str_list]
Out[136]:
[17,
 25,
 26,
 25,
 35,
 18,
 25,
 33,
 39,
 30,
 13,
 21,
 22,
 35,
 28,
 27,
 26,
 23,
 21,
 21,
 24,
 17,
 25,
 30,
 24,
 18,
 38,
 19,
 33,
 26,
 26,
 15,
 30,
 32,
 32,
 36,
 25,
 21,
 34,
 30,
 29,
 27,
 18,
 34,
 30,
 24,
 31,
 13,
 37,
 36,
 42,
 33,
 31,
 20,
 61,
 22,
 19,
 17,
 23,
 19,
 21,
 24,
 43,
 15,
 25,
 32,
 38,
 17,
 13,
 32,
 17,
 34,
 38,
 29,
 37,
 36,
 27]

Let's double-check that the values in this list are, in fact, integers, by spot-checking the first item in the list:

In [137]:
type([int(x) for x in str_list][0])
Out[137]:
int

Hey, voila! Now we'll assign that list to a variable, for the sake of convenience, and then check to see if sum() works how we expect it to.

In [139]:
int_list = [int(x) for x in str_list]
sum(int_list)
Out[139]:
2089

Wow! 2089 points in one season! Good work, King James.

Join: Making strings from lists

Once we've created a list of words, it's a common task to want to take that list and "glue" it back together, so it's a single string again, instead of a list. So, for example:

In [105]:
element_list = ["hydrogen", "helium", "lithium", "beryllium", "boron"]
glue = ", and "
glue.join(element_list)
Out[105]:
'hydrogen, and helium, and lithium, and beryllium, and boron'

The .join() method needs a "glue" string to the left of it---this is the string that will be placed in between the list elements. In the parentheses to the right, you need to put an expression that evaluates to a list. Very frequently with .join(), programmers don't bother to assign the "glue" string to a variable first, so you end up with code that looks like this:

In [106]:
words = ["this", "is", "a", "test"]
" ".join(words)
Out[106]:
'this is a test'

When we're working with .split() and .join(), our workflow usually looks something like this:

  1. Split a string to get a list of units (usually words).
  2. Use some of the list operations discussed above to modify or slice the list.
  3. Join that list back together into a string.
  4. Do something with that string (e.g., print it out).

With this in mind, here's a program that splits a string into words, randomizes the order of the words, then prints out the results:

In [107]:
text = "it was a dark and stormy night"
words = text.split()
random.shuffle(words)
' '.join(words)
Out[107]:
'a it was night dark stormy and'

Lists and randomness

Python's random library provides several helpful functions for performing chance operations on lists. The first is shuffle, which takes a list and randomly shuffles its contents:

In [140]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.shuffle(ingredients)
ingredients
Out[140]:
['eggs', 'sugar', 'flour', 'milk']

The second is choice, which returns a single random element from list.

In [141]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.choice(ingredients)
Out[141]:
'milk'

Finally, the sample function returns a list of values, selected at random, from a list. The sample function takes two parameters: the first is a list, and the second is how many items should be in the resulting list of randomly selected values:

In [142]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.sample(ingredients, 2)
Out[142]:
['flour', 'milk']

Text files and lists of lines

The open() function allows you to read text from a file. When used as the source in a list comprehension, the predicate expression will be evaluated for each line of text in the file. For example:

In [143]:
[line for line in open("sea_rose.txt")]
Out[143]:
['Rose, harsh rose, \n',
 'marred and with stint of petals, \n',
 'meagre flower, thin, \n',
 'spare of leaf,\n',
 '\n',
 'more precious \n',
 'than a wet rose \n',
 'single on a stem -- \n',
 'you are caught in the drift.\n',
 '\n',
 'Stunted, with small leaf, \n',
 'you are flung on the sand, \n',
 'you are lifted \n',
 'in the crisp sand \n',
 'that drives in the wind.\n',
 '\n',
 'Can the spice-rose \n',
 'drip such acrid fragrance \n',
 'hardened in a leaf?\n']

What we have here is a list of strings. Each element in the list corresponds to a single line from the text file.

Notice that you see the \n character at the end of each string; this is Python letting you know that there's a character in the file that indicates where the linebreaks should occur. By default, Python doesn't strip this character out. To make our data a little bit cleaner, let's use the .strip() method in the predicate expression to get rid of the newline character (and any other accompanying whitespace characters):

In [194]:
[line.strip() for line in open("sea_rose.txt")]
Out[194]:
['Rose, harsh rose,',
 'marred and with stint of petals,',
 'meagre flower, thin,',
 'spare of leaf,',
 '',
 'more precious',
 'than a wet rose',
 'single on a stem --',
 'you are caught in the drift.',
 '',
 'Stunted, with small leaf,',
 'you are flung on the sand,',
 'you are lifted',
 'in the crisp sand',
 'that drives in the wind.',
 '',
 'Can the spice-rose',
 'drip such acrid fragrance',
 'hardened in a leaf?']

Assigning this to a variable, we can do things like get the lines of the poem from index 3 up to index 9:

In [195]:
poem = [line.strip() for line in open("sea_rose.txt")]
In [196]:
poem[3:9]
Out[196]:
['spare of leaf,',
 '',
 'more precious',
 'than a wet rose',
 'single on a stem --',
 'you are caught in the drift.']

Or we can get a few lines at random:

In [197]:
random.sample(poem, 3)
Out[197]:
['more precious', 'you are caught in the drift.', 'Can the spice-rose']

Or sort the poem in alphabetical order:

In [198]:
sorted(poem)
Out[198]:
['',
 '',
 '',
 'Can the spice-rose',
 'Rose, harsh rose,',
 'Stunted, with small leaf,',
 'drip such acrid fragrance',
 'hardened in a leaf?',
 'in the crisp sand',
 'marred and with stint of petals,',
 'meagre flower, thin,',
 'more precious',
 'single on a stem --',
 'spare of leaf,',
 'than a wet rose',
 'that drives in the wind.',
 'you are caught in the drift.',
 'you are flung on the sand,',
 'you are lifted']

The resulting list variable can itself be used as the source expression in other list comprehensions:

In [148]:
[line[:5] for line in poem]
Out[148]:
['Rose,',
 'marre',
 'meagr',
 'spare',
 '',
 'more ',
 'than ',
 'singl',
 'you a',
 '',
 'Stunt',
 'you a',
 'you a',
 'in th',
 'that ',
 '',
 'Can t',
 'drip ',
 'harde']

Transforming lines of text

Wait, how did I do that thing with that poem, where the letters are all weird? It's like this weird pomo l=a=n=g=u=a=g=e poetry now. Completely unpublishable, I'll get kicked right out of the Iowa Writer's Workshop. Very cool.

It turns out you can make changes to the predicate expression in order to make changes to the way the text looks in the output. We're modifying the text by transforming the strings. There are a handful of really easy things you can do to strings of characters in Python to make the text do weird and interesting things. I'm going to show you a few.

First, I'm going to make a variable called poem and assign to it the result of reading in that Robert Frost poem. The road one where he decides to take a road, but not another road, and it is very momentous.

In [201]:
poem = [line.strip() for line in open("frost.txt")]

What we want to do now is write a list comprehension that transforms the lines in the poem somehow. The key to this is to change the predicate expression in a list comprehension. The simplest possible text transformation is nothing at all: just make a new list that looks like the old list in every way.

In [155]:
[line for line in poem]
Out[155]:
['Two roads diverged in a yellow wood,',
 'And sorry I could not travel both',
 'And be one traveler, long I stood',
 'And looked down one as far as I could',
 'To where it bent in the undergrowth;',
 '',
 'Then took the other, as just as fair,',
 'And having perhaps the better claim,',
 'Because it was grassy and wanted wear;',
 'Though as for that the passing there',
 'Had worn them really about the same,',
 '',
 'And both that morning equally lay',
 'In leaves no step had trodden black.',
 'Oh, I kept the first for another day!',
 'Yet knowing how way leads on to way,',
 'I doubted if I should ever come back.',
 '',
 'I shall be telling this with a sigh',
 'Somewhere ages and ages hence:',
 'Two roads diverged in a wood, and I—',
 'I took the one less travelled by,',
 'And that has made all the difference.']

That list comprehension basically translates to the following: "Hey python, hey yes you, python! Take a look at the list of strings in the variable poem that I defined earlier. I want you to make a new list, and here's how that new list should look: for every item of that list—let's call the item line—put an item with whatever that line is into the new list." Python: "So, uh, make a copy of the list?" You: "Yeah I guess basically."

Another simple transformation is to not do anything with the data in the line at all, and have Python put another string altogether into the new list:

In [156]:
["I'm Robert Frost, howdy howdy howdy" for line in poem]
Out[156]:
["I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy",
 "I'm Robert Frost, howdy howdy howdy"]

Neither of these are interesting from anything other than a theoretical perspective, which means if you're a humanities scholar or something you can just stop here and start typing up your monograph on conceptions of identity and iteration in algorithmic media. But as for me and my students, we are ARTISTS and ENGINEERS and it's important that we CHANGE THE WORLD and SHOW RESULTS.

String methods in the predicate expression

Recall from the tutorial on strings that string expressions in Python have a number of methods that can be called on them that return a copy of the string with some transformation applied, like .lower() (converts the string to lower case) or .replace() (replaces matching substrings with some other string). We can use these in the predicate expression to effect that transformation on every item in the list. To make every string in the upper case, for example, call the .upper() method on the temporary variable line. This makes Frost look really mad:

In [166]:
[line.upper() for line in poem]
Out[166]:
['TWO ROADS DIVERGED IN A YELLOW WOOD,',
 'AND SORRY I COULD NOT TRAVEL BOTH',
 'AND BE ONE TRAVELER, LONG I STOOD',
 'AND LOOKED DOWN ONE AS FAR AS I COULD',
 'TO WHERE IT BENT IN THE UNDERGROWTH;',
 '',
 'THEN TOOK THE OTHER, AS JUST AS FAIR,',
 'AND HAVING PERHAPS THE BETTER CLAIM,',
 'BECAUSE IT WAS GRASSY AND WANTED WEAR;',
 'THOUGH AS FOR THAT THE PASSING THERE',
 'HAD WORN THEM REALLY ABOUT THE SAME,',
 '',
 'AND BOTH THAT MORNING EQUALLY LAY',
 'IN LEAVES NO STEP HAD TRODDEN BLACK.',
 'OH, I KEPT THE FIRST FOR ANOTHER DAY!',
 'YET KNOWING HOW WAY LEADS ON TO WAY,',
 'I DOUBTED IF I SHOULD EVER COME BACK.',
 '',
 'I SHALL BE TELLING THIS WITH A SIGH',
 'SOMEWHERE AGES AND AGES HENCE:',
 'TWO ROADS DIVERGED IN A WOOD, AND I—',
 'I TOOK THE ONE LESS TRAVELLED BY,',
 'AND THAT HAS MADE ALL THE DIFFERENCE.']

And .replace() is a fun one. You need to put two comma-separated strings between the parentheses. Python will replace every occurrence of the first string with the second string. To make our poem more colloquial, for example:

In [161]:
[line.replace("I", "my dude") for line in poem]
Out[161]:
['Two roads diverged in a yellow wood,',
 'And sorry my dude could not travel both',
 'And be one traveler, long my dude stood',
 'And looked down one as far as my dude could',
 'To where it bent in the undergrowth;',
 '',
 'Then took the other, as just as fair,',
 'And having perhaps the better claim,',
 'Because it was grassy and wanted wear;',
 'Though as for that the passing there',
 'Had worn them really about the same,',
 '',
 'And both that morning equally lay',
 'my duden leaves no step had trodden black.',
 'Oh, my dude kept the first for another day!',
 'Yet knowing how way leads on to way,',
 'my dude doubted if my dude should ever come back.',
 '',
 'my dude shall be telling this with a sigh',
 'Somewhere ages and ages hence:',
 'Two roads diverged in a wood, and my dude—',
 'my dude took the one less travelled by,',
 'And that has made all the difference.']

If you have ever wondered, "What would this roady poem sound like if you tickled Robert Frost while he read it aloud," then you're in luck because Python has answered that question.

In [170]:
[line.replace("a", "aheHEEhaHAha") for line in poem]
Out[170]:
['Two roaheHEEhaHAhads diverged in aheHEEhaHAha yellow wood,',
 'And sorry I could not traheHEEhaHAhavel both',
 'And be one traheHEEhaHAhaveler, long I stood',
 'And looked down one aheHEEhaHAhas faheHEEhaHAhar aheHEEhaHAhas I could',
 'To where it bent in the undergrowth;',
 '',
 'Then took the other, aheHEEhaHAhas just aheHEEhaHAhas faheHEEhaHAhair,',
 'And haheHEEhaHAhaving perhaheHEEhaHAhaps the better claheHEEhaHAhaim,',
 'BecaheHEEhaHAhause it waheHEEhaHAhas graheHEEhaHAhassy aheHEEhaHAhand waheHEEhaHAhanted weaheHEEhaHAhar;',
 'Though aheHEEhaHAhas for thaheHEEhaHAhat the paheHEEhaHAhassing there',
 'HaheHEEhaHAhad worn them reaheHEEhaHAhally aheHEEhaHAhabout the saheHEEhaHAhame,',
 '',
 'And both thaheHEEhaHAhat morning equaheHEEhaHAhally laheHEEhaHAhay',
 'In leaheHEEhaHAhaves no step haheHEEhaHAhad trodden blaheHEEhaHAhack.',
 'Oh, I kept the first for aheHEEhaHAhanother daheHEEhaHAhay!',
 'Yet knowing how waheHEEhaHAhay leaheHEEhaHAhads on to waheHEEhaHAhay,',
 'I doubted if I should ever come baheHEEhaHAhack.',
 '',
 'I shaheHEEhaHAhall be telling this with aheHEEhaHAha sigh',
 'Somewhere aheHEEhaHAhages aheHEEhaHAhand aheHEEhaHAhages hence:',
 'Two roaheHEEhaHAhads diverged in aheHEEhaHAha wood, aheHEEhaHAhand I—',
 'I took the one less traheHEEhaHAhavelled by,',
 'And thaheHEEhaHAhat haheHEEhaHAhas maheHEEhaHAhade aheHEEhaHAhall the difference.']

The .strip() method is helpful! We used it above to strip off whitespace, but if you give it a string as a parameter (inside the parentheses), it will remove all of the characters inside that string from the beginning and end of every line. This is a convenient way to, e.g., remove punctuation from the ends of lines:

In [172]:
[line.strip(",;.!:—") for line in poem]
Out[172]:
['Two roads diverged in a yellow wood',
 'And sorry I could not travel both',
 'And be one traveler, long I stood',
 'And looked down one as far as I could',
 'To where it bent in the undergrowth',
 '',
 'Then took the other, as just as fair',
 'And having perhaps the better claim',
 'Because it was grassy and wanted wear',
 'Though as for that the passing there',
 'Had worn them really about the same',
 '',
 'And both that morning equally lay',
 'In leaves no step had trodden black',
 'Oh, I kept the first for another day',
 'Yet knowing how way leads on to way',
 'I doubted if I should ever come back',
 '',
 'I shall be telling this with a sigh',
 'Somewhere ages and ages hence',
 'Two roads diverged in a wood, and I',
 'I took the one less travelled by',
 'And that has made all the difference']

You can use the + operator to build up strings from each line as well:

In [174]:
["☛ " + line + " ☚" for line in poem]
Out[174]:
['☛ Two roads diverged in a yellow wood, ☚',
 '☛ And sorry I could not travel both ☚',
 '☛ And be one traveler, long I stood ☚',
 '☛ And looked down one as far as I could ☚',
 '☛ To where it bent in the undergrowth; ☚',
 '☛  ☚',
 '☛ Then took the other, as just as fair, ☚',
 '☛ And having perhaps the better claim, ☚',
 '☛ Because it was grassy and wanted wear; ☚',
 '☛ Though as for that the passing there ☚',
 '☛ Had worn them really about the same, ☚',
 '☛  ☚',
 '☛ And both that morning equally lay ☚',
 '☛ In leaves no step had trodden black. ☚',
 '☛ Oh, I kept the first for another day! ☚',
 '☛ Yet knowing how way leads on to way, ☚',
 '☛ I doubted if I should ever come back. ☚',
 '☛  ☚',
 '☛ I shall be telling this with a sigh ☚',
 '☛ Somewhere ages and ages hence: ☚',
 '☛ Two roads diverged in a wood, and I— ☚',
 '☛ I took the one less travelled by, ☚',
 '☛ And that has made all the difference. ☚']

Using string slices, we can create some abstract poetry from parts of each line. Here we smoosh the first five characters of each line up against the last five characters:

In [191]:
[line[:5] + line[-5:] for line in poem]
Out[191]:
['Two rwood,',
 'And s both',
 'And bstood',
 'And lcould',
 'To whowth;',
 '',
 'Then fair,',
 'And hlaim,',
 'Becauwear;',
 'Thougthere',
 'Had wsame,',
 '',
 'And by lay',
 'In lelack.',
 'Oh, I day!',
 'Yet k way,',
 'I douback.',
 '',
 'I sha sigh',
 'Somewence:',
 'Two rnd I—',
 'I tood by,',
 'And tence.']

You may find discover a desire deep inside of you to use more than one of these transformations on the predicate expression. "Impossible," says a nearby moustachioed man, monocle popping from his orbital socket. But it can be done! In two ways. First, you can perform the transformation by assigning the result of one list comprehension to a variable, and then using that result in a second list comprehension. For example, to turn this poem into a telegram, we'll first convert it to upper case:

In [176]:
upper_frost = [line.upper() for line in poem]

And then we'll get rid of punctuation at the end of the line:

In [177]:
upper_frost_no_punct = [line.strip(",;.!:—") for line in upper_frost]

And then append the string STOP to the end of each line:

In [180]:
[line + " STOP" for line in upper_frost_no_punct]
Out[180]:
['TWO ROADS DIVERGED IN A YELLOW WOOD STOP',
 'AND SORRY I COULD NOT TRAVEL BOTH STOP',
 'AND BE ONE TRAVELER, LONG I STOOD STOP',
 'AND LOOKED DOWN ONE AS FAR AS I COULD STOP',
 'TO WHERE IT BENT IN THE UNDERGROWTH STOP',
 ' STOP',
 'THEN TOOK THE OTHER, AS JUST AS FAIR STOP',
 'AND HAVING PERHAPS THE BETTER CLAIM STOP',
 'BECAUSE IT WAS GRASSY AND WANTED WEAR STOP',
 'THOUGH AS FOR THAT THE PASSING THERE STOP',
 'HAD WORN THEM REALLY ABOUT THE SAME STOP',
 ' STOP',
 'AND BOTH THAT MORNING EQUALLY LAY STOP',
 'IN LEAVES NO STEP HAD TRODDEN BLACK STOP',
 'OH, I KEPT THE FIRST FOR ANOTHER DAY STOP',
 'YET KNOWING HOW WAY LEADS ON TO WAY STOP',
 'I DOUBTED IF I SHOULD EVER COME BACK STOP',
 ' STOP',
 'I SHALL BE TELLING THIS WITH A SIGH STOP',
 'SOMEWHERE AGES AND AGES HENCE STOP',
 'TWO ROADS DIVERGED IN A WOOD, AND I STOP',
 'I TOOK THE ONE LESS TRAVELLED BY STOP',
 'AND THAT HAS MADE ALL THE DIFFERENCE STOP']

Not bad, but sort of inconvenient! You can actually write that whole thing using one expression. Any of those weird methods (.lower(), .upper(), etc.) mentioned above can be chained: you can attach them not just to line but to any other expression you made with line. Likewise, the + operator can be used with line but also any expression that results from performing a transformation on line. For example, you can rewrite the three list comprehensions above using one list comprehension with chained operators:

In [182]:
[line.upper().strip(",;.!:—") + " STOP" for line in poem]
Out[182]:
['TWO ROADS DIVERGED IN A YELLOW WOOD STOP',
 'AND SORRY I COULD NOT TRAVEL BOTH STOP',
 'AND BE ONE TRAVELER, LONG I STOOD STOP',
 'AND LOOKED DOWN ONE AS FAR AS I COULD STOP',
 'TO WHERE IT BENT IN THE UNDERGROWTH STOP',
 ' STOP',
 'THEN TOOK THE OTHER, AS JUST AS FAIR STOP',
 'AND HAVING PERHAPS THE BETTER CLAIM STOP',
 'BECAUSE IT WAS GRASSY AND WANTED WEAR STOP',
 'THOUGH AS FOR THAT THE PASSING THERE STOP',
 'HAD WORN THEM REALLY ABOUT THE SAME STOP',
 ' STOP',
 'AND BOTH THAT MORNING EQUALLY LAY STOP',
 'IN LEAVES NO STEP HAD TRODDEN BLACK STOP',
 'OH, I KEPT THE FIRST FOR ANOTHER DAY STOP',
 'YET KNOWING HOW WAY LEADS ON TO WAY STOP',
 'I DOUBTED IF I SHOULD EVER COME BACK STOP',
 ' STOP',
 'I SHALL BE TELLING THIS WITH A SIGH STOP',
 'SOMEWHERE AGES AND AGES HENCE STOP',
 'TWO ROADS DIVERGED IN A WOOD, AND I STOP',
 'I TOOK THE ONE LESS TRAVELLED BY STOP',
 'AND THAT HAS MADE ALL THE DIFFERENCE STOP']

This is especially useful for multiple replacements. Here's the Swedish Chef version:

In [186]:
[line.replace("i", "ö").replace("o", "ö").replace("a", "ö").replace("e", "ur") for line in poem]
Out[186]:
['Twö rööds dövurrgurd ön ö yurllöw wööd,',
 'And sörry I cöuld nöt trövurl böth',
 'And bur önur trövurlurr, löng I stööd',
 'And löökurd döwn önur ös för ös I cöuld',
 'Tö whurrur öt burnt ön thur undurrgröwth;',
 '',
 'Thurn töök thur öthurr, ös just ös föör,',
 'And hövöng purrhöps thur burtturr clööm,',
 'Burcöusur öt wös grössy önd wönturd wurör;',
 'Thöugh ös för thöt thur pössöng thurrur',
 'Höd wörn thurm rurölly öböut thur sömur,',
 '',
 'And böth thöt mörnöng urquölly löy',
 'In lurövurs nö sturp höd tröddurn blöck.',
 'Oh, I kurpt thur först för önöthurr döy!',
 'Yurt knöwöng höw wöy luröds ön tö wöy,',
 'I döubturd öf I shöuld urvurr cömur böck.',
 '',
 'I shöll bur turllöng thös wöth ö sögh',
 'Sömurwhurrur ögurs önd ögurs hurncur:',
 'Twö rööds dövurrgurd ön ö wööd, önd I—',
 'I töök thur önur lurss trövurllurd by,',
 'And thöt hös mödur öll thur döffurrurncur.']

Filtering lines

Using the membership expression of a list comprehension, we can make a list of only the lines from the text file that match particular criteria. Any of the various expressions that answer questions about strings can be used in this spot. For example, to find all of the lines of a particular length:

In [205]:
[line for line in poem if len(line) == 33]
Out[205]:
['And sorry I could not travel both',
 'And be one traveler, long I stood',
 'And both that morning equally lay',
 'I took the one less travelled by,']

Lines that have the string travel:

In [206]:
[line for line in poem if "travel" in line]
Out[206]:
['And sorry I could not travel both',
 'And be one traveler, long I stood',
 'I took the one less travelled by,']

Lines that start with And:

In [213]:
[line for line in poem if line.startswith("And")]
Out[213]:
['And sorry I could not travel both',
 'And be one traveler, long I stood',
 'And looked down one as far as I could',
 'And having perhaps the better claim,',
 'And both that morning equally lay',
 'And that has made all the difference.']

Text files and lists of words

Lines are an interesting unit to work with, especially with poetic source texts, as they give us an easy handle on large (but not too large) syntactic units. A more traditional unit of analysis is the word. Fortunately for us, getting words from a text file is (relatively) easy.

Calling open(filename).read() will read the file filename into a Python string. We can then use the .split() method to split that string into a list of words. Without any parameters, the .split() method just breaks the string up into units delimited by any kind of whitespace (whether that's a space character, a tab, a newline, etc.). So, for example, to get all of the words from our Frost poem:

In [214]:
frost_txt = open("frost.txt").read() # evaluates to a string
In [220]:
words = frost_txt.split() # split evaluates to a list of strings
In [218]:
words
Out[218]:
['Two',
 'roads',
 'diverged',
 'in',
 'a',
 'yellow',
 'wood,',
 'And',
 'sorry',
 'I',
 'could',
 'not',
 'travel',
 'both',
 'And',
 'be',
 'one',
 'traveler,',
 'long',
 'I',
 'stood',
 'And',
 'looked',
 'down',
 'one',
 'as',
 'far',
 'as',
 'I',
 'could',
 'To',
 'where',
 'it',
 'bent',
 'in',
 'the',
 'undergrowth;',
 'Then',
 'took',
 'the',
 'other,',
 'as',
 'just',
 'as',
 'fair,',
 'And',
 'having',
 'perhaps',
 'the',
 'better',
 'claim,',
 'Because',
 'it',
 'was',
 'grassy',
 'and',
 'wanted',
 'wear;',
 'Though',
 'as',
 'for',
 'that',
 'the',
 'passing',
 'there',
 'Had',
 'worn',
 'them',
 'really',
 'about',
 'the',
 'same,',
 'And',
 'both',
 'that',
 'morning',
 'equally',
 'lay',
 'In',
 'leaves',
 'no',
 'step',
 'had',
 'trodden',
 'black.',
 'Oh,',
 'I',
 'kept',
 'the',
 'first',
 'for',
 'another',
 'day!',
 'Yet',
 'knowing',
 'how',
 'way',
 'leads',
 'on',
 'to',
 'way,',
 'I',
 'doubted',
 'if',
 'I',
 'should',
 'ever',
 'come',
 'back.',
 'I',
 'shall',
 'be',
 'telling',
 'this',
 'with',
 'a',
 'sigh',
 'Somewhere',
 'ages',
 'and',
 'ages',
 'hence:',
 'Two',
 'roads',
 'diverged',
 'in',
 'a',
 'wood,',
 'and',
 'I—',
 'I',
 'took',
 'the',
 'one',
 'less',
 'travelled',
 'by,',
 'And',
 'that',
 'has',
 'made',
 'all',
 'the',
 'difference.']

Or, more succinctly:

In [221]:
words = open("frost.txt").read().split()

Now we can ask simple questions about this poem, like how many words does it have?

In [223]:
len(words)
Out[223]:
144

We can create a new weird poem by randomly sampling words from the original:

In [226]:
random.sample(words, 20)
Out[226]:
['In',
 'to',
 'it',
 'all',
 'where',
 'Yet',
 'wood,',
 'day!',
 'I',
 'morning',
 'To',
 'one',
 'far',
 'the',
 'one',
 'sigh',
 'it',
 'has',
 'I',
 'with']

Or sort the words in alphabetical order:

In [228]:
sorted(words)
Out[228]:
['And',
 'And',
 'And',
 'And',
 'And',
 'And',
 'Because',
 'Had',
 'I',
 'I',
 'I',
 'I',
 'I',
 'I',
 'I',
 'I',
 'In',
 'I—',
 'Oh,',
 'Somewhere',
 'Then',
 'Though',
 'To',
 'Two',
 'Two',
 'Yet',
 'a',
 'a',
 'a',
 'about',
 'ages',
 'ages',
 'all',
 'and',
 'and',
 'and',
 'another',
 'as',
 'as',
 'as',
 'as',
 'as',
 'back.',
 'be',
 'be',
 'bent',
 'better',
 'black.',
 'both',
 'both',
 'by,',
 'claim,',
 'come',
 'could',
 'could',
 'day!',
 'difference.',
 'diverged',
 'diverged',
 'doubted',
 'down',
 'equally',
 'ever',
 'fair,',
 'far',
 'first',
 'for',
 'for',
 'grassy',
 'had',
 'has',
 'having',
 'hence:',
 'how',
 'if',
 'in',
 'in',
 'in',
 'it',
 'it',
 'just',
 'kept',
 'knowing',
 'lay',
 'leads',
 'leaves',
 'less',
 'long',
 'looked',
 'made',
 'morning',
 'no',
 'not',
 'on',
 'one',
 'one',
 'one',
 'other,',
 'passing',
 'perhaps',
 'really',
 'roads',
 'roads',
 'same,',
 'shall',
 'should',
 'sigh',
 'sorry',
 'step',
 'stood',
 'telling',
 'that',
 'that',
 'that',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'them',
 'there',
 'this',
 'to',
 'took',
 'took',
 'travel',
 'traveler,',
 'travelled',
 'trodden',
 'undergrowth;',
 'wanted',
 'was',
 'way',
 'way,',
 'wear;',
 'where',
 'with',
 'wood,',
 'wood,',
 'worn',
 'yellow']

Using a list comprehension, we can get all of the words that meet particular criteria, like the words that have more than seven characters:

In [231]:
[item for item in words if len(item) > 7]
Out[231]:
['diverged',
 'traveler,',
 'undergrowth;',
 'Somewhere',
 'diverged',
 'travelled',
 'difference.']

Or all of the words that start with the letter a:

In [239]:
[item for item in words if item.startswith("a")]
Out[239]:
['a',
 'as',
 'as',
 'as',
 'as',
 'and',
 'as',
 'about',
 'another',
 'a',
 'ages',
 'and',
 'ages',
 'a',
 'and',
 'all']

Formatting lists

You've doubtlessly noticed with the previous examples that whenever we evaluate a list, Jupyter Notebook displays the actual syntax you'd need to reproduce the list, i.e., with the square brackets, quotes, commas and everything. That's the default way Python shows values when you evaluate it—which is usually helpful, because it means you can just take the text in the cell output and paste it into another notebook and you've got exactly the same value, without needing to re-type it. However, when we're creating poetic output, the extra Python punctuation is undesirable.

To make the output prettier, we need to do the following steps:

  • Create a string from the list
  • Print the string (don't just evaluate it)

To create a string from the list, we use the .join() method, as outlined above! For example, here's Smooshed Frost from earlier in the notebook, assigned to a variable:

In [240]:
smooshed = [line[:5] + line[-5:] for line in poem]
In [241]:
type(smooshed)
Out[241]:
list

This is a list of strings. To create a version of this where all of the strings are joined together, we'll use .join(). In this case, let's say that we want each string in the list to be a single line of text in the output, so we'll use \n (the newline character) as the "glue."

In [242]:
"\n".join(smooshed)
Out[242]:
'Two rwood,\nAnd s both\nAnd bstood\nAnd lcould\nTo whowth;\n\nThen fair,\nAnd hlaim,\nBecauwear;\nThougthere\nHad wsame,\n\nAnd by lay\nIn lelack.\nOh, I day!\nYet k way,\nI douback.\n\nI sha sigh\nSomewence:\nTwo rnd I—\nI tood by,\nAnd tence.'

Ugh, that's still not right though! It's one string now, but when we evaluate the expression Python is showing us the escape characters instead of interpreting them. To get Python to interpret them, we have to send the whole thing to the print function, like so:

In [244]:
print("\n".join(smooshed))
Two rwood,
And s both
And bstood
And lcould
To whowth;

Then fair,
And hlaim,
Becauwear;
Thougthere
Had wsame,

And by lay
In lelack.
Oh, I day!
Yet k way,
I douback.

I sha sigh
Somewence:
Two rnd I—
I tood by,
And tence.

There we go! Ready to submit to our favorite poetry journal. Of course, you don't have to join with a newline character. Let's say you want to make a prose poem (i.e., no line breaks) by randomly sampling fifty words in the Frost poem. We'll use the space character as the glue:

In [247]:
print(" ".join(random.sample(words, 50)))
there knowing diverged another one And claim, roads where about black. undergrowth; In ever in both less ages Because as I wood, long ages for as leads and by, that has trodden sorry be took stood a just diverged And I day! I— hence: if how Two the I travel

Nice!

Making changes to lists

Often we'll want to make changes to a list after we've created it---for example, we might want to append elements to the list, remove elements from the list, or change the order of elements in the list. Python has a number of methods for facilitating these operations.

The first method we'll talk about is .append(), which adds an item on to the end of an existing list.

In [208]:
ingredients = ["flour", "milk", "eggs"]
ingredients.append("sugar")
ingredients
Out[208]:
['flour', 'milk', 'eggs', 'sugar']

Notice that invoking the .append() method doesn't itself evaluate to anything! (Technically, it evaluates to a special value of type None.) Unlike many of the methods and syntactic constructions we've looked at so far, the .append() method changes the underlying value---it doesn't return a new value that is a copy with changes applied.

There are two methods to facilitate removing values from a list: .pop() and .remove(). The .remove() method removes from the list the first value that matches the value in the parentheses:

In [209]:
ingredients = ["flour", "milk", "eggs", "sugar"]
ingredients.remove("flour")
ingredients
Out[209]:
['milk', 'eggs', 'sugar']

(Note that .remove(), like .append() doesn't evaluate to anything---it changes the list itself.)

The .pop() method works slightly differently: give it an expression that evaluates to an integer, and it evaluates to the expression at the index named by the integer. But it also has a side effect: it removes that item from the list:

In [210]:
ingredients = ["flour", "milk", "eggs", "sugar"]
ingredients.pop(1)
ingredients
Out[210]:
['flour', 'eggs', 'sugar']

EXERCISE: What happens when you try to .pop() a value from a list at an index that doesn't exist in the list? What happens you try to .remove() an item from a list if that item isn't in that list to begin with?

ANOTHER EXERCISE: Write an expression that .pop()s the second-to-last item from a list. SPOILER: (Did you guess that you could use negative indexing with `.pop()`?

The .sort() and .reverse() methods do exactly the same thing as their function counterparts sorted() and reversed(), with the only difference being that the methods don't evaluate to anything, instead opting to change the list in-place.

In [211]:
ingredients = ["flour", "milk", "eggs", "sugar"]
ingredients.sort()
ingredients
Out[211]:
['eggs', 'flour', 'milk', 'sugar']
In [212]:
ingredients = ["flour", "milk", "eggs", "sugar"]
ingredients.reverse()
ingredients
Out[212]:
['sugar', 'eggs', 'milk', 'flour']

EXERCISE: Write a Python command-line program that prints out the lines of a text file in random order.

Iterating over lists with for

The list comprehension syntax discussed earlier is very powerful: it allows you to succinctly transform one list into another list by thinking in terms of filtering and modification. But sometimes your primary goal isn't to make a new list, but simply to perform a set of operations on an existing list.

Let's say that you want to print every string in a list. Here's a short text:

In [94]:
text = "it was the best of times, it was the worst of times"

We can make a list of all the words in the text by splitting on whitespace:

In [95]:
words = text.split()

Of course, we can see what's in the list simply by evaluating the variable:

In [96]:
words
Out[96]:
['it',
 'was',
 'the',
 'best',
 'of',
 'times,',
 'it',
 'was',
 'the',
 'worst',
 'of',
 'times']

But let's say that we want to print out each word on a separate line, without any of Python's weird punctuation. In other words, I want the output to look like:

it
was
the
best
of
times,
it
was
the
worst
of
times

But how can this be accomplished? We know that the print() function can display an individual string in this manner:

In [97]:
print("hello")
hello

So what we need, clearly, is a way to call the print() function with every item of the list. We could do this by writing a series of print() statements, one for every item in the list:

In [98]:
print(words[0])
print(words[1])
print(words[2])
print(words[3])
print(words[4])
print(words[5])
print(words[6])
print(words[7])
print(words[8])
print(words[9])
print(words[10])
print(words[11])
it
was
the
best
of
times,
it
was
the
worst
of
times

Nice, but there are some problems with this approach:

  1. It's kind of verbose---we're doing exactly the same thing multiple times, only with slightly different expressions. Surely there's an easier way to tell the computer to do this?
  2. It doesn't scale. What if we wrote a program that we want to produce hundreds or thousands of lines. Would we really need to write a print statement for each of those expressions?
  3. It requires us to know how many items are going to end up in the list to begin with.

Things are looking grim! But there's hope. Performing the same operation on all items of a list is an extremely common task in computer programming. So common, that Python has some built-in syntax to make the task easy: the for loop.

Here's how a for loop looks:

for tempvar in sourcelist:
    statements

The words for and in just have to be there---that's how Python knows it's a for loop. Here's what each of those parts mean.

  • tempvar: A name for a variable. Inside of the for loop, this variable will contain the current item of the list.
  • sourcelist: This can be any Python expression that evaluates to a list---a variable that contains a list, or a list slice, or even a list literal that you just type right in!
  • statements: One or more Python statements. Everything tabbed over underneath the for will be executed once for each item in the list. The statements tabbed over underneath the for line are called the body of the loop.

Here's what the for loop for printing out every item in a list might look like:

In [99]:
for item in words:
    print(item)
it
was
the
best
of
times,
it
was
the
worst
of
times

The variable name item is arbitrary. You can pick whatever variable name you like, as long as you're consistent about using the same variable name in the body of the loop. If you wrote out this loop in a long-hand fashion, it might look like this:

In [100]:
item = words[0]
print(item)
item = words[1]
print(item)
item = words[2]
print(item)
item = words[3]
print(item)
# etc.
it
was
the
best

Of course, the body of the loop can have more than one statement, and you can assign values to variables inside the loop:

In [101]:
for item in words:
    yelling = item.upper()
    print(yelling)
IT
WAS
THE
BEST
OF
TIMES,
IT
WAS
THE
WORST
OF
TIMES

You can also include other kinds of nested statements inside the for loop, like if/else:

In [102]:
for item in words:
    if len(item) == 2:
        print(item.upper())
    elif len(item) == 3:
        print("   " + item)
    else:
        print(item)
IT
   was
   the
best
OF
times,
IT
   was
   the
worst
OF
times

This structure is called a "loop" because when Python reaches the end of the statements in the body, it "loops" back to the beginning of the body, and executes the same statements again (this time with the next item in the list).

Python programmers tend to use for loops most often when the problem would otherwise be too tricky or complicated to solve using a list comprehension. It's easy to paraphrase any list comprehension in for loop syntax. For example, this list comprehension, which evaluates to a list of the squares of even integers from 1 to 25:

In [103]:
[x * x for x in range(1, 26) if x % 2 == 0]
Out[103]:
[4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576]

You can rewrite this list comprehesion as a for loop by starting out with an empty list, then appending an item to the list inside the loop. The source list remains the same:

In [104]:
result = []
for x in range(1, 26):
    if x % 2 == 0:
        result.append(x * x)
result
Out[104]:
[4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576]

Conclusion

We've put down the foundation today for you to become fluent in Python's very powerful and super-convenient syntax for lists. We've also done a bit of data parsing and analysis! Pretty good for day one.

Further resources: