Python basics: Part 2

files needed = none

Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work --- we do not need to be Python Jedi.

Lists, tuples, and dicts are very powerful. We could spend weeks going through all the things we can do with them. Instead, we will cover some basics here, and add to our knowledge as needed.

In this notebook we will cover (the terms ordered and mutable will make sense by the time we are done here)

  1. Lists (ordered, mutable)
  2. Tuples (ordered, immutable)
  3. Dictionaries (unordered, mutable)
  4. More on types

Remember: Ask questions as we go.

Data structures

The types: float, int, and str are scalar types. Think of them as the individual data types. An important --- and very powerful --- part of Python is its data structures which collect scalar types (and also other data structures) together. These data structures include: list, tuple, and dict. We will use lists and dicts (dictionaries) extensively.

Lists

A list is an ordered and modifiable (in Pythonese: mutable) collection of objects. We will use lists a lot. Let's try some out.

In [1]:
# You define a list using square brackets

number_list = [2, 3, 5, 8]   #a list of numeric data (integers)

string_list = ['university', 'of', 'wisconsin', 'madison']  # a list of strings

# Notice that the print function understands types. We have passed is floats, ints, strs, and now lists, and it 'knows'
# how to print them out.


print('The string list:')
print(string_list)
print('\n')                           # '\n' is just a string. \n is the special character that creates a new line

print('The number list:')
print(number_list)     
print("number_list's type", type(number_list))
The string list:
['university', 'of', 'wisconsin', 'madison']


The number list:
[2, 3, 5, 8]
number_list's type <class 'list'>

We can make lists of mixed types. This would not work in many languages.

In [2]:
# Some numbers and some strings
mixed_list = [1, 25, 'biochemistry', 3, 'foo' ]    # 'foo' is a programmers favorite generic placeholder

print('The mixed list:')
print(mixed_list)
The mixed list:
[1, 25, 'biochemistry', 3, 'foo']

We can access an element of a list using square brackets, like this:

In [3]:
print(mixed_list[0], mixed_list[2], '\n') # print out the first and third elements of the list

print(type(mixed_list[0]), type(mixed_list[2])) # print out the types of the first and third elements
1 biochemistry 

<class 'int'> <class 'str'>

Key concept: Lists are ordered, like an array in many other languages.

Important: The list index starts with 0. (In some languages, the list index starts with 1.)

In [4]:
temp_list = ['Dane', 'County', 3]

long_mixed_list_1 = mixed_list + temp_list     # This concatenates temp_list and mixed_list

# The next line of code does two things. What are they?
long_mixed_list_2 = mixed_list + ['Dane', 'County', 3]


print('long_mixed_list_1:', long_mixed_list_1, '\n')
print('long_mixed_list_2:', long_mixed_list_2, '\n')
long_mixed_list_1: [1, 25, 'biochemistry', 3, 'foo', 'Dane', 'County', 3] 

long_mixed_list_2: [1, 25, 'biochemistry', 3, 'foo', 'Dane', 'County', 3] 

The code above shows us two python features.

  1. We can concatenate lists using the + operator
  2. We can create a list on the same line we assign it

The '+' operator works like the print() function: it 'knows' what kinds of objects it is working with (lists, ints, strings) and takes the appropriate action. Everything, however, has its limits.

In [6]:
# What does this do?
long_mixed_list_3 = mixed_list + 'Bucky'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-f2c2e6ef456b> in <module>
      1 # What does this do?
----> 2 long_mixed_list_3 = mixed_list + 'Bucky'

TypeError: can only concatenate list (not "str") to list

The '+' operator is not set up to concatenate a list to a string. We can see this in the TypeError message.

Because lists are mutable, we can assign new values to them.

In [7]:
print('Before I change the first element:', long_mixed_list_2, '\n')

long_mixed_list_2[0] = 50      #Change the first element from 1 to 50
print('After I changed the first element:', long_mixed_list_2)
Before I change the first element: [1, 25, 'biochemistry', 3, 'foo', 'Dane', 'County', 3] 

After I changed the first element: [50, 25, 'biochemistry', 3, 'foo', 'Dane', 'County', 3]

Lists are not limited to scalar types. What Type is each element in the following list?

In [8]:
xzibit_list = [1, 'oak', [3.2, 5, 'elm']]
print(xzibit_list)
print('element 1', type(xzibit_list[0]))
print('element 2', type(xzibit_list[1]))
print('element 3', type(xzibit_list[2]))
[1, 'oak', [3.2, 5, 'elm']]
element 1 <class 'int'>
element 2 <class 'str'>
element 3 <class 'list'>

We can have a list be an element within a list!

How would you print (or otherwise access) the [3.2,5,'elm'] list?

How would you print 'elm'?

In [9]:
print('The sublist: ', xzibit_list[2])
print('Is this elm?', xzibit_list[2][2])
The sublist:  [3.2, 5, 'elm']
Is this elm? elm

Practice: Lists

Take a few minutes and try the following. Feel free to chat with those around if you get stuck. The TA and I are here, too.

Insert a code cell and try these out

  1. Create a list containing the integers 1, 2, 3. Name it my_int_list.
  2. Create a list containing 1, 2, 3 where each number is a string, not an int. Name it my_string_list.
  3. What is the Type of my_int_list and my_string_list? Print out the types.

Insert another code cell and

  1. Concatenate my_int_list and my_string_list. Name the new list my_super_list.
  2. Print out your super list.
  3. In your super list, change the integer 2 to your favorite number.
  4. In your super list, change the string 3 to your least favorite number.
  5. Print your super list. If you made a mistake, go back and fix it.
In [10]:
my_int_list = [1,2,3]
my_string_list = ['1','2','3']

print(type(my_int_list))
print(type(my_string_list))
<class 'list'>
<class 'list'>
In [11]:
my_super_list = my_int_list + my_string_list
print(my_super_list)
[1, 2, 3, '1', '2', '3']
In [12]:
my_super_list[1] = 30
my_super_list[5] = 3
print(my_super_list)
[1, 30, 3, '1', '2', 3]

Tuples

Tuples are collections of objects, like lists, but they are immutable: once they are created, they cannot be changed. We will not use tuples that often, but they will pop up, so we should be ready.

We create a tuple with round brackets.

In [13]:
# You define a tuple using round brackets

number_tuple = (2, 3, 5, 8)   #a tuple of numeric data

string_tuple = ('university', 'of', 'wisconsin', 'madison')  # a tuple of strings

print('number_tuple type:', type(number_tuple))
print('number_tuple:', number_tuple, '\n')

print('number_list type:', type(number_list))
print('number_list:',number_list)
number_tuple type: <class 'tuple'>
number_tuple: (2, 3, 5, 8) 

number_list type: <class 'list'>
number_list: [2, 3, 5, 8]

Notice the printed output: round vs. square brackets. Thanks print()!

Now, let's see how immutability works.

In [14]:
# Change the second element of the list to 1000
number_list[1] = 1000    
print(number_list)
[2, 1000, 5, 8]
In [15]:
# Now try that with a tuple
number_tuple[1] = 1000
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-40193bc06616> in <module>
      1 # Now try that with a tuple
----> 2 number_tuple[1] = 1000

TypeError: 'tuple' object does not support item assignment

This property of tuples is useful if you have data that you want to protect from being accidentally changed.

Dictionaries (dicts)

Dicts are unordered key-value pairs. Each element of a dict is made of a key and its associated value. The keys must be unique, but the values do not need to be. We create dicts with curly brackets. It's easiest to understand with some examples.

In [16]:
# This is a dict with five elements

grades = {'A':4.0, 'B':3.0, 'C':2.0, 'D':1.0, 'F':0.0}   #I am associating the key A with the value 4.0
print(type(grades))
print(grades)
<class 'dict'>
{'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F': 0.0}

We use the keys to refer to the values.

In [17]:
print(grades['B'], grades['D'])
3.0 1.0
In [18]:
# What happens here? Will this return 'B'?
print(grades[3.0])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-36ce319687df> in <module>
      1 # What happens here? Will this return 'B'?
----> 2 print(grades[3.0])

KeyError: 3.0

You get an error here because a dictionary can only be referenced by its keys --- never by its values. The error message says KeyError because it is looking for a key named 3.0, and that key doesn't exist.

In [19]:
# let's try looking for the numeric grade associated with withdrawing from class
print(grades['W'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-19-611b044bc08c> in <module>
      1 # let's try looking for the numeric grade associated with withdrawing from class
----> 2 print(grades['W'])

KeyError: 'W'

We get the same KeyError, again, because the key W does not exist.

We can add to our dictionary (it is mutable). We can also change the value of an existing entry.

In [20]:
# Let's add a withdrawal score
grades['W'] = 0.0
print(grades, '\n')

# Our grading is generous!
grades['F'] = 0.5
print(grades)
{'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F': 0.0, 'W': 0.0} 

{'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F': 0.5, 'W': 0.0}

Practice: Dicts

Take a few minutes and try the following. Feel free to chat with those around if you get stuck. I am here, too.

Insert a code cell and try these out

  1. Create a dict with keys: 'coke', 'pepsi', 'root beer', and 'fanta' and give each key a value that corresponds to your rating on a 1 to 10 scale with 1 being the worst and 10 the best. For example, I would rate root beer a 9.
  2. Print your dict.
  3. Can you rank more than one soda a 2?

Insert a second cell and

  1. The coca-cola corp is hiring you as a celebrity endorser. Change your ranking for coke to a 10.
  2. Change your ranking of pepsi to a 1.
  3. Print your dict.

Here is challenging but important example from my old colleagues at NYU Stern:

Consider the dictionary

data = {'Year': [1990, 2000, 2010], 'GDP':  [8.95, 12.56, 14.78]}

What are the keys here? The values? What do you think this dictionary represents?

In [33]:
ratings = {'coke':8, 'pepsi':5, 'root beer':9, 'fanta':2}
print(ratings)

# Sure, you can rate more than one soda a 2. You cannot have more than one entry for each soda, though.
{'coke': 8, 'pepsi': 5, 'root beer': 9, 'fanta': 2}
In [34]:
ratings['coke'] = 10
ratings['pepsi'] = 1
print(ratings)
{'coke': 10, 'pepsi': 1, 'root beer': 9, 'fanta': 2}

The keys are 'Year' and 'GDP'. The values are the two lists of numbers.

The data dictionary is a data set! The first entry is a 'column' of dates and the second entry is a 'column' of GDP values.

More on types

We have seen several types now: int, float, str, tuple, dict, and list. There a few more to come this semester and many more that we will not address.

Types are great because many of the functions [like print()] and operators [like +] automatically know how to handle objects of different types. We don't need functions like print_string(), print_int(), print_list()...

Types are also great because they keep us from doing dumb things, like trying to add an int and a str. (Insert a code cell and try it. What kind of error do you get?) There are languages that will not stop you from adding a string and an integer, even though the result will be garbage.

In [21]:
x_int = 10
x_string = '10'
x = x_int + x_string
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-86a570296cbd> in <module>
      1 x_int = 10
      2 x_string = '10'
----> 3 x = x_int + x_string

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Changing types

Here is something that comes up a lot in the 'wild.' You have a file with some data in it. When you read the file into your program, the numeric data are strings. Not good. (What is the difference between y = 3 + 2 and y = '3' + '2' ?)

Python gives us an easy way to change a variable's type.

In [22]:
golden_ratio_s = '1.6180339'               # What type is this variable? How can you check?
print(type(golden_ratio_s))
<class 'str'>
In [23]:
# Now turn the string into a float

golden_ratio_f = float(golden_ratio_s)     # What type is this variable? How can you check?

print('golden_ratio_f is of type:', type(golden_ratio_f))
golden_ratio_f is of type: <class 'float'>

We just 'cast' the string variable to a float variable. Can we do the reverse, and cast the string to a float?

In [24]:
# You can 'cast' the float back to a string with str()

golden_ratio_s_2 = str(golden_ratio_f)
print('golden_ratio_s_2 is of type:', type(golden_ratio_s_2))
golden_ratio_s_2 is of type: <class 'str'>

I am feeling pretty powerful right now.

Can we turn the string into an int?

We used float() to cast to a float, str() to cast to a string. We use the int() function to cast things to ints.

In [25]:
golden_ratio_i = int(golden_ratio_s)    #Let's try from the string version
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-be1af6138bb9> in <module>
----> 1 golden_ratio_i = int(golden_ratio_s)    #Let's try from the string version

ValueError: invalid literal for int() with base 10: '1.6180339'

Nope! The int() doesn't know how to convert a str with a decimal point float to an int.

What if we tried to convert to an int from a float?

In [26]:
golden_ratio_i = int(golden_ratio_f)    #Let's try from the float version
print(golden_ratio_i)
1

What just happened? It did something, but it isn't obvious what int() should do to a float: Should it round it up? Round it down? Truncate it? There is no obvious way to turn a float into an int.

If we look at the documentation will see that int() truncates floats towards zero.

We can convert types with list() and tuple(), too.

In [27]:
x = [0,1,2,3]          # what type is this?
x_tup = tuple(x)
print("Can you tell x_tup's type from looking at the printout?", x_tup)
Can you tell x_tup's type from looking at the printout? (0, 1, 2, 3)
In [28]:
# Another conversion that is often useful

y = list('on wisconsin')
print(y)
['o', 'n', ' ', 'w', 'i', 's', 'c', 'o', 'n', 's', 'i', 'n']

Practice: Types

Take a few minutes and try the following. Feel free to chat with those around if you get stuck. I am here, too.

  1. We have 5 integer observations in our dataset: 1, 3, 8, 3, 9. Unfortunately, the data file ran all the observations together and we are left with the variable raw_data in the cell below.
  2. What type is raw_data?
  3. Turn raw_data into a list. Print it.
In [29]:
raw_data = '13839'
print(type(raw_data))

list_data = list(raw_data)
print(list_data)
<class 'str'>
['1', '3', '8', '3', '9']

Is your data ready to be analyzed? Why not?

  1. In the cell below, covert your list to a list of integers. You might try repeating statements like list_data[0]=int(list_data[0])
  2. Print out your list of integers.
In [30]:
list_data[0] = int(list_data[0])
list_data[1] = int(list_data[1])
list_data[2] = int(list_data[2])
list_data[3] = int(list_data[3])
list_data[4] = int(list_data[4])

print(list_data)
[1, 3, 8, 3, 9]

That worked for our small list, but imagine having a list of several thousand elements. This approach will not work, but it introduced us to a common problem with data in the wild: numbers stored as text.

We will soon learn that Python has very powerful and simple ways to repeatedly apply operations to lists.