Other built-in types

Python obviously has a lot of objects built-in. Here we are going to have a quick look at those you are most likely to use at the start.

In this notebook, we'll see:

  • lists
  • tuples
  • dictionaries
  • copy vs. pointers
  • some interesting methods of these builtin types (including strings) as exercises.

Lists

Lists look like arrays but are not arrays! Lists can contain elements of different types and are always 1 dimension.

A list can probably have any object as an element: other lists, strings, numbers, arrays, functions etc. A list simply allows you to create a collection of objects. It doesn't inform you in any way how these objects are related to each other (or if indeed they are related).

In [1]:
li = [] # Empty list
li = ['a',1]  # List of 2 elements
print("First list length: ",len(li))
li = [['a','b'],3]  # A list can have another list as an element. But li is still a 2 elements list.
print("I'm still 2 elements long: ", len(li))
li.append(5)
print("Easy to add elements to lists: ", li)
First list length:  2
I'm still 2 elements long:  2
Easy to add elements to lists:  [['a', 'b'], 3, 5]

Lists are indexable. Python also says "subscriptable".

Lists are iterable. They can be used to build loops on.

In [2]:
for n in li:
    print(n)
['a', 'b']
3
5

List elements can be changed, i.e lists are mutable objects:

In [23]:
print(li[0])
li[0]='new'
print(li[0])
print(li)
1
new
['new', 2, 3]

If you have a subscriptable object as a list element, how do you access this element's subscripts? For example, you have the word "new" in the first element of a list and you want to know the last character:

In [4]:
li[0][-1]
Out[4]:
'w'
In [5]:
# Returns an error with non-subscriptable objects
li[1][-1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-a48fcdf0b0ca> in <module>
      1 # Returns an error with non-subscriptable objects
----> 2 li[1][-1]

TypeError: 'int' object is not subscriptable

Convert to list

You can convert an iterable object to a list with the builtin function: list(). Each element of the initial object will become an element in the list:

In [6]:
a = "Claire"
b = list(a)
print(b)
# To keep the string together as 1 element of the list:
c = [a]
print(c)
['C', 'l', 'a', 'i', 'r', 'e']
['Claire']

Tuples

Tuples are like lists but are immutable. That means elements of a tuple can't be changed, added, removed. It can be useful for a collection of objects you don't want to inadvertently change in your code. For example, a collection of names of experiments, paths, models etc. Although tuples are rarely used and people tend to simply use lists.

In [7]:
t = ()  # Empty tuple
t = (1,2,3)
t = 4,5,6  # Parentheses are optional!
t = 1,  # Don't forget the comma to define a 1-element tuple!
t
Out[7]:
(1,)
In [8]:
# As said earlier, they are immutable
t=1,2,3
t[0]=3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-7b1c8be80904> in <module>
      1 # As said earlier, they are immutable
      2 t=1,2,3
----> 3 t[0]=3

TypeError: 'tuple' object does not support item assignment

Convert to tuple

Same as lists, you can convert iterable objects to tuples with the builtin function: tuple().

In [9]:
a = [1,3,4]
b = tuple(a)
print(b)
# Here again if you want to keep the list as an element of the tuple:
c = (a,)
print(c)
(1, 3, 4)
([1, 3, 4],)

Copy or not?

There is another consequence to all this talk about mutable or immutable. It has to do with how Python manages memory. To reduce the memory footprint of a program, Python will try to make pointers instead of copying the same values at different places in memory. That means if 2 variables are the same mutable object both will change if an element of one is changed. If one of the object is assigned to something else, then only this object will change. Let's see what this means with an example for a list

In [10]:
# If we change the first element of one list, the other one changes too.
li = [1,2,3]
li2 = li
li2[0]=2
print(li, li2)
[2, 2, 3] [2, 2, 3]
In [11]:
# If we reassign li to a different list (even if it's the same as initially), only li changes.
li = [1,2,3]
print(li2,li)
[2, 2, 3] [1, 2, 3]

The easiest way to make a copy of a mutable object is with the list() and dict() functions (see below for dict()). These will only do a shallow copy of the object. It is usually enough. There are ways to make deep copies of mutable objects. That is rarely needed, we'll see it at the end, depending on time.

Since it is impossible to change only 1 element of an immutable object, any change made on those isn't reflected to other objects initially pointing to the same memory.


Dictionaries

Dictionaries allow you to label values. For example, if you'd like to keep track of a grid description (rectilinear), you might need a grid name, first lon/lat, the resolution, last lon/lat. A dictionary allows you to keep all those values together in one object and allows you to refer to each by name instead of position for example.

In [12]:
d={
    "name":"my_grid",
    "first_lon":-180.,
    "first_lat":-90.,
    "last_lon":180.,
    "last_lat":90.,
    "res":0.5
}
d
Out[12]:
{'name': 'my_grid',
 'first_lon': -180.0,
 'first_lat': -90.0,
 'last_lon': 180.0,
 'last_lat': 90.0,
 'res': 0.5}
In [13]:
d["name"]
Out[13]:
'my_grid'

The values associated to each key can be very complicated objects. For example, we could have only "first_point" and "last_point" keys, each pointing to a dictionary with lat and lon as keys:

In [14]:
point0={"lon":-180., "lat":-90.}
point1=dict(point0)
point1["lon"]=180.
point1["lat"]=90.
print(point0, point1)
{'lon': -180.0, 'lat': -90.0} {'lon': 180.0, 'lat': 90.0}
In [15]:
d={'name':"my_grid", "first_point":point0, "last_point":point1, "res":0.5}
d["first_point"]
Out[15]:
{'lon': -180.0, 'lat': -90.0}

Get the keys and values

In [16]:
# Get the keys in a dictionary:
for k in d.keys():
    print(k)
print("End keys \n")

# Get the values:
for v in d.values():
    print(v)
print("End values \n")

# Get the pairs of keys and values:
for k,v in d.items():
    print(k,v)
print("End pairs \n")

# As you can see here, loops in Python can have several loop variables!
# If we give only 1 loop variable:
for k in d.items():
    print(k)
name
first_point
last_point
res
End keys 

my_grid
{'lon': -180.0, 'lat': -90.0}
{'lon': 180.0, 'lat': 90.0}
0.5
End values 

name my_grid
first_point {'lon': -180.0, 'lat': -90.0}
last_point {'lon': 180.0, 'lat': 90.0}
res 0.5
End pairs 

('name', 'my_grid')
('first_point', {'lon': -180.0, 'lat': -90.0})
('last_point', {'lon': 180.0, 'lat': 90.0})
('res', 0.5)

Add a (key,value) pair

In [17]:
d['projection'] = "cartesian"
d
Out[17]:
{'name': 'my_grid',
 'first_point': {'lon': -180.0, 'lat': -90.0},
 'last_point': {'lon': 180.0, 'lat': 90.0},
 'res': 0.5,
 'projection': 'cartesian'}

If you have several grids you want to keep track of, each key could have a list of values (or tuples). Or you could create a list of dictionaries. So you have 1 dictionary per grid and you keep them all together in a list (or tuple). The second option might be better as it allows you to create a dictionary for a new grid more easily by simply appending its definition dictionary to the list.


Exercises

Below are a few exercises or examples for you to go through.

Don't forget to use either the inline help (?var or tab) or Google and Stack Overflow.

String formatting with dictionaries

We saw earlier that f-strings are the most readable way to format a string.

With dictionaries, str.format() can be quite powerful and useful:

In [18]:
# Let's define a dictionary:
point0=[-180., -90.]
point1=[180.,90.]
d={'name':"my_grid", "first_point":point0, "last_point":point1, "res":0.5}
print(d)
# Let's print all this information nicely with f-string and then `str.format()`
print(f"My grid {d['name']} has {d['res']} degrees resolution, starts at {d['first_point']}, ends at {d['last_point']}")
print("My grid {name} has {res} degrees resolution, starts at {first_point}, ends at {last_point}".format(**d))
{'name': 'my_grid', 'first_point': [-180.0, -90.0], 'last_point': [180.0, 90.0], 'res': 0.5}
My grid my_grid has 0.5 degrees resolution, starts at [-180.0, -90.0], ends at [180.0, 90.0]
My grid my_grid has 0.5 degrees resolution, starts at [-180.0, -90.0], ends at [180.0, 90.0]

The ** operator allows us to take a dictionary of key-value pairs and unpack it into keyword arguments in a function call. It can be quite useful in other functions. For example to define plot characteristics instead of having a long line of arguments. Or to easily use the same arguments for different functions.

There is also a * operator. See this page for details: https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/ This page is using a few things we haven't seen but it's very thorough.


Split and join strings

If reading tabulated data or a filename with fields separated by a separator ('_', space, etc.), it's common one might want to split the different fields into separate strings or values. Inversely, you might want to create files that have fields separated by '_'. You can use + to add the different fields, but it gets long pretty quickly. Python provides a better way.

In [24]:
# Split the following string along the "_" separator. Hint: check the string's methods. What is the type of the result?
a="tas_day_MRI-ESM1_historical_r1i1p1_18510101-18601231.nc"
b=a.split(sep="_")
b
Out[24]:
['tas', 'day', 'MRI-ESM1', 'historical', 'r1i1p1', '18510101-18601231.nc']
In [33]:
# Join together the elements of the following list to form a filename with the same format as above.
# Again check the string's methods. Be careful how you specify the separator and the strings to join!
l = ["ta", "day", "ACCESS", "historical","r1i1p1", "18510101", "19001231","nc"]
c='.'.join(l[-2:])
d = '-'.join([l[-3],c])
f = l[:-3]
f.append(d)
"_".join(f)
Out[33]:
'ta_day_ACCESS_historical_r1i1p1_18510101-19001231.nc'

Zip function

It allows you to zip together 2 iterables. So that the first elements of the 2 original iterables become together the 1st element of the result, etc.

Obviously, it's much easier to see examples.

In [21]:
a="Claire"
b="Carouge"
c=list(zip(a,b))
c
Out[21]:
[('C', 'C'), ('l', 'a'), ('a', 'r'), ('i', 'o'), ('r', 'u'), ('e', 'g')]
In [22]:
# Given the 2 lists below, create a dictionary where the keys come from the list "keys" and values from "values"
keys=['name', 'age', 'place']
values=['John', 32, 'Paris']

These are all the concepts we wanted to introduce. As a summary we've seen:

  • strings, lists, tuples, dictionaries
  • how to format strings with str.format() and f-strings
  • Builtin functions:
    • print()
    • len()
    • any()
    • all()
    • list()
    • tuple()
    • dict()
  • For loops
  • If constructs
  • A few useful methods and other builtin functions in the exercises. For additional builtin functions: https://docs.python.org/3/library/functions.html
In [35]:
l=[1,2]
a=['a','b']
l.append(a)
Out[35]:
[1, 2, ['a', 'b']]
In [36]:
l=[1,2]
a=['a','b']
l.extend(a)
l
Out[36]:
[1, 2, 'a', 'b']
In [38]:
l = ["ta", "day", "ACCESS", "historical","r1i1p1", "18510101", "19001231","nc"]
"_".join([*l[:-3],d])
Out[38]:
'ta_day_ACCESS_historical_r1i1p1_18510101-19001231.nc'
In [44]:
print(l[:-3])
print(*l[:-3])
print(l[0],l[1])
['ta', 'day', 'ACCESS', 'historical', 'r1i1p1']
ta day ACCESS historical r1i1p1
ta day
In [ ]: