Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 6: Text & Bytes¶

In this chapter, we continue the study of the built-in data types. The next layer on top of numbers consists of textual data that are modeled primarily with the str type in Python. str objects are more complex than the numeric objects in Chapter 5 as they consist of an arbitrary and possibly large number of individual characters that may be chosen from any alphabet in the history of humankind. Luckily, Python abstracts away most of this complexity from us. However, after looking at the str type in great detail, we briefly introduce the bytes type at the end of this chapter to understand how characters are modeled in memory.

The `str` Type¶

To create a str object, we use the literal notation and type the text between enclosing double quotes ".

In [1]:

text = "Lorem ipsum dolor sit amet."

Like everything in Python, text is an object with an identity, a type, and a value.

In [2]:

id(text)

Out[2]:

140667764715968

In [3]:

type(text)

Out[3]:

str

As seen before, a str object evaluates to itself in a literal notation with enclosing single quotes '.

In Chapter 1 , we specify the double quotes " convention this book follows. Yet, single quotes ' and double quotes " are perfect substitutes. We could use the reverse convention, as well. As this discussion shows, many programmers have strong opinions about such conventions. Consequently, the discussion was "closed as not constructive" by the moderators.

In [4]:

text

Out[4]:

'Lorem ipsum dolor sit amet.'

As the single quote ' is often used in the English language as a shortener, we could make an argument in favor of using the double quotes ": There are possibly fewer situations like the two code cells below, where we must escape the kind of quote used as the str object's delimiter with a backslash "\" inside the text (cf., also the "Unicode & (Special) Characters" section further below). However, double quotes " are often used as well, for example, to indicate a quote like the one by Albert Einstein below. So, such arguments are not convincing.

Many proponents of the single quote ' usage claim that double quotes " cause more visual noise on the screen. However, this argument is also not convincing as, for example, one could claim that two single quotes '' look so similar to one double quote " that a reader may confuse an empty str object with a missing closing quote ". With the double quotes " convention we at least avoid such confusion (i.e., empty str objects are written as "").

This discussion is an excellent example of a flame war in the programming world: Everyone has an opinion and the discussion leads to no result.

In [5]:

"Einstein said, \"If you can't explain it, you don't understand it.\""

Out[5]:

'Einstein said, "If you can\'t explain it, you don\'t understand it."'

In [6]:

'Einstein said, "If you can\'t explain it, you don\'t understand it."'

Out[6]:

'Einstein said, "If you can\'t explain it, you don\'t understand it."'

An important fact to know is that enclosing quotes of either kind are not part of the str object's value! They are merely syntax indicating the literal notation.

So, printing out the sentence with the built-in print() function does the same in both cases.

In [7]:

print("Einstein said, \"If you can't explain it, you don't understand it.\"")

Einstein said, "If you can't explain it, you don't understand it."

In [8]:

print('Einstein said, "If you can\'t explain it, you don\'t understand it."')

Einstein said, "If you can't explain it, you don't understand it."

As an alternative to the literal notation, we may use the built-in str() constructor to cast non-str objects as str ones. As Chapter 11 reveals, basically any object in Python has a text representation. Because of that we may also pass list objects, the boolean True and False, or None to str() .

In [9]:

str(42)

Out[9]:

'42'

In [10]:

str(42.87)

Out[10]:

'42.87'

In [11]:

str([1, 2, 3])

Out[11]:

'[1, 2, 3]'

In [12]:

str(True)

Out[12]:

'True'

In [13]:

str(False)

Out[13]:

'False'

In [14]:

str(None)

Out[14]:

'None'

User Input¶

As shown in the "Guessing a Coin Toss" example in Chapter 4 , the built-in input() function displays a prompt to the user and returns whatever is entered as a str object. input() is in particular valuable when writing command-line tools.

In [15]:

user_input = input("Whatever you enter is put in a new string: ")

In [16]:

type(user_input)

Out[16]:

str

In [17]:

user_input

Out[17]:

'123'

Reading Files¶

A more common situation where we obtain str objects is when reading the contents of a file with the open() built-in. In its simplest usage form, to open a text file file, we pass in its path (i.e., "filename") as a str object.

In [18]:

file = open("lorem_ipsum.txt")

open() returns a proxy object of type TextIOWrapper that allows us to interact with the file on disk. mode='r' shows that we opened the file in read-only mode and encoding='UTF-8' is explained in detail in the The bytes Type section at the end of this chapter.

In [19]:

type(file)

Out[19]:

_io.TextIOWrapper

In [20]:

file

Out[20]:

<_io.TextIOWrapper name='lorem_ipsum.txt' mode='r' encoding='UTF-8'>

TextIOWrapper objects come with plenty of type-specific methods and attributes.

In [21]:

file.readable()

Out[21]:

True

In [22]:

file.writable()

Out[22]:

False

In [23]:

file.name

Out[23]:

'lorem_ipsum.txt'

In [24]:

file.encoding

Out[24]:

'UTF-8'

So far, we have not yet read anything from the file (i.e., from disk)! That is intentional as, for example, the file could contain more data than could fit into our computer's memory. Therefore, we have to explicitly instruct the file object to read some of or all the data in the file.

One way to do that, is to simply loop over the file object with the for statement as shown next: In each iteration, line is assigned the next line in the file. Because we may loop over TextIOWrapper objects, they are iterables.

In [25]:

for line in file:
    print(line)

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s

when an unknown printer took a galley of type and scrambled it to make a type

specimen book. It has survived not only five centuries but also the leap into

electronic typesetting, remaining essentially unchanged. It was popularised in

the 1960s with the release of Letraset sheets.

Once we looped over the file object, it is exhausted: We can not loop over it a second time. So, the built-in print() function is never called in the code cell below!

In [26]:

for line in file:
    print(line)

After the for-loop, the line variable is still set and references the last line in the file. We verify that it is indeed a str object.

In [27]:

line

Out[27]:

'the 1960s with the release of Letraset sheets.\n'

In [28]:

type(line)

Out[28]:

str

An important observation is that the file object is still associated with an open file descriptor . Without going into any technical details, we note that an operating system can only handle a limited number of "open files" at the same time, and, therefore, we should always close the file once we are done processing it.

TextIOWrapper objects have a closed attribute on them that indicates if the associated file descriptor is still open or has been closed. We can "manually" close any TextIOWrapper object with the close() method.

In [29]:

file.closed

Out[29]:

False

In [30]:

file.close()

In [31]:

file.closed

Out[31]:

True

The more Pythonic way is to use open() within the compound with statement (cf., reference ): In the example below, the indented code block is said to be executed within the context of the file object that now plays the role of a context manager . Many different kinds of context managers exist in Python with different applications and purposes. Context managers returned from open() mainly ensure that file descriptors get automatically closed after the last line in the code block is executed.

In [32]:

with open("lorem_ipsum.txt") as file:
    for line in file:
        print(line)

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s

when an unknown printer took a galley of type and scrambled it to make a type

specimen book. It has survived not only five centuries but also the leap into

electronic typesetting, remaining essentially unchanged. It was popularised in

the 1960s with the release of Letraset sheets.

In [33]:

file.closed

Out[33]:

True

Using syntax familiar from Chapter 3 to explain what the with open(...) as file: does above, we provide an alternative formulation with a try statement below: The finally-branch is always executed, even if an exception is raised inside the for-loop. Therefore, file is sure to be closed too. However, this formulation is somewhat less expressive.

In [34]:

try:
    file = open("lorem_ipsum.txt")
    for line in file:
        print(line)
finally:
    file.close()

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s

when an unknown printer took a galley of type and scrambled it to make a type

specimen book. It has survived not only five centuries but also the leap into

electronic typesetting, remaining essentially unchanged. It was popularised in

the 1960s with the release of Letraset sheets.

In [35]:

file.closed

Out[35]:

True

As an alternative to reading the contents of a file by looping over a TextIOWrapper object, we may also call one of the methods they come with.

For example, the read() method takes a single size argument of type int and returns a str object with the specified number of characters.

In [36]:

file = open("lorem_ipsum.txt")

In [37]:

file.read(11)

Out[37]:

'Lorem Ipsum'

When we call read() again, the returned str object begins where the previous one left off. This is because TextIOWrapper objects like file simply store a position at which the associated file on disk is being read. In other words, file is like a cursor pointing into a file.

In [38]:

file.read(11)

Out[38]:

' is simply '

On the contrary, the readline() method keeps reading until it hits a newline character. These are shown in str objects as "\n".

In [39]:

file.readline()

Out[39]:

'dummy text of the printing and typesetting industry.\n'

When we call readline() again, we obtain the next line.

In [40]:

file.readline()

Out[40]:

"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n"

Lastly, the readlines() method returns a list object that holds all lines in the file from the current position to the end of the file. The latter position is often abbreviated as EOF in the documentation. Let's always remember that readlines() has the potential to crash a computer with a MemoryError.

In [41]:

file.readlines()

Out[41]:

['when an unknown printer took a galley of type and scrambled it to make a type\n',
 'specimen book. It has survived not only five centuries but also the leap into\n',
 'electronic typesetting, remaining essentially unchanged. It was popularised in\n',
 'the 1960s with the release of Letraset sheets.\n']

Calling readlines() a second time, is as pointless as looping over file a second time.

In [42]:

file.readlines()

Out[42]:

[]

In [43]:

file.close()

Because every str object created by reading the contents of a file in any of the ways shown in this section ends with a "\n", we see empty lines printed between each line in the for-loops above. To print the entire text without empty lines in between, we pass a end="" argument to the print() function.

In [44]:

with open("lorem_ipsum.txt") as file:
    for line in file:
        print(line, end="")

Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s
when an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries but also the leap into
electronic typesetting, remaining essentially unchanged. It was popularised in
the 1960s with the release of Letraset sheets.

A String of Characters¶

A sequence is yet another abstract concept (cf., the "Containers vs. Iterables" section in Chapter 4 ).

It unifies four orthogonal (i.e., "independent") concepts into one bigger idea: Any data type, such as str, is considered a sequence if it

contains
a finite number of other objects that
can be iterated over
in a predictable order.

Chapter 7 formalizes these concepts in great detail. Here, we keep our focus on the str type that historically received its name as it models a string of characters . String is simply another term for sequence in the computer science literature.

Another example of a sequence is the list type. Because of that, str objects may be treated like list objects in many situations.

Below, the built-in len() function tells us how many characters make up text. len() would not work with an "infinite" object. As anything modeled in a program must fit into a computer's finite memory, there cannot exist truly infinite objects; however, Chapter 8 introduces specialized iterable data types that can be used to model an infinite series of "things" and that, consequently, have no concept of "length."

In [45]:

text

Out[45]:

'Lorem ipsum dolor sit amet.'

In [46]:

len(text)

Out[46]:

Being iterable, we may loop over text and do something with the individual characters, for example, print them out with extra space in between them. If it were not for the appropriately chosen name of the text variable, we could not tell what concrete type of object the for statement is looping over.

In [47]:

for character in text:
    print(character, end="  ")

L  o  r  e  m     i  p  s  u  m     d  o  l  o  r     s  i  t     a  m  e  t  .

With the reversed() built-in, we may loop over text in reversed order. Reversing text only works as it has a forward order to begin with.

In [48]:

for character in reversed(text):
    print(character, end="  ")

.  t  e  m  a     t  i  s     r  o  l  o  d     m  u  s  p  i     m  e  r  o  L

Being a container, we may check if a given str object is contained in text with the in operator, which has two distinct usages: First, it checks if a single character is contained in a str object. Second, it may also check if a shorter str object, then called a substring, is contained in a longer one.

In [49]:

"L" in text

Out[49]:

True

In [50]:

"ipsum" in text

Out[50]:

True

In [51]:

"veni, vidi, vici" in text

Out[51]:

False

Indexing¶

As str objects are ordered and finite, we may index into them to obtain individual characters with the indexing operator []. This is analogous to how we obtained individual elements of a list object in Chapter 1 .

In [52]:

text[0]

Out[52]:

'L'

In [53]:

text[1]

Out[53]:

'o'

The index must be of type int; othewise, we get a TypeError.

In [54]:

text[1.0]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[54], line 1
----> 1 text[1.0]

TypeError: string indices must be integers, not 'float'

The last index is one less than the above "length" of the str object as we start counting at 0.

In [55]:

text[26]  # == text[len(text) - 1]

Out[55]:

'.'

An IndexError is raised whenever the index is out of range.

In [56]:

text[27]  # == text[len(text)]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[56], line 1
----> 1 text[27]  # == text[len(text)]

IndexError: string index out of range

We may use negative indexes to start counting from the end of the str object, as shown in the figure below. Note how this only works because sequences are finite.

Index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26
Reverse	-27	-26	-25	-24	-23	-22	-21	-20	-19	-18	-17	-16	-15	-14	-13	-12	-11	-10	-9	-8	-7	-6	-5	-4	-3	-2	-1
Character	`L`	`o`	`r`	`e`	`m`		`i`	`p`	`s`	`u`	`m`		`d`	`o`	`l`	`o`	`r`		`s`	`i`	`t`		`a`	`m`	`e`	`t`	`.`

In [57]:

text[-1]

Out[57]:

'.'

In [58]:

text[-27]  # == text[-len(text)]

Out[58]:

'L'

One reason why programmers like to start counting at 0 is that a positive index and its corresponding negative index always add up to the length of the sequence. Here, 6 and 21 add to 27.

In [59]:

text[6]

Out[59]:

'i'

In [60]:

text[-21]

Out[60]:

'i'

Slicing¶

A slice is a substring of a str object.

The slicing operator is a generalization of the indexing operator: We put one, two, or three integers within the brackets [], separated by colons :. The three integers are then referred to as the start, stop, and step values.

Let's start with two integers, start and stop. Whereas the character at the start position is included in the returned str object, the one at the stop position is not. If both start and stop are positive, the difference "stop minus start" tells us how many characters the resulting slice has. So, below, 5 - 0 == 5 implies that "Lorem" consists of 5 characters. So, colloquially speaking, text[0:5] means "taking the first 5 - 0 == 5 characters of text."

In [61]:

text[0:5]

Out[61]:

'Lorem'

In [62]:

text[12:len(text)]

Out[62]:

'dolor sit amet.'

If left out, start defaults to 0 and stop to the length of the str object (i.e., the end).

In [63]:

text[:5]

Out[63]:

'Lorem'

In [64]:

text[12:]

Out[64]:

'dolor sit amet.'

Not including the character at the stop position makes working with individual slices easier as they add up to the original str object again (cf., the "String Operations" section below regarding the overloaded + operator).

In [65]:

text[:5] + text[5:]

Out[65]:

'Lorem ipsum dolor sit amet.'

Slicing and indexing makes it easy to obtain shorter versions of the original str object. A common application would be to parse out meaningful substrings from raw text data.

In [66]:

text[:11] + text[-10:]

Out[66]:

'Lorem ipsum sit amet.'

By combining a positive start with a negative stop index, we specify both ends of the slice relative to the ends of the entire str object. So, colloquially speaking, [6:-10] below means "drop the first six and last ten characters." The length of the resulting slice can then not be calculated from the indexes and depends only on the length of the original str object!

In [67]:

text[6:-10]

Out[67]:

'ipsum dolor'

For convenience, the indexes do not need to lie within the range from 0 to len(text) when slicing. So, no IndexError is raised here.

In [68]:

text[-999:999]

Out[68]:

'Lorem ipsum dolor sit amet.'

By leaving out both start and stop, we take a "full" slice that is essentially a copy of the original str object.

In [69]:

text[:]

Out[69]:

'Lorem ipsum dolor sit amet.'

A step value of i can be used to obtain only every ith character.

In [70]:

text[::2]

Out[70]:

'Lrmismdlrstae.'

A negative step size of -1 reverses the order of the characters.

In [71]:

text[::-1]

Out[71]:

'.tema tis rolod muspi meroL'

Immutability¶

Whereas elements of a list object may be re-assigned, as shortly hinted at in Chapter 1 , this is not allowed for the individual characters of str objects. Once created, they can not be changed. Formally, we say that str objects are immutable. In that regard, they are like the numeric types in Chapter 5 .

On the contrary, objects that may be changed after creation, are called mutable. We already saw in Chapter 1 how mutable objects are more difficult to reason about for a beginner, in particular, if more than one variable references it. Yet, mutability does have its place in a programmer's toolbox, and we revisit this idea in the next chapters.

The TypeError indicates that str objects are immutable: Assignment to an index or a slice are not supported.

In [72]:

text[0] = "X"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[72], line 1
----> 1 text[0] = "X"

TypeError: 'str' object does not support item assignment

In [73]:

text[:5] = "random"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[73], line 1
----> 1 text[:5] = "random"

TypeError: 'str' object does not support item assignment

String Methods¶

Objects of type str come with many methods bound on them (cf., the documentation for a full list). As seen before, they work like normal functions and are accessed via the dot operator .. Calling a method is also referred to as method invocation.

The .find() method returns the index of the first occurrence of a character or a substring. If no match is found, it returns -1. A mirrored version searching from the right called .rfind() exists as well. The .index() and .rindex() methods work in the same way but raise a ValueError if no match is found. So, we can control if a search fails silently or loudly.

In [74]:

text

Out[74]:

'Lorem ipsum dolor sit amet.'

In [75]:

text.find("a")

Out[75]:

In [76]:

text.find("b")

Out[76]:

-1

In [77]:

text.find("dolor")

Out[77]:

.find() takes optional start and end arguments that allow us to find occurrences other than the first one.

In [78]:

text.find("o")

Out[78]:

In [79]:

text.find("o", 2)

Out[79]:

In [80]:

text.find("o", 2, 12)

Out[80]:

-1

The .count() method does what we expect.

In [81]:

text

Out[81]:

'Lorem ipsum dolor sit amet.'

In [82]:

text.count("l")

Out[82]:

As .count() is case-sensitive, we must chain it with the .lower() method to get the count of all "L"s and "l"s.

In [83]:

text.lower().count("l")

Out[83]:

Alternatively, we can use the .upper() method and search for "L"s.

In [84]:

text.upper().count("L")

Out[84]:

Because str objects are immutable, .upper() and .lower() return new str objects, even if they do not change the value of the original str object.

In [85]:

example = "random"

In [86]:

id(example)

Out[86]:

140667840152112

In [87]:

lower = example.lower()

In [88]:

id(lower)

Out[88]:

140667764453680

example and lower are different objects with the same value.

In [89]:

example is lower

Out[89]:

False

In [90]:

example == lower

Out[90]:

True

Besides .upper() and .lower() there exist also .title() and .swapcase() methods.

In [91]:

text.lower()

Out[91]:

'lorem ipsum dolor sit amet.'

In [92]:

text.upper()

Out[92]:

'LOREM IPSUM DOLOR SIT AMET.'

In [93]:

text.title()

Out[93]:

'Lorem Ipsum Dolor Sit Amet.'

In [94]:

text.swapcase()

Out[94]:

'lOREM IPSUM DOLOR SIT AMET.'

Another popular string method is .split() : It separates a longer str object into smaller ones collected in a list object. By default, groups of contiguous whitespace characters are used as the separator.

As an example, we use .split() to print out the individual words in text with more whitespace in between them.

In [95]:

text.split()

Out[95]:

['Lorem', 'ipsum', 'dolor', 'sit', 'amet.']

In [96]:

for word in text.split():
    print(word, end="   ")

Lorem   ipsum   dolor   sit   amet.

The opposite of splitting is done with the .join() method. It is typically invoked on a str object that represents a separator (e.g., " " or ", ") and connects the elements provided by an iterable argument (e.g., words below) into one new str object.

In [97]:

words = ["This", "will", "become", "a", "sentence."]

In [98]:

sentence = " ".join(words)

In [99]:

sentence

Out[99]:

'This will become a sentence.'

As the str object "abcde" below is an iterable itself, its characters (!) are joined together with a space " " in between.

In [100]:

" ".join("abcde")

Out[100]:

'a b c d e'

The .replace() method creates a new str object with parts of the original str object potentially replaced.

In [101]:

sentence.replace("will become", "is")

Out[101]:

'This is a sentence.'

Note how sentence itself remains unchanged. Bound to an immutable object, .replace() must create new objects.

In [102]:

sentence

Out[102]:

'This will become a sentence.'

As seen previously, the .strip() method is often helpful in cleaning text data from unreliable sources like user input from unnecessary leading and trailing whitespace. The .lstrip() and .rstrip() methods are specialized versions of it.

In [103]:

"  text with whitespace  ".strip()

Out[103]:

'text with whitespace'

In [104]:

"  text with whitespace  ".lstrip()

Out[104]:

'text with whitespace  '

In [105]:

"  text with whitespace  ".rstrip()

Out[105]:

'  text with whitespace'

When justifying a str object for output, the .ljust() and .rjust() methods may be helpful.

In [106]:

sentence.ljust(40)

Out[106]:

'This will become a sentence.            '

In [107]:

sentence.rjust(40)

Out[107]:

'            This will become a sentence.'

Similarly, the .zfill() method can be used to pad a str representation of a number with leading 0s for justified output.

In [108]:

"42.87".zfill(10)

Out[108]:

'0000042.87'

In [109]:

"-42.87".zfill(10)

Out[109]:

'-000042.87'

String Operations¶

As mentioned in Chapter 1 , the + and * operators are overloaded and used for string concatenation. They always create new str objects. That has nothing to do with the str type's immutability, but is the default behavior of operators.

In [110]:

"Hello " + text[:4]

Out[110]:

'Hello Lore'

In [111]:

5 * text[:12] + "..."

Out[111]:

'Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum ...'

String Comparison¶

The relational operators also work with str objects, another example of operator overloading. Comparison is done one character at a time in a pairwise fashion until the first pair differs or one operand ends. However, str objects are sorted in a "weird" way. For example, all upper case characters come before all lower case characters. The reason for that is given in the "Characters are Numbers with a Convention" sub-section in the second part of this chapter.

In [112]:

"Apple" < "Banana"

Out[112]:

True

In [113]:

"apple" < "Banana"

Out[113]:

False

In [114]:

"apple" < "Banana".lower()

Out[114]:

True

Below is an example with typical German last names that shows how characters other than the first decide the ordering.

In [115]:

"Mai" < "Maier" < "Mayer" < "Meier" < "Meyer"

Out[115]:

True

String Interpolation¶

Often, we want to use str objects as drafts in the source code that are filled in with concrete text only at runtime. This approach is called string interpolation. There are three ways to do that in Python.

f-strings¶

Formatted string literals , of f-strings for short, are the least recently added (cf., PEP 498 in 2016) and most readable way: We simply prepend a str in its literal notation with an f, and put variables, or more generally, expressions, within curly braces {}. These are then filled in when the string literal is evaluated.

In [116]:

name = "Alexander"
time_of_day = "morning"

In [117]:

f"Hello {name}! Good {time_of_day}."

Out[117]:

'Hello Alexander! Good morning.'

Separated by a colon :, various formatting options are available. In the beginning, the ability to round numbers for output may be particularly useful: This can be achieved by adding :.2f to the variable name inside the curly braces, which casts the number as a float and rounds it to two digits. The :.2f is a so-called format specifier, and there exists a whole format specification mini-language to govern how specifiers work.

In [118]:

pi = 3.141592653

In [119]:

f"Pi is {pi:.2f}"

Out[119]:

'Pi is 3.14'

format() Method

str objects also provide a .format() method that accepts an arbitrary number of positional arguments that are inserted into the str object in the same order replacing empty curly brackets {}. String interpolation with the .format() method is a more traditional and probably the most common way as of today. While f-strings are the recommended way going forward, usage of the .format() method is likely not declining any time soon.

In [120]:

"Hello {}! Good {}.".format(name, time_of_day)

Out[120]:

'Hello Alexander! Good morning.'

We may use index numbers inside the curly braces if the order is different in the str object.

In [121]:

"Good {1}, {0}".format(name, time_of_day)

Out[121]:

'Good morning, Alexander'

The .format() method may alternatively be used with keyword arguments as well. Then, we must put the keywords' names within the curly brackets.

In [122]:

"Hello {name}! Good {time}.".format(name=name, time=time_of_day)

Out[122]:

'Hello Alexander! Good morning.'

Format specifiers work as in the f-string case.

In [123]:

"Pi is {:.2f}".format(pi)

Out[123]:

'Pi is 3.14'

`%` Operator¶

The % operator that we saw in the context of modulo division in Chapter 1 is overloaded with string interpolation when its first operand is a str object. The second operand consists of all expressions to be filled in. Format specifiers work with a % instead of curly braces and according to a different set of rules referred to as printf-style string formatting . So, {:.2f} becomes %.2f.

This way of string interpolation is the oldest and originates from the C language . It is still widely spread, but we should use one of the other two ways instead. We show it here mainly for completeness sake.

In [124]:

"Pi is %.2f" % pi

Out[124]:

'Pi is 3.14'

To insert more than one expression, we must list them in order and between parenthesis ( and ). As Chapter 7 reveals, this literal syntax creates an object of type tuple. Also, to format an expression as text, we use the format specifier %s.

In [125]:

"Hello %s! Good %s." % (name, time_of_day)

Out[125]:

'Hello Alexander! Good morning.'