I/O: `input`, files, filesystem¶

Yoav Ram¶

User prompt¶

The input function is useful to get string prompt from the user. It works in the notebook, as well as when running scripts in the console.

In [1]:

name = input("What's your name?\n")

print("Hi", name)

What's your name?
Yoav
Hi Yoav

In [2]:

n_icecreams = input("How many icecreams would you like?")
price = input("How much does an icecream cost?")
print("That would be", price * n_icecreams)

How many icecreams would you like?3
How much does an icecream cost?1.5

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-3e01a711260d> in <module>()
      1 n_icecreams = input("How many icecreams would you like?")
      2 price = input("How much does an icecream cost?")
----> 3 print("That would be", price * n_icecreams)

TypeError: can't multiply sequence by non-int of type 'str'

For security reasons, input returns strings. It is the program's responsibility to convert the string to the desired type:

In [3]:

n_icecreams = int(input("How many icecreams would you like?"))
price = float(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like?3
How much does an icecream cost?1.5
That would be 4.5

You can use eval to evaluate the input string into a Python expression, but don't do it if you don't trust the user because it can lead to strange behaviour and side effects.

Let's see what happens when we give valud input (2 and 1.5) and when we give invalid input (2 and [1,2,3]). Try it with eval and with the above code(int and float).

In [4]:

n_icecreams = eval(input("How many icecreams would you like?"))
price = eval(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like?3
How much does an icecream cost?1.5
That would be 4.5

Exercise¶

Ask the user for a number between 1 and 10; if the number is not within that range, let him know and ask him again.

In [ ]:

Files¶

We'll start with simple text files and proceed to more complex formats.
Let's read the list of crop plants located in data/crops.txt or you can download it from GitHub.

Reading files¶

Whenever we want to work with a file, we first need to open it using the open function.
This function returns an IO object which we can then use for reading or writing.

In [3]:

f = open('../data/crops.txt', 'rt') # rt = read text
print(type(f))

<class '_io.TextIOWrapper'>

In [4]:

crops = f.read()
f.close()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia

The open function receives two parameters:

the path to the file you want to open.
the mode of opening - r for reading, w for writing, a for appending, t for text, b for binary.

read returns all the text from the file as a string.

close then closes the file handle.

A more idiomatic way to do this, in which Python takes care of closing the file, is using a context manager:

In [1]:

with open('../data/crops.txt','r') as f:
    crops = f.read()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia

This idiom uses a context manager, and the file handle f is closed when the context manager block ends, even if it ends due to an error.

Iterating over files¶

Using a `for` loop¶

We can simply use a for loop to go over all lines in a text file. This is the best practice, and also very simple to use:

In [7]:

with open('../data/crops.txt','r') as f:
    for line in f:
        if line.startswith('Musa'):   # check if line starts with a given string
            print(line.strip())       # strip removes the newline character from the end of the line

Musa balbisiana
Musa spp.
Musa textilis

Reading line by line with `readline`¶

The readline() method allows us to read a single line each time. It works well when combined with a while loop, giving us control of the program flow.

In [1]:

with open('../data/crops.txt','r') as f:
    line = f.readline().strip()    # read first line
    print(line)
    while line:
        line = f.readline().strip()
        if line.startswith('Triticum'):
            print(line)        

Abelmoschus caillei
Triticum aestivum
Triticum dicoccum
Triticum durum
Triticum monococcum
Triticum spelta
Triticum turanicum

There are other methods you can use to read files. For example, the readlines() returns all the lines as a list of strings.

Exercise¶

Print the last line in the file.
Find out how many Garcinia species are in the file (use the startswith() string method).

In [ ]:

Writing to a file¶

To write to a file, we first have to open it for writing. This is done using one of two modes: 'w' or 'a'.

'w', for write, will let you write into the file. If it doesn't exist, it'll be automatically created. If it exists and already has some content, the content will be overwritten.

'a', for append, is very similar, only it will not overwrite, but append your text to the end of an existing file.

Writing is done using print() by adding the argument file = <file object>.

In [11]:

with open(r'tmp.txt','w') as f:
    print('This is the first line', file=f)
    line = 'Another line'
    print(line, file=f)
    msg1 = 'Hello '
    msg2 = 'World!'
    print(msg1 + msg2, file=f)

In [12]:

%less tmp.txt

Temporary files¶

Temporary files can be created using the tempfile module:

In [1]:

import tempfile

In [2]:

_, fname = tempfile.mkstemp()
print("Writing to temp file", fname)
with open(fname, 'w') as f:
     print("This is a temporary file", file=f)

Writing to temp file /var/folders/qn/3hj7mcx56k19b_09n6dymw8h0000gn/T/tmp5mvzb1dr

In [3]:

%less $fname

See other methods in tempfile on how to create temporary directories, named temporary files, etc.

Exercise¶

In the last example we wrote to a temporary file. In this exercise we will copy that file contents to a new temporary file that has an extension .txt (use the suffix keyword when creating the temporary file). Copy the contents by reading from the existing file and writing to a new file (this is not the efficient way to do it, but it's just an exercise!). Don't forget to close the files and print the new temporary filename so that you can check that the writing was successful.

In [11]:

C:\Users\yoavram\AppData\Local\Temp\tmpk6qauup4.txt

Filesystem¶

Python offers plenty of ways to interact with the filesystem through the os and os.path modules.

Let's import os:

In [16]:

import os

Showcase some of the capabilities of os:

In [2]:

files = os.listdir()
for fname in files:
    if os.path.isdir(fname):
        print(fname, "is a folder")
    elif os.path.isfile(fname):
        size = os.path.getsize(fname)
        print(fname, "is a file with size", size, "bytes")

.ipynb_checkpoints is a folder
async.ipynb is a file with size 5560 bytes
calculus.ipynb is a file with size 101282 bytes
conda-env.ipynb is a file with size 12529 bytes
csv.ipynb is a file with size 5547 bytes
curve-fitting.ipynb is a file with size 513763 bytes
dictionaries.ipynb is a file with size 17682 bytes
differential-equations.ipynb is a file with size 280735 bytes
DSP.ipynb is a file with size 1894434 bytes
exceptions.ipynb is a file with size 30828 bytes
functions.ipynb is a file with size 17818 bytes
gui.ipynb is a file with size 5995 bytes
idioms.ipynb is a file with size 40620 bytes
if-while.ipynb is a file with size 10170 bytes
image-processing.ipynb is a file with size 2681392 bytes
img is a folder
io.ipynb is a file with size 29051 bytes
iteration.ipynb is a file with size 40219 bytes
linear-algebra.ipynb is a file with size 90527 bytes
matplotlib-aesthetics.ipynb is a file with size 454243 bytes
matplotlib.ipynb is a file with size 155736 bytes
memory-model.ipynb is a file with size 13114 bytes
ML.ipynb is a file with size 148262 bytes
modules.ipynb is a file with size 23036 bytes
notebook-display.ipynb is a file with size 272487 bytes
notebook-magic.ipynb is a file with size 24575 bytes
numpy.ipynb is a file with size 59320 bytes
oop.ipynb is a file with size 77898 bytes
optimization.ipynb is a file with size 550354 bytes
pandas-seaborn.ipynb is a file with size 425639 bytes
probability.ipynb is a file with size 304972 bytes
regexp.ipynb is a file with size 32066 bytes
requests.ipynb is a file with size 100499 bytes
statistics.ipynb is a file with size 666372 bytes
strings-lists-loops.ipynb is a file with size 44695 bytes
types-operators.ipynb is a file with size 28509 bytes

Here's a combination of functions to get the current directory (os.getcwd), change the directory (os.chdir), check if a file exists (os.path.exists), and split a filename from its extension:

In [18]:

curdir = os.getcwd()
os.chdir('../data')
fname = 'crops.txt'
print(fname, 'exists?', os.path.exists(fname))
fname = os.path.splitext('crops.txt')[0] + '.csv'
print(fname, 'exists?', os.path.exists(fname))
os.chdir(curdir)

crops.txt exists? True
crops.csv exists? False

See the os and os.path modules for more functions.

Serializing objects¶

The json module allows to encode Python objects to text and decode them back again. It implements the JSON (JavaScript Object Notation) format, a lightweight data interchange format inspired by JavaScript object literal syntax, and is therefore interoperable and widely used outside of the Python ecosystem. Also, the format is human-readable, which allows the developer to inspect the data from file without requiring him to deserialize the data.

We start by importing the module and creating an example data dictionary:

In [1]:

import json

data = { 
    'a_string': 'Hello JSON', 
    'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6), 
    'some_number': 5768.4454,
    'list_as_well': [True, False, 'This', 'That']
} 
data

Out[1]:

{'a_string': 'Hello JSON',
 'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6),
 'list_as_well': [True, False, 'This', 'That'],
 'some_number': 5768.4454}

We dump the dictionary into a string:

In [2]:

data_string = json.dumps(data)
data_string

Out[2]:

'{"a_string": "Hello JSON", "ints_in_a_tuple": [5, 6, 7, 2, 3, 5, 6], "some_number": 5768.4454, "list_as_well": [true, false, "This", "That"]}'

If we want to save this to a file, we can either write the string to a file or dump directly to a file:

In [3]:

fname = tempfile.mktemp(suffix='.json')

with open(fname, 'w') as f:
     json.dump(data, f)

In [4]:

%less $fname

We can make the file more readable with some configuration:

In [5]:

_, fname = tempfile.mkstemp(suffix='.json')

with open(fname, 'w') as f:
     json.dump(data, f, sort_keys=True, indent=4, separators=(',', ': '))

In [6]:

%less $fname

Not everything is supported by json, for example, complex numbers:

In [7]:

json.dumps([1 + 2j, 4 + 5j])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-e209f240b279> in <module>()
----> 1 json.dumps([1 + 2j, 4 + 5j])

/Users/yoavram/miniconda3/envs/Py4Eng/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/Users/yoavram/miniconda3/envs/Py4Eng/lib/python3.6/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/Users/yoavram/miniconda3/envs/Py4Eng/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/Users/yoavram/miniconda3/envs/Py4Eng/lib/python3.6/json/encoder.py in default(self, o)
    178         """
    179         raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180                         o.__class__.__name__)
    181 
    182     def encode(self, o):

TypeError: Object of type 'complex' is not JSON serializable

In [8]:

def encode_complex(obj):
    if isinstance(obj, complex):
        return {'real': obj.real, 'imag': obj.imag}

In [17]:

data = [1 + 2j, 4 + 5j, 5]
dump = json.dumps(data, default=encode_complex)
dump

Out[17]:

'[{"real": 1.0, "imag": 2.0}, {"real": 4.0, "imag": 5.0}, 5]'

And to decode:

In [18]:

def decode_complex(o):
    if 'real' in o and 'imag' in o: # no need for isinstance(o, dict) as o is always dict, see docstring
        return complex(o['real'], o['imag'])
    return o

data2 = json.loads(dump, object_hook=decode_complex)
print(data2, data2 == data)

[(1+2j), (4+5j), 5] True

References¶

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. It can deal with (almost) any Python object, but produces binary rather than text files, and is Python specific. The pickle API is similar to that of json.
The io module provides facilities for dealing with various types of I/O.

Colophon¶

This notebook was written by Yoav Ram and is part of the Python for Engineers course.

The notebook was written using Python 3.6.1. Dependencies listed in environment.yml, full versions in environment_full.yml.

This work is licensed under a CC BY-NC-SA 4.0 International License.

I/O: input, files, filesystem¶

Yoav Ram¶

User prompt¶

Exercise¶

Files¶

Reading files¶

Iterating over files¶

Using a for loop¶

Reading line by line with readline¶

Exercise¶

Writing to a file¶

Temporary files¶

Exercise¶

Filesystem¶

Serializing objects¶

References¶

Colophon¶

I/O: `input`, files, filesystem¶

Using a `for` loop¶

Reading line by line with `readline`¶