#!/usr/bin/env python # coding: utf-8 # # Reading and Writing Data Files with Python # In order plot or fit data with Python, you have to get the data into the program. If a program makes calculations using data, it can be useful to write the results to a file. # ## 1. Reading Data Files # In Python, it is often useful for data to be in arrays. Data can be entered directly into the programs using the **`array`** function from the numpy library. For instance, the following lines assign arrays of numbers to `x`, `y`, and `yerr`. # In[1]: from numpy import * x = array([0.0, 2.0, 4.0, 6.0, 8.0]) y = array([1.1, 1.9, 3.2, 4.0, 5.9]) yerr = array([0.1, 0.2, 0.1, 0.3, 0.3]) # However, this is not a good way to handle large data sets. It is better to store the data in a separate file and have the program read the data file. You could use a text editor (*Idle* works well or you can create and edit a file in *CoCalc*) to enter the data above in the form shown below. The values of `x`, `y`, and `yerr` (the uncertainty in `y`) for a single data point are entered on the same line separated by spaces or tabs. #
    # 0.0 1.1 0.1
    # 2.0 1.9 0.2
    # 4.0 3.2 0.1
    # 6.0 4.0 0.3
    # 8.0 5.9 0.3 #
# # Suppose that the file is saved as plain text and given the name “`input.dat`”. The **`loadtxt`** function # from the numpy library can be used to read data from the text file. The following example shows how to read the data into an array called `DataIn`. # In[2]: from numpy import * DataIn = loadtxt('input.dat') print(DataIn) # Notice that `DataIn` is a single 2-dimensional array, rather than three 1-dimensional arrays. # # If you add a line that starts with a number sign (`#`) to the data file, it will be ignored as a comment when the file is read. (Blank lines are also ignored.) It is a good idea to put explanatory comments at the # beginning of data files because you will quickly forget what the numbers mean. Giving # files descriptive names and keeping good notes about them are also helpful. # In[3]: # This line is a comment, even in a data file # In most cases (plotting, for example), each variable should be in a separate 1-dimensional # array. Setting the **`unpack`** argument to **`True`** and providing a variable for each column # accomplishes this. # In[4]: from numpy import * x, y, yerr = loadtxt('input.dat', unpack=True) print(x) print(y) print(yerr) # If you want to read in only some columns, you can use the **`usecols`** argument to specify # which ones. Indices in Python start from zero, not one. The line below will read only the # first and second columns of data, so only two variable names are provided. # In[5]: x, y = loadtxt('input.dat', unpack=True, usecols=[0,1]) # Sometimes you will get a file with data separated by commas, instead of spaces. For example, suppose that the file "`input2.dat`" contains the following time and voltage data from a pressure sensor. #
    # 0.0, 1.1
    # 2.0, 1.9
    # 4.0, 4.2
    # 6.0, 4.0
    # 8.0, 5.9 #
# # The **`delimiter`** argument can be used to make the **`loadtxt`** function recognize commas as the separators. # In[6]: t, v = loadtxt('input2.dat', delimiter=',', unpack=True) print(t) print(v) # ## 2. Writing Data Files # The **`savetxt`** function from the numpy library can be used to write data to a text file. # Suppose that you’ve read two columns of data into the arrays `t` for time and `v` for the # voltage from a pressure sensor. Also, suppose that the manual for the sensor gives the # following equation to find the pressure in atmospheres from the voltage reading. # In[7]: p = 0.15 + v/10.0 # Recall that this single Python command will calculate an array `p` with the same length as # the array `v`. Once you’ve calculated the pressures, you might want to write the times and # pressures to a text file for later use. The following command will write `t` and `p` to the file # “`output.dat`”. The file will be saved in the same directory as the program. **If you give # the name of an existing file, it will be overwritten so be careful!** # In[8]: savetxt('output.dat', (t,p)) # Unfortunately, each of the arrays will appear in a different row, which is inconvenient for large data sets. The **`column_stack`** function can be used to put each array written into a different # column. The arguments should be a list of arrays (the inner pair of brackets make it a list) # in the order that you want them to appear. # In[9]: savetxt('output.dat', column_stack((t,p)) ) # The default is to write the data out separated by spaces, but you can use the optional **`delimiter`** argument to specify something else. For example, the following writes comma separated data. # In[10]: savetxt('output.dat', column_stack((t,p)), delimiter=',') # By default, the numbers will be written in scientific notation. The **`fmt`** argument can be # used to specify the formatting. If one format is supplied, it will be used for all of the # numbers. The form of the formatting string is “`%(width).(precision)(specifier)`”, where `width` specifies the maximum number of digits, `precision` specifies the number of digits after the decimal point, and the possibilities for `specifier` are shown below. For integer formatting, the precision argument is ignored if you give it. For scientific notation and floating point formatting, the width argument is optional. # # |Specifier|Meaning|Example Format|Output for -34.5678| # |-|-|-|-| # |i|signed integer|%5i|-34| # |e|scientific notation|%5.4e|−3.4568e+001| # |f|floating point|%5.2f|−34.57| # # A format can also be provided for each column (two in this case) as follows. # In[11]: savetxt('output.dat', column_stack((t,p)), fmt=('%i3', '%4.3f')) # It is a good idea to add comments at the top of data files that you create to remind you # of what they contain. The optional **`header`** argument, which allows you put comments at the top of the text file. The **`comment`** argument allows you to pick what proceeds the header text. If you want the string to be considered a comment # when it is read by the loadtxt function, it should start with a number sign (`#`). An example is shown below. # In[12]: savetxt('output.dat', column_stack((t,p)), comments='# ', header='t (s) p (Pa)') # If you want a mulitple-line header, you can include “`\n`” to force a newline. # In[13]: savetxt('output.dat', column_stack((t,p)), comments='# ', header='First line\nSecond line') # **Remember to be very careful about overwriting existing files with the `savetxt` function!** # # Additional Documentation # More information is available at
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.column_stack.html