from IPython.core.display import HTML
css_file = '../styles.css'
HTML(open(css_file, "r").read())
- Use a library function to get a list of filenames that match a simple wildcard pattern.
- Use a for loop to process multiple files.
We now have almost everything we need to process all our data files. The only thing that's missing is a library with a rather unpleasant name:
import glob
The glob
library contains a single function, also called glob
,
that finds files whose names match a pattern.
We provide those patterns as strings:
the character *
matches zero or more characters,
while ?
matches any one character.
We can use this to get the names of all the Jupyter notebooks in the current directory:
print (glob.glob("*.ipynb"))
['00-Index.ipynb', '01-numpy.ipynb', '02-loop.ipynb', '03-lists.ipynb', '04-files.ipynb', '05-cond.ipynb', '06-func.ipynb', '07-errors.ipynb', '08-defense.ipynb']
As these examples show, glob.glob
's result is a list of strings,
which means we can loop over it to do something with each filename in turn. In our case, the "something" we want to do is generate a plot for each file in our star dataset.
Let's test it by analyzing the first three files in the list:
import numpy
import matplotlib.pyplot
%matplotlib inline
filenames = glob.glob('data/*.csv')
filenames = filenames[0:4]
for f in filenames:
print (f)
data = numpy.loadtxt(fname=f, delimiter=',') # load in the data
# calculate the average brightness over all stars (rows)
ave_brightness = data.mean(axis=0)
# divide by the average brightness
processed_data = data/ave_brightness
image = matplotlib.pyplot.imshow(processed_data)
matplotlib.pyplot.show(image)
data/star_data_01.csv
data/star_data_02.csv
data/star_data_03.csv
data/star_data_04.csv
The first three datasets all look alike: no stars in these datasets seem to show intrinsic variability. The fourth dataset is different however! It looks like one star is changing it's brightness significantly, producing the odd colored stripe across the image.