import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Below are 15 measurements of temperature and pressure. The temperature data is indegrees Kelvin, and the pressure data is given to us in units of Bars. We are told the measurements are daily starting on August first of 2020. In order to clean up this data, we should turn it into a Pandas DataFrame.
# Below are our sample timeseries of data
pressures = [1.011, 1.009, 1.0085, 1.0089, 1.0099, 1.013, 1.014, 1.013, 1.014, 1.018, 1.017, 1.011, 1.006, 1.001, 1.009]
temperatures = [301.5, 301.1, 301.3, 301.3, 301.7, 302.2, 302.3, 302.3, 302.4, 303.1, 302.9, 302.3, 302.1, 301.2, 301.5]
# First, lets declare all our data as numpy arrays
temperature_data = np.array(temperatures)
pressure_data = np.array(pressures)
# Next, lets join our two numpy arrays together, so instead of two 15 element arrays, we have one 2x15 array
t_p_arrays = np.array([temperature_data, pressure_data])
# Here we will create the Pandas DataFrame
# To declare a DataFrame we will pass these lists to the pd.DataFrame()
t_p_dataframe = pd.DataFrame(t_p_arrays)
print(t_p_dataframe)
Because we passed the numpy array t_p_arrays to the pd.DataFrame() function, our current dataframe has two rows, one for pressure and one for temperature. There are 15 columns (index starts at zero) which correspond to each daily reading. Lets change our DataFrame to one in which the columns are temperature and pressure and the rows are each daily reading!
To do this we have to pass the pd.DataFrame() function an array where the rows and columns are switched. An easy way to do this is to take the transpose of t_p_arrays before passing it to pd.DataFrame().
# Declare a second dataframe using the transpose of t_p_arrays
t_p_arrays_transpose = np.transpose(t_p_arrays) # This line swaps the rows and columns of t_p_arrays
t_p_dataframe_transpose = pd.DataFrame(t_p_arrays_transpose)
print(t_p_dataframe_transpose)
Nice! Now we have an array where the first column is our index, the second column is our temperature in Kelvin, and our third column is pressure in bars! In our next section we will work on making this data more readable.
Under this objective, we want to learn how to subset data from the DataFrame for easier interpretation. We will start with becoming familiar with how to select one row or column from a padas DataFrame
# Lets just select the column of our DataFrame that corresponds to pressure
temperatures = t_p_dataframe_transpose[0] # Temperature uses index zero because temps are in the first column
print(temperatures)
Perfect! Subsetting a column in pandas returns a pd.Series object, which are like the pandas equivalent of numpy arrays.
# If we just wanted pressure and temperature from the 10th day then we should locate the row indexed with a 9
tenth_day_temp_and_pressure = t_p_dataframe_transpose.iloc[9]
print(tenth_day_temp_and_pressure)
# Lets index each of the readings by day
dates = pd.date_range("20200801", periods=15) # The data is 20200801 for August 1st 2020
temp_pres_df = pd.DataFrame(t_p_arrays_transpose, index=dates, columns=['Temp', 'Pressure'])
# Now change our temperature to celcius and the pressures to hPa
temp_pres_df['Temp'] = temp_pres_df['Temp'] - 273.15 # The conversion between
temp_pres_df['Pressure'] = temp_pres_df['Pressure']*1000
display(temp_pres_df) # Use the display() function instead of print for a fancier output!
This can be done easily with our pandas DataFrame!
temp_pres_above_1010_df = temp_pres_df[temp_pres_df['Pressure'] > 1010.0]
display(temp_pres_above_1010_df)
We can plot temperature and pressure data by using subsetting the columns as we did above and using matplotlib. Instead, we will try to use the pandas built in plot functions!
# Use the pandas .plot() function on our DataFrame
temp_pres_df.plot()
Hurray! The .plot() function automatically uses the column names, and date indices that we specified to create a plot of pressure and temperature. Unfortunately, the scales of temeprature and pressure need to be changed, lets give both the temperature and pressure their own plot.
temp_pres_df.plot(subplots=True, figsize=(12, 6))
Finally, lets add some labels and a title before finishing with our plot!
ax = temp_pres_df.plot(subplots=True, figsize=(12, 6))
plt.suptitle('Daily Temperature and Pressure from August 1st - August 15th 2020') # plt.suptitle means 'super title'
ax[0].set_ylabel('Temperature ($\degree$C)') # ax[0] is used because its the first plot
ax[1].set_xlabel('Days') # ax[1] is used because its the second plot
ax[1].set_ylabel('Pressure ($hPa$)')