Here's how I used pandas to rearrange and clean up the raw tide data from NOAA.

In [1]:
import pandas as pd


Read the data via pandas' flexible read_table function. This returns a DataFrame:

In [2]:
d = pd.read_table('BatteryParkTideData_Cleaned.txt', sep='\s+', parse_dates=[[1, 2]])

In [3]:
d.head()

Out[3]:
             Date_Time  ?Station  Pred6  Backup  Acoustc
0  2012-10-29 00:00:00   8518750   1.59    4.68     4.65
1  2012-10-29 00:06:00   8518750   1.50    4.55     4.54
2  2012-10-29 00:12:00   8518750   1.40    4.46     4.44
3  2012-10-29 00:18:00   8518750   1.31    4.36     4.33
4  2012-10-29 00:24:00   8518750   1.22    4.28     4.26

In the data I output I don't want dates, I just want the time since the first measurement, which is a simple, easy to work with floating point number. So next I add a new column to d that's the difference between each timestamp and the first timestamp:

In [6]:
d['TimeOffset'] = d['Date_Time'] - d['Date_Time'][0]

In [7]:
d.head()

Out[7]:
             Date_Time  ?Station  Pred6  Backup  Acoustc TimeOffset
0  2012-10-29 00:00:00   8518750   1.59    4.68     4.65    0:00:00
1  2012-10-29 00:06:00   8518750   1.50    4.55     4.54    0:06:00
2  2012-10-29 00:12:00   8518750   1.40    4.46     4.44    0:12:00
3  2012-10-29 00:18:00   8518750   1.31    4.36     4.33    0:18:00
4  2012-10-29 00:24:00   8518750   1.22    4.28     4.26    0:24:00

Now I have the time since the first measurement, but it's still in a datetime format, specifically a timedelta. To convert that to hours I use the total_seconds methods and divide by 3600. I add that to d as another new column:

In [14]:
d['TimeOffsetHours'] = pd.Series(to.total_seconds() / 3600. for to in d['TimeOffset'])


Finally I write a CSV using the to_csv method. I'm writing only the time since first measurement, predicted measurement, and measurement columns:

In [15]:
d.to_csv('BatteryParkTideData.csv', na_rep='NA', cols=['TimeOffsetHours', 'Pred6', 'Backup', 'Acoustc'], index=False)

In [ ]: