Times and Dates

Time is an essential component of nearly all geoscience data. Timescales span orders of magnitude from microseconds for lightning, hours for a supercell thunderstorm, days for a global weather model, millenia and beyond for the earth's climate. To properly analyze geoscience data, you must have a firm understanding of how to handle time in Python. In this notebook, we will examine the Python Standard Library for handling dates and times. We will also briefly make use of the pytz module to handle some thorny time zone issues in Python.

Time Versus Datetime Modules and Some Core Concepts

Python comes with time and datetime modules. Unfortunately, Python can be initially disorienting because of the heavily overlapping terminology concerning dates and times:

  • datetime module has a datetime class
  • datetime module has a time class
  • datetime module has a date class
  • time module has a time function which returns (almost always) Unix time
  • datetime class has a date method which returns a date object
  • datetime class has a time method which returns a time object

This confusion can be partially alleviated by aliasing our imported modules:

In [1]:
import datetime as dt
# we can now reference the datetime module (alaised to 'dt') and datetime
# object unambiguously
pisecond = dt.datetime(2016, 3, 14, 15, 9, 26)
print(pisecond)
2016-03-14 15:09:26
In [2]:
import time as tm
now = tm.time()
print(now)
1467053657.6353862

time module

The time module is well-suited for measuring Unix time. For example, when you are calculating how long it takes a Python function to run (so-called "benchmarking"), you can employ the time() function from the time module to obtain Unix time before and after the function completes and take the difference of those two times.

In [3]:
import time as tm
start = tm.time()
tm.sleep(1)  # The sleep function will stop the program for n seconds
end = tm.time()
diff = end - start
print("The benchmark took {} seconds".format(diff))
The benchmark took 1.0026235580444336 seconds

(For more accurate benchmarking, see the timeit module.)

datetime module

The datetime module handles time with the Gregorian calendar (the calendar we are all familiar with) and is independent of Unix time. The datetime module has an object-oriented approach with the date, time, datetime, timedelta, and tzinfo classes.

  • date class represents the day, month and year
  • time class represents the time of day
  • datetime class is a combination of the date and time classes
  • timedelta class represents a time duration
  • tzinfo (abstract) class represents time zones

The datetime module is effective for:

  • performing date and time arithmetic and calculating time duration
  • reading and writing date and time strings in a particular format
  • handling time zones (with the help of third-party libraries)

The time and datetime modules overlap in functionality, but in your geoscientific work, you will probably be using the datetime module more than the time module.

What is Unix Time?

Unix time is an example of system time which is the computer's notion of passing time. It is measured in seconds from the the start of the epoch which is January 1, 1970 00:00 UTC. It is represented "under the hood" as a floating point number which is how computers represent real (ℝ) numbers .

The Thirty Second Introduction to Object-Oriented Programming

We have been talking about object-oriented (OO) programming by mentioning terms like "class", "object", and "method", but never really explaining what they mean. A class is a collection of related variables, similar to a struct, in the C programming language or even a tuple in Python) coupled with functions, or "methods" in OO parlance, that can act on those variables. An object is a concrete example of a class.

For example, if you have a Coord class that represents an earth location with latitude, and longitude, you may have a method that returns the distance between two locations, distancekm() in this example.

In [4]:
import math
class Coord:
    """Earth location identified by (latitude, longitude) coordinates.
    distancekm  -- distance between two points in kilometers
    """

    def __init__(self, latitude=0.0, longitude=0.0):
        self.lat = latitude
        self.lon = longitude

    def distancekm(self, p):
        """Distance between two points in kilometers."""
        DEGREES_TO_RADIANS = math.pi / 180.0
        EARTH_RADIUS = 6373  # in KMs
        phi1 = (90.0 - self.lat) * DEGREES_TO_RADIANS
        phi2 = (90.0 - p.lat) * DEGREES_TO_RADIANS
        theta1 = self.lon * DEGREES_TO_RADIANS
        theta2 = p.lon * DEGREES_TO_RADIANS
        cos = (math.sin(phi1) * math.sin(phi2) *
               math.cos(theta1 - theta2) + math.cos(phi1) * math.cos(phi2))
        arc = math.acos(cos)
        return arc * EARTH_RADIUS

To create a concrete example of a class, also known as an object, initialize the object with data:

In [5]:
timbuktu = Coord(16.77, 3.00)

Here, timbuktu is an object of the class Coord initialized with a latitude of 16.77 and a longitude of 3.00. Next, we create two Coord objects: ny and paris. We will invoke the distancekm() method on the ny object and pass the paris object as an argument to determine the distance between New York and Paris in kilometers.

In [6]:
ny = Coord(40.71, 74.01)
paris = Coord(48.86, 2.35)
distance = ny.distancekm(paris)
print("The distance from New York to Paris is {:.1f} kilometers.".format(
    distance))
The distance from New York to Paris is 5517.0 kilometers.

The old joke about OO programming is that they simply moved the struct that the function takes as an argument and put it first because it is special. So instead of having distancekm(ny, paris), you have ny.distancekm(paris). We have not talked about inheritance or polymorphism but that is OO in a nutshell.

Reading and Writing Dates and Times

Parsing Lightning Data Timestamps with the datetime.strptime Method

Suppose you want to analyze US NLDN lightning data. Here is a sample row of data:

06/27/07 16:18:21.898 18.739 -88.184 0.0 kA 0 1.0 0.4 2.5 8 1.2 13 G

Part of the task involves parsing the 06/27/07 16:18:21.898 time string into a datetime object. (The full description of the data are described here.) In order to parse this string or others that follow the same format, you will employ the datetime.strptime() method from the datetime module. This method takes two arguments: the first is the date time string you wish to parse, the second is the format which describes exactly how the date and time are arranged. The full range of format options is described in the Python documentation. In reality, the format will take some degree of experimentation to get right. This is a situation where Python shines as you can quickly try out different solutions in the IPython interpreter. Beyond the official documentation, Google and Stack Overflow are your friends in this process. Eventually, after some trial and error, you will find the '%m/%d/%y %H:%M:%S.%f' format will properly parse the date and time.

In [7]:
import datetime as dt
strike_time = dt.datetime.strptime('06/27/07 16:18:21.898',
                                   '%m/%d/%y %H:%M:%S.%f')
# print strike_time to see if we have properly parsed our time
print(strike_time)
2007-06-27 16:18:21.898000

Retrieving METAR from the MesoWest API with Help from the datetime.strftime Method

Let's say you are interested in obtaining METAR data from the Aleutian Islands with the MesoWest API. In order to retrieve these data, you will have to assemble a URL that abides by the MesoWest API reference, and specifically create date time strings that the API understands (e.g., 201606010000 for the year, month, date, hour and minute). For example, typing the following URL in a web browser will return a human-readable nested data structure called a JSON object which will contain the data along with additional "metadata" to help you interpret the data (e.g., units etc.). Here, we are asking for air temperature information from the METAR station at Eareckson air force base (ICAO identifier "PASY") in the Aleutians from June 1, 2016, 00:00 UTC to June 1, 2016, 06:00 UTC.

http://api.mesowest.net/v2/stations/timeseries?stid=pasy&start=201606010000&end=201606010600&vars=air_temp&token=demotoken

{
  "SUMMARY": {
    "FUNCTION_USED": "time_data_parser",
    "NUMBER_OF_OBJECTS": 1,
    "TOTAL_DATA_TIME": "5.50103187561 ms",
    "DATA_PARSING_TIME": "0.313997268677 ms",
    "METADATA_RESPONSE_TIME": "97.2690582275 ms",
    "RESPONSE_MESSAGE": "OK",
    "RESPONSE_CODE": 1,
    "DATA_QUERY_TIME": "5.18608093262 ms"
  },
  "STATION": [
    {
      "ID": "12638",
      "TIMEZONE": "America/Adak",
      "LATITUDE": "52.71667",
      "OBSERVATIONS": {
        "air_temp_set_1": [
          8.3,
          8.0,
          8.3,
          8.0,
          7.8,
          7.8,
          7.0,
          7.2,
          7.2
        ],
        "date_time": [
          "2016-06-01T00:56:00Z",
          "2016-06-01T01:26:00Z",
          "2016-06-01T01:56:00Z",
          "2016-06-01T02:40:00Z",
          "2016-06-01T02:56:00Z",
          "2016-06-01T03:56:00Z",
          "2016-06-01T04:45:00Z",
          "2016-06-01T04:56:00Z",
          "2016-06-01T05:56:00Z"
        ]
      },
      "STATE": "AK",
      "LONGITUDE": "174.11667",
      "SENSOR_VARIABLES": {
        "air_temp": {
          "air_temp_set_1": {
            "end": "",
            "start": ""
          }
        },
        "date_time": {
          "date_time": {}
        }
      },
      "STID": "PASY",
      "NAME": "Shemya, Eareckson AFB",
      "ELEVATION": "98",
      "PERIOD_OF_RECORD": {
        "end": "",
        "start": ""
      },
      "MNET_ID": "1",
      "STATUS": "ACTIVE"
    }
  ],
  "UNITS": {
    "air_temp": "Celsius"
  }
}
// GET http://api.mesowest.net/v2/stations/timeseries?stid=pasy&start=201606010000&end=201606010600&vars=air_temp&token=demotoken
// HTTP/1.1 200 OK
// Content-Type: application/json
// Date: Mon, 27 Jun 2016 18:17:08 GMT
// Server: nginx/1.4.6 (Ubuntu)
// Vary: Accept-Encoding
// Content-Length: 944
// Connection: keep-alive
// Request duration: 0.271790s

Continuing with this example, let's create a function that takes a station identifier, start and end time, a meteorological field and returns the JSON object as a Python dictionary data structure. We will draw upon our knowledge from the Basic Input and Output notebook to properly construct the URL. In addition, we will employ the urllib.request module for opening and reading URLs.

But first, we must figure out how to properly format our date with the datetime.strftime() method. This method takes a format identical to the one we employed for strptime(). After some trial and error from the IPython interpreter, we arrive at '%Y%m%d%H%M':

In [8]:
import datetime as dt
print(dt.datetime(2016, 6, 1, 0, 0).strftime('%Y%m%d%H%M'))
201606010000

Armed with this knowledge of how to format the date and time according to the MesoWest API reference, we can write our metar() function:

In [9]:
import urllib.request
import json  # json module to help us with the HTTP response
def metar(icao, starttime, endtime, var):
    """
    Retrieves METAR with the icao identifier, the starttime and endtime
    datetime objects and the var atmospheric field (e.g., "air_temp".)
    Returns a dictionary data structure that mirros the JSON object from
    returned from the MesoWest API.
    """
    fmt = '%Y%m%d%H%M'
    st = starttime.strftime(fmt)
    et = endtime.strftime(fmt)
    url = "http://api.mesowest.net/v2/stations/timeseries?"\
        "stid={}&start={}&end={}&vars={}&token=demotoken"
    reply = urllib.request.urlopen(url.format(icao, st, et, var))
    return json.loads(reply.read().decode('utf8'))

We can now try out our new metar function to fetch some air temperature data.

In [10]:
import datetime as dt
pasy = metar("pasy", dt.datetime(2016, 6, 1, 0, 0),
             dt.datetime(2016, 6, 1, 6, 0), "air_temp")
print(pasy)
{'SUMMARY': {'DATA_QUERY_TIME': '5.33509254456 ms', 'METADATA_RESPONSE_TIME': '52.8740882874 ms', 'DATA_PARSING_TIME': '0.19907951355 ms', 'TOTAL_DATA_TIME': '5.53607940674 ms', 'NUMBER_OF_OBJECTS': 1, 'FUNCTION_USED': 'time_data_parser', 'RESPONSE_CODE': 1, 'RESPONSE_MESSAGE': 'OK'}, 'UNITS': {'air_temp': 'Celsius'}, 'STATION': [{'MNET_ID': '1', 'STID': 'PASY', 'STATUS': 'ACTIVE', 'ID': '12638', 'OBSERVATIONS': {'date_time': ['2016-06-01T00:56:00Z', '2016-06-01T01:26:00Z', '2016-06-01T01:56:00Z', '2016-06-01T02:40:00Z', '2016-06-01T02:56:00Z', '2016-06-01T03:56:00Z', '2016-06-01T04:45:00Z', '2016-06-01T04:56:00Z', '2016-06-01T05:56:00Z'], 'air_temp_set_1': [8.3, 8.0, 8.3, 8.0, 7.8, 7.8, 7.0, 7.2, 7.2]}, 'LATITUDE': '52.71667', 'ELEVATION': '98', 'SENSOR_VARIABLES': {'air_temp': {'air_temp_set_1': {'end': '', 'start': ''}}, 'date_time': {'date_time': {}}}, 'LONGITUDE': '174.11667', 'NAME': 'Shemya, Eareckson AFB', 'STATE': 'AK', 'PERIOD_OF_RECORD': {'end': '', 'start': ''}, 'TIMEZONE': 'America/Adak'}]}

The data are returned in a nested data structure composed of dictionaries and lists. We can pull that data structure apart to fetch our data. Also, observe that the times are returned in UTC according to the ISO 8601 international time standard.

In [11]:
print(pasy['STATION'][0]['OBSERVATIONS'])
{'date_time': ['2016-06-01T00:56:00Z', '2016-06-01T01:26:00Z', '2016-06-01T01:56:00Z', '2016-06-01T02:40:00Z', '2016-06-01T02:56:00Z', '2016-06-01T03:56:00Z', '2016-06-01T04:45:00Z', '2016-06-01T04:56:00Z', '2016-06-01T05:56:00Z'], 'air_temp_set_1': [8.3, 8.0, 8.3, 8.0, 7.8, 7.8, 7.0, 7.2, 7.2]}

We could continue with this exercise by parsing the returned date time strings in to datetime objects, but we will leave that exercise to the reader.

Calculating Coastal Tides with the timedelta Class

Let's suppose we are looking at coastal tide and current data perhaps in a tropical cyclone storm surge scenario. The lunar day is 24 hours, 50 minutes with two low tides and two high tides in that time duration. If we know the time of the current high tide, we can easily calculate the occurrence of the next low and high tides with the timedelta class. (In reality, the exact time of tides is influenced by local coastal effects, in addition to the laws of celestial mechanics, but we will ignore that fact for this exercise.)

The timedelta class is initialized by supplying time duration usually supplied with keyword arguments to clearly express the length of time. Significantly, you can use the timedelta class with arithmetic operators (i.e., +, -, *, /) to obtain new dates and times as the next code sample illustrates. This convenient language feature is known as operator overloading and again illustrates Python's batteries-included philosophy of making life easier for the programmer. (In another language such as Java, you would have to call a method significantly obfuscating the code.) Another great feature is that the difference of two times will yield a datetime object. Let's examine all these features in the following code block.

In [12]:
import datetime as dt
high_tide = dt.datetime(2016, 6, 1, 4, 38, 0)
lunar_day = dt.timedelta(hours=24, minutes=50)
tide_duration = lunar_day / 4
next_low_tide = high_tide + tide_duration
next_high_tide = high_tide + (2 * tide_duration)
tide_length = next_high_tide - high_tide
print("The time between high and low tide is {}.".format(tide_duration))
print("The current high tide is {}.".format(high_tide))
print("The next low tide is {}.".format(next_low_tide))
print("The next high tide  {}.".format(next_high_tide))
print("The tide length is  {}.".format(tide_length))
print("The type of the 'tide_length' variable is {}.".format(type(
    tide_length)))
The time between high and low tide is 6:12:30.
The current high tide is 2016-06-01 04:38:00.
The next low tide is 2016-06-01 10:50:30.
The next high tide  2016-06-01 17:03:00.
The tide length is  12:25:00.
The type of the 'tide_length' variable is <class 'datetime.timedelta'>.

In the last print statement, we use the type() built-in Python function to simply illustrate the difference between two times yields a timedelta object.

Dealing with Time Zones

Time zones can be a source of confusion and frustration in geoscientific data and in computer programming in general. Core date and time libraries in various programming languages inevitably have design flaws (Python is no different) leading to third-party libraries that attempt to fix the core library limitations. To avoid these issues, it is best to handle data in UTC, or at the very least operate in a consistent time zone, but that is not always possible. Users will expect their tornado alerts in local time.

What is UTC?

UTC is an abbreviation of Coordinated Universal Time and is equivalent to Greenwich Mean Time (GMT), in practice. (Greenwich at 0 degrees longitude, is a district of London, England.) In geoscientific data, times are often in UTC though you should always verify this assumption is actually true!

Time Zone Naive Versus Time Zone Aware datetime Objects

When you create datetime objects in Python, they are so-called "naive" which means they are time zone unaware. In many situations, you can happily go forward without this detail getting in the way of your work. As the Python documentation states: "Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality". However, if you wish to convey time zone information, you will have to make your datetime objects time zone aware. In order to handle time zones in Python, you will need the third-party pytz module whose classes build upon, or "inherit" in OO terminology, from the tzinfo class. You cannot solely rely on the Python Standard Library unfortunately. Here, we create time zone naive and time zone aware datetime objects:

In [13]:
import datetime as dt
import pytz
naive = dt.datetime.now()
aware = dt.datetime.now(pytz.timezone('US/Mountain'))
print("I am time zone naive {}.".format(naive))
print("I am time zone aware {}.".format(aware))
I am time zone naive 2016-06-27 18:54:19.249738.
I am time zone aware 2016-06-27 12:54:19.264678-06:00.

The pytz.timezone() method takes a time zone string and returns a tzinfo object which can be used to initialize the time zone. The -06:00 denotes we are operating in a time zone six hours behind UTC.

If you have data that are in UTC, and wish to convert them to another time zone, Mountain Time Zone for example, you will again make use of the pytz module. First, we will create a UTC time with the utcnow() method which inexplicably returns a time zone naive object so you must still specify the UTC time zone with the replace() method. We then create a "US/Mountain" tzinfo object as before, but this time we will use the astimzone() method to adjust the time to the specified time zone.

In [14]:
import datetime as dt
import pytz
utc = dt.datetime.utcnow().replace(tzinfo=pytz.utc)
print("The UTC time is {}.".format(utc.strftime('%B %d, %Y, %-I:%M%p')))
mountaintz = pytz.timezone("US/Mountain")
ny = utc.astimezone(mountaintz)
print("The 'US/Mountain' time is {}.".format(ny.strftime(
    '%B %d, %Y, %-I:%M%p')))
The UTC time is June 27, 2016, 6:54PM.
The 'US/Mountain' time is June 27, 2016, 12:54PM.

We also draw upon our earlier knowledge of the strftime() method to format a human-friendly date and time string.