Introduction

This iPython notebook walks you through uploading photos saved in your local filesystem to Fluxtream. The photos you upload will appear in both the Fluxtream Calendar and BodyTrack apps as belonging to the FluxtreamCapture connector.

This notebook was developed using the example of uploading photos taken by an iPhone and uploaded to a mac using iPhoto. It should work to upload other photos which contain EXIF data, but the process of calculating what timestamp to use may need to be modified to take advantage of other variations on encoding timezone context into EXIF headers.

If you are new to iPython notebooks, the main website with to learn about them ishere.

You will need to install python and iPython notebook on your local system, run a local ipython kernel, and make a local copy of this notebook to be able to execute and modify the code below.

Install instructions for iPython Notebook are here. At least on osx systems, after installation you can start the server by going to Terminal and calling 'ipython notebook'. This will start a local web server talking to an iPython kernel and open a root IP[y] page.

The directory where you start the iPython Notebook server will be treated as iPython's home directory. New and imported notebooks will be put in that directory, so it's good to remember where you started the server from and be consistent about it.

Once you have IP[y] generally working on your system, here's a brief intro in how to use it:

  • A green outline shows the currently selected cell.
  • Select a different cell by clicking on it.
  • Execute the currently selected cell by either clicking the play button on the icon bar at the top, selecting Cell/Run from the menu bar, or by using the keyboard shortcut Shift-Return.

When a given cell is executed, it may print output which appears below the cell, and the cursor will continue to the next cell. If the next cell is tall, you might need to scroll back up to see the previous cell's output.

Each cell in this notebook pertains to a particular step in the process, topic, or action requiring your input and contains comments at the top saying what it's about and what you should do.

Cells that require customization for your own setup start with "Modify". These include cells where you configure what range of photos in which directory to puload, and what default time zone to use. These require some thought about which sets of data you want to see in Fluxtream and how you want it to appear. See the info below about timestamps and timezones in considering how to approach time issues when uploading your photos.

Cells that define functions or do other things that don't require user input or modification generally just start with "Execute". These can just be executed without much consideration, though you may want to go back later to understand or modify them.

Please enjoy, tinker, modify, etc. Feel free to contact [email protected] if you have questions.

Timestamps and timezones</H2>

It's necessary to calculate the absolute time of a photo to have it combine properly with other data. The Fluxtream photo upload API uses epoch timestamps, aka unix time, which is the number of seconds since midnight 1/1/1970 in UTC (aka GMT). Converting local time to epoch time requires knowing what timezone to use for interpreting the local time. A good web site that helps convert date/time stamps back and forth between local time + timezone and epoch time is http://www.epochconverter.com/.

Sadly, EXIF timestamps in photos historically only recorded local time without timezone information. This dates back to the days when cameras were stand-alone devices that uniformly didn't have a concept of timezone. It was up to the individual human using the camera to set the clock, and remember to change it (or not) for travel and daylight savings time. Converting these timestamps to epoch time requires an additional source of information for deciding which timezone to use.

Some newer devices, such as iPhones, do have a model for what timezone they are in at any given point in time. However, they (or at least their firmware or photo capture apps) are not consistent about how or if they record the timezone they used to generate a given local timestamp in the EXIF. These inconsistentcies happen between different devices, different versions of the same device, firmware, or app, and different environmental conditions for a given device/firmware/app instance.

Even worse, some methods of transferring photos that did, at some point, contain valid timezone information strip that info off and throw it away. For example, while a photo is in the camera roll on an iOS device it has a beautiful, high precision, unambigous ephoch timestamp. All the existing methods I can find for transferring photos from iOS devices lose those high precsion timestamps. Many of the transfer methods also strip off info that could be used to recover the timezone. Both Flickr and Picasa suffer from these problems. Their APIs only report local time without time zone. Various forums online document problems people have with various combinations of devices, apps, uploaders, etc. corrupting the photo timestamps.

A while back, we added photo upload capability to the FluxtreamCapture iOS app to try to make a sustainable method for people with iOS devices to stream photos to Fluxtream with the timestamps properly preserved. Unfortunately there are still hurdles we haven't gotten over that block us from making it available on the app store, not everyone has the resources to build an iOS app from sources, and even with TestFlight the difficulty of dealing with Apple provisioning limits its impact. However, even if the FluxtreamCapture iOS app availability issues weren't a problem, that still doesn't help with photos captured by other methods, and for those sitting around in people's filesystems.

There's really no universally correct solution to this problem of timezone ambiguity of existing photos. However, a given individual facing a given directory of photos they took at some point in the past can potentially do the detective work to set reasonable defaults for particuar ranges of photos. For example, I have old directories of photos taken during particular trips and remember, or can dig up, enough context to know what timezone they were taken in. I know that christmas pictures are from CST (Texas in December), the honeymoon pictures are from PDT (Hawaii in April, but we decided to keep to PDT on that trip so we could sleep late and still see the sun rise), and when it was that I moved from Central to Eastern to Pacific and back to Eastern time zones. I also know my own aesthetics for the tradeoff between tedious perfection and broad, generally correct brush strokes.

The problem is how to combine this personal contextual debugging capability and aesthetics with particular batches of images I'd like to upload. It would be great to have an app with a great user experience that let me set unambiguous timezone info on my directories full of photos. However, if it exists I haven't seen it, and I certainly am not up to creating such a thing. The best solution I can come up with is to leverage the API FluxtreamCapture uses to upload photos with absolute timestamps, provide this iPython notebook, implement and describe a default strategy, and let you modify the behavior if the default is not what you want.

The function upload_photo_range(start_photo, end_photo) can upload a range of consecutive photos in increasing alphanumeric order from within a given directory. It takes as arguments absolute paths to the first and last photos, and requires that the directories match.

For each photo, it calls the function image_creation_time(filename). If a given photo includes unambigous timezone information that the function knows how to recognize, it will use that timezone and store that timezone in last_tz to use as the default for the next ambigous one. If it doesn't recognize an unambiguous timezone for a given photo, it will use last_tz if it is set. Otherwise it will default to default_tz, which you set manually.

The only EXIF tag that the image_creation_time function currently recognizes to disambiguate the timezone is 'GPS GPSTimeStamp' field, which is the one used by iOS for photos taken while the GPS is on. The iOS photos that I've processed so far which lack this tag are the ones I took in Airplane Mode. Those photos have no GPS EXIF tags, and I can't find any other tags on those photos to disambiguate the timezone. However, if, like me, you don't modify the timezone manually in flight and wait until you land and get the timezone from the cell tower, the last_tz heuristic should work pretty well. If you have photos from another source that uses other tags that could disambiguate the timezone, please modify the image_creation_time function and send the upgrade and, ideally, a sample image to [email protected].

To test what would happen for a particular range of images without uploading, you can execute the cell that starts "Optionally use this block to disable upload", then execute the path setting and upload block. Once you're happy with what would happen, you can execute the cell that starts "Optionally use this block to re-enable upload", then execute the path setting and upload block and it will perform the upload. If you do neither, then by default it will upload and be less verbose about timezones.

What I'm doing is skimming each directory of images I want to upload and deciding if they're probably from one timezone or not. In the former case, I set default_tz appropriately, set last_tz=None, set start_photo and end_photo to the first and last photos in the directory, and let it go. In the latter case, I'd execute the "Optionally use this block to disable upload" cell, see if I agree where the timezone transitions happen (I mostly record travel that would cross timezones in my Google Calendar), and potentially break up the directory into multiple sets of start_photo and end_photo with appropriate default_tz values to make the right thing happen.

Note that as you process each block, if you want you can make multiple copies of the commands in the path setting and upload block, strip out the comments to make it more compact, annotate why you chose the timezone in each block, and leave the cells around for documentation and repeatability.

Note that uploading a photo multiple times with identical time values will safely ignore subsequent uploads. However, if you upload the same image multiple times with different timezones you will end up with duplicates. There is no API or user interaction component in Fluxtream that allows the deletion of a previously-uploaded image. If you upload images and later regret it, please send the info about your situation, including your Fluxtream username, guest ID, and the details of what range of photos you want deleted to [email protected]. You can get your Guest ID by doing the step below to set up your Fluxtream credentials and looking at the value of fluxtream_guest_id. It's not obvious to me what the right move would be for adding API support for deletion. In the case of an oops, input about what you wish the API supported, and how you feel about how to handle the other possible oops of accidental deletion, would be appreciated.

Setup for uploading to Fluxtream

In [1]:
# Execute this cell to define the functions for calling the Fluxtream upload API for the 
# credentials entered below
import json, subprocess

# By default, the upload function will send data to the main server at fluxtream.org.  
# If you want to have this use a different fluxtream server, change it here
# and make sure the username and password entered below are valid on that server.
global fluxtream_server
fluxtream_server = "fluxtream.org"

def setup_fluxtream_credentials():
    # Call the Fluxtream guest API, documented at 
    #   https://fluxtream.atlassian.net/wiki/display/FLX/BodyTrack+server+APIs#BodyTrackserverAPIs-GettheIDfortheguest

    # Make sure it works and harvest the Guest ID for future use
    global fluxtream_server, fluxtream_username, fluxtream_password, fluxtream_guest_id

    # Make sure we have fluxtream credentials set properly
    if not('fluxtream_server' in globals() and 
           'fluxtream_username' in globals() and
           'fluxtream_password' in globals()):
        raise Exception("Need to enter Fluxtream credentials before uploading data.  See above.")

    cmd = ['curl', '-v']
    cmd += ['-u', '%s:%s' % (fluxtream_username, fluxtream_password)]
    cmd += ['https://%s/api/guest' % fluxtream_server]

    result_str = subprocess.check_output(cmd)
    #print '  Result=%s' % (result_str)

    try:
        response = json.loads(result_str)

        if 'id' in response:
            fluxtream_guest_id = int(response['id'])
        else:
            raise Exception('Received unexpected response %s while trying to check credentials for %s on %s' % (response, 
                                                                                                            fluxtream_username, 
                                                                                                            fluxtream_server))

        print 'Verified credentials for user %s on %s work. Guest ID=%d' % (fluxtream_username, fluxtream_server, fluxtream_guest_id)
    except:
        print "Attempt to check credentials of user %s failed" % (fluxtream_username)
        print "Server returned response of: %s" % (result_str)
        print "Check login to https://%s works and re-enter your Fluxtream credentials above" % (fluxtream_server)
        raise
In [9]:
# Execute and fill in the fields below to set your Fluxtream credentials.  

from IPython.html import widgets # Widget definitions
from IPython.display import display # Used to display widgets in the notebook

def set_fluxtream_password(this):
    global fluxtream_username, fluxtream_password
    fluxtream_username = fluxtream_username_widget.value
    fluxtream_password = fluxtream_password_widget.value
    fluxtream_password_widget.value = ''
    setup_fluxtream_credentials()

    print "To make persistent for future restarts, insert a cell, paste in:"
    print ""
    print "global fluxtream_username, fluxtream_password"
    print "fluxtream_username = \"%s\"" % (fluxtream_username)
    print "fluxtream_password = \"xxx\""
    print "setup_fluxtream_credentials()"
    print ""
    print "replace xxx with your password, and execute that cell instead."
    print "Only do this if you're keeping this copy of your iPython notebook private,"
    print "and remove that cell before sharing"    
    
display(widgets.HTMLWidget(value='Fluxtream Username'))
fluxtream_username_widget = widgets.TextWidget()
display(fluxtream_username_widget)
display(widgets.HTMLWidget(value='Fluxtream Password'))
fluxtream_password_widget = widgets.TextWidget()
display(fluxtream_password_widget)

set_fluxtream_login_button = widgets.ButtonWidget(description='Set Fluxtream credentials')
set_fluxtream_login_button.on_click(set_fluxtream_password)
display(set_fluxtream_login_button)

# Enter Fluxtream username and password and click "Set Fluxtream credentials" button.  
# Password field will blank afterwards, but variables will be set

Photo handling

In [4]:
# Execute this cell to define the functions for handling and uploading photos.  

# If no package exifread, try "pip install exifread" in Terminal
import datetime, exifread, glob, json, os, pprint, subprocess
from dateutil import tz

# Read image creation as a fully-specified time -- localtime, plus timezone
# Requires GPSTimeStamp to deduce offset between localtime and UTC
# TODO: consider letting user pass in the name of a timezone for photos where GPSTimeStamp is unavailable
def image_creation_time(filename):
    global verbose_tz
    tags = exifread.process_file(open(filename, 'rb'))

    # print pprint.pformat(tags)

    # Get the local time that the image was acquired from the EXIF. EXIF DateTimeDigitized looks like the right 
    # EXIF field for iPhone photos.  The right field name for other cameras may vary.
    acq_local_timestr = str(tags['EXIF DateTimeDigitized'])
    local_creation_time = datetime.datetime.strptime(acq_local_timestr, '%Y:%m:%d %H:%M:%S')

    # Calculate the timezone to use for interpreting local_creation_time.
    # By default, the timezone will be calculated from GPSTimeStamp EXIF fields in each photo.  
    # In the case that a given photo doesn't have a GPSTimeStamp header, it will first try to use the timezone
    # from the last previously processed image that did have one.  
    # In the case where all previous photos (if any) are missing GPSTimeStamp headers, it will use
    # default_tz set in the next cell below.
    photo_tz = None
    global last_tz
    
    if 'GPS GPSTimeStamp' in tags:
        # Get the UTC time that the image was acquired from the EXIF. Using the hour from GPS GPSTimeStamp looks like the right 
        # EXIF field to use for iPhone photos, at least for integer TZ offsets (Iran and other places use 
        # non-integral-hour TZ offsets; this won't work there).  The right approach for other cameras may vary.
        gps_creation_time = tags['GPS GPSTimeStamp']

        # The format of the GPS GPSTimeStamp field is weird.  Used the _timestamp_in function in 
        # http://www.sethoscope.net/geophoto/geoexif.py to figure out how to extract the hour.
        gps_creation_hr = gps_creation_time.values[0].num

        # Find the hour of TZ offset from UTC.  This is + for times ahead of UTC, such as most of Europe/Asia, and - for 
        # times behind UTC, such as North and South America
        tz_hr_offset = local_creation_time.hour - gps_creation_hr
        if tz_hr_offset <= -12:
            tz_hr_offset += 24
        elif tz_hr_offset > 12:
            tz_hr_offset -= 24
    
        # print 'tz_hr_offset is %s' % tz_hr_offset
        photo_tz = tz.tzoffset(None, tz_hr_offset * 3600)
        
        if verbose_tz:
            print "{0}: using GPSTimeStamp, UTC {1:+d}, timestamp={2}".format(os.path.basename(filename), 
                                                                    tz_hr_offset,
                                                                    local_creation_time.replace(tzinfo=photo_tz).isoformat())
        # Hold onto this timezone to use by default for the next photo we process.
        # Note that this will persist between runs, so if you're starting on a 
        # new batch of photos you might want to reset last_tz manually
        last_tz = photo_tz
    elif 'last_tz' in globals() and last_tz:
        photo_tz = last_tz
        print "{0}: using last_tz={1}, timestamp={2}".format(os.path.basename(filename), last_tz,
                                                            local_creation_time.replace(tzinfo=photo_tz).isoformat())
    else:
        photo_tz = default_tz
        print "{0}: using default_tz={1}, timestamp={2}".format(os.path.basename(filename), default_tz,
                                                               local_creation_time.replace(tzinfo=photo_tz).isoformat())
        
    # Attach discovered timezone offset to local_creation_time
    local_creation_time = local_creation_time.replace(tzinfo=photo_tz)

    # print 'Local creation time with timezone is %s' % local_creation_time
    # print 'Creation time as UTC is %s' % local_creation_time.astimezone(tz.tzutc())
    return local_creation_time

def epoch_time(dt):
    epoch = datetime.datetime(1970, 1, 1, tzinfo=tz.tzutc())
    return (dt - epoch).total_seconds()    

# Assumes fluxtream_username and fluxtream_password variables are set
def upload_photo(photo_path):
    global verbose_tz, do_upload

    # Calculate the time stamp for this photo.  See Timestamps and timezones section at the top of the notebook for 
    # details.
    creation_time = image_creation_time(photo_path)

    # If upload is disabled, just print timezone info
    if do_upload:
        cmd = ['curl', '-v']
        cmd += ['-u', '%s:%s' % (fluxtream_username, fluxtream_password)]
        cmd += ['--form', 'metadata={"capture_time_secs_utc":%.3f}' % epoch_time(creation_time)]
        cmd += ['--form', '[email protected]%s' % photo_path]
        cmd += ['http://%s/api/bodytrack/photoUpload?connector_name=fluxtream_capture' % fluxtream_server]

        print 'Uploading %s, created %s (%d bytes) to %s' % (os.path.basename(photo_path), creation_time, os.stat(photo_path).st_size, fluxtream_server)

        result_str = subprocess.check_output(cmd)
        # print '  Result=%s' % (result_str)
        response = json.loads(result_str)

        if response['result'] != 'OK':
            raise Exception('Received non-OK response %s while trying to upload %s' % (response, photo_path))

        print '   Success'
    elif not verbose_tz:
        print 'Skipping upload of %s, created %s' % (os.path.basename(photo_path), creation_time)
    
# Upload a range of photos.  Assumes start and end photos are in the same directory
def upload_photo_range(start_photo,end_photo):
    for photo in sorted(glob.glob(os.path.dirname(start_photo) + '/*' + os.path.splitext(start_photo)[1])):
        if start_photo <= photo and photo <= end_photo:
            upload_photo(photo)
            
    print "Done!"

# Set default to do upload and not do verbose timezone reporting
global verbose_tz, do_upload

do_upload=True
verbose_tz=False
In [5]:
# Optionally use this block to disable upload if you want to test what time stamps would be used for a given batch of photos before
# committing to upload
global verbose_tz, do_upload

do_upload=False
verbose_tz=True
In [7]:
# Optionally use this block to re-enable upload if you tested previously and are happy to upload now
global verbose_tz, do_upload

do_upload=True
verbose_tz=False
In [8]:
# Modify the start and end photo paths below, set default_tz and run.  Start and end photos need to be in the same directory.
# See Timestamps and timezones section at the top of the notebook for details about time handling strategy.
    
# After setting the photo paths, and potentially setting the default_tz below, execute.  
# This should print a couple lines after each upload (each of which takes a while), then print Done!
start_photo = "/Users/anne/Pictures/iPhoto Library/Masters/2014/05/17/20140517-061939/IMG_8555.JPG"
end_photo = "/Users/anne/Pictures/iPhoto Library/Masters/2014/05/17/20140517-061939/IMG_8556.JPG"

# By default, the timezone will be calculated from GPSTimeStamp EXIF fields in each photo.  
# In the case that a given photo doesn't have a GPSTimeStamp header, it will first try to use the timezone
# from the last previously processed image that did have one (last_tz).  
# In the case where all previous photos (if any) are missing GPSTimeStamp headers, use a global default set below.
# The list of available timezones is at: http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
default_tz = tz.gettz("America/New_York")

# Clear last_tz so that the last unambigous timezone from the end of the last batch doesn't affect a potentially 
# ambigous timezone at the start of this one
last_tz=None

# Upload the photos
upload_photo_range(start_photo, end_photo)
Uploading IMG_8555.JPG, created 2014-05-16 15:46:24-04:00 (994351 bytes) to fluxtream.org
   Success
Uploading IMG_8556.JPG, created 2014-05-16 15:48:41-04:00 (1056126 bytes) to fluxtream.org
   Success
Done!
In [ ]: