Guerrilla traffic speed monitoring to inform and push for change

(or: using open source software and readily available tools to see how fast cars move on Forbes Ave)

October was a tragic month for the Oakland community in Pittsburgh. Two pedestrians and a cyclist were killed in car crashes within four days of each other. While the collisions are still under investigation, I have a strong suspicion that speed was a major factor in both. The survivability of a crash decreases very dramatically as speed increases from 20 to 40 miles per hour [1].

There's been a number of calls for traffic calming measures along Forbes Avenue, which is one of the few routes east for both cars and cyclists. I was curious what the current average traffic speed is, and if we could strengthen these calls with some real data. Even though there's no public data and I don't have a RADAR gun, it's possible to collect this myself with just a high vantage point, a cell phone video, and basic computer vision techniques.

The stretch of Forbes Avenue that I'd like to focus on is between Pitt and CMU: Overview map of Forbes Ave, courtesy of Google Maps

It's here that the road and surrounding area "opens up." The previous 8-10 blocks go through the tightly packed Oakland business district, with timed traffic lights (matching the 25MPH speed limit) at every intersection. Once you reach Schenley Plaza, though, the buildings recede away and there's a sense of freedom. I also believe that the timings of lights change at this point, too, allowing you to exceed 25MPH for the first time since the highway exit. Once you get to the Natural History museum, a fourth travel lane is added on the left and the right lane turns into an unmarked, 20ft wide luxury lane. The right side of this lane is intended as a bus stop, but the lack of markings makes it a bit of a free-for-all when there aren't any busses. I believe that all these things contribute to an overall increase in speed.

I don't have a RADAR gun, but I do have a cell phone and access to the Cathedral of Learning. At 3:30pm on Friday afternoon, I recorded 10 minutes of traffic on Forbes Ave. Here's a snippet of what this looked like: Two second snippet of the raw footage

You can see Dippy the Dino on the top left, with Schenley Plaza on the top right, and the intersection with Schenley Drive Extension in between. Unfortunately the trees obscure the section of the road where I think traffic moves the fastest, but there's a great view of about 300ft of the road. I rotated and cropped the image, used basic image processing techniques to detect objects and their locations, converted pixels to meters, and computed their speeds. The full analysis is documented below. It worked surprisingly well: The same two second snippet, but rotated, cropped, and annotated with speeds in MPH

Analysis

I'll use Julia v0.4 and a bunch of Julia packages to do this analysis, but the concepts are applicable in any language. Unless otherwise noted, all code is copyright 2015 Matt Bauman, available for use with attribution under the MIT license. All videos and images are similarly copyright 2015 Matt Bauman, available for use with attribution under the Creative Commons Attribution 4.0 International License (CC-BY).

First, there's some setup. I try to make use of as many existing packages as possible. I also define a few helper utilities up front.

In [ ]:
using Images, FixedPointNumbers, ImageMagick, Colors, Gadfly, DataFrames, ProgressMeter
import VideoIO
In [2]:
# Let's create a GIF to display a snippet of the raw footage. There aren't any (to my knowledge) native
# Julia libraries to work with GIFs, but we have ImageMagick installed through BinDeps, which uses Homebrew
# since I'm on a Mac.  So let's just create a simple helper function to shell out to the `convert` binary.

# Inspired by Tom Breloff's animated plots: https://github.com/tbreloff/Plots.jl/blob/master/src/animation.jl
immutable GIF
    data::Vector{UInt8}
end
import Homebrew
"""
    animate(f, n; fps=20, width)

Call function `f` repeatedly, `n` times. The function `f` must take one argument (the frame number),
and it must return an Image for that frame.  Optionally specify the number of frames per second
and a width for proportional scaling (defaults to the actual width).
"""
function animate(f, n; fps = 20, width=0)
    mktempdir() do dir
        for i=1:n
            img = f(i)
            frame = width > 0 ? Images.imresize(img, (width, floor(Int, width/size(img, 1) * size(img, 2)))) : img
            Images.save(@sprintf("%s/%06d.png", dir, i), frame)
        end
        speed = round(Int, 100 / fps)
        run(`$(Homebrew.brew_prefix)/bin/convert -delay $speed -loop 0 $dir/*.png $dir/result.gif`)
        return GIF(open(readbytes, "$dir/result.gif"))
    end
end
Base.writemime(io::IO, ::MIME"text/html", g::GIF) = write(io, "<img src=\"data:image/gif;base64,$(base64encode(g.data))\" />")
Base.write(io::IO, g::GIF) = write(io, g.data)
Out[2]:
write (generic function with 71 methods)
In [3]:
# The VideoIO library is really great, but it's missing a random access seeking API.
# This should eventually be pushed upstream (https://github.com/kmsquire/VideoIO.jl/issues/30)
function Base.seek(s::VideoIO.VideoReader, time, video_stream=1)
    pCodecContext = s.pVideoCodecContext
    seek(s.avin, time, video_stream)
    VideoIO.avcodec_flush_buffers(pCodecContext)
    s
end
function Base.seek(avin::VideoIO.AVInput, time, video_stream = 1)
    # AVFormatContext
    fc = avin.apFormatContext[1]

    # Get stream information
    stream_info = avin.video_info[video_stream]
    seek_stream_index = stream_info.stream_index0
    stream = stream_info.stream
    time_base = stream_info.codec_ctx.time_base
    ticks_per_frame = stream_info.codec_ctx.ticks_per_frame
    
    # Seek
    ret = VideoIO.av_seek_frame(fc, seek_stream_index, Int(div(time*time_base.den, time_base.num*ticks_per_frame)), VideoIO.AVSEEK_FLAG_ANY)

    ret < 0 && throw(ErrorException("Could not seek to start of stream"))

    return avin
end
# While we're at it, It's very handy to know how many frames there are:
Base.length(s::VideoIO.VideoReader) = s.avin.video_info[1].stream.nb_frames
Out[3]:
length (generic function with 132 methods)
In [4]:
# So now we can load our video, seek to a spot with some nice action, and create a GIF for display
io = VideoIO.open("IMG_2399.MOV")
f = VideoIO.openvideo(io)

seek(f, 5*60+18)
gif = animate(60, fps=30, width=450) do _
    read(f, Image)
end
open("movieclip.gif", "w") do f
    write(f, gif)
end
gif
# While it's handy to embed gifs into the notebook when working interactively,
# it makes the notebook too big to render online. So instead, just point to the saved file.
display("text/html", """<img src="assets/movieclip.gif" />""")

Selecting the region of interest

The very first step in image processing is to define the region of interest. This is often done just by cropping and manually selecting the pixels you're interested in looking at. But in our case we can make life a lot easier if we also rotate the image so the cars just travel along one axis.

Rotating an image is inherently an interpolation-like process. The naive way to rotate an image is to move the locations of each pixel, but the new locations won't end up at integer coordinates. In order to display the image on the screen, you need to interpolate the value of each new pixel from the nearby rotated pixels. This is hard and requires lots of bookkeeping. The easy way to rotate an image is to tilt your head the opposite direction. Or less facetiously, you can instead rotate the indices into the image the opposite direction. This is the approach that AffineTransforms.jl takes, with support for all sorts of transformations. Coupled with Interpolations.jl, this allows for fast and robust lazy transformations.

In [5]:
using Interpolations, AffineTransforms

"""
Rotate and crop a matrix by the angle θ.

Optional arguments:
* region - a tuple of two arrays that specify the section of the rotated image to return; defaults to the unrotated viewport
* fill - the value to use for regions that fall outside the rotated image; defaults to zero(T)
"""
function rotate_and_crop{T}(A::AbstractMatrix{T}, θ, region=(1:size(A, 1), 1:size(A, 2)), fill=zero(T))
    etp = extrapolate(interpolate(A, BSpline(Linear()), OnGrid()), fill)
    R = TransformedArray(etp, tformrotate(θ))
    Base.unsafe_getindex(R, region[1], region[2]) # Extrapolations can ignore bounds checks
end

# While the above will work for images, it may iterate through them inefficiently depending on the storage order
rotate_and_crop(A::Image, θ, region) = shareproperties(A, rotate_and_crop(A.data, θ, region))
INFO: Recompiling stale cache file /Users/mbauman/.julia/lib/v0.4/AffineTransforms.ji for module AffineTransforms.
Out[5]:
rotate_and_crop (generic function with 4 methods)
In [6]:
# This is what we actually want: A rotated and cropped image that just shows the unobstructed section of Forbes Ave:
img = read(f, Image)
rotate_and_crop(img, 0.321, (721:1821,24:201))
Out[6]:
In [7]:
# This gets called often, so let's optimize it a little bit.  Instead of just 
# using read, I use the internal `retrieve!` with a pre-allocated buffer.
# This is safe since I know it's getting rotated and discarded immediately
const _buffer = Array{UInt8}(3, size(img.data, 1), size(img.data, 2))
function readroi(f::VideoIO.VideoReader)
    VideoIO.retrieve!(f, _buffer)
    # _buffer is a 3-dimensional array (color x width x height), but by reinterpreting
    # it as RGB{UFixed8}, it becomes a matrix of colors that we can rotate
    Image(rotate_and_crop(reinterpret(RGB{UFixed8}, _buffer), 0.321, (721:1821,24:201)), Dict("spatialorder"=>["x","y"]))
end
Out[7]:
readroi (generic function with 1 method)

Object detection

Now that we have our region of interest, we want to identify the vehicles. The first step is to find a frame without any vehicles — this will define the background. We just want to discard everything in the background.

In [8]:
seek(f, (2*60+40.5))
background = readroi(f)
Out[8]:

Great! We can now go back to the beginning of the movie, and subtract the background from it! Pixels that are close in color to the background will be black, whereas new objects in the frame will have a different color value from the background and therefore be brighter (or maybe negative, which is rather non-sensical for a color).

In [9]:
seekstart(f)
# To subtract the background, first convert both to RGB{Float32} images.  Subtracting RGB{UFixed8}s
# is problematic because they are just unsigned 8-bit integers. So instead of going negative, they
# *wrap around* to the maximum value. Using floating point numbers to represent the colors fixes this:
img = readroi(f)
convert(Image{RGB{Float32}}, img) - convert(Image{RGB{Float32}}, background)
Out[9]:

You can somewhat see the four cars here. This is a "color" image, but we don't really care what colors the things are -- we just want the maximum deviation from the background. To do this, we can take the absolute value of each color and sum them all together:

In [10]:
# Absolute value is defined for RGB colors, but it's a little wonky -- it's the *sum* of the absolute values
# of the components. It is exactly what we want, but it's not defined for arrays of RGBs, so we add that definition here:
@vectorize_1arg AbstractRGB Base.abs
grayim(abs(convert(Image{RGB{Float32}}, img) - convert(Image{RGB{Float32}}, background)))
Out[10]: