1. Introduction

This project, National Park Optimization, aims to determine an optimal route to visit a subset of National Parks using an RV, while minimizing fuel cost on the trip across the United States.

Currently, there are 59 U.S. National Parks scattered across the country, and they are often visited as vacation destinations; just last year, there were 331 million reported recreational visits to the National Parks [1]. Visitation data has been collected since 1996, and based on this data, the top 10 most popular National Parks are as follows: Great Smoky Mountains, Grand Canyon, Yosemite, Olympic, Yellowstone, Rocky Mountain, Cuyahoga Valley, Zion, Grand Teton, and Acadia. The chart below shows the average annual visits to each of the top ten national parks since 1996. As you can see, this subset of National Parks, like the rest, are located in states all across the country [2].

fixit flowchart

Because some of these parks are in relatively remote parts of the country, it can be advantageous to travel to these parks on road networks. If multiple parks are to be visited during a single trip, it is important to choose an optimal route between the parks to minimize the total distance traveled, which influences time spent and fuel consumed. This will guarantee that travelers get the most out of the trip.

This project will incorporate parameters of distance between 15 National Parks, Madison, Wisconsin, and all TA and Petro fuel stations across the United States. The subset of national parks was chosen based on the above popularity and the preference of the team. All distances were computed in Python using the Google Distance Matrix API, and the fuel price for each station was determined by an 8 month weighted average from TA and Petro fueling data. This 8 month fuel data was analyzed to determine that the most expensive fuel stops always remain the most expensive, and the cheapest fuel stops remain the cheapest, as shown in the figure below. This impacts our model because the output of the recommended fuel stops will remain constant over time, though the cost of the fuel on the trip may vary based on current fuel price.

fuel.JPG

Finally, RV fuel economy and tank size are parameters included in the model, initially determined by an average fuel economy and tank size for a standard RV [3]. In later iterations of the model, the user is able to vary parameters including fuel economy, tank size, and maximum drive time.

This report will walk users through four iterations of the problem, with increasing complexity, in order to provide guidance on the cheapest route for traveling to various national parks while stopping at specific gas stations.

$$ \text{ } "All\ models\ are\ wrong,\ but\ some\ are\ useful." \\-George\ E.\ P.\ Box$$

[4]

2. Mathematical Model

2.A. Model 1: Find the Optimal Route to Visit a National Parks

$\textbf{Model Type: Traveling Salesman Problem (TSP)}$

This model uses a Miller-Tucker-Zemlin formulation of a TSP and seeks to determine an optimal tour between the subset of fifteen national parks chosen for this project, starting in Madison, Wisconsin at the house of a team member. The optimal route is later used as input into the Embedded Minimum Cost Network Flow Model, which will be discribed below.

A later iteration of this model incorporates user input to select a subset of the fifteen National Parks, since realistically, it would be difficult to plan a trip to visit fifteen National Parks at one time. In this variation, the TSP will determine the optimal tour between this subset. This model is the same as the TSP modeled below, with the exception that n does not equal 16; it will be equal to the number of parks input by the user. In this iteration, the user has the choice to start at any National Park in our subset of fifteen, or in Madison.

$\textbf{Parameters and Assumptions:}$

The parameters in this TSP include the number of nodes, N = 16, and the distance between each node $c_{i,j}$, precalculated using the Google Distance Matrix API. It is initially assumed that the start point of this model is Madison, Wisconsin.

$\textbf{Decision Variables:}$

The decision variables are binary decision variables in terms of i and j, in which i represents the origin and j represents the destination. In this case, the user starts in Madison, goes to all 15 National Parks and returns back to Madison, for a total of n = 16 destinations.

$$x_{i,j}= \begin{cases} 1 & \text{if travel from destination i to destination j} \\ 0 & \text{otherwise} \end{cases} $$$$for \ i=1,..,n \ for \ j=1,..,n$$$$u_i \ free \ for \ i=1,..,n$$

$\textbf{Constraints:}$

There must be a flow of 1 into every destination. $$ \text{ } \sum_{i=1}^n x_{i,j}= 1 \text{ } \ \forall j=1,..,n$$

There must be a flow of 1 out of every destination. $$ \text{ } \sum_{j=1}^n x_{i,j}= 1 \text{ } \ \forall i=1,..,n$$

There cannot be flow from a destination to itself. $$ \text{ }x_{i,i}= 0 \text{ } \ \forall i=1,..,n$$

There cannot be any subtours. $$ \text{ } u_i - u_j + Nx_{ij} \le N - 1 \text{ } \ \forall i=1,..,n \ \forall j=2,..,n$$

$\textbf{Objective:}$

$$ Minimize \text{ } \sum_{j=1}^n\sum_{i=1}^n c_{i,j}x_{i,j} $$

where $c_{i,j}$ represents the $i^{th}$ row in and $j^{th}$ column of the cost matrix.

$\textbf{Full Model:}$ $$ \begin{aligned} \underset{x \in \mathbb{R^n} \ u \in \mathbb{R}}{\text{minimize}}\qquad& \text{ } \sum_{j=1}^n\sum_{i=1}^n c_{i,j}x_{i,j} \\ \text{subject to:}\qquad& \text{ } \sum_{i=1}^n x_{i,j}= 1 && j=1,\dots,n\\ & \text{ } \sum_{j=1}^n x_{i,j}= 1 && i=1,\dots,n \\ & \text{ }x_{i,i}= 0 && i=1,\dots,n \\ & \text{ } u_i - u_j + Nx_{ij} \le N - 1 && i=1,\dots,n \ j=2,\dots,n\\ \end{aligned} $$

2.B. Model 2: Determine where to Stop for Refueling the RV

$\textbf{Model Type: Minimum Cost Network Flow}$

This model seeks to find the minimum cost of refueling between any source-sink combination provided to it. The model takes two inputs. The first is provided to it after a solution is found for the TSP described, above; the flow model takes the optimal national park tour order provided from the TSP. The second input is an adjacency matrix describing the relative distances between all TA and Petro fuel stations and national parks in the data set.

The tour provided by the TSP is iteratively looped through in order to obtain the source-sink combinations needed to solve each instance of the flow problem; the source is initially defined as the start point of the TSP, and the sink for this first iteration becomes the first national park stop on the tour. After the first iteration, the source updates to the national park that was the sink in the previous iteration, and the sink then updates to become the next destination on the tour. The model continues to iterate through the tour, replacing the source and sink as the current location and next destination, respectively, until the model returns to the initial start point and terminates

At each iteration, a route between different fuel stations is determined with respect to the fuel prices associated with each station. The fuel stations that are stopped at during each iteration are stored in order to obtain for the entire trip, including both national parks as well as fuel stops.

$\textbf{Parameters and Assumptions:}$

The parameters in this instance of the model include tanks size, RV fuel economy, distance between every combination of fuel stops and national parks, and an 8 month average cost per gallon of fuel at each TA and Petro in the country. The total cost of each arc were calculated by first converting travel distance to gallons required to travel between all nodes, $G_{i,j}$. This was accomplished by assuming a constant fuel economy. The gallons associated with traveling between any two locations was then further converted to a cost, $c_{i,j}$, in dollars by multiplying the cost per gallon associated with a specific fuel station with the number of gallons required to move from place to place. For this problem, I will be defined as i=1..n, and J will be defined as j=1..n.

$\textbf{Decision Variables:}$

Decision variables for the iterative flow problems represent each potential arc from source to sink in the network. All fuel stations are considered for each fuel station, regardless of their geographical location, in order to provide the most robust solution. As a result, there are decision variables for every arc in the entire data set. The flow from the source is fixed to 1, and the demand at the sink is fixed to 1 (represented as -1 in the model); this causes decision variables in the network to take on values of at most 1. The variables are fixed to be nonnegative in order to prevent negative flow across the arcs. The resulting solution yields strictly 1s and 0s for the values of the decision variables by exploiting the fact that network problems yield integer solutions if all values in the objective function and constraint set are also integer. If a decision variable takes on a value of 1, it represents flow moving through it.

$$x_{i,j} \ \ge 0 \ for \ i=1,..,n,\ j=1,..,n $$

$\textbf{Constraints:}$

Flow Balance Constraint: The first constraint set represents a net flow of 0 for every transient node found in the network. This is done in order to ensure that flow is forced all the way from the source to the sink.

$$ \text{ } \sum_{j=1}^J x_{i,j} - x_{j,i} = 0 \text{ } \ \forall \ i \in I = \{2,..,n\} $$

Supply Constraint: The second constraint represents a net supply of 1 coming from the source node. $$ \text{ } \sum_{j=1}^J x_{source,j} - \text{ } \sum_{j=1}^J x_{j,source} = 1 \text{ }$$

Sink Constraint: The third constraint represents a net demand of 1 (represented as -1) at the sink node. $$ \text{ } \sum_{j=1}^J x_{sink,j} - \text{ } \sum_{j=1}^J x_{j,sink} = -1 \text{ }$$

Fuel Constraint: The user cannot use more fuel than the tank can hold without filling up the tank. $G_{i,j}$ is set to 0 in the data pre-processing steps below if the fuel required to travel between nodes exceeds the tank size. This constraint enforces the decision variable for that arc to be 0 if it is not feasible to travel between the nodes, ensuring this arc is not included in the optimal route. $$ \text{ } x_{i,j} \le G_{i,j} \ \forall i \in I \ \forall j \in J$$

$\textbf{Objective:}$

The model seeks to minimize the total costs associated with traveling between the current location and the next destination.

$$ Minimize \text{ } \sum_{j=1}^n\sum_{i=1}^n c_{i,j}x_{i,j} $$

$\textbf{Full Model:}$ $$ \begin{aligned} \underset{x \in \mathbb{R^{n-1}}}{\text{minimize}}\qquad& \text{ } \sum_{j=1}^n\sum_{i=1}^n c_{i,j}x_{i,j} \\ \text{subject to:}\qquad& \text{ } \sum_{j=1}^J x_{source,j} - \text{ } \sum_{j=1}^J x_{j,source} = 1 \\ & \text{ } \sum_{j=1}^J x_{sink,j} - \text{ } \sum_{j=1}^J x_{j,sink} = -1 \\ & \text{ } \sum_{j=1}^J x_{i,j} - x_{j,i} = 0 && i=1,\dots,n\\ & \text{ } x_{i,j} \le G_{i,j} && i \in I \ j\in J\\ \end{aligned} $$

3. Problem Setup

3.A. Destination Nodes

Create an optimal route for an RV road trip to visit the following destinations while minimizing fuel cost:
  1. Madison, WI,
  2. Mesa Verde National Park,
  3. Congaree National Park,
  4. Yellowstone National Park,
  5. Acadia National Park,
  6. Zion National Park,
  7. Yosemite National Park,
  8. Grand Canyon National Park,
  9. Kings Canyon National Park,
  10. Cuyahoga Valley National Park,
  11. Glacier National Park,
  12. Rocky Mountain National Park,
  13. Mammoth Cave National Park,
  14. Theodore Roosevelt National Park,
  15. Crater Lake National Park,
  16. Big Bend National Park

3.B. User Input

Please input the information for your RV.

NOTE: If you change anything in this section, run the entire Data Definition section again before proceeding.


Change the variables below to reflect your average miles per gallon (mpg), size of your gas tank in gallons (tanksize), maximum desired drive time per leg in hours (maxDriveTime), desired starting point (startPoint), and park selection from the list above (selectParks). For startPoint and selectParks, type index numbers from above list. If it is a list of parks (selectParks), please enter each number only once and separate entries by commas.

In [1]:
# NOTE: Please run this model once with the given input data to accurately show Iterations 1 and 2, 
# then change input to observe various cases in Iterations 3 and 4.

# mpg will affect the output of Iterations 2, 3, and 4
mpg = 10 

# tankzise will affect the output Iterations 2, 3, and 4
tanksize = 125

# maxDriveTime will affect the output of Iterations 2, 3, and 4 
# note that the current base case (24 hours) provides an accurate representation of Iteration 2
maxDriveTime = 24

# startPoint will affect the output of Iteration 4 only
startPoint = 7

# selectParks will affect the output of Iteration 4 only
selectParks = [6,13,10,5,16]
;

Are you willing to refuel at National Parks? (enter "yes" or "no" for the decision variable below)

In [2]:
# decision will affect the output of Iterations 2, 3, and 4
# note that the base case of "yes" yields the results of Iteration 2
# changing decision to "no" will acheive the results of Iteration 3
# this was done to reduce the overall length of the code as these two iterations have a very similar structure
decision = "yes"
;

3.C. Data Definition

This section defines all data necessary for solving all iterations of this problem found in the Iterations and Results section below. This data includes necessary vectors, named arrays, matrices, etc. of geographic locations for each destination and gas station, along with the pairwise distance between each item in this network, forming adjacency matrices utilized throughout the problem. One important note is that the overall distance matrix, read in as a .csv file, was pulled from the Google Distance Matrix API so the distances represent true driving distance between all points in the network of destinations and gas stations. To better understand this data extraction procedure, please reference the Python code included in the data folder associated with this model. Costs were calculated using an 8 month average of gas prices for each TA and Petro station across the United States.

In [3]:
# Define the problem data (parks and costs)
using JuMP, NamedArrays, Cbc

parks =

[:"John's House",:"Mesa Verde National Park",:"Congaree National Park",:"Yellowstone National Park",:"Acadia National Park",
:"Zion National Park",:"Yosemite National Park",:"Grand Canyon National Park",:"Kings Canyon National Park",
:"Cuyahoga Valley National Park",:"Glacier National Park",:"Rocky Mountain National Park",:"Mammoth Cave National Park",
:"Theodore Roosevelt National Park",:"Crater Lake National Park",:"Big Bend National Park" ]

gasstations = 

[:"5045",:"5025",:"5036",:"5022",:"5002",:"5024",:"5013",:"5028",:"5037",:"5040",:"5003",:"5030",:"5005",:"5044",:"5020",
:"214",:"5009",:"5004",:"118",:"5039",:"5007",:"5017",:"189",:"5012",:"5047",:"5006",:"5041",:"5001",:"5032",:"186",:"5051",
:"5050",:"319",:"342",:"676",:"785",:"756",:"212",:"729",:"307",:"701",:"317",:"656",:"778",:"5073",:"435",:"433",:"120",
:"709",:"425",:"5014",:"528",:"5034",:"5038",:"5019",:"5048",:"5021",:"5035",:"5015",:"5033",:"424",:"5082",:"178",:"312",
:"502",:"5091",:"322",:"5072",:"439",:"104",:"675",:"755",:"421",:"723",:"713",:"384",:"768",:"310",:"187",:"210",:"217",
:"5071",:"5076",:"5070",:"198",:"5078",:"5018",:"5074",:"6153",:"5079",:"677",:"706",:"5084",:"429",:"770",:"417",:"431",
:"5053",:"6170",:"115",:"428",:"306",:"432",:"315",:"5087",:"604",:"605",:"213",:"725",:"5058",:"5060",:"5064",:"5065",
:"5067",:"510",:"504",:"512",:"102",:"5195",:"672",:"304",:"5199",:"6201",:"6202",:"6207",:"6208",:"6209",:"6210",:"6211",
:"6212",:"6213",:"6214",:"6215",:"6216",:"6218",:"6219",:"6220",:"6221",:"6224",:"6225",:"6226",:"6227",:"6228",:"6229",
:"6230",:"6231",:"6232",:"6233",:"6234",:"5085",:"6236",:"6237",:"6238",:"6239",:"6241",:"6242",:"6243",:"6244",:"6245",
:"6246",:"6247",:"6248",:"6249",:"6250",:"6251",:"6252",:"6253",:"6254",:"6255",:"6256",:"6257",:"6258",:"6259",:"6260",
:"6261",:"6262",:"6264",:"6265",:"6266",:"6301",:"6302",:"6303",:"6304",:"6305",:"6306",:"6307",:"6308",:"6309",:"6310",
:"6311",:"6312",:"6313",:"6314",:"6315",:"6316",:"6317",:"6318",:"6319",:"6320",:"6321",:"6322",:"6323",:"6324",:"6325",
:"6326",:"6327",:"6328",:"6329",:"6330",:"6331",:"6333",:"6336",:"6338",:"6339",:"6340",:"6343",:"6344",:"6345",:"6346",
:"6347",:"6348",:"6349",:"6350",:"6352",:"6354",:"6357",:"6359",:"6361",:"6362",:"6366",:"6367",:"6368",:"6369",:"6370",
:"6371",:"6372",:"6376",:"6377",:"6378",:"6379",:"6380",:"6382",:"6385",:"6386",:"6388",:"6389",:"6392",:"6393",:"6394",
:"5026",:"6396",:"6397",:"6398",:"6399",:"6402",:"6403",:"5086"]


masterlist = vcat(gasstations,parks)

distances = readcsv("National Parks distances.csv")

c = NamedArray(distances,(parks,parks))
N = size(distances,1);
N1 = size(masterlist,1);

latlong = readcsv("Master-Lat-Long.csv") # .csv with latitude and longitude in columns 2 and 3, and gas price in column 4
latlongsize = size(latlong,1)
parklatlong = latlong[(latlongsize-15):latlongsize, 2:3]
gaslatlong = latlong[2:(latlongsize-16), 2:3]
overallLatLong = latlong[2:latlongsize, 2:3]
gasprice = latlong[2:(latlongsize), 4] # gas prices in tenths of cents

# define dictionaries for use in helper functions
lat = Dict(zip(parks,parklatlong[:,1]))
lon = Dict(zip(parks,parklatlong[:,2]))

latg = Dict(zip(gasstations,gaslatlong[:,1]))
long = Dict(zip(gasstations,gaslatlong[:,2]))

latm = Dict(zip(masterlist,overallLatLong[:,1]))
lonm = Dict(zip(masterlist,overallLatLong[:,2]))
;
In [4]:
# we need a dictionary with output from TSP

# creating the distance matrix

using NamedArrays

raw = readcsv("All locations.csv")
(row,col) = size(raw)

n_demand = 2:col   # columns containing supply
n_supply = 2:row   # rows containing demand

demand = raw[1,n_demand][:]   # the list of nutrients (convert to 1-D array)
supply = raw[n_supply,1][:]   # the list of foods (convert to 1-D array)

mph = 55 # average speed for the RV, assumed to be constant
maxDriveTank = (maxDriveTime*mph)/mpg

# if max drive time limits our distance, set the associated gallons to be used as the tank size
if maxDriveTank < tanksize
    tanksize = maxDriveTank
end

# distance matrix from/to each node G = D/mpg # gallons of gas required from/to each node
D = NamedArray( raw[n_supply,n_demand], (supply,demand), ("supply node","demand node") ); 

G = D/mpg      # matrix of gallons
U1 = tanksize  # upper bound of gas after visiting first national park (changes during loop iterations)
U2 = tanksize  # upper bound of gas for all transient nodes
    
startGas = 3060 # price of gas to start, based on an average price of diesel in America
parkGas = 3750 # average gas price at national parks (note that it is higher than overall average)

I = 1:row-1 # i = j cuz square matrix
J = 1:row-1 # i = j

if decision == "yes"
    for i in 258:length(masterlist)
        gasprice[i] = parkGas
    end
end
C = zeros(size(G))
for i in I
    C[i,:] = G[i,:] * gasprice[i] # total dollars spent from/to each node, prices is a matrix of all fuel prices
end

allindices = Dict(zip(1:row-1,masterlist))
allindicesrev = Dict(zip(masterlist,1:row-1))
indices = Dict(zip(parks,258:row-1))
indicesrev = Dict(zip(258:row-1,parks))

# Error-proof the user input to ensure the park selection does not double count the start point
for i in 1:length(selectParks)-1
    if selectParks[i] == startPoint
        deleteat!(selectParks, i)
    end
end
if selectParks[length(selectParks)] == startPoint
    deleteat!(selectParks, length(selectParks))
end
parkSelection = vcat(startPoint, selectParks) # define cleaned vector of user's desired destinations
sortedParks = sort(parkSelection)
ordSortParks = vcat(startPoint, sort(selectParks))

# revise data to fit the user's selection of parks
parksDict = Dict(1=>0,2=>0,3=>0,4=>0,5=>0,6=>0,7=>0,8=>0,9=>0,10=>0,11=>0,12=>0,13=>0,14=>0,15=>0,16=>0)
for i = 1:length(sortedParks)
    park = sortedParks[i]
    parksDict[park] = 1
end

subdist = distances
count = 0
for i = 1:16
    if parksDict[i] == 0
        for j = 1:16
            subdist = deleteat!(subdist[1:end], 1+count)
        end
    else
        for j = 1:16
            if parksDict[j] == 0
                subdist = deleteat!(subdist[1:end], 1+count)
            else
                count = count + 1
            end
        end
    end
end

# find distance matrix with subset of destinations
sq = Int(sqrt(length(subdist)))
subdistances0 = zeros(sq,sq) # intermediate matrix for calculation
count = 1
for i in 1:sq
    for j in 1:sq
        subdistances0[i,j] = subdist[count]
        count = count + 1
    end
end

# reorder distance matrix to account for different start points
subdistances2 = flipdim(subdistances0,2)
indstart = findfirst(sortedParks,startPoint)
subdust = zeros(size(subdistances0)) # intermediate matrix for calculation
count = 0
for i in 1:length(sortedParks)
    if i == indstart
        subdust[1,:] = subdistances2[i,:]
    end
end
for i in 1:length(sortedParks)
    if i != indstart
        subdust[2+count,:] = subdistances2[i,:]
        count = count + 1
    end
end
subdistances = zeros(size(subdistances0))
for i in 1:length(sortedParks)
    for j in 1:length(sortedParks)
        if subdust[j,i] == 0
            subdistances[:,j] = subdust[:,i]
        end
    end
end

#redefine data for new TSP and flow problems with subset of data
parksnew = []
for i in 1:length(ordSortParks)
    push!(parksnew, (indicesrev[ordSortParks[i]+257]))
end
masterlistnew = vcat(gasstations,parksnew) 
cs = NamedArray(subdistances,(parksnew,parksnew)) # define new c matrix to be used in TSP with subset of parks
Nsub = size(subdistances,1);
;
In [5]:
# HELPER FUNCTION: returns the cycle containing the START.
function getSubtour(x,start)
    subtour = [start]
    while true
        j = subtour[end]
        for k in parks
            if x[k,j] == 1
                push!(subtour,k)
                break
            end
        end
        if subtour[end] == start
            break
        end
    end
    return subtour
end

# HELPER FUNCTION: returns a list of all cycles
function getAllSubtours(x)
    nodesRemaining = parks
    subtours = []
    while length(nodesRemaining) > 0
        subtour = getSubtour(x,nodesRemaining[1])
        push!(subtours, subtour)
        nodesRemaining = setdiff(nodesRemaining,subtour)
    end
    return subtours
end

# HELPER FUNCTION FOR MAPPING GEOGRAPHIC LOCATIONS

using PyPlot
using PyCall

@pyimport mpl_toolkits.basemap as basemap

function mapSolution(x=0)
    m=basemap.Basemap(projection="merc", resolution="l",llcrnrlat=23,llcrnrlon=-126,urcrnrlat=50,urcrnrlon=-65)
    m[:drawmapboundary](fill_color="#0971a5")
    m[:fillcontinents](color="#679411")

    # plot gas stations
    for i in gasstations
        m[:plot](long[i], latg[i], "k." ,markersize = 3,latlon=true)
    end
    
    # plot tours
    if x!=0
    tours = getAllSubtours(x)
        for t in tours
            L = length(t)-1
            for i in 1:L
                m[:drawgreatcircle](lon[t[i]],lat[t[i]],lon[t[i+1]],lat[t[i+1]],linewidth=1,color="b")
            end
        end
    end
    
    # plot parks
    for i in parks
        m[:plot](lon[i], lat[i], "ro" ,markersize = 5,latlon=true)
    end
    return
end
;
In [6]:
# HELPER FUNCTION: returns the cycle containing the START.
function getSubtour2(x,start)
    subtour = [start]
    while true
        j = subtour[end]
        for k in parksnew
            if x[k,j] == 1
                push!(subtour,k)
                break
            end
        end
        if subtour[end] == start
            break
        end
    end
    return subtour
end

# HELPER FUNCTION: returns a list of all cycles
function getAllSubtours2(x)
    nodesRemaining = parksnew
    subtours = []
    while length(nodesRemaining) > 0
        subtour = getSubtour2(x,nodesRemaining[1])
        push!(subtours, subtour)
        nodesRemaining = setdiff(nodesRemaining,subtour)
    end
    return subtours
end

# HELPER FUNCTION FOR MAPPING GEOGRAPHIC LOCATIONS

using PyPlot
using PyCall

@pyimport mpl_toolkits.basemap as basemap

function mapSolution2(x=0)
    m=basemap.Basemap(projection="merc", resolution="l",llcrnrlat=23,llcrnrlon=-126,urcrnrlat=50,urcrnrlon=-65)
    m[:drawmapboundary](fill_color="#0971a5")
    m[:fillcontinents](color="#679411")

    # plot tours
    if x!=0
    tours = getAllSubtours2(x)
        for t in tours
            L = length(t)-1
            for i in 1:L
                m[:drawgreatcircle](lon[t[i]],lat[t[i]],lon[t[i+1]],lat[t[i+1]],linewidth=1,color="b")
            end
        end
    end

    # plot gas stations
    for i in gasstations
        m[:plot](long[i], latg[i], "k." ,markersize = 3,latlon=true)
    end
    
    # plot parks
    for i in parks
        m[:plot](lon[i], lat[i], "ro" ,markersize = 5,latlon=true)
    end
    
    # plot subset of parks and starting point
    for i in parksnew
        if i == indicesrev[startPoint+257]
            m[:plot](lon[i], lat[i], "wo" ,markersize = 6,latlon=true)
        else
            m[:plot](lon[i], lat[i], "mo" ,markersize = 6,latlon=true)
        end
    end
    return
end 
;

3.D. Plot all Points before Solving

In [7]:
# Run the previous block in this notebook to load helper functions before continuing!
mapSolution(0)
xlabel("Plot showing the 15 national parks and John's house");
tight_layout()

Destinations, including national parks and Madison, WI, are shown as red points on this map, and all gas stations in the dataset are shown as smaller black points.

4. Iterations of Solutions and Results

4.A. Iteration 1

TSP Formulation with all 15 national parks


The first iteration of our problem solves a simple Traveling Saleman Problem (TSP) starting in Madison, WI and going to all 15 national parks on the list presented in the problem definition. The output of this section will be the optimal route for this trip, a total travel distance if traveling directly from park to park, and a plot of the given route on a map, utilizing great circle curves between each stop.

In [8]:
m = Model(solver = CbcSolver())
@variable(m, x[parks,parks], Bin)                                      # must formulate as IP this time
@constraint(m, c1[j in parks], sum( x[i,j] for i in parks ) == 1)      # one out-edge
@constraint(m, c2[i in parks], sum( x[i,j] for j in parks ) == 1)      # one in-edge
@constraint(m, c3[i in parks], x[i,i] == 0 )                           # no self-loops
@objective(m, Min, sum( x[i,j]*c[i,j] for i in parks, j in parks ))    # minimize total cost
                                    
# MTZ variables and constraints
@variable(m, u[parks])
@constraint(m, c4[i in parks, j in parks[2:end]], u[i] - u[j] + N*x[i,j] <= N-1 )

solve(m)

xx = getvalue(x)
length16TSP = getobjectivevalue(m)
subtours = getAllSubtours(xx)  # get cycle containing start
println("Tour length: ", length16TSP)

# pretty print the solution
xx = getvalue(x)
sol = NamedArray(zeros(Int,N,N),(parks,parks))
for i in parks
    for j in parks
        sol[i,j] = Int(xx[i,j])
    end
end
;
Tour length: 9980.0
In [9]:
# Plot the full 16 park TSP
println("Tour length: ", length16TSP, " miles.")
sleep(1)
mapSolution(sol)
xlabel("TSP for all 15 national parks")
tight_layout()
Tour length: 9980.0 miles.

Total tour length is displayed above, and the optimal tour between parks is displayed on the map as the blue line connecting each destination. This iteration assumes no stops for refueling between destinations, which is not a reasonable assumption and will be addressed in the next section.

4.B. Iteration 2

Embedded Flow Problem Formulation

This iteration takes the tour order output from the TSP found in iteration 1, and embeds minimum cost network flow problems between each ordered pair of destinations, treating the current location as the source and the next location as the sink. This runs through each step in the route until returning to the original start point of Madison, WI. The transient nodes in this network are the gas stations from the dataset and cost associated with each arc in the network is the distance times fuel price at each gas stop. We assumed constant fuel economy throughout this trip, so the costs represent the distance traveled at various gas prices.

In [10]:
using JuMP, Clp

ord = getAllSubtours(sol)
gassol=[(indicesrev[258])]
U1 = tanksize # reset beginning tank limit to full tank

for p in 1:(size(sol,1))
    
    source = indices[ord[1][p]]
    sink = indices[ord[1][(p + 1)]]
    
    G = D/mpg      # reset matrix of gallons
    
    # ensure that the distance traveled from source is feasible based on how much gas we have left from the previous trip
    for j in J
        if G[source,j] - U1 > 0 
            G[source,j] = 0 
        end
    end

    # ensure that the distance traveled after each gas stop is not greater than the max distance traveled on a full tank
    for i in 1:row-1
        if i != source
            for j in J
                if G[i,j] - U2 > 0 
                    G[i,j] = 0 
                end
            end
        end
    end
    
    m = Model(solver = ClpSolver())

    @variable(m, x[1:row-1,1:row-1] >= 0)

    for i in 1:length(masterlist)
        if i != source
            if i != sink
                @constraint(m, sum(x[i,j] - x[j,i] for j in J) == 0) # flow balance, all transient nodes have 0 demand
            end
        end
    end
    
    @constraint(m, sum(x[source,j] for j in J) - sum(x[j,source] for j in J) == 1) # source, S, has 1 unit of supply (shortest path problem)
    @constraint(m, sum(x[sink,j] for j in J) - sum(x[j,sink] for j in J) == -1) # sink, T, has 1 unit of demand (shortest path problem)        
        
    for i in I
        for j in J
            @constraint(m, x[i,j] <= G[i,j])
        end
    end
  
    @objective(m, Min, sum(x .* C))
    
    status = solve(m)
    
    (q,r,s) = findnz(getvalue(x))
    
    count = 0

    while count <= (length(q)-1)
        for i in 1:length(q)
            if q[i] == allindicesrev[gassol[length(gassol)]]
                push!(gassol, allindices[r[i]])
                count = count + 1
            end
        end
    end

    if decision == "no"
        U1 = tanksize - sum(G[i,sink] * getvalue(x[i,sink]) for i in I) # amount left for travel at beginning of the next leg
    end
        
end

println(gassol)
println()
if decision == "no"
    println("NOTE: This is the result for Iteration 3.")
else
    println("NOTE: This is the result for Iteration 2.")
end
String["John's House", "307", "5048", "Cuyahoga Valley National Park", "120", "6211", "Acadia National Park", "6211", "5045", "Congaree National Park", "6238", "5039", "Mammoth Cave National Park", "6347", "5041", "729", "Big Bend National Park", "6230", "5073", "6228", "5070", "Rocky Mountain National Park", "432", "Mesa Verde National Park", "Grand Canyon National Park", "Zion National Park", "6331", "421", "Kings Canyon National Park", "Yosemite National Park", "6170", "5051", "Crater Lake National Park", "5033", "Glacier National Park", "6256", "Yellowstone National Park", "6261", "Theodore Roosevelt National Park", "6361", "307", "John's House"]

NOTE: This is the result for Iteration 2.

This output string shows the optimal order including destination names and gas station ID numbers for all stations used along the way.

In [11]:
# Calculate total cost and distance

cost = startGas*G[allindicesrev[gassol[1]],allindicesrev[gassol[2]]]
distance = D[allindicesrev[gassol[1]],allindicesrev[gassol[2]]]

if decision == "no"
    for i in 2:length(gassol)-1
        if allindicesrev[gassol[i]] < 258
            cost = cost + gasprice[i]*G[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
            distance = distance + D[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
        else
            if allindicesrev[gassol[i-1]] < 258
                cost = cost + gasprice[allindicesrev[gassol[i-1]]]*G[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
                distance = distance + D[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
            else
                cost = cost + gasprice[allindicesrev[gassol[i-2]]]*G[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
                distance = distance + D[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
            end
        end
    end
else
    for i in 2:length(gassol)-1
        if allindicesrev[gassol[i]] < 258
            cost = cost + gasprice[allindicesrev[gassol[i]]]*G[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
            distance = distance + D[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
        else   
            cost = cost + parkGas*G[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
            distance = distance + D[allindicesrev[gassol[i]],allindicesrev[gassol[i+1]]]
        end
    end
end

cost = cost/1000 # convert back from tenths of cents to dollars

println("Total driving distance of trip = ", distance, " miles.")
println("Total gas cost = \$", cost)
println()
println("TSP distance was ", length16TSP, " miles.")
println("To stop for gas, the route deviates ", round(distance - length16TSP, 1), " miles from direct distance.")
Total driving distance of trip = 10271.3 miles.
Total gas cost = $2723.8303600000004

TSP distance was 9980.0 miles.
To stop for gas, the route deviates 291.3 miles from direct distance.
In [12]:
# PLOT THE FINAL ROUTE

using PyPlot
using PyCall

@pyimport mpl_toolkits.basemap as basemap

m=basemap.Basemap(projection="merc", resolution="l",llcrnrlat=23,llcrnrlon=-126,urcrnrlat=50,urcrnrlon=-65)
    m[:drawmapboundary](fill_color="#0971a5")
    m[:fillcontinents](color="#679411")
    
    # plot tours
        tours = []
        push!(tours, gassol) 
        for t in tours
            L = length(t)-1
            for i in 1:L
                m[:drawgreatcircle](lonm[t[i]],latm[t[i]],lonm[t[i+1]],latm[t[i+1]],linewidth=1,color="y")
            end
        end

    # plot gas stations
    for i in gasstations
        m[:plot](long[i], latg[i], "k." ,markersize = 3,latlon=true)
    end
    
    # plot parks
    for i in parks
        m[:plot](lon[i], lat[i], "ro" ,markersize = 5,latlon=true)
    end

xlabel("Final route for national parks with gas station stops")
tight_layout()

The above map shows the finalized travel route given the order of destinations from the TSP, with necessary fuel stops calculated by the embedded flow problems. The final route is depicted as a yellow line connecting both destinations (red points) and fuel stops (black points).

4.C. Iteration 3

Incorporate Maximum Drive Time and the Option to not Refuel at National Parks

This iteration builds off the previous iteration by incorporating the option to not refuel at national parks. This allows the driver to use their loyalty membership with TA/Petro for all refueling needs and avoid higher fuel costs at national park stations. The other element of this iteration includes a maximum drive time per step of the trip to allow the user to choose the maximum time they would like to drive before stopping.

The base information provided begins with a large enough drive time and decision called "yes" to refuel at parks, in order to ignore this iteration. To begin using Iteration 3, change the variables 'maxDriveTime' and 'decision' in the User Input section at the top, and rerun the Data Definition and Iterations 1 and 2 to account for the new information.

Please note that due to the sparsity of the gas station data, reducing the maxDriveTime below a certain point will make the model infeasible. This will be discussed more in the description of model limitations.

4.D. Iteration 4

TSP Formulation with Subset of Destinations

To provide the user with additional routing options, this iteration allows the user to choose any subset combination of destinations, including the option to choose a starting point in the destinations network other than the base case of Madison, WI.

In [13]:
m = Model(solver = CbcSolver())
@variable(m, x[parksnew,parksnew], Bin)                                      # must formulate as IP this time
@constraint(m, c1[j in parksnew], sum( x[i,j] for i in parksnew ) == 1)      # one out-edge
@constraint(m, c2[i in parksnew], sum( x[i,j] for j in parksnew ) == 1)      # one in-edge
@constraint(m, c3[i in parksnew], x[i,i] == 0 )                           # no self-loops
@objective(m, Min, sum( x[i,j]*cs[i,j] for i in parksnew, j in parksnew ))    # minimize total cost
                                    
# MTZ variables and constraints
@variable(m, u[parksnew])
@constraint(m, c4[i in parksnew, j in parksnew[2:end]], u[i] - u[j] + Nsub*x[i,j] <= Nsub-1 )

solve(m)

xx = getvalue(x)
lengthsubTSP = getobjectivevalue(m)
subtours = getAllSubtours2(xx)  # get cycle containing start
println("Tour length: ", lengthsubTSP)

# pretty print the solution
xx = getvalue(x)
solsub = NamedArray(zeros(Int,Nsub,Nsub),(parksnew,parksnew))
for i in parksnew
    for j in parksnew
        solsub[i,j] = Int(xx[i,j])
    end
end
;
Tour length: 7495.000000000001
In [14]:
# Plot the subset TSP
println("Tour length: ", lengthsubTSP, " miles.")
sleep(1)
mapSolution2(solsub)
xlabel("TSP for subset of national parks")
tight_layout()
println("Note: The white point indicates the chosen starting location and magenta points represent the rest of the chosen subset.")
Tour length: 7495.000000000001 miles.
Note: The white point indicates the chosen starting location and magenta points represent the rest of the chosen subset.

Total tour length is displayed above, and the optimal tour between parks is displayed on the map as the blue line connecting each destination. This is the input to the the next section, which accounts for fuel stops.

Embedded Flow Problem Formulation with Subset of Destinations for Iteration 4

In [15]:
using JuMP, Clp

ord1 = getAllSubtours2(solsub)
gassol1=[(indicesrev[startPoint+257])]
U1 = tanksize # reset beginning tank limit to full tank

for p in 1:(size(solsub,1))
    
    source = indices[ord1[1][p]]
    sink = indices[ord1[1][(p + 1)]]
    
    G = D/mpg      # reset matrix of gallons
    
    # ensure that the distance traveled from source is feasible based on how much gas we have left from the previous trip
    for j in J
        if G[source,j] - U1 > 0 
            G[source,j] = 0 
        end
    end

    # ensure that the distance traveled after each gas stop is not greater than the max distance traveled on a full tank
    for i in 1:row-1
        if i != source
            for j in J
                if G[i,j] - U2 > 0 
                    G[i,j] = 0 
                end
            end
        end
    end
    
    m = Model(solver = ClpSolver())

    @variable(m, x[1:row-1,1:row-1] >= 0)

    for i in 1:length(masterlist)
        if i != source
            if i != sink
                @constraint(m, sum(x[i,j] - x[j,i] for j in J) == 0) # flow balance, all transient nodes have 0 demand
            end
        end
    end
    
    @constraint(m, sum(x[source,j] for j in J) - sum(x[j,source] for j in J) == 1) # source, S, has 1 unit of supply (shortest path problem)
    @constraint(m, sum(x[sink,j] for j in J) - sum(x[j,sink] for j in J) == -1) # sink, T, has 1 unit of demand (shortest path problem)        
        
    for i in I
        for j in J
            @constraint(m, x[i,j] <= G[i,j])
        end
    end
  
    @objective(m, Min, sum(x .* C))
    
    status = solve(m)
    
    (q,r,s) = findnz(getvalue(x))
    
    count = 0

    while count <= (length(q)-1)
        for i in 1:length(q)
            if q[i] == allindicesrev[gassol1[length(gassol1)]]
                push!(gassol1, allindices[r[i]])
                count = count + 1
            end
        end
    end

    if decision == "no"
        U1 = tanksize - sum(G[i,sink] * getvalue(x[i,sink]) for i in I) # amount left for travel at beginning of the next leg
    end
        
end

println(gassol1)
String["Yosemite National Park", "429", "417", "6225", "5003", "6350", "Big Bend National Park", "6230", "713", "729", "Mammoth Cave National Park", "6330", "5072", "Cuyahoga Valley National Park", "120", "6211", "Acadia National Park", "6211", "5015", "528", "512", "Zion National Park", "6331", "421", "Yosemite National Park"]

This output string shows the optimal order including destination names and gas station ID numbers for all stations used along the way.

In [16]:
# Calculate total cost and distance

cost = startGas*G[allindicesrev[gassol1[1]],allindicesrev[gassol1[2]]]
distance = D[allindicesrev[gassol1[1]],allindicesrev[gassol1[2]]]

if decision == "no"
    for i in 2:length(gassol1)-1
        if allindicesrev[gassol1[i]] < 258
            cost = cost + gasprice[i]*G[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
            distance = distance + D[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
        else
            if allindicesrev[gassol1[i-1]] < 258
                cost = cost + gasprice[allindicesrev[gassol1[i-1]]]*G[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
                distance = distance + D[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
            else
                cost = cost + gasprice[allindicesrev[gassol1[i-2]]]*G[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
                distance = distance + D[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
            end
        end
    end
else
    for i in 2:length(gassol1)-1
        if allindicesrev[gassol1[i]] < 258
            cost = cost + gasprice[allindicesrev[gassol1[i]]]*G[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
            distance = distance + D[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
        else   
            cost = cost + parkGas*G[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
            distance = distance + D[allindicesrev[gassol1[i]],allindicesrev[gassol1[i+1]]]
        end
    end
end

cost = cost/1000 # convert back from tenths of cents to dollars

println("Total driving distance of trip = ", distance, " miles.")
println("Total gas cost = \$", cost)
println()
println("TSP distance was ", lengthsubTSP, " miles.")
println("To stop for gas, the route deviates ", round(distance - lengthsubTSP, 1), " miles from direct distance.")
Total driving distance of trip = 7532.8 miles.
Total gas cost = $1869.17158

TSP distance was 7495.000000000001 miles.
To stop for gas, the route deviates 37.8 miles from direct distance.
In [17]:
# PLOT THE FINAL ROUTE

using PyPlot
using PyCall

@pyimport mpl_toolkits.basemap as basemap

m=basemap.Basemap(projection="merc", resolution="l",llcrnrlat=23,llcrnrlon=-126,urcrnrlat=50,urcrnrlon=-65)
    m[:drawmapboundary](fill_color="#0971a5")
    m[:fillcontinents](color="#679411")
    
    # plot tours
        tours = []
        push!(tours, gassol1) 
        for t in tours
            L = length(t)-1
            for i in 1:L
                m[:drawgreatcircle](lonm[t[i]],latm[t[i]],lonm[t[i+1]],latm[t[i+1]],linewidth=1,color="y")
            end
        end

    # plot gas stations
    for i in gasstations
        m[:plot](long[i], latg[i], "k." ,markersize = 3,latlon=true)
    end
    
    # plot parks
    for i in parks
        m[:plot](lon[i], lat[i], "ro" ,markersize = 5,latlon=true)
    end

    # plot subset of parks and starting point
    for i in parksnew
        if i == indicesrev[startPoint+257]
            m[:plot](lon[i], lat[i], "wo" ,markersize = 6,latlon=true)
        else
            m[:plot](lon[i], lat[i], "mo" ,markersize = 6,latlon=true)
        end
    end

xlabel("Final route for subset of destinations with gas station stops")
tight_layout()
println("Note: The white point indicates the chosen starting location and magenta points represent the rest of the chosen subset.")
Note: The white point indicates the chosen starting location and magenta points represent the rest of the chosen subset.

The above map shows the finalized travel route given the order of destinations from the TSP, with necessary fuel stops calculated by the embedded flow problems. The final route is depicted as a yellow line connecting both destinations and fuel stops (black points).

5. Conclusion

5.A. Summary of Findings

The above analysis provides interesting routing information for traveling around the United States. Our team found that a simple TSP was not sufficient to provide realistic routing information as fuel stops are naturally required during a large scale trip. By leveraging embedded min cost network flow problems in concert with a TSP, we were able to account for fuel stops on an optimal travel route, with real driving distances. Since refueling is necessary and brings a traveler off of a direct route, we observed that the result when incorporating fuel stops deviated slightly from the distance given by the TSP. As expected, the route including fuel stops was longer than direct distances, but accounted for true driving distance and fuel costs. In the final model, we found that the effect of various starting points was minimal when using the same subset of destinations. The only impact of differing starting points occurs in the case where travelers do not refuel at national parks, allowing for varying fuel amounts remaining in the gas tank in different steps of the route, depending on where the trip began.

Ultimately, this model provided an informed decision of where to stop for gas. The results were intuitive but represented a model that would be too complex to solve with brute force. The problem demonstrates the utility of optimization models for routing problems with multiple stops. Creating a TSP for the entire network of gas stations and destinations would yield a vastly complex, computationally expensive problem with impractical run times. The use of a simplified TSP with embedded network flow problems allows the model to integrate linear subproblems with the results of a reasonably scoped integer program.

5.B. Limitations

Due to the sparsity of the fuel station data, the maximum drive time variable was not able to be used to its full potential, since shortening routes too much would make the model infeasible for parks that are isolated in areas with few gas stations. For the same reason, the gas tank size cannot be made too small without the model becoming infeasible.

This model assumes constant fuel economy, which is a limitation to the result since the RV would not achieve the same miles per gallon for the entire trip. The model also ignores the change in gas prices over time, so the total cost estimate will not be exact for the true time horizon of the trip. This does not affect the final tour suggestion, however, because we proved relative pricing structures to be constant over time (i.e. the cheapest gas stations always have the cheapest price as proven by the graph in the Introduction Section).

The model does not account for real time traffic and accident data that could affect drive time and route if roads are backed up or closed. Additionally, the formulation only includes a specific subset of national parks due to our team's preference, but other users may want to consider other destinations with this type of model.

5.C. Future Direction

Although our team created variations of this problem, an immediate next step would be to allow more national parks to be considered than the subset that was worked with in this problem. The TSP model currently solves the problem relatively quickly with 16 nodes; although the TSP is an NP complete problem, we feel that more parks could be potentially introduced to the model without hurting run time too much. We most likely would not be able to easily solve a problem with all 59 parks, but the current subset could be expanded upon.

Another future step that our team considered was to introduce additional fuel stop locations outside of what is available from TA and Petro. Currently, we only have price data for these fuel stations, and as a result, we were limited to creating our flow network for refueling at these specific locations. If we were able to find more price data for other fuel stations, we could potentially obtain more realistic results with an expanded amount of fuel stations available for the model to select.

Finally, although this step would be more difficult, our team feels that it would be very useful to develop a dynamic adjacency matrix that allows a user to choose their own start location, where all of the relative distances to both national parks and fuel stations would be calculated in reference to this location. This would drastically expand usability of the model and allow users to determine their optimal route, even if their start point is not naturally located near the national park nodes.