Department of Economics, University of Tokyo
From Miranda and Fackler, Applied Computational Economics and Finance, 2002, Section 7.6.3
Julia translation of the Python version
The model is formulated with finite horizon in Section 7.2.1, but solved with infinite horizon in Section 7.6.1. Here we follow the latter.
using QuantEcon
using Plots
plotlyjs();
Plotly javascript loaded.
To load again call
init_notebook(true)
price = 1 # Market price of ore
sbar = 100 # Upper bound of ore stock
beta = 0.9 # Discount rate
n = sbar + 1 # Number of states
m = sbar + 1 # Number of actions
# Cost function
c(s, x) = (x^2) / (1+s);
This approch sets up the reward array R and the transition probability array Q as a 2-dimensional array of shape (n, m) and a 3-simensional array of shape (n, m, n), respectively, where the reward is set to $−∞$ for infeasible state-action pairs (and the transition probability distribution is arbitrary for those pairs).
Reward array:
R = Matrix{Float64}(n, m)
for j in 1:m
for i in 1:n
if j <= i
R[i, j] = price*(j-1) -c(i-1, j-1)
else
R[i, j] = -Inf
end
end
end
(Degenerate) transition probability array:
Q = zeros(n, m, n)
for j in 1:m
for i in 1:n
if j <= i
Q[i, j, i-j+1] = 1.0
else
Q[i, j, 1] = 1.0
end
end
end
Set up the dynamic program as a DiscreteDP instance:
ddp = DiscreteDP(R, Q, beta);
Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:
# Solve the dynamic optimization problem (by policy iteration)
res = solve(ddp, PFI);
The number of iterations:
res.num_iter
The controlled Markov chain is stored in res.mc. To simulate:
res.mc.state_values = 0:100
0:100
# Simulate the controlled Markov chain
nyrs = 15
spath = simulate(res.mc, nyrs+1, init = sbar + 1)
16-element Array{Int64,1}: 100 76 58 44 33 25 19 14 11 8 6 4 3 2 1 0
p1 = plot(0:sbar, res.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2 = plot(0:sbar, res.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3 = plot(0:nyrs+1, spath, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1, p2, p3, legend = false)
This approach assigns the rewards and transition probabilities only to feaslbe state-action pairs, setting up R and Q as a 1-dimensional array of length L and a 2-dimensional array of shape (L, n), respectively.
We need the arrays of feasible state and action indices:
# Arrays of feasible state and action indices
s_indices = []
a_indices = []
for j in 1:m
for i in 1:n
if i - j >= 0
push!(s_indices, i)
push!(a_indices, j)
end
end
end
L = length(s_indices)
s_indices
5151-element Array{Any,1}: 1 2 3 4 5 6 7 8 9 10 11 12 13 ⋮ 100 101 98 99 100 101 99 100 101 100 101 101
a_indices
5151-element Array{Any,1}: 1 1 1 1 1 1 1 1 1 1 1 1 1 ⋮ 97 97 98 98 98 98 99 99 99 100 100 101
Reward vector:
R_sa = zeros(L)
for (i , (s, x)) in enumerate(zip(s_indices.-1, a_indices.-1))
R_sa[i] = price*x - c(s, x)
end
(Degenerate) transition probability array:
Q_sa = Matrix{Float64}(L,n)
for (i, s) in enumerate(s_indices)
x = a_indices[i]
Q_sa[i, s-x+1] = 1
end
Set up the dynamic program:
ddp_sa = DiscreteDP(R_sa, Q_sa, beta, s_indices, a_indices);
Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:
res_sa = solve(ddp_sa, PFI);
Number of iterations:
res_sa.num_iter
Simulate the controlled markov chain:
res_sa.mc.state_values = 0:100
nyrs = 15
spath_sa = simulate(res_sa.mc,nyrs+1,init = sbar + 1)
16-element Array{Int64,1}: 100 76 58 44 33 25 19 14 11 8 6 4 3 2 1 0
Draw the graphs:
p1_sa = plot(0:sbar, res_sa.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2_sa = plot(0:sbar, res_sa.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3_sa = plot(0:nyrs+1, spath_sa, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1_sa, p2_sa, p3_sa, legend=false)