By Tyler Ransom, Duke University Social Science Research Institute

Let's examine how to use a new `Julia`

packaged called `JuMP`

to illustrate some of Julia's powerful resources for optimizing objective functions of all kinds.

For an overview of the `JuMP`

package, see here. Essentially, `JuMP`

makes use of `Julia`

's macros to greatly simplify construction and optimization of various problems.

Here are some features of `JuMP`

:

- interfaces seamlessly with many industry-grade solvers (e.g. AMPL)
- can be used to solve linear programming, nonlinear programming, and many other types of problems (including constrained optimization)
- automatically differentiates the objective function (
*not*numerical differentiation), resulting in speed gains - user-friendly model construction: user simply writes the objective function and any constraints in mathematical notation; JuMP then translates this into binaries at compile time

In this illustration, we will use JuMP for nonlinear programming problems (i.e. maximum likelihood estimation). We will be using the open-source optimizer `Ipopt`

(pronounced eye-PEE-opt)

So let's load the packages we'll need and get started!

In [ ]:

```
using JuMP
using Ipopt
```

Now let's create some data inside of a function:

In [1]:

```
function datagen(N,T)
## Generate data for a linear model to test optimization
srand(1234)
N = convert(Int64,N) #inputs to functions such as -ones- need to be integers!
T = convert(Int64,T) #inputs to functions such as -ones- need to be integers!
const n = convert(Int64,N*T) # use -const- as a way to declare a variable to be global (so other functions can access it)
# generate the Xs
const X = cat(2,ones(N*T,1),5+3*randn(N*T,1),rand(N*T,1),
2.5+2*randn(N*T,1),15+3*randn(N*T,1),
.7-.1*randn(N*T,1),5+3*randn(N*T,1),
rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1),
5+3*randn(N*T,1),rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1));
# generate the betas (X coefficients)
const bAns = [ 2.15; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2 ]
# generate the std dev of the errors
const sigAns = 0.3
# generate the Ys from the Xs, betas, and error draws
draw = 0 + sigAns*randn(N*T,1)
const y = X*bAns+draw
# return generated data so that other functions (below) have access
return X,y,bAns,sigAns,n
end
```

Out[1]:

Now let's evaluate the function so that the function outputs are in the current scope:

In [3]:

```
X,y,bAns,sigAns,n = datagen(1e4,5);
```

We can estimate the `bAns`

parameter vector using OLS:

In [4]:

```
bHat = (X'*X)\(X'*y);
sigHat = sqrt((y-X*bHat)'*(y-X*bHat)/(n-size(X,2)));
```

And now we can compare the estimates with the solution:

In [5]:

```
[round([bHat;sigHat],3) [bAns;sigAns]]
```

Out[5]:

The estimator performs well.

`JuMP`

¶Now let's estimate this using maximum likelihood under the (valid) assumption that the errors in our `datagen()`

function are independently drawn from a normal distribution with mean 0 and standard deviation to be estimated.

Recall that the likelihood function for this scenario is: $$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \left( -\frac{\left(y_{i}-x_{i}\beta\right)^{2}}{2\sigma^{2}}\right) $$

or, more simply,

$$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sigma} \phi \left(\frac{y_{i}-x_{i}\beta}{\sigma}\right) $$where $\phi\left(\cdot\right)$ is the probability density function of the standard normal distribution.

In maximum likelihood estimation, we search for the parameters that maximize the `log`

likelihood function. Our function above becomes:

`JuMP`

syntax¶`JuMP`

requires the following syntax elements:

- model name
- solver
- variable (i.e. a parameter to search over)
- objective function
`solve()`

call

We will construct these line-by-line below.

First, let's decleare the model name and solver:

In [6]:

```
myMLE = Model(solver=IpoptSolver(tol=1e-6));
```

Now let's tell `JuMP`

which variables to search over. For the example above, this is the vector $\beta$ and the scalar $\sigma$.

In [7]:

```
@defVar(myMLE, b[i=1:size(X,2)]);
@defVar(myMLE, s>=0.0);
```

Now we will write the objective function, which is the log likelihood function from above. We will also tell `JuMP`

that we are maximizing.

In [8]:

```
@setNLObjective(myMLE, Max, (n/2)*log(1/(2*pi*s^2))-sum{(y[i]-sum{X[i,k]*b[k],
k=1:size(X,2)})^2, i=1:size(X,1)}/(2s^2));
```

Note that we specify the summation term over $i$ using `sum{ ,i=1:size(X,1)}`

. We also have to specify the linear algebra implied by `X*b`

as another `sum`

, this time over $k$. This is because `JuMP`

doesn't understand matrix multiplication. This last fact adds a little bit of syntax to the objective function, but overall it is very readable.

Now that we have all of our elements set up, we can issue a `solve()`

call and ask `JuMP`

to perform the optimization:

In [9]:

```
status = solve(myMLE)
```

Out[9]:

We see that the solver finished at a solution that is a local maximum. To store the estimated values of the parameters, we can use the `getValue()`

functions, applied to each separate parameter:

In [10]:

```
bOpt = getValue(b[:]);
sOpt = getValue(s);
```

And if we compare them with the solution, we see that we have indeed obtained the correct parameter estimates:

In [11]:

```
[round([bOpt;sOpt],3) [bAns;sigAns]]
```

Out[11]:

We can also return other helpful values such as the objective function value at the optimum, which is the log likelihood of the estimated model.

In [12]:

```
getObjectiveValue(myMLE)
```

Out[12]:

Additionally, we can obtain the Hessian matrix of the model, which serves as a key component to the variance-covariance matrix that is used for statistical inference. The syntax for this is a bit messy, but I will put it below for reference:

In [13]:

```
this_par = myMLE.colVal
m_eval = JuMP.JuMPNLPEvaluator(myMLE);
MathProgBase.initialize(m_eval, [:ExprGraph, :Grad, :Hess])
hess_struct = MathProgBase.hesslag_structure(m_eval)
hess_vec = zeros(length(hess_struct[1]))
numconstr = length(m_eval.m.linconstr) + length(m_eval.m.quadconstr) + length(m_eval.m.nlpdata.nlconstr)
dimension = length(myMLE.colVal)
MathProgBase.eval_hesslag(m_eval, hess_vec, this_par, 1.0, zeros(numconstr))
this_hess_ld = sparse(hess_struct[1], hess_struct[2], hess_vec, dimension, dimension)
hOpt = this_hess_ld + this_hess_ld' - sparse(diagm(diag(this_hess_ld)));
hOpt = -full(hOpt); #since we are maximizing
seOpt = sqrt(diag(full(hOpt)\eye(size(hOpt,1))));
```

The above code obtained the estimated hessian matrix, took the negative of it (because we are maximizing), and then obtained standard errors of our parameter estimates from the diagonal elements of the inverse of the negative hessian.

In [14]:

```
[round([bOpt;sOpt],3) round(seOpt,3)]
```

Out[14]:

We can compare these with the OLS standard errors and confirm that they are the same:

In [15]:

```
seHat = sqrt((sigHat^2).*diag((X'*X)\eye(size(X,2))));
[round(bHat,3) round(seHat,3)]
```

Out[15]:

Constraints can easily be added to the objective function. There are two ways of doing this:

- Directly to the
`@defVar`

statement, e.g.`@defVar(myMLE, s>=0.0);`

restricts`s`

to be weakly positive. - Additional lines after the set of
`@defVar`

statements, but before the`@setNLObjective`

statement. The syntax for this is, e.g.`@addConstraint(myMLE, b[15] == 0)`

which would restrict the 15th element of the`b`

vector to equal 0.

`JuMP`

accepts up to literally thousands of constraints, and its simple syntax makes coding much easier.

Pros of `JuMP`

:

- Versatile modeling language that takes advantage of
`Julia`

's features. - Interfaces to a variety of optimizers, meaning the user can simply call a different optimizer without adjusting any underlying code.
- Very fast. Roughly 3+ times faster than
`Matlab`

's`fminunc`

for a simple large-scale problem. - User can specify maximization or minimization on the fly without needed to re-express the objective function.

Cons of `JuMP`

:

- Because JuMP does constrained optimization if the user specifies constraints on any of the parameters. Thus, the hessian it returns is the hessian of the corresponding
*Lagrangian*, not the hessian of the*objective*. This makes inference a bit trickier. - No matrix multiplication allowed, so users must specify matrix operations in scalar form