By Tyler Ransom, Duke University Social Science Research Institute

Let's examine how to use a new `Julia`

packaged called `JuMP`

to illustrate some of Julia's powerful resources for optimizing objective functions of all kinds.

For an overview of the `JuMP`

package, see here. Essentially, `JuMP`

makes use of `Julia`

's macros to greatly simplify construction and optimization of various problems.

Here are some features of `JuMP`

:

- interfaces seamlessly with many industry-grade solvers (e.g. AMPL)
- can be used to solve linear programming, nonlinear programming, and many other types of problems (including constrained optimization)
- automatically differentiates the objective function (
*not*numerical differentiation), resulting in speed gains - user-friendly model construction: user simply writes the objective function and any constraints in mathematical notation; JuMP then translates this into binaries at compile time

In this illustration, we will use JuMP for nonlinear programming problems (i.e. maximum likelihood estimation). We will be using the open-source optimizer `Ipopt`

(pronounced eye-PEE-opt)

So let's load the packages we'll need and get started!

In [ ]:

```
using JuMP
using Ipopt
```

Now let's create some data inside of a function:

In [1]:

```
function datagen(N,T)
## Generate data for a linear model to test optimization
srand(1234)
N = convert(Int64,N) #inputs to functions such as -ones- need to be integers!
T = convert(Int64,T) #inputs to functions such as -ones- need to be integers!
const n = convert(Int64,N*T) # use -const- as a way to declare a variable to be global (so other functions can access it)
# generate the Xs
const X = cat(2,ones(N*T,1),5+3*randn(N*T,1),rand(N*T,1),
2.5+2*randn(N*T,1),15+3*randn(N*T,1),
.7-.1*randn(N*T,1),5+3*randn(N*T,1),
rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1),
5+3*randn(N*T,1),rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1));
# generate the betas (X coefficients)
const bAns = [ 2.15; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2 ]
# generate the std dev of the errors
const sigAns = 0.3
# generate the Ys from the Xs, betas, and error draws
draw = 0 + sigAns*randn(N*T,1)
const y = X*bAns+draw
# return generated data so that other functions (below) have access
return X,y,bAns,sigAns,n
end
```

Out[1]:

Now let's evaluate the function so that the function outputs are in the current scope:

In [3]:

```
X,y,bAns,sigAns,n = datagen(1e4,5);
```

We can estimate the `bAns`

parameter vector using OLS:

In [4]:

```
bHat = (X'*X)\(X'*y);
sigHat = sqrt((y-X*bHat)'*(y-X*bHat)/(n-size(X,2)));
```

And now we can compare the estimates with the solution:

In [5]:

```
[round([bHat;sigHat],3) [bAns;sigAns]]
```

Out[5]:

The estimator performs well.

`JuMP`

¶Now let's estimate this using maximum likelihood under the (valid) assumption that the errors in our `datagen()`

function are independently drawn from a normal distribution with mean 0 and standard deviation to be estimated.

Recall that the likelihood function for this scenario is: $$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \left( -\frac{\left(y_{i}-x_{i}\beta\right)^{2}}{2\sigma^{2}}\right) $$

or, more simply,

$$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sigma} \phi \left(\frac{y_{i}-x_{i}\beta}{\sigma}\right) $$where $\phi\left(\cdot\right)$ is the probability density function of the standard normal distribution.

In maximum likelihood estimation, we search for the parameters that maximize the `log`

likelihood function. Our function above becomes:

`JuMP`

syntax¶`JuMP`

requires the following syntax elements:

- model name
- solver
- variable (i.e. a parameter to search over)
- objective function
`solve()`

call

We will construct these line-by-line below.

First, let's decleare the model name and solver:

In [6]:

```
myMLE = Model(solver=IpoptSolver(tol=1e-6));
```

`JuMP`

which variables to search over. For the example above, this is the vector $\beta$ and the scalar $\sigma$.

In [7]:

```
@defVar(myMLE, b[i=1:size(X,2)]);
@defVar(myMLE, s>=0.0);
```

`JuMP`

that we are maximizing.

In [8]:

```
@setNLObjective(myMLE, Max, (n/2)*log(1/(2*pi*s^2))-sum{(y[i]-sum{X[i,k]*b[k],
k=1:size(X,2)})^2, i=1:size(X,1)}/(2s^2));
```

`sum{ ,i=1:size(X,1)}`

. We also have to specify the linear algebra implied by `X*b`

as another `sum`

, this time over $k$. This is because `JuMP`

doesn't understand matrix multiplication. This last fact adds a little bit of syntax to the objective function, but overall it is very readable.

`solve()`

call and ask `JuMP`

to perform the optimization:

In [9]:

```
status = solve(myMLE)
```

Out[9]:

`getValue()`

functions, applied to each separate parameter:

In [10]:

```
bOpt = getValue(b[:]);
sOpt = getValue(s);
```

In [11]:

```
[round([bOpt;sOpt],3) [bAns;sigAns]]
```

Out[11]:

In [12]:

```
getObjectiveValue(myMLE)
```

Out[12]:

In [13]:

```
this_par = myMLE.colVal
m_eval = JuMP.JuMPNLPEvaluator(myMLE);
MathProgBase.initialize(m_eval, [:ExprGraph, :Grad, :Hess])
hess_struct = MathProgBase.hesslag_structure(m_eval)
hess_vec = zeros(length(hess_struct[1]))
numconstr = length(m_eval.m.linconstr) + length(m_eval.m.quadconstr) + length(m_eval.m.nlpdata.nlconstr)
dimension = length(myMLE.colVal)
MathProgBase.eval_hesslag(m_eval, hess_vec, this_par, 1.0, zeros(numconstr))
this_hess_ld = sparse(hess_struct[1], hess_struct[2], hess_vec, dimension, dimension)
hOpt = this_hess_ld + this_hess_ld' - sparse(diagm(diag(this_hess_ld)));
hOpt = -full(hOpt); #since we are maximizing
seOpt = sqrt(diag(full(hOpt)\eye(size(hOpt,1))));
```

In [14]:

```
[round([bOpt;sOpt],3) round(seOpt,3)]
```

Out[14]:

We can compare these with the OLS standard errors and confirm that they are the same:

In [15]:

```
seHat = sqrt((sigHat^2).*diag((X'*X)\eye(size(X,2))));
[round(bHat,3) round(seHat,3)]
```

Out[15]:

Constraints can easily be added to the objective function. There are two ways of doing this:

- Directly to the
`@defVar`

statement, e.g.`@defVar(myMLE, s>=0.0);`

restricts`s`

to be weakly positive. - Additional lines after the set of
`@defVar`

statements, but before the`@setNLObjective`

statement. The syntax for this is, e.g.`@addConstraint(myMLE, b[15] == 0)`

which would restrict the 15th element of the`b`

vector to equal 0.

`JuMP`

accepts up to literally thousands of constraints, and its simple syntax makes coding much easier.

Pros of `JuMP`

:

- Versatile modeling language that takes advantage of
`Julia`

's features. - Interfaces to a variety of optimizers, meaning the user can simply call a different optimizer without adjusting any underlying code.
- Very fast. Roughly 3+ times faster than
`Matlab`

's`fminunc`

for a simple large-scale problem. - User can specify maximization or minimization on the fly without needed to re-express the objective function.

Cons of `JuMP`

:

- Because JuMP does constrained optimization if the user specifies constraints on any of the parameters. Thus, the hessian it returns is the hessian of the corresponding
*Lagrangian*, not the hessian of the*objective*. This makes inference a bit trickier. - No matrix multiplication allowed, so users must specify matrix operations in scalar form