Maximum Likelihood Estimation

In this example, we show how to use FiniteDiff.jl to compute the maximum likelihood estimates for a univariate logistic regression model. We'll use FiniteDiff to approximate the gradients as part of our model fitting process. We'll also use FiniteDiff to compute an approximate Hessian, which we will use as an approximation of the observed Fisher information matrix that defines the standard errors for the logistic regression model. At the end, we'll compare our results with results computed in R as a way of showing that our work is correct – and that the approximations introduced by FiniteDiff are not the dominant source of errors in these computations.

The first thing we'll do is import all of the libraries we'll use. If you do not have these libraries installed, you'll need to use Pkg.add to install them before you can continue on with this example.

import Distributions: Normal, Bernoulli
import FiniteDiff: gradient, hessian
import Optim: optimize, LBFGS, minimizer
import RCall: @R_str

The link function for a logistic regression model is the inverse logit, so we define a function to compute this quantity:

invlogit(z) = 1 / (1 + exp(-z))

Next, we'll define a function to simulate data from a logistic regression model:

function simulate_logistic_regression(n_obs, β₀, β₁)
    x = rand(Normal(0, 1), n_obs)

    y = Array(Float64, n_obs)
    for i in 1:n_obs
        z_i = β₀ + β₁ * x[i]
        p_i = invlogit(z_i)
        y[i] = rand(Bernoulli(p_i))
    end

    return x, y
end

Once we have data available, we can define the negative log likelihood function using a closure that is a function of the parameter vector Θ, but which wraps the observed values of x and y so that Θ is the function's only argument:

function make_nll(x, y)
    function nll(θ)
        s = 0.0
        for i in 1:length(x)
            p = invlogit(θ[1] + θ[2] * x[i])
            s -= ifelse(y[i] == 1.0, log(p), log(1 - p))
        end
        s
    end

    return nll
end

We'll also want to be able to compute gradients. This is the first place in which we'll use FiniteDiff. Because Optim requires that the gradient function employ a different calling convention than the gradient functions that are generated by FiniteDiff, we'll need to wrap the results in a calling-convention translation layer:

function make_nll_gr!(nll, initial_x)
    tmp! = gradient(nll, mutates=true)
    buffer = similar(initial_x)

    nll_gr! = (x, gr) -> begin
        tmp!(gr, x, buffer);
        return
    end

    return nll_gr!
end

Now that we have the core machinery for model fitting in place, we can simulate some data and fit our model to it:

# Simulate data
x, y = simulate_logistic_regression(2500, 0.017, 0.217)

# Construct the negative log likelihood function.
nll = make_nll(x, y)

# Initialize our model parameters as a vector of all zeros.
initial_x = [0.0, 0.0]

# Construct the gradient of the negative log likelihood (NLL) function.
nll_gr! = make_nll_gr!(nll, initial_x)

# Pass the NLL function and its gradient to Optim.optimize to find the MLE.
res = optimize(nll, nll_gr!, initial_x, LBFGS())

# Extract the coefficients as the minimizer of the NLL function.
coefficients = minimizer(res)

# Evaluate the Hessian of the NLL function at the coefficients to approximate
#     the Fisher information matrix.
H = hessian(nll, coefficients)

# Use an analytic result relating the Hessian to the standard errors to compute
#     the standard errors for each coefficient.
standard_errors = sqrt(diag(inv(H)))

# Write our data to disk so we can confirm our results in R.
writecsv("example_data.csv", hcat(x, y))

# Use the RCall package's R macro to run R's glm function, which matches our
# results.
R"""
summary(
    glm(
        V2 ~ V1,
        data = read.csv('example_data.csv', header = FALSE),
        family = binomial(link = "logit")
    )
)
"""

# Before terminating the program, clean up the file we created.
rm("example_data.csv")

For readers interested in model fitting, we hope this example illustrates the power of FiniteDiff for approximating gradients and Hessians. In this example, it is easy to look up analytic forms for the gradient and Hessian, but in more complicated models the use of numeric differentiation can save time while testing out new models.