Forum: help

RE: standard errors in maxLik when supplying grad, hess arguments [ Reply ] By: Ott Toomet on 2016-06-02 02:44	[forum:43246]
I guess you are right :-) I don't want to dig too deep into this but I suspect you have a hyperplane that separates ind=1 and ind=0 cases. In such cases binary choice does not converge. You can test by setting the first ind in the iris data to 0: > iris[1,6] <- 0 now both glm and your maxLik code works. Cheers, Ott

RE: standard errors in maxLik when supplying grad, hess arguments [ Reply ]
By: Sam Ackerman on 2016-06-01 19:20

[forum:43245]

You know I think I figured out what is the issue. In this particular case, if you look a the logistic regression (which I should have) you see that the standard errors of the coefficients are very big. The regression does not converge. I tried a different dataset (myopia from https://www.umass.edu/statdata/statdata/stat-logistic.html) on a few variables and the algorithms worked fine. I thought at first it might be an issue with a small sample size but it turns out that what I gave was just a bad example for this. So it makes sense that the numeric approximations will be smaller than the functions. Since in my actual example I have a complicated log-likelihood with many variables, providing the functions is the better way to go since it seems to ensure that you will not get NA for the standard errors.

RE: standard errors in maxLik when supplying grad, hess arguments [ Reply ]
By: Sam Ackerman on 2016-05-31 17:59

[forum:43231]

Hi Ott,
Thanks for your quick response. If the standard errors are dependent on the numerical approximations (which seem in this case to give much lower standard errors), you are saying I shouldn't rely on them?

I tried using the BHHH method like you recommended. Here is what I am doing:

#modify the gradient function to produce a matrix (if as_matrix==TRUE) for the #BHHH function. The rest of the steps are the same as above.

gen_gf <- function(d, y, as_matrix=FALSE) {

X <- as.matrix(d)
p <- ncol(X)
n <- nrow(X)

if (as_matrix==FALSE) {

gf <- function(beta_par) {
g <- 1/(1+exp(-1*X%*%beta_par))
L <- c()
for (i in 1:p) {
L[ i ] <- sum(y*(1-g)*X[,i] - (1-y)*g*X[,i])
}
L
}
}

else {
gf <- function(beta_par) {
g <- 1/(1+exp(-1*X%*%beta_par))
L <- matrix(0, ncol=p, nrow=n)
for (i in 1:p) {
L[,i] <- y*(1-g)*X[,i] - (1-y)*g*X[,i]
}
L
}
}

gf
}

#now generate the gradient function. all else is the same

gf_bhhh <- gen_gf(d=iris[,1:4], y=iris$ind, as_matrix=TRUE)
gf_bhhh(init) #confirm, gives a 100x4 matrix

mle_func <- maxLik(logLik=llike_ml, start=init, grad=gf_bhhh, hess=hf, method="BHHH")
summary(mle_func)

#in this case the standard errors are even bigger than using the other two methods
#also note the estimates have bigger errors

Maximum Likelihood estimation
BHHH maximisation, 1 iterations
Return code 1: gradient close to zero
Log-Likelihood: -4.730006e-09
4 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
Sepal.Length 5.452e+00 4.151e+09 0 1
Sepal.Width 7.294e+00 6.050e+09 0 1
Petal.Length -1.096e+01 4.507e+09 0 1
Petal.Width -1.265e+01 4.979e+09 0 1
--------------------------------------------

RE: standard errors in maxLik when supplying grad, hess arguments [ Reply ]
By: Ott Toomet on 2016-05-28 02:35

[forum:43230]

Hey Sam,
thank you for the feedback :-)

Hope I can take a closer look tomorrow, but in general, numeric derivatives (in particular Hessian) are unreliable. BHHH is a way to decrease the numeric derivation errors, at a cost of introducing information equality related problems. Quite often it works well. You have to feed into the matrix the gradient components (in columns) of individual (independent!) observations (in rows).

I will look at your example tomorrow.

Cheers,
Ott

standard errors in maxLik when supplying grad, hess arguments [ Reply ]
By: Sam Ackerman on 2016-05-27 22:00

[forum:43229]

Hi,
I am working with the maxLik package using logistic likelihoods. My question is about the difference (of orders of magnitude) in standard errors between when the grad and hess arguments are supplied, as opposed to when they are not and the numeric approximations are used.

I supply an example below, see my notes in the comments. I was only using this as a simple test to understand maxLik.

gf and hf are the hessian functions given the data. I confirm using plugging in that they return essentially the same values when the initial values are plugged in. For the hessian, it differs a little, but then I plug in c(.5, .5, .5, .5) to show they are very close.

(1) My main concern is that with the BFGS method (the only one that doesn't return NA for standard errors when grad and hess are omitted) gives very high standard errors with the functions compared to the approximation. The estimates otherwise are close enough. Here is an example when I run it once.

If there is another method in R I can use that works better, let me know. I am relatively new to optimization algorithms.

(2) This question in the help forum (https://r-forge.r-project.org/forum/forum.php?thread_id=28362&forum_id=828&group_id=254) talks about using BHHH. However, if I try to use BHHH as my method, it returns the error "BHHH method requires that the gradient function (argument 'grad') returns a numeric matrix." How do I reformat the gf function to make it into a matrix?

Thank you for all your help!

#build simple logistic likelihood based on the iris dataset

library(maxLik)
data(iris)
head(iris)
summary(iris)
iris <- iris[iris$Species %in% c("setosa","virginica"), ]
iris$ind <- as.numeric(iris$Species=="setosa")

#functions to generate the log-likelihood, gradient, and hessian for this dataset

gen_loglik <- function(d, y, min=TRUE) {
#X <- as.matrix(cbind(1, d))
X <- as.matrix(d)
const <- ifelse(min==TRUE, -1, 1)

llike <- function(beta_par) {
e <- exp(X%*%beta_par)
y <- iris$ind
const*sum(y*log(e/(1+e)) + (1-y)*log(1/(1+e)))
}
llike
}

gen_gf <- function(d, y) {

X <- as.matrix(d)
p <- ncol(X)
n <- nrow(X)

gf <- function(beta_par) {
g <- 1/(1+exp(-1*X%*%beta_par))
L <- c()
for (i in 1:p) {
L[ i ] <- sum(y*(1-g)*X[,i] - (1-y)*g*X[,i])
}
# colSums( X*matrix(y*(1-g), ncol=p, nrow=n, byrow=FALSE) - X*matrix((1-y)*g, ncol=p, nrow=n, byrow=FALSE))
L
}
gf
}

gen_hf <- function(d, y) {
X <- as.matrix(d)
p <- ncol(X)
n <- nrow(X)

hf <- function(beta_par) {
g <- 1/(1+exp(-1*X%*%beta_par))
H <- matrix(0, ncol=p, nrow=p)

for (ii in 1:p) {
for (jj in ii:p) {
H[ii, jj] <- -1*sum(g*(1-g)*X[,ii]*X[,jj])
H[jj, ii] <- H[ii, jj]
}
}
H
}
hf

}

#generate log-likelihood (min=FALSE means maximize the LL), gradient, and hessian functions
llike_ml <- gen_loglik(d=iris[,1:4], y=iris$ind, min=FALSE)
gf <- gen_gf(d=iris[,1:4], y=iris$ind)
hf <- gen_hf(d=iris[,1:4], y=iris$ind)

#get true values from the logistic regression, add some noise to the coefficients as initial values
logmod <- glm(ind ~. -1, data=iris[,c(1:4, 6)], family="binomial")
init <- coef(logmod) + 3*rnorm(4)

#test to make sure the numerical approximations and the functions are basically the #same at the initial values, close to the MLE

numericGradient(f=llike_ml, t0=init)
gf(init)
numericHessian(f=llike_ml, t0=init)
hf(init)

numericHessian(f=llike_ml, t0=rep(.5,4))
hf(rep(.5,4))

#comparing the derivatives seems to give good results
compareDerivatives(f=llike_ml, grad=gf, hess=hf, t0=init)

#Try 4 different methods. The first (mle_func) uses the grad and hess arguments. #The second (mle_approx) omits these, using the numeric approximations.
#The last 3 methods (SANN, CG, and NR) give NA for the coefficient standard errors #when the approximation is used.

mle_func <- maxLik(logLik=llike_ml, start=init, grad=gf, hess=hf, method="BFGS")
summary(mle_func)
mle_approx <- maxLik(logLik=llike_ml, start=init,method="BFGS")
summary(mle_approx)

mle_func <- maxLik(logLik=llike_ml, start=init, grad=gf, hess=hf, method="SANN")
summary(mle_func)
mle_approx <- maxLik(logLik=llike_ml, start=init,method="SANN")
summary(mle_approx)

mle_func <- maxLik(logLik=llike_ml, start=init, grad=gf, hess=hf, method="CG")
summary(mle_func)
mle_approx <- maxLik(logLik=llike_ml, start=init,method="CG")
summary(mle_approx)

mle_func <- maxLik(logLik=llike_ml, start=init, grad=as.matrix(gf), hess=hf, method="BHHH")
summary(mle_func)
mle_approx <- maxLik(logLik=llike_ml, start=init,method="CG")
summary(mle_approx)

mle_func <- maxLik(logLik=llike_ml, start=init, grad=as.matrix(gf), hess=hf, method="NR")
summary(mle_func)
mle_approx <- maxLik(logLik=llike_ml, start=init,method="NR")
summary(mle_approx)

#try BHHH method, doesn't work

mle_func1 <- maxLik(logLik=llike_ml, start=init, grad=gf, hess=hf, method="BHHH")
summary(mle_func1)
mle_approx1 <- maxLik(logLik=llike_ml, start=init,method="CG")
summary(mle_approx1)