Forum: help

RE: Help about maxLik [ Reply ]
By: Jean Pierre Munezero on 2019-11-27 22:57

There is std error issue in the results using my dataset with BHHH method. Suggestions!
> data<-c(48.1,48.3,48.3,48.3,48.4,48.5,48.6,48.6,48.7,48.7,48.7,49.1,49.2,49.2,49.3,49.4,50.0,50.0,50.0,50.4,50.5,50.6,50.7,50.8,50.9,51.0,51.1,51.3,51.3,51.5,51.6,51.8,51.8,51.8,51.9,52.0,52.2,52.2,52.3,52.4,52.7,52.7,52.7,52.9,52.9,53.0,53.2,53.2,53.4,53.4,53.6,53.7,53.7,54.0,54.3,54.4,54.5,54.6,54.6,54.7,54.9,55.3,55.3,55.4,55.5,55.7,55.8,55.8,55.8,56.0,56.2,56.2,56.2,56.5,56.9,57.0,57.2,57.2,57.3,57.3,57.4,57.7,57.8,57.9,57.9,58.0,58.2,58.4,58.9,59.1,59.1,59.3,59.3,59.9,60.0,60.0,60.1,60.1,60.5,60.5,60.8,60.9,61.1,61.2,61.3,61.4,61.6,61.6,61.9,62.0,62.1,62.1,62.4,62.5,62.7,62.8,63.4,63.8,63.8,63.8,64.0,64.0,64.3,64.4,64.4,64.5,65.3,65.4,65.5,66.0,66.1,66.3,66.7,66.8,67.1,67.3,67.6,68.0,68.1,68.1,68.2,68.8,69.0,69.1,69.5,69.9,70.0,70.0,70.1,70.2,70.8,70.8,71.6,71.9,72.0,72.0,72.4,72.8,75.9,76.5,77.0,78.0,78.1,78.4,78.7,80.5,81.0,84.3,84.8,84.8,86.3,86.5,86.5,87.0,87.8,87.8,88.1,88.2,90.0,93.4,93.9,94.1,95.2,95.7,98.2,101.1,102.2,106.2,107.7,107.7,108.4,108.9,110.3,126.5,126.5,133.8,135.2,136.9,138.1,148.9,156.4)
> ggp <- function(par, x) {
+ a <- par[1]
+ xi <- par[2]
+ sig <- par[3]
+ if(sig <= 0 | a <= 0)
+ return(NA)
+ z <- 1 + (xi*data)/sig
+ l <- -(a - 1)*log(xi) - log(sig) - log(gamma(a)) +
+ (a - 1)*log(log(z)) -
+ (1 + 1/xi)*log(z)
+ l
+ }
> res <- maxLik(logLik = ggp, start = c(a = 2, xi = 0.5, sig = 1.5), x = data, method = "BHHH")
> print(res)
Maximum Likelihood estimation
BHHH maximisation, 150 iterations
Return code 4: Iteration limit exceeded.
Log-Likelihood: -1171.24 (3 free parameter(s))
Estimate(s): 19.30969 0.1066101 0.4405043
> coef(res)
a xi sig
19.3096850 0.1066101 0.4405043
> summary(res)
--------------------------------------------
Maximum Likelihood estimation
BHHH maximisation, 150 iterations
Return code 4: Iteration limit exceeded.
Log-Likelihood: -1171.24
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
a 19.3097 Inf 0 1
xi 0.1066 Inf 0 1
sig 0.4405 Inf 0 1
--------------------------------------------

RE: Help about maxLik [ Reply ]
By: Ott Toomet on 2019-11-27 20:52

[forum:47212]

Modified your log-likelihood a bit so now it can use BHHH. It does not help :-( Whatever I do, the estimates are unstable. I suspect either:
* data does not fit the model, or so fits the model with xi=0 and the likelihood cannot really handle that.
* derivatives are unstable. Try to add the analytic score. But Nelder-Mead did not do any better...
* start with some sort of simpler data and mold it into a more complicated one step by step and see where it leads you...

I am also a little puzzled by the term xi^(a - 1) in (7). This sort of hints that either a-1 is positive integer, or so xi must be positive. But that does not seem to be required there.

Here is the modfied log-likelihood:

ggp <- function(par, x) {
a <- par[1]
ξ <- par[2]
σ <- par[3]
if(σ <= 0 | a <= 0)
return(NA)
z <- 1 + (ξ*x)/σ
l <- -(a - 1)*log(ξ) - log(σ) - log(gamma(a)) +
(a - 1)*log(log(z)) -
(1 + 1/ξ)*log(z)
l
}

RE: Help about maxLik [ Reply ]
By: Jean Pierre Munezero on 2019-11-27 18:38

[forum:47210]

Thank you for the reply. Yes I am using (7) as the likelihood. The first dataset I shared with you is the data that I am using to model a pdf. The other datasets are the datasets used in that paper. My concern is about the loglikelihood. I want it to work with all the sets of data but it gives me correct values for one of the datasets used in the paper but not for the other dataset. I would like to use the likelihood on my dataset but I cannot use it because I am failing replicate the results of that paper in one of the dataset. See section 4: for the log-likelihood.

RE: Help about maxLik [ Reply ] By: Ott Toomet on 2019-11-27 17:44	[forum:47209]
Thanks. Do I understand correctly: * is your likelihood function coming from the pdf in (7) in the paper? * you are directly modeling pdf, not the hazard rate? * the data is not censored?

RE: Help about maxLik [ Reply ] By: Jean Pierre Munezero on 2019-11-27 17:24	[forum:47208]
Dear Ott, You can check out in the following paper De Andrade, Thiago AN, et al. "The gamma generalized pareto distribution with applications in survival analysis." International Journal of Statistics and Probability 6.3 (2017).

RE: Help about maxLik [ Reply ]
By: Ott Toomet on 2019-11-27 16:32

[forum:47206]

Did you attach a paper? I cannot see it here (not sure either how it works...) Maybe you try again or just give the reference? I'd like to see what we are dealing with before I can suggest solutions.

Otherwise, I'd recommend you to try BHHH method. This requires though that you return not sum of log-likelihood but log-likelihoods by individual observations. I think it is feasible in this case but I cannot really tell unless I have seen a bit of the background...

Ott

RE: Help about maxLik [ Reply ]
By: Jean Pierre Munezero on 2019-11-27 13:24

[forum:47204]

de2017gamma.pdf (30) downloads

Thank you for your response
I tried this log-likelihood function to the dataset with small observations (30) and It works. I got the same results as in the attached paper but for another dataset of 212 observations, it does not give me the same results. Would you be so kind and give me some advice?
For example:
> dt<-c(1.43,0.11,0.71,0.77,2.63,1.49,3.46,2.46,0.59,0.74,1.23,0.94,4.36,0.40,1.74,4.73,2.23,0.45,0.70,1.06,1.46,0.30,1.82,2.37,0.63,1.23,1.24,1.97,1.86,1.17)
> fn<- function(par,x) {
+ a=exp(par[1])
+ sig=exp(par[2])
+ alfa=exp(par[3])
+ z=1+((alfa*dt)/sig)
+ l= (a-1)*(-n*log(alfa)+sum(log(log(z))))-n*(log(sig*gamma(a)))-(1+(1/alfa))*sum(log(z))
+ }
> n=length(dt)
> out <- maxLik(logLik = fn, start = c(a = 2, sig = 1.5, alfa = 0.5), x = dt, method = "SANN")
> print(out)
Maximum Likelihood estimation
SANN maximization, 10000 iterations
Return code 0: successful convergence
Log-Likelihood: -39.62543 (3 free parameter(s))
Estimate(s): 0.7427839 -0.348895 -3.548347
> coef(out)
a sig alfa
0.7427839 -0.3488950 -3.5483473
> summary(out)
--------------------------------------------
Maximum Likelihood estimation
SANN maximization, 10000 iterations
Return code 0: successful convergence
Log-Likelihood: -39.62543
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
a 0.7428 0.3761 1.975 0.0483 *
sig -0.3489 0.6089 -0.573 0.5666
alfa -3.5483 5.3857 -0.659 0.5100
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
but for this below dataset with 212 observations from that paper also, it does not work
> d <- c(194,413,90,74,55,23,97,50,359,50,130,487,102,15,14,10,57,320,261,51,44,9,254,493,18,209,41,58,60,48,56,87,11,102,12,5,100,14,37,186,29,104,7,4,72,270,283,7,57,33,100,61,502,220,120,141,22,603,35,98,54,181,65,49,12,239,14,18,39,3,12,5,32,9,14,70,47,62,142,3,104,85,67,169,24,21,246,47,68,15,2,91,59,447,56,29,176,225,77,197,438,43,134,184,20,386,182,71,80,188,230,152,36,79,59,33,246,1,79,3,27,201,84,27,21,16,88,130,14,118,44,15,42,106,46,230,59,153,104,20,206,5,66,34,29,26,35,5,82,5,61,31,118,326,12,54,36,34,18,25,120,31,22,18,156,11,216,139,67,310,3,46,210,57,76,14,111,97,62,26,71,39,30,7,44,11,63,23,22,23,14,18,13,34,62,11,191,14,16,18,130,90,163,208,1,24,70,16,101,52,208,95)
> n=length(d)
> fn<- function(par,x) {
+ a=par[1]
+ sig=par[2]
+ alfa=par[3]
+ z=1+((alfa*d)/sig)
+ l= (a-1)*(-n*log(alfa)+sum(log(log(z))))-n*(log(sig*gamma(a)))-(1+(1/alfa))*sum(log(z))
+ }
> mle <- maxLik(logLik = fn, start = c(a = 2, sig = 1.5, alfa = 0.5), x = d, method = "BFGS")
There were 28 warnings (use warnings() to see them)
> print(mle)
Maximum Likelihood estimation
BFGS maximization, 121 iterations
Return code 0: successful convergence
Log-Likelihood: -1171.137 (3 free parameter(s))
Estimate(s): 1.163209 59.75303 0.2481807
> coef(mle)
a sig alfa
1.1632095 59.7530320 0.2481807
> summary(mle)
--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 121 iterations
Return code 0: successful convergence
Log-Likelihood: -1171.137
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
a 1.16321 0.06184 18.811 < 2e-16 ***
sig 59.75303 NA NA NA
alfa 0.24818 0.06508 3.813 0.000137 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
Warning messages:
1: In sqrt(diag(vc)) : NaNs produced
2: In sqrt(diag(vc)) : NaNs produced
There is not a particular reason to use SANN method. Unfortunately, I am not familiar with the distributions and I am not a good programmer

RE: Help about maxLik [ Reply ]
By: Ott Toomet on 2019-11-27 01:51

[forum:47200]

I dont know your likelihood function but:
a) NA-s mean your final position is not a concave point (it appears to be a saddle point)
b) do you have a particular reason to use SANN optimizer and 10000 iterations? I get positive std errors if I increase iterlim t0 100k or 1M
c) I also get positive std. errs if I use BFGS instead of SANN.
d) in any case, the results seem to be unstable (noticeably different values if I play with that). This Suggests that either something is wrong with the function, or (quite likely) the numeric gradient is just too unstable to be useful.

Help about maxLik [ Reply ]
By: Jean Pierre Munezero on 2019-11-24 15:17

[forum:47174]

I am working on extremes using R. I am trying to find the maximum likelihood estimator of gamma generalized pareto distribution based on the simulated annealing method using maxLik package. But I can't estimate these parameters for gamma generalized pareto distribution using SANN method. More specifically, I am writing
data<-c(48.1,48.3,48.3,48.3,48.4,48.5,48.6,48.6,48.7,48.7,48.7,49.1,49.2,49.2,49.3,49.4,50.0,50.0,50.0,50.4,50.5,50.6,50.7,50.8,50.9,51.0,51.1,51.3,51.3,51.5,51.6,51.8,51.8,51.8,51.9,52.0,52.2,52.2,52.3,52.4,52.7,52.7,52.7,52.9,52.9,53.0,53.2,53.2,53.4,53.4,53.6,53.7,53.7,54.0,54.3,54.4,54.5,54.6,54.6,54.7,54.9,55.3,55.3,55.4,55.5,55.7,55.8,55.8,55.8,56.0,56.2,56.2,56.2,56.5,56.9,57.0,57.2,57.2,57.3,57.3,57.4,57.7,57.8,57.9,57.9,58.0,58.2,58.4,58.9,59.1,59.1,59.3,59.3,59.9,60.0,60.0,60.1,60.1,60.5,60.5,60.8,60.9,61.1,61.2,61.3,61.4,61.6,61.6,61.9,62.0,62.1,62.1,62.4,62.5,62.7,62.8,63.4,63.8,63.8,63.8,64.0,64.0,64.3,64.4,64.4,64.5,65.3,65.4,65.5,66.0,66.1,66.3,66.7,66.8,67.1,67.3,67.6,68.0,68.1,68.1,68.2,68.8,69.0,69.1,69.5,69.9,70.0,70.0,70.1,70.2,70.8,70.8,71.6,71.9,72.0,72.0,72.4,72.8,75.9,76.5,77.0,78.0,78.1,78.4,78.7,80.5,81.0,84.3,84.8,84.8,86.3,86.5,86.5,87.0,87.8,87.8,88.1,88.2,90.0,93.4,93.9,94.1,95.2,95.7,98.2,101.1,102.2,106.2,107.7,107.7,108.4,108.9,110.3,126.5,126.5,133.8,135.2,136.9,138.1,148.9,156.4)
n=length(data)
ggp<- function(par,x) {
a=par[1]
sig=par[2]
alfa=par[3]
z=1+((alfa*data)/sig)
l= (a-1)*(-n*log(alfa)+sum(log(log(z))))-n*(log(sig*gamma(a)))-(1+(1/alfa))*sum(log(z))
}
res <- maxLik(logLik = ggp, start = c(a = 2, sig = 1.5, alfa = 0.5), x = data, method = "SANN")
summary(res)
and I am taking the follow message from R
> data<-c(48.1,48.3,48.3,48.3,48.4,48.5,48.6,48.6,48.7,48.7,48.7,49.1,49.2,49.2,49.3,49.4,50.0,50.0,50.0,50.4,50.5,50.6,50.7,50.8,50.9,51.0,51.1,51.3,51.3,51.5,51.6,51.8,51.8,51.8,51.9,52.0,52.2,52.2,52.3,52.4,52.7,52.7,52.7,52.9,52.9,53.0,53.2,53.2,53.4,53.4,53.6,53.7,53.7,54.0,54.3,54.4,54.5,54.6,54.6,54.7,54.9,55.3,55.3,55.4,55.5,55.7,55.8,55.8,55.8,56.0,56.2,56.2,56.2,56.5,56.9,57.0,57.2,57.2,57.3,57.3,57.4,57.7,57.8,57.9,57.9,58.0,58.2,58.4,58.9,59.1,59.1,59.3,59.3,59.9,60.0,60.0,60.1,60.1,60.5,60.5,60.8,60.9,61.1,61.2,61.3,61.4,61.6,61.6,61.9,62.0,62.1,62.1,62.4,62.5,62.7,62.8,63.4,63.8,63.8,63.8,64.0,64.0,64.3,64.4,64.4,64.5,65.3,65.4,65.5,66.0,66.1,66.3,66.7,66.8,67.1,67.3,67.6,68.0,68.1,68.1,68.2,68.8,69.0,69.1,69.5,69.9,70.0,70.0,70.1,70.2,70.8,70.8,71.6,71.9,72.0,72.0,72.4,72.8,75.9,76.5,77.0,78.0,78.1,78.4,78.7,80.5,81.0,84.3,84.8,84.8,86.3,86.5,86.5,87.0,87.8,87.8,88.1,88.2,90.0,93.4,93.9,94.1,95.2,95.7,98.2,101.1,102.2,106.2,107.7,107.7,108.4,108.9,110.3,126.5,126.5,133.8,135.2,136.9,138.1,148.9,156.4)
> n=length(data)
> ggp<- function(par,x) {
+ a=par[1]
+ sig=par[2]
+ alfa=par[3]
+ z=1+((alfa*data)/sig)
+ l= (a-1)*(-n*log(alfa)+sum(log(log(z))))-n*(log(sig*gamma(a)))-(1+(1/alfa))*sum(log(z))
+ }
> res <- maxLik(logLik = ggp, start = c(a = 2, sig = 1.5, alfa = 0.5), x = data, method = "SANN")
There were 50 or more warnings (use warnings() to see the first 50)
> print(res)
Maximum Likelihood estimation
SANN maximization, 10000 iterations
Return code 0: successful convergence
Log-Likelihood: -861.163 (3 free parameter(s))
Estimate(s): 11.9501 5.452735 0.005803676
> summary(out1)
Error in summary(out1) : object 'out1' not found
> summary(res)
--------------------------------------------
Maximum Likelihood estimation
SANN maximization, 10000 iterations
Return code 0: successful convergence
Log-Likelihood: -861.163
3 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
a 11.950098 NA NA NA
sig 5.452735 NA NA NA
alfa 0.005804 0.005722 1.014 0.31
--------------------------------------------
Warning messages:
1: In sqrt(diag(vc)) : NaNs produced
2: In sqrt(diag(vc)) : NaNs produced
Why there are NA for std error and t values?
Could you please help me with my problem?
Any assistance would be greatly appreciated!