SCM

[#1331] segfault of MatrixModels::glm4(sparse=TRUE) with interaction term including a factor of ~34.000-levels

Date:
2011-03-16 14:19
Priority:
3
State:
Open
Submitted by:
Philip R. Kensche (pkensche)
Assigned to:
Nobody (None)
Hardware:
PC
Product:
None
Operating System:
Linux
Component:
None
Version:
None
Severity:
normal
Resolution:
None
URL:
Summary:
segfault of MatrixModels::glm4(sparse=TRUE) with interaction term including a factor of ~34.000-levels

Detailed description
Hi,

I try to fit a model with a 34.000 level factor input interacting with a 2-level factor.
The data look like this:

> str(z)
'data.frame': 407028 obs. of 5 variables:
$ factorB : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
$ replicate: Factor w/ 13 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
$ factorA : Factor w/ 33919 levels "1","2","3","4",..: 2253 13953 16821 15272 29539 2215 2217 13458 10491 25005 ...
$ value : int 37 34 0 0 0 0 0 0 0 0 ...
$ exposure : int 575366 575366 575366 575366 575366 575366 575366 575366 575366 575366 ...

This is the result when fitting the model:

> zm <- glm4(sqrt(value / exposure) ~ factorA + factorA:factorB - 1,
+ data=z,
+ family=gaussian(link="identity"),
+ sparse=TRUE);

*** caught segfault ***
address 0x2b7308f8f000, cause 'memory not mapped'

Traceback:
1: .Call(Csparse_submatrix, x, ii, NULL)
2: subCsp_rows(x, i, drop = drop)
3: x[i = i, , drop = TRUE]
4: x[i = i, , drop = TRUE]
5: eval(expr, envir, enclos)
6: eval(call, sys.frame(sys.parent()))
7: callGeneric(x, i = i, , drop = TRUE)
8: Y[rep(seq_len(ny), each = nx), ]
9: Y[rep(seq_len(ny), each = nx), ]
10: sparse2int(sparseInt.r(rList[-m], do.names = do.names), rList[[m]], do.names = do.names)
11: F(sparse2int(sparseInt.r(rList[-m], do.names = do.names), rList[[m]], do.names = do.names))
12: sparseInt.r(lapply(nmSplits, getR), do.names = TRUE, forceSparse = TRUE)
13: model.spmatrix(t, data, transpose = transpose, drop.unused.levels = drop.unused.levels, row.names = row.names)
14: sparse.model.matrix(object, data = data, contrasts.arg = contrasts.arg, drop.unused.levels = drop.unused.levels, xlev = xlev, ...)
15: model.Matrix(formula, mf, sparse = sparse, drop.unused.levels = drop.unused.levels)
16: .class1(object)
17: as(model.Matrix(formula, mf, sparse = sparse, drop.unused.levels = drop.unused.levels), "predModule")
18: initialize(value, ...)
19: initialize(value, ...)
20: new("glpModel", call = call, resp = mkRespMod(mf, family), pred = as(model.Matrix(formula, mf, sparse = sparse, drop.unused.levels = drop.unused.levels), "predModule"))
21: glm4(sqrt(value/exposure) ~ factorA + factorA:factorB - 1, data = z, family = gaussian(link = "identity"), sparse = TRUE)


I would appreciate your assistance in this!
Regards,

Philip


> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] MatrixModels_0.2-1 Matrix_0.999375-46 lattice_0.19-17

loaded via a namespace (and not attached):
[1] grid_2.12.2

Comments:

Message  ↓
Date: 2022-10-23 23:52
Sender: Ben Bolker

FWIW I can't get a segfault with current version. Results depend on whether data are balanced or not.

Unbalanced: problem is rank-deficient. glm4 gets a cholmod factorization error (presumably this is correct), glmmTMB (devel version) reduces the model matrix appropriately and gets somewhere.

Balanced: glm4 fails with "problem too big", glmmTMB fails with a memory allocation error


```
R Under development (unstable) (2022-10-14 r83104)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
```

```
set.seed(101)
n <- 12*34000 ## 407028
nf <- 34000
dd <- data.frame(y = rnorm(n),
fB = factor(rep(1:2, each = n/2)),
fA = factor(rep(seq(nf), length.out = n))
)
## make sure results are balanced (avoid rank-deficiency)
with(dd, table(table(fA, fB)))

ddu <- transform(dd,
fA = sample(nf, size = n, replace = TRUE))

with(ddu, table(table(fA, fB)))
## unbalanced

library(MatrixModels)
glm4(y ~ fA + fA:fB - 1, data = ddu,
family = gaussian(link = "identity"),
sparse = TRUE)
## Error in subCsp_rows(x, i, drop = drop) :
## Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89


library(glmmTMB)
g1 <- glmmTMB(y ~ fA + fA:fB - 1, data = ddu,
family = gaussian(link = "identity"),
sparseX = TRUE)
## In .checkRankX(TMBStruc, control$rank_check) :
## fixed effects in conditional model are rank deficient
## Error: cannot allocate vector of size 206.7 Gb

```

Date: 2018-03-12 15:34
Sender: Martin Maechler

This has the same "inner design" bug in sparse2int(),
but the result, a segmentation fault is of course much worse than a decent R error; notably as the above back trace would indicate that there is
an error even in the X[i, , drop=TRUE] operation (for very large 'i')..
I can accept R to say that something is too large,
but a segfault is grave.

If you could find a reproducible example, constructing largish X[] *and*
show that subsetting aka indexing fails with a seg.fault, that would be an important issue. Thank you in advance, if you find time trying

Date: 2015-03-12 18:45
Sender: brandon willard

Isn't this a dupe of #1330?

Attached Files:

Changes

No Changes Have Been Made to This Item

Thanks to:
Vienna University of Economics and Business Powered By FusionForge