[datatable-help] segfault with "large" number of rows
"Günter J. Hitsch"
guenter.hitsch at mac.com
Wed Jan 22 21:52:12 CET 2014
I’ve been using data.table for several months. It’s a great package—thank you for developing it!
Here’s my question: I’ve run into a problem when I use “large” data tables with many millions of rows. In particular, for such large data tables I get segmentation faults when I create columns by groups. Example:
N = 2500 # No. of groups
T = 100000 # No. of observations per group
DT = data.table(group = rep(1:N, each = T), x = 1)
setkey(DT, group)
DT[, sum_x := sum(x), by = group]
print(head(DT))
This runs fine. But when I increase the number of groups, say from 2500 to 3000, I get a segfault:
N = 3000 # No. of groups
T = 100000 # No. of observations per group
...
*** caught segfault ***
address 0x159069140, cause 'memory not mapped'
Traceback:
1: `[.data.table`(DT, , `:=`(sum_x, sum(x)), by = group)
2: DT[, `:=`(sum_x, sum(x)), by = group]
3: eval(expr, envir, enclos)
4: eval(ei, envir)
5: withVisible(eval(ei, envir))
I can reproduce this problem on:
(1) OS X 10.9, R 3.0.2, data.table 1.8.10
(2) Ubuntu 13.10, R 3.0.1, data.table 1.8.10
And of course the amount of RAM in my machines is not the issue.
Thanks in advance for your help with this!
Günter
More information about the datatable-help
mailing list