[datatable-help] segfault with "large" number of rows
Arunkumar Srinivasan
aragorn168b at gmail.com
Wed Jan 29 01:41:29 CET 2014
Hi Guenter,
CC: data.table list,
I filed this as bug #5305 and now we've now fixed it with commit 1100
v1.8.11. Thank you very much once again for reporting!
On Wed, Jan 22, 2014 at 9:52 PM, "Günter J. Hitsch"
<guenter.hitsch at mac.com>wrote:
>
> I’ve been using data.table for several months. It’s a great package—thank
> you for developing it!
>
> Here’s my question: I’ve run into a problem when I use “large” data
> tables with many millions of rows. In particular, for such large data
> tables I get segmentation faults when I create columns by groups. Example:
>
> N = 2500 # No. of groups
> T = 100000 # No. of observations per group
>
> DT = data.table(group = rep(1:N, each = T), x = 1)
> setkey(DT, group)
>
> DT[, sum_x := sum(x), by = group]
> print(head(DT))
>
> This runs fine. But when I increase the number of groups, say from 2500
> to 3000, I get a segfault:
>
> N = 3000 # No. of groups
> T = 100000 # No. of observations per group
>
> ...
>
> *** caught segfault ***
> address 0x159069140, cause 'memory not mapped'
>
> Traceback:
> 1: `[.data.table`(DT, , `:=`(sum_x, sum(x)), by = group)
> 2: DT[, `:=`(sum_x, sum(x)), by = group]
> 3: eval(expr, envir, enclos)
> 4: eval(ei, envir)
> 5: withVisible(eval(ei, envir))
>
>
> I can reproduce this problem on:
>
> (1) OS X 10.9, R 3.0.2, data.table 1.8.10
> (2) Ubuntu 13.10, R 3.0.1, data.table 1.8.10
>
> And of course the amount of RAM in my machines is not the issue.
>
> Thanks in advance for your help with this!
>
> Günter
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140129/4de229bd/attachment.html>
More information about the datatable-help
mailing list