[datatable-help] Memory issue

Gene Leynes gleynes+r at gmail.com
Tue Oct 16 00:50:15 CEST 2012


I've been using data.table heavily since UseR! 2012.

For the most part it's been nothing short of a magical panacea, until
today.

I had a strange problem where using setkey on a data table makes the file
huge (in comparison to what it should be) when it saves.

I spent hours trying to reproduce this with similar data that I could
share, but I couldn't get it to happen on simulated data.

Here is an outline of my process:
Data table 1 (DT1) is about 80 mb when I save
Data table 2 (DT2) is about 10 mb when I save
Data table 3 (DT3) = cbind(DT1, DT2)

Data table 3 is about 90 mb when I save  (so far so good)

If I set the key of DT3 to be a particular column (for me it's isotime),
suddently the table is 212 mb of disk space
If I change the key to something else, or set it to NULL it still takes 212
mb

HOWEVER, if I never set DT3's key to isotime, but I set it to another
column instead (like a "name" field), then the file only takes about 90 mb
as expected

The memory ballooning only happens with the save.  The actual "in memory"
values for these data sets are about the right size.

I need to step, but I can give more information tomorrow if you would like.

I'm using R 2.15.1 "Roasted Marshmallows" and a Windows 7 machine.  The
package version is data.table 1.8.2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121015/4fe89dc3/attachment.html>


More information about the datatable-help mailing list