[datatable-help] Fwd: Data table hanging on memory allocation failure

Matthew Dowle mdowle at mdowle.plus.com
Fri Aug 2 18:54:27 CEST 2013


Hi,

Interesting. To hone in on this my first quick thoughts are :
1. Try in plain R at the prompt rather than RStudio, just to isolate 
that for now.
2. Assign the result  dummy<-dt[,pt:=as.integer(p),by=list(sk, ik, pk)]; 
gc().  That shouldn't make a difference but when printing at the prompt 
(even just the head and tail) I'm aware that makes an internal copy of 
the whole object (to be fixed, and in the meantime a manual print(dt) 
avoids that copy).  If it's a script that's being run then maybe 
printing comes into it.
3. Is it after the last group has been processed, or during grouping?  
To establish this try printing the value of .GRP inside j; i.e.,  
dt[,pt:={print(.GRP);as.integer(p)},by=list(sk, ik, pk)]. This will give 
me a clue where it might be.
4. p is definitely a column of the table dt at that point?  If p is 
actually in calling scope it might be doing the wrong thing (over and 
over again).
5. Does it work with a much smaller subset of dt say 10 rows? Often this 
reveals that an incorrect (much larger result) is being computed.  Maybe 
related to allow.cartesian.
6. Set options(datatable.verbose=TRUE), run again from scratch in a new 
session and send us the output.  Might be a lot of it but we might get 
lucky, or give further clues.
7. Otherwise, something reproducible would be great if possible. In 
cases like this it doesn't have to reproduce the memory allocation 
problem,  it just has to be pasteable into a fresh R session and 
complete on small data.  Then I can stress test it myself and see if I 
can see where the leak or corruption is happening.

Matthew

On 02/08/13 15:43, Paul Harding wrote:
> Hi, I've got a big data table and I'm having memory allocation issues. 
> This isn't about the memory issue per se, rather it's about how it 
> gets handled.
>
> The table has 2M+ rows and is about 15G in size. Whilst manipulating 
> the table memory usage grows quite fast, and I'm having to manually 
> garbage collect after each manipulation. Even so it's possibly to 
> reach a point (there are a lot of other developers using this server 
> for all sorts of things) where even though there is 28GB memory free I 
> can't allocate a needed 944MB contiguous chunk.
>
> I get the usual error message and it would be convenient if data table 
> exited at that point (then I wouldn't lose my previous work), but it 
> just hangs:
>
> 02-06:30:38.8> dt[,pt:=as.integer(p),by=list(sk, ik, pk)]; gc()
> Error: cannot allocate vector of size 944.8 Mb
>
> And the world holds its breath ... and the world starts turning blue 
> ...I've left it like this for hours, nothing further happens.
>
> Windows Server 2008 R2 Enterprise SP1 // Intel Zeon CPU E7-4830 @ 
> 2.13Hhz 4 processors // 128GB memory installed, 28.7GB available, R 
> session 65GB
> R 3.0.0 data.table 1.8.9 rev 874
> RStudio 0.97
>
> Incidentally, after finishing a table manipulation and garbage 
> collecting the R session memory usage drops to 33GB. This is 
> consistent behaviour, there were 5 similar calls prior to this one 
> that executed successfully, with the same behavior ( garbage collected 
> after each). Almost as if there were a copy being made. But that's for 
> info, not shooting off at a tangent (I'll try and do some 
> investigation and maybe ask for help around the temporary memory 
> growth issue later).
>
> I would be really happy if data table exited on this error or if I had 
> that option, even if it's doing something very clever (waiting for 
> memory availability?) because it doesn't seem to succeed.
>
> Regards
> Paul
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130802/ad14418b/attachment.html>


More information about the datatable-help mailing list