[datatable-help] Unexpectedly getting "Didn't allocate enough rows..."

Matthew Dowle mdowle at mdowle.plus.com
Thu Jul 22 15:43:37 CEST 2010


This is now fixed, just committed. Bug #952 raised by Georg closed.
Tests 173-175 added which includes Harish's test below.
If anyone can test and confirm, much appreciated.
Matthew


On Tue, 2010-07-13 at 00:24 +0100, Matthew Dowle wrote: 
> Thanks once again Harish. When I made the changes for fast grouping I
> didn't quite finish it off hence the 'to implement' in the message. I
> didn't want to hold up 1.4 going to CRAN because of it. Sure enough
> pretty quickly after that, and more quickly than I expected, Georg V
> added it to the bug tracker (#952) and I've been meaning to get to it.
> 
> What happens is this. It allocates memory for the largest group and
> re-uses that same memory for all groups. No allocation and no garbage
> collection. Thats one reason its fast on the input data side of things
> for each eval(j).  However on the output side its also fast because it
> allocates the data.table result in advance. When it gets the result of
> the j expression for each group, it sticks that data directly into the
> result data.table at the correct row. It doesn't build a list() of
> results which is then collapsed down.
> 
> It can't possibly know how many rows to allocate in advance though,
> until it has run the j for all the groups, right? True, so it tries to
> make a very good guess, optimised for most tasks. Most of the time we
> either do i) single row aggregates (j is sum, mean, lm etc), or ii) a
> subset of the group data (j is cumprod or [ or similar returning
> multiple rows per group) or iii) NULL for the side effect of plotting
> where no data output is required.
> 
> First, it runs the j for the first group. Depending on the number of
> rows returned by the j on the first group it decides how to allocate the
> result. If that is a single row for example, it allocates 1*number of
> groups rows for the result. Most of the time thats what we need. Then it
> proceeds to the 2nd group etc.
> 
> If it gets the guess wrong, then it needs to re-allocate memory for the
> result using information from the later groups. Thats what isn't done
> yet.  Its the right way to do it I think, but the re-allocate just isn't
> implemented yet. In the vast majority of cases, it should only need one
> re-allocate.
> 
> 'slow grow' means the method of either building up a growing list()
> which is later collapsed, or growing the result slowly for example in
> powers of two or by a fixed number of rows somehow.
> 
> Why the first group? I did try with the largest group which improves the
> guess, but that messes up side-effect only plotting. The plot appears
> for the largest group first, followed by group 1, group 2, etc. It
> wasn't right and even then a re-allocate might still be needed. So its
> cleaner to run for the first group, then make the good guess, then
> proceed through groups 2 to n.
> 
> Long answer to a simple question I'm afraid.
> 
> Btw, you don't need to wrap with list in that example :
>    DT[ , list(C[ C-min(C) < 5 ]), by=list(A,B) ]
> you can just do this :
>    DT[ , C[ C-min(C) < 5 ], by=list(A,B) ]
> 
> I'll see if I can implement the re-allocate soon. Or if there any C
> programmers listening, then its this line
>    // TO DO: implement R_realloc(?) here
> that needs doing in dogroups.c.
> 
> Matthew
> 
> 
> On Sun, 2010-07-11 at 02:09 -0700, Harish wrote:
> > I am unexpectedly getting an error -- Didn't allocate enough rows. Must grow ans (to implement as we don't want default slow grow)
> > 
> > 
> > DT <- data.table(
> >          A=c("a","a","b","b","d","c","a","d"),
> >          B=c("x1","x2","x2","x1","x2","x1","x1","x2"),
> >          C=c(5,2,3,4,9,5,1,9)
> >          )
> > DT[ , list(C[ C-min(C) < 3 ]), by=list(A,B) ]    # Get error
> > 
> > DT[ , list(C[ C-min(C) < 5 ]), by=list(A,B) ]    # No error (as expected)
> > 
> > 
> > Am I doing something that I shouldn't be?
> > 
> > 
> > Harish
> > 
> > 
> > 
> >       
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help





More information about the datatable-help mailing list