[datatable-help] Unexpectedly getting "Didn't allocate enough rows..."

Matthew Dowle mdowle at mdowle.plus.com
Tue Aug 3 01:47:18 CEST 2010


Glad to hear that - thanks for confirming.
Matthew

P.S. This post is also excuse to test if new nabble mirror is
working ...

On Mon, 2010-08-02 at 09:01 -0700, Harish wrote:
> It looks good.  Thanks.
> 
> Harish
> 
> 
> --- On Thu, 7/22/10, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> 
> > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > Subject: Re: [datatable-help] Unexpectedly getting "Didn't allocate enough rows..."
> > To: "Harish" <harishv_99 at yahoo.com>, datatable-help at lists.r-forge.r-project.org
> > Date: Thursday, July 22, 2010, 6:43 AM
> > This is now fixed, just committed.
> > Bug #952 raised by Georg closed.
> > Tests 173-175 added which includes Harish's test below.
> > If anyone can test and confirm, much appreciated.
> > Matthew
> > 
> > 
> > On Tue, 2010-07-13 at 00:24 +0100, Matthew Dowle wrote: 
> > > Thanks once again Harish. When I made the changes for
> > fast grouping I
> > > didn't quite finish it off hence the 'to implement' in
> > the message. I
> > > didn't want to hold up 1.4 going to CRAN because of
> > it. Sure enough
> > > pretty quickly after that, and more quickly than I
> > expected, Georg V
> > > added it to the bug tracker (#952) and I've been
> > meaning to get to it.
> > > 
> > > What happens is this. It allocates memory for the
> > largest group and
> > > re-uses that same memory for all groups. No allocation
> > and no garbage
> > > collection. Thats one reason its fast on the input
> > data side of things
> > > for each eval(j).  However on the output side its
> > also fast because it
> > > allocates the data.table result in advance. When it
> > gets the result of
> > > the j expression for each group, it sticks that data
> > directly into the
> > > result data.table at the correct row. It doesn't build
> > a list() of
> > > results which is then collapsed down.
> > > 
> > > It can't possibly know how many rows to allocate in
> > advance though,
> > > until it has run the j for all the groups, right?
> > True, so it tries to
> > > make a very good guess, optimised for most tasks. Most
> > of the time we
> > > either do i) single row aggregates (j is sum, mean, lm
> > etc), or ii) a
> > > subset of the group data (j is cumprod or [ or similar
> > returning
> > > multiple rows per group) or iii) NULL for the side
> > effect of plotting
> > > where no data output is required.
> > > 
> > > First, it runs the j for the first group. Depending on
> > the number of
> > > rows returned by the j on the first group it decides
> > how to allocate the
> > > result. If that is a single row for example, it
> > allocates 1*number of
> > > groups rows for the result. Most of the time thats
> > what we need. Then it
> > > proceeds to the 2nd group etc.
> > > 
> > > If it gets the guess wrong, then it needs to
> > re-allocate memory for the
> > > result using information from the later groups. Thats
> > what isn't done
> > > yet.  Its the right way to do it I think, but the
> > re-allocate just isn't
> > > implemented yet. In the vast majority of cases, it
> > should only need one
> > > re-allocate.
> > > 
> > > 'slow grow' means the method of either building up a
> > growing list()
> > > which is later collapsed, or growing the result slowly
> > for example in
> > > powers of two or by a fixed number of rows somehow.
> > > 
> > > Why the first group? I did try with the largest group
> > which improves the
> > > guess, but that messes up side-effect only plotting.
> > The plot appears
> > > for the largest group first, followed by group 1,
> > group 2, etc. It
> > > wasn't right and even then a re-allocate might still
> > be needed. So its
> > > cleaner to run for the first group, then make the good
> > guess, then
> > > proceed through groups 2 to n.
> > > 
> > > Long answer to a simple question I'm afraid.
> > > 
> > > Btw, you don't need to wrap with list in that example
> > :
> > >    DT[ , list(C[ C-min(C) < 5 ]),
> > by=list(A,B) ]
> > > you can just do this :
> > >    DT[ , C[ C-min(C) < 5 ], by=list(A,B)
> > ]
> > > 
> > > I'll see if I can implement the re-allocate soon. Or
> > if there any C
> > > programmers listening, then its this line
> > >    // TO DO: implement R_realloc(?) here
> > > that needs doing in dogroups.c.
> > > 
> > > Matthew
> > > 
> > > 
> > > On Sun, 2010-07-11 at 02:09 -0700, Harish wrote:
> > > > I am unexpectedly getting an error -- Didn't
> > allocate enough rows. Must grow ans (to implement as we
> > don't want default slow grow)
> > > > 
> > > > 
> > > > DT <- data.table(
> > > >         
> > A=c("a","a","b","b","d","c","a","d"),
> > > >         
> > B=c("x1","x2","x2","x1","x2","x1","x1","x2"),
> > > >         
> > C=c(5,2,3,4,9,5,1,9)
> > > >          )
> > > > DT[ , list(C[ C-min(C) < 3 ]), by=list(A,B)
> > ]    # Get error
> > > > 
> > > > DT[ , list(C[ C-min(C) < 5 ]), by=list(A,B)
> > ]    # No error (as expected)
> > > > 
> > > > 
> > > > Am I doing something that I shouldn't be?
> > > > 
> > > > 
> > > > Harish
> > > > 
> > > > 
> > > > 
> > > >       
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > 
> > > 
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> > 
> > 
> >
> 
> 
> 
>       




More information about the datatable-help mailing list