[datatable-help] Bug in subsetting/iterating over a data.table with1 row(?)

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jan 10 17:08:53 CET 2011


Thanks Matthew .. I'll update my local copy.

-steve


On Mon, Jan 10, 2011 at 3:18 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> Hi Steve,
> Fixed now.
> Thanks, Matthew
>
> On Thu, 2011-01-06 at 09:48 -0500, Steve Lianoglou wrote:
>> Hi Matthew,
>>
>> On Thu, Jan 6, 2011 at 5:26 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> >
>> > How about writing it this way. This way should invoke the incremental binary
>> > search for efficiency too, rather than a repeated binary search for each by.
>> >
>> >> dt2[dt1[, .SD[1], by=list(name, place)],mult="all"]
>> >     name place length
>> > [1,]    a  home     10
>> > [2,]    a  home    100
>> >
>> >> dt2[dt3[, .SD[1], by=list(name, place)],mult="all"]
>> >     name place length
>> > [1,]    a  home     10
>> > [2,]    a  home    100
>> > [3,]    b  work     20
>>
>> While doing it this way works for this trivial case, I'm actually
>> doing a fair bit of book keeping/computation in my j expression and
>> returning a list of elements that you can't really get from simple
>> joins and stuff.
>>
>> > But doing it your way should work too, so I'll add as a bug.
>>
>> Thanks. I'm currently working around this by adding a dummy row into
>> my dt1 data.table, which I then remove after the `dogroups` stuff
>> finishes.
>>
>> -steve
>>
>> > Another way to get the first row of each group is a fast self-join via i.
>> > There was a thread on that some time ago when i was changed to be evaluated
>> > within the frame of DT too. Something like :
>> >
>> >   dt3[J(unique(name)), mult="first"]    # first of each group
>> >
>> > HTH
>> > Matthew
>> >
>> >
>> > "Steve Lianoglou" <mailinglist.honeypot at gmail.com> wrote in message
>> > news:AANLkTinEhEXFGKWNze8goCn-67fuJmcE3Zu8YOZAxPYk at mail.gmail.com...
>> > Hi,
>> >
>> > I'm calculating some statistics over a large data.table via `dt[,
>> > {somestuff}, by=list(key1,key2)]`.
>> > Sometimes my dt data.table ends up only having one row, which results
>> > in the following error:
>> >
>> >  "Didn't allocate enough rows for result of first group."
>> >
>> > Here is a toy/trivial example.
>> >
>> > R> dt1 <- data.table(name='a', place='home', count=1, key='name,place')
>> > R> dt2 <- data.table(name=c('a', 'a', 'a', 'b'),
>> >                  place=c('home', 'work', 'home', 'work'),
>> >                  length=c(10,20,100, 20), key='name,place')
>> >
>> > R> dt1[, list(length=dt2[J(.SD$name[1], .SD$place[1]),
>> > mult='all']$length), by=list(name, place)]
>> > Error in `[.data.table`(dt1, , list(length = dt2[J(.SD$name[1],
>> > .SD$place[1]),  :
>> >  Didn't allocate enough rows for result of first group.
>> >
>> > When my data.table has > 1 row, it works:
>> >
>> > R> dt3 <- data.table(name=c('a', 'b'), place=c('home', 'work'),
>> > count=1:2, key='name,place')
>> > R> dt3[, list(length=dt2[J(.SD$name[1], .SD$place[1]),
>> > mult='all']$length), by=list(name, place)]
>> >     name place length
>> > [1,]    a  home     10
>> > [2,]    a  home    100
>> > [3,]    b  work     20
>> >
>> > I believe if the result of my {somestuff} expression only ever
>> > returned one row, this bug wouldn't happen, but .... it doesn't just
>> > do that :-)
>> >
>> > It looks like the fix is where the `byretn` value is calculated in the
>> > `[.data.table` but that code is a somehow inscrutable at first glance
>> > ... can anyone propose a quick fix?
>> >
>> > Thanks,
>> > -steve
>> >
>> > --
>> > Steve Lianoglou
>> > Graduate Student: Computational Systems Biology
>> > | Memorial Sloan-Kettering Cancer Center
>> > | Weill Medical College of Cornell University
>> > Contact Info: http://cbio.mskcc.org/~lianos/contact
>> >
>> >
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>>
>>
>>
>
>
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list