[datatable-help] Bug in subsetting/iterating over a data.table with1 row(?)

Matthew Dowle mdowle at mdowle.plus.com
Thu Jan 6 11:26:17 CET 2011


How about writing it this way. This way should invoke the incremental binary 
search for efficiency too, rather than a repeated binary search for each by.

> dt2[dt1[, .SD[1], by=list(name, place)],mult="all"]
     name place length
[1,]    a  home     10
[2,]    a  home    100

> dt2[dt3[, .SD[1], by=list(name, place)],mult="all"]
     name place length
[1,]    a  home     10
[2,]    a  home    100
[3,]    b  work     20
>

But doing it your way should work too, so I'll add as a bug.

Another way to get the first row of each group is a fast self-join via i. 
There was a thread on that some time ago when i was changed to be evaluated 
within the frame of DT too. Something like :

   dt3[J(unique(name)), mult="first"]    # first of each group

HTH
Matthew


"Steve Lianoglou" <mailinglist.honeypot at gmail.com> wrote in message 
news:AANLkTinEhEXFGKWNze8goCn-67fuJmcE3Zu8YOZAxPYk at mail.gmail.com...
Hi,

I'm calculating some statistics over a large data.table via `dt[,
{somestuff}, by=list(key1,key2)]`.
Sometimes my dt data.table ends up only having one row, which results
in the following error:

  "Didn't allocate enough rows for result of first group."

Here is a toy/trivial example.

R> dt1 <- data.table(name='a', place='home', count=1, key='name,place')
R> dt2 <- data.table(name=c('a', 'a', 'a', 'b'),
                  place=c('home', 'work', 'home', 'work'),
                  length=c(10,20,100, 20), key='name,place')

R> dt1[, list(length=dt2[J(.SD$name[1], .SD$place[1]),
mult='all']$length), by=list(name, place)]
Error in `[.data.table`(dt1, , list(length = dt2[J(.SD$name[1],
.SD$place[1]),  :
  Didn't allocate enough rows for result of first group.

When my data.table has > 1 row, it works:

R> dt3 <- data.table(name=c('a', 'b'), place=c('home', 'work'),
count=1:2, key='name,place')
R> dt3[, list(length=dt2[J(.SD$name[1], .SD$place[1]),
mult='all']$length), by=list(name, place)]
     name place length
[1,]    a  home     10
[2,]    a  home    100
[3,]    b  work     20

I believe if the result of my {somestuff} expression only ever
returned one row, this bug wouldn't happen, but .... it doesn't just
do that :-)

It looks like the fix is where the `byretn` value is calculated in the
`[.data.table` but that code is a somehow inscrutable at first glance
... can anyone propose a quick fix?

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact 





More information about the datatable-help mailing list