[datatable-help] Bug when by=key(DT)

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Feb 24 04:05:27 CET 2011


Hi,

I'm running data.table 1.5.4 (but this also fails w/ data.table in SVN).

One of the bullet points in the news for version 1.5.3 was:

  o    'by' may now be a character vector of column names.
       This allows syntax such as DT[,sum(x),by=key(DT)].

But when the result of the subgroup iteration/summary returns less
rows than the original subgroup, it fails.

For example:

R> library(data.table)
R> dt <- data.table(name=c('a', 'a', 'a', 'b', 'b', 'c', 'c', 'c'),
start=sample(1:50, 8))
R> dt$end <- dt$start + sample(1:50, 8)
R> key(dt) <- 'name'

This is OK:

R> dt[, list(start=max(start), end=max(end)), by='name']
     name start end
[1,]    a    47  69
[2,]    b    35  48
[3,]    c    26  52

This isn't:

R> dt[, list(start=max(start), end=max(end)), by=key(dt)]
Error in `[[<-.data.frame`(`*tmp*`, jj, value = 1:3) :
  replacement has 3 rows, data has 8



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list