[datatable-help] [R] Using plyr::dply more (memory) efficiently?
Matthew Dowle
mdowle at mdowle.plus.com
Fri Apr 30 20:15:35 CEST 2010
Looks like Steve found a bug, see below. [ He gave ok to forward to the
list. ] Thanks Steve.
If a data.frame df has a factor column x where the levels are not
sorted, perhaps if its been created from somewhere else or other
code, then dt=data.table(df) doesn't sort those levels.
setkey(dt,x) then doesn't sort it, and lookup doesn't work.
Change could be in data.table (to make ssre all factor columns have
sorted levels), or just in setkey for those columns in the key only.
Not sure if add hoc 'by' works ok on factor levels with out-of-order
levels, so the change might need to be in data.table().
Or something else I didn't think of. Any views?
In the meantime, one workaround to sort the levels :
check.cds$symbol = factor(as.character(check.cds$symbol))
key(check.cds) = NULL # to clear the key if its already there
setkey(check.cds,symbol)
Matthew
On Thu, 2010-04-29 at 12:46 -0400, Steve Lianoglou wrote:
> Actually, the keys aren't working for me as I expect. Witness that the
> "symbol" column is defined as a key in the `check.cds` object:
>
> R> tables()
> NAME NROW MB COLS KEY
> [1,] check.cds 18,829 3 transcript,symbol,counts,exon.width symbol
> [2,] intron 18,532 3 transcript,symbol,counts,exon.width
> [3,] x 18,829 3 transcript,symbol,counts,exon.width
>
> R> head(check.cds)
> transcript symbol counts exon.width
> [1,] OR4F5 OR4F5 0 125
> [2,] OR4F16 OR4F16 0 0
> [3,] OR4F29 OR4F29 0 0
> [4,] OR4F3 OR4F3 0 0
> [5,] SAMD11 SAMD11 3 2040
> [6,] NOC2L NOC2L 12 1772
>
> R> check.cds["NOC2L",]
> transcript symbol counts exon.width
> [1,] <NA> <NA> NA NA
>
> R> check.cds[symbol == "NOC2L",]
> transcript symbol counts exon.width
> [1,] NOC2L NOC2L 12 1772
>
> Am I doing something wrong?
>
> I'm using R 2.11 and data.table_1.4
>
> -steve
>
More information about the datatable-help
mailing list