[datatable-help] [R] Using plyr::dply more (memory) efficiently?
Short, Tom
TShort at epri.com
Fri Apr 30 20:52:00 CEST 2010
Interesting issue. Thanks, Steve.
I'd prefer a check or force reordering in setkey rather than in
data.table or [.data.table.
I'd rather not forbid out-of-order levels for non-key columns.
Out-of-order levels are sometimes nice to get legends and panels in the
order I like when plotting with lattice.
By seems to work okay with out-of-order levels:
> a = data.table(a = rep(1:5, 2), b = factor(letters[rep(1:5, each =
2)], levels = letters[5:1]), key = "b")
> a[J("b")] # the problem
a b
[1,] NA <NA>
> a[, b, by = "a"]
a b
[1,] 1 c
[2,] 1 a
[3,] 2 d
[4,] 2 a
[5,] 3 d
[6,] 3 b
[7,] 4 e
[8,] 4 b
[9,] 5 e
[10,] 5 c
> a[, a, by = "b"]
b a
[1,] e 4
[2,] e 5
[3,] d 2
[4,] d 3
[5,] c 5
[6,] c 1
[7,] b 3
[8,] b 4
[9,] a 1
[10,] a 2
- Tom
-----Original Message-----
From: datatable-help-bounces at lists.r-forge.r-project.org
[mailto:datatable-help-bounces at lists.r-forge.r-project.org] On Behalf Of
Matthew Dowle
Sent: Friday, April 30, 2010 2:16 PM
To: datatable-help at lists.r-forge.r-project.org
Cc: lianos at cbio.mskcc.org
Subject: Re: [datatable-help] [R] Using plyr::dply more (memory)
efficiently?
Looks like Steve found a bug, see below. [ He gave ok to forward to the
list. ] Thanks Steve.
If a data.frame df has a factor column x where the levels are not
sorted, perhaps if its been created from somewhere else or other
code, then dt=data.table(df) doesn't sort those levels.
setkey(dt,x) then doesn't sort it, and lookup doesn't work.
Change could be in data.table (to make ssre all factor columns have
sorted levels), or just in setkey for those columns in the key only.
Not sure if add hoc 'by' works ok on factor levels with out-of-order
levels, so the change might need to be in data.table().
Or something else I didn't think of. Any views?
In the meantime, one workaround to sort the levels :
check.cds$symbol = factor(as.character(check.cds$symbol))
key(check.cds) = NULL # to clear the key if its already there
setkey(check.cds,symbol)
Matthew
On Thu, 2010-04-29 at 12:46 -0400, Steve Lianoglou wrote:
> Actually, the keys aren't working for me as I expect. Witness that the
> "symbol" column is defined as a key in the `check.cds` object:
>
> R> tables()
> NAME NROW MB COLS KEY
> [1,] check.cds 18,829 3 transcript,symbol,counts,exon.width symbol
> [2,] intron 18,532 3 transcript,symbol,counts,exon.width
> [3,] x 18,829 3 transcript,symbol,counts,exon.width
>
> R> head(check.cds)
> transcript symbol counts exon.width
> [1,] OR4F5 OR4F5 0 125
> [2,] OR4F16 OR4F16 0 0
> [3,] OR4F29 OR4F29 0 0
> [4,] OR4F3 OR4F3 0 0
> [5,] SAMD11 SAMD11 3 2040
> [6,] NOC2L NOC2L 12 1772
>
> R> check.cds["NOC2L",]
> transcript symbol counts exon.width
> [1,] <NA> <NA> NA NA
>
> R> check.cds[symbol == "NOC2L",]
> transcript symbol counts exon.width
> [1,] NOC2L NOC2L 12 1772
>
> Am I doing something wrong?
>
> I'm using R 2.11 and data.table_1.4
>
> -steve
>
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-h
elp
More information about the datatable-help
mailing list