[datatable-help] [R] Using plyr::dply more (memory) efficiently?

Matthew Dowle mdowle at mdowle.plus.com
Sat May 1 15:47:21 CEST 2010


Fixed in setkey now. Test 150:158 added.

Steve, r-forge usually takes a night or two to update, then if you
re-install from r-forge and let us know if ok.

Matthew


On Fri, 2010-04-30 at 11:52 -0700, Short, Tom wrote:
> Interesting issue. Thanks, Steve.
> 
> I'd prefer a check or force reordering in setkey rather than in
> data.table or [.data.table. 
> 
> I'd rather not forbid out-of-order levels for non-key columns.
> Out-of-order levels are sometimes nice to get legends and panels in the
> order I like when plotting with lattice. 
> 
> By seems to work okay with out-of-order levels:
> 
> > a = data.table(a = rep(1:5, 2), b = factor(letters[rep(1:5, each =
> 2)], levels = letters[5:1]), key = "b")
> > a[J("b")] # the problem
>       a    b
> [1,] NA <NA>
> > a[, b, by = "a"]
>       a b
>  [1,] 1 c
>  [2,] 1 a
>  [3,] 2 d
>  [4,] 2 a
>  [5,] 3 d
>  [6,] 3 b
>  [7,] 4 e
>  [8,] 4 b
>  [9,] 5 e
> [10,] 5 c
> > a[, a, by = "b"]
>       b a
>  [1,] e 4
>  [2,] e 5
>  [3,] d 2
>  [4,] d 3
>  [5,] c 5
>  [6,] c 1
>  [7,] b 3
>  [8,] b 4
>  [9,] a 1
> [10,] a 2
> 
> - Tom
> 
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] On Behalf Of
> Matthew Dowle
> Sent: Friday, April 30, 2010 2:16 PM
> To: datatable-help at lists.r-forge.r-project.org
> Cc: lianos at cbio.mskcc.org
> Subject: Re: [datatable-help] [R] Using plyr::dply more (memory)
> efficiently?
> 
> 
> Looks like Steve found a bug, see below. [ He gave ok to forward to the
> list. ]  Thanks Steve.
> 
> If a data.frame df has a factor column x where the levels are not
> sorted, perhaps if its been created from somewhere else or other
> code, then dt=data.table(df) doesn't sort those levels.
> setkey(dt,x) then doesn't sort it, and lookup doesn't work.
> 
> Change could be in data.table (to make ssre all factor columns have
> sorted levels), or just in setkey for those columns in the key only.
> 
> Not sure if add hoc 'by' works ok on factor levels with out-of-order
> levels,  so the change might need to be in data.table().
> 
> Or something else I didn't think of. Any views?
> 
> In the meantime, one workaround to sort the levels :
> 
> check.cds$symbol = factor(as.character(check.cds$symbol))
> key(check.cds) = NULL   # to clear the key if its already there
> setkey(check.cds,symbol)
> 
> 
> Matthew
> 
> 
> On Thu, 2010-04-29 at 12:46 -0400, Steve Lianoglou wrote:
> > Actually, the keys aren't working for me as I expect. Witness that the
> > "symbol" column is defined as a key in the `check.cds` object:
> > 
> > R> tables()
> >      NAME          NROW MB COLS                                   KEY
> > [1,] check.cds   18,829 3  transcript,symbol,counts,exon.width symbol
> > [2,] intron      18,532 3  transcript,symbol,counts,exon.width
> > [3,] x           18,829 3  transcript,symbol,counts,exon.width
> > 
> > R> head(check.cds)
> >      transcript symbol counts exon.width
> > [1,]      OR4F5  OR4F5      0        125
> > [2,]     OR4F16 OR4F16      0          0
> > [3,]     OR4F29 OR4F29      0          0
> > [4,]      OR4F3  OR4F3      0          0
> > [5,]     SAMD11 SAMD11      3       2040
> > [6,]      NOC2L  NOC2L     12       1772
> > 
> > R> check.cds["NOC2L",]
> >      transcript symbol counts exon.width
> > [1,]       <NA>   <NA>     NA         NA
> > 
> > R> check.cds[symbol == "NOC2L",]
> >      transcript symbol counts exon.width
> > [1,]      NOC2L  NOC2L     12       1772
> > 
> > Am I doing something wrong?
> > 
> > I'm using R 2.11 and data.table_1.4
> > 
> > -steve
> > 
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-h
> elp




More information about the datatable-help mailing list