[datatable-help] [R] Using plyr::dply more (memory) efficiently?
Matthew Dowle
mdowle at mdowle.plus.com
Sat May 1 15:47:21 CEST 2010
Fixed in setkey now. Test 150:158 added.
Steve, r-forge usually takes a night or two to update, then if you
re-install from r-forge and let us know if ok.
Matthew
On Fri, 2010-04-30 at 11:52 -0700, Short, Tom wrote:
> Interesting issue. Thanks, Steve.
>
> I'd prefer a check or force reordering in setkey rather than in
> data.table or [.data.table.
>
> I'd rather not forbid out-of-order levels for non-key columns.
> Out-of-order levels are sometimes nice to get legends and panels in the
> order I like when plotting with lattice.
>
> By seems to work okay with out-of-order levels:
>
> > a = data.table(a = rep(1:5, 2), b = factor(letters[rep(1:5, each =
> 2)], levels = letters[5:1]), key = "b")
> > a[J("b")] # the problem
> a b
> [1,] NA <NA>
> > a[, b, by = "a"]
> a b
> [1,] 1 c
> [2,] 1 a
> [3,] 2 d
> [4,] 2 a
> [5,] 3 d
> [6,] 3 b
> [7,] 4 e
> [8,] 4 b
> [9,] 5 e
> [10,] 5 c
> > a[, a, by = "b"]
> b a
> [1,] e 4
> [2,] e 5
> [3,] d 2
> [4,] d 3
> [5,] c 5
> [6,] c 1
> [7,] b 3
> [8,] b 4
> [9,] a 1
> [10,] a 2
>
> - Tom
>
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] On Behalf Of
> Matthew Dowle
> Sent: Friday, April 30, 2010 2:16 PM
> To: datatable-help at lists.r-forge.r-project.org
> Cc: lianos at cbio.mskcc.org
> Subject: Re: [datatable-help] [R] Using plyr::dply more (memory)
> efficiently?
>
>
> Looks like Steve found a bug, see below. [ He gave ok to forward to the
> list. ] Thanks Steve.
>
> If a data.frame df has a factor column x where the levels are not
> sorted, perhaps if its been created from somewhere else or other
> code, then dt=data.table(df) doesn't sort those levels.
> setkey(dt,x) then doesn't sort it, and lookup doesn't work.
>
> Change could be in data.table (to make ssre all factor columns have
> sorted levels), or just in setkey for those columns in the key only.
>
> Not sure if add hoc 'by' works ok on factor levels with out-of-order
> levels, so the change might need to be in data.table().
>
> Or something else I didn't think of. Any views?
>
> In the meantime, one workaround to sort the levels :
>
> check.cds$symbol = factor(as.character(check.cds$symbol))
> key(check.cds) = NULL # to clear the key if its already there
> setkey(check.cds,symbol)
>
>
> Matthew
>
>
> On Thu, 2010-04-29 at 12:46 -0400, Steve Lianoglou wrote:
> > Actually, the keys aren't working for me as I expect. Witness that the
> > "symbol" column is defined as a key in the `check.cds` object:
> >
> > R> tables()
> > NAME NROW MB COLS KEY
> > [1,] check.cds 18,829 3 transcript,symbol,counts,exon.width symbol
> > [2,] intron 18,532 3 transcript,symbol,counts,exon.width
> > [3,] x 18,829 3 transcript,symbol,counts,exon.width
> >
> > R> head(check.cds)
> > transcript symbol counts exon.width
> > [1,] OR4F5 OR4F5 0 125
> > [2,] OR4F16 OR4F16 0 0
> > [3,] OR4F29 OR4F29 0 0
> > [4,] OR4F3 OR4F3 0 0
> > [5,] SAMD11 SAMD11 3 2040
> > [6,] NOC2L NOC2L 12 1772
> >
> > R> check.cds["NOC2L",]
> > transcript symbol counts exon.width
> > [1,] <NA> <NA> NA NA
> >
> > R> check.cds[symbol == "NOC2L",]
> > transcript symbol counts exon.width
> > [1,] NOC2L NOC2L 12 1772
> >
> > Am I doing something wrong?
> >
> > I'm using R 2.11 and data.table_1.4
> >
> > -steve
> >
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-h
> elp
More information about the datatable-help
mailing list