[datatable-help] Something seems funky. I think with character-to-factor conversion for keys (?)

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat Mar 5 03:43:49 CET 2011


Hi Mel,

On Fri, Mar 4, 2011 at 8:15 PM, Bacou, Melanie <mel at mbacou.com> wrote:
> Steve,
>
> Try instead:
>
> R> m2[J(9)]
>
> It seems your original entrez.id key is integer not character

It's actually a factor:

R> is(m2$entrez.id)
[1] "factor"   "integer"  "oldClass" "numeric"  "vector"

and moreover:

R> '9' %in% levels(m2$entrez.id)
[1] TRUE

and the integer J() maneuver is a no go:

R> Error in `[.data.table`(m2, J(9)) :
  x.entrez.id is a factor but joining to i.V1 which is not a factor.
Factors must join to factors.

> -- but to be honest I'm not sure why:
>
> R> m2[9]
>
> doesn't work either...

That works, in that it does something, but it just gets the 9th row of
m2, not the row whose key is '9'

Seems like something's strange is afoot here ...

-steve

> --Mel.
>
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Steve
> Lianoglou
> Sent: Friday, March 04, 2011 5:46 PM
> To: datatable-help at r-forge.wu-wien.ac.at
> Subject: [datatable-help] Something seems funky. I think with
> character-to-factor conversion for keys (?)
>
> I'll have to apologize in advance because I can't create a
> reproducible example for this behavior, but I'll keep trying .. please
> bear with me.
>
> Somehow I've ended up with a data.table `m2` that looks like this:
>
> R> m2
>      entrez.id total.tags.liver cds.liver intron.liver utr.liver
>  [1,]         9               27         0            0         0
>  [2,]        10              347         0            0         0
>  [3,]        12             5076         0           17         0
>  [4,]        13             2445         0            0         0
>  [5,]        18             2076         0            0         0
>  [6,]        20               15         0            0         0
>  [7,]        25               62         0            0         0
>  [8,]        32              320         0            0         0
>  [9,]        34             1377         0            0         0
> [10,]        35              757         0            0         0
> First 10 rows of 5236 printed.
>
> R> key(m2)
> [1] "entrez.id"
>
> R> any(duplicated(m2$entrez.id))
> [1] FALSE
>
> So far so good -- I stumbled on the following problem when `merge`-ing
> two large data tables which was giving me a stranger error. In the
> process of trying to smoke out the problem, I notice this unexpected
> behavior:
>
> ## This is expected
> R> subset(m2, entrez.id == '9')
>     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> [1,]         9               27         0            0         0
>
> ## This isn't
> R> m2['9']
>     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> [1,]         9               NA        NA           NA        NA
>
> Woops! Isn't that supposed to return the same as above?
>
> I can fix `m2` by manipulating the key column:
>
> R> key(m2) <- NULL ## probably not necessary
> R> m2$entrez.id <- as.character(m2$entrez.id)
> R> key(m2) <- 'entrez.id'
> R> m2['9']
>     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> [1,]         9               27         0            0         0
>
> (side note: the bug I mentioned when I try to `merge` this w/ another
> data.table is gone after I did the above fix).
>
> So -- I guess my point is that I'm not exactly sure how I got `m2` to
> have a funky key, but the fact that it got messed up like this somehow
> I think is undesired behavior, no?
>
> Does this point to something (maybe obvious) that happened on the way
> to building up `m2`?
>
> Thanks,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list