[datatable-help] Something seems funky. I think with character-to-factor conversion for keys (?)

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Mar 4 23:46:17 CET 2011


I'll have to apologize in advance because I can't create a
reproducible example for this behavior, but I'll keep trying .. please
bear with me.

Somehow I've ended up with a data.table `m2` that looks like this:

R> m2
      entrez.id total.tags.liver cds.liver intron.liver utr.liver
 [1,]         9               27         0            0         0
 [2,]        10              347         0            0         0
 [3,]        12             5076         0           17         0
 [4,]        13             2445         0            0         0
 [5,]        18             2076         0            0         0
 [6,]        20               15         0            0         0
 [7,]        25               62         0            0         0
 [8,]        32              320         0            0         0
 [9,]        34             1377         0            0         0
[10,]        35              757         0            0         0
First 10 rows of 5236 printed.

R> key(m2)
[1] "entrez.id"

R> any(duplicated(m2$entrez.id))
[1] FALSE

So far so good -- I stumbled on the following problem when `merge`-ing
two large data tables which was giving me a stranger error. In the
process of trying to smoke out the problem, I notice this unexpected
behavior:

## This is expected
R> subset(m2, entrez.id == '9')
     entrez.id total.tags.liver cds.liver intron.liver utr.liver
[1,]         9               27         0            0         0

## This isn't
R> m2['9']
     entrez.id total.tags.liver cds.liver intron.liver utr.liver
[1,]         9               NA        NA           NA        NA

Woops! Isn't that supposed to return the same as above?

I can fix `m2` by manipulating the key column:

R> key(m2) <- NULL ## probably not necessary
R> m2$entrez.id <- as.character(m2$entrez.id)
R> key(m2) <- 'entrez.id'
R> m2['9']
     entrez.id total.tags.liver cds.liver intron.liver utr.liver
[1,]         9               27         0            0         0

(side note: the bug I mentioned when I try to `merge` this w/ another
data.table is gone after I did the above fix).

So -- I guess my point is that I'm not exactly sure how I got `m2` to
have a funky key, but the fact that it got messed up like this somehow
I think is undesired behavior, no?

Does this point to something (maybe obvious) that happened on the way
to building up `m2`?

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list