[datatable-help] Efficient conversion of data table column to vector
Nicolas Chapados
nicolas.chapados at gmail.com
Tue Aug 31 05:59:55 CEST 2010
Dear data.table friends and maintainers,
First, thanks to the authors for this excellent package: it really fills a
void in the R world. However, I have a question: I'm looking to have an
efficient conversion of a data table object to a vector (of the correct
type) when querying a single column whose name is stored in a variable. As
per the vignette and the FAQ, I use the syntax
my.data.table[, colname, with=FALSE]
(where colname is a variable containing my desired column name) but this
returns another data table, not a vector. Morever, the eval syntax
suggested in the FAQ simply does not work:
my.data.table[, eval(colname)]
See example below. I could use as.matrix on the result, but this carries
out undesirable type conversion in the case of columns containing dates: see
below.
Here is an example to reproduce this problem:
> require(data.table)
Loading required package: data.table
> a <- data.table(x=seq(1, 2, by=0.1), y=seq(as.POSIXct("2010-01-01"),
as.POSIXct("2010-01-11"), length.out=11))
> a
x y
[1,] 1.0 2010-01-01
[2,] 1.1 2010-01-02
[3,] 1.2 2010-01-03
[4,] 1.3 2010-01-04
[5,] 1.4 2010-01-05
[6,] 1.5 2010-01-06
[7,] 1.6 2010-01-07
[8,] 1.7 2010-01-08
[9,] 1.8 2010-01-09
[10,] 1.9 2010-01-10
[11,] 2.0 2010-01-11
> colname <- "y"
## The following returns a data table. How can I get a vector, and still
preserve type information?
> a[, colname, with=FALSE]
y
[1,] 2010-01-01
[2,] 2010-01-02
[3,] 2010-01-03
[4,] 2010-01-04
[5,] 2010-01-05
[6,] 2010-01-06
[7,] 2010-01-07
[8,] 2010-01-08
[9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11
## The eval recipe suggested in the FAQ does not work.
> a[, eval(colname)]
[1] "y"
## as.vector does not convert away from data.table
> as.vector(a[, colname, with=FALSE])
y
[1,] 2010-01-01
[2,] 2010-01-02
[3,] 2010-01-03
[4,] 2010-01-04
[5,] 2010-01-05
[6,] 2010-01-06
[7,] 2010-01-07
[8,] 2010-01-08
[9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11
> class(as.vector(a[, colname, with=FALSE]))
[1] "data.table"
## as.matrix loses type information (NOTE: in my case it is not acceptable
to
## convert this character vector back to a POSIXct, due to the loss of
important
## timezone information. Furthermore, this would be very inefficient.)
> as.matrix(a[, colname, with=FALSE])
y
[1,] "2010-01-01"
[2,] "2010-01-02"
[3,] "2010-01-03"
[4,] "2010-01-04"
[5,] "2010-01-05"
[6,] "2010-01-06"
[7,] "2010-01-07"
[8,] "2010-01-08"
[9,] "2010-01-09"
[10,] "2010-01-10"
[11,] "2010-01-11"
> mode(as.matrix(a[, colname, with=FALSE]))
[1] "character"
## Finally, one could go through a data.frame, but this is inefficient
## and it sorts of defeats the purpose of using data.table...
> as.data.frame(a[, colname, with=FALSE])[, colname]
[1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04 EST"
[5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08 EST"
[9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"
So at this point, my imagination is running out and I'm turning to this list
for suggestions. This should seem to be a fairly frequent use-case, and I'm
surprised it does not appear to have previously been addressed.
For the record, here is my sessionInfo()
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu
locale:
C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.4.1
Thanks in advance for any help!
+ Nicolas Chapados
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20100830/7e7f2304/attachment-0001.htm>
More information about the datatable-help
mailing list