[datatable-help] Efficient conversion of data table column to vector

Nicolas Chapados nicolas.chapados at gmail.com
Tue Aug 31 05:59:55 CEST 2010


Dear data.table friends and maintainers,

First, thanks to the authors for this excellent package: it really fills a
void in the R world.  However, I have a question: I'm looking to have an
efficient conversion of a data table object to a vector (of the correct
type) when querying a single column whose name is stored in a variable.  As
per the vignette and the FAQ, I use the syntax

    my.data.table[, colname, with=FALSE]

(where colname is a variable containing my desired column name) but this
returns another data table, not a vector.  Morever, the eval syntax
suggested in the FAQ simply does not work:

    my.data.table[, eval(colname)]

See example below.  I could use as.matrix on the result, but this carries
out undesirable type conversion in the case of columns containing dates: see
below.

Here is an example to reproduce this problem:

> require(data.table)
Loading required package: data.table
> a <- data.table(x=seq(1, 2, by=0.1), y=seq(as.POSIXct("2010-01-01"),
as.POSIXct("2010-01-11"), length.out=11))
> a
        x          y
 [1,] 1.0 2010-01-01
 [2,] 1.1 2010-01-02
 [3,] 1.2 2010-01-03
 [4,] 1.3 2010-01-04
 [5,] 1.4 2010-01-05
 [6,] 1.5 2010-01-06
 [7,] 1.6 2010-01-07
 [8,] 1.7 2010-01-08
 [9,] 1.8 2010-01-09
[10,] 1.9 2010-01-10
[11,] 2.0 2010-01-11
> colname <- "y"

## The following returns a data table.  How can I get a vector, and still
preserve type information?
> a[, colname, with=FALSE]
               y
 [1,] 2010-01-01
 [2,] 2010-01-02
 [3,] 2010-01-03
 [4,] 2010-01-04
 [5,] 2010-01-05
 [6,] 2010-01-06
 [7,] 2010-01-07
 [8,] 2010-01-08
 [9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11

## The eval recipe suggested in the FAQ does not work.
> a[, eval(colname)]
[1] "y"

## as.vector does not convert away from data.table
> as.vector(a[, colname, with=FALSE])
               y
 [1,] 2010-01-01
  [2,] 2010-01-02
 [3,] 2010-01-03
 [4,] 2010-01-04
 [5,] 2010-01-05
 [6,] 2010-01-06
 [7,] 2010-01-07
 [8,] 2010-01-08
 [9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11
> class(as.vector(a[, colname, with=FALSE]))
[1] "data.table"

## as.matrix loses type information (NOTE: in my case it is not acceptable
to
## convert this character vector back to a POSIXct, due to the loss of
important
## timezone information. Furthermore, this would be very inefficient.)
> as.matrix(a[, colname, with=FALSE])
       y
 [1,] "2010-01-01"
 [2,] "2010-01-02"
 [3,] "2010-01-03"
 [4,] "2010-01-04"
 [5,] "2010-01-05"
 [6,] "2010-01-06"
 [7,] "2010-01-07"
 [8,] "2010-01-08"
 [9,] "2010-01-09"
[10,] "2010-01-10"
[11,] "2010-01-11"
> mode(as.matrix(a[, colname, with=FALSE]))
[1] "character"

## Finally, one could go through a data.frame, but this is inefficient
## and it sorts of defeats the purpose of using data.table...
> as.data.frame(a[, colname, with=FALSE])[, colname]
 [1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04 EST"
 [5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08 EST"
 [9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"


So at this point, my imagination is running out and I'm turning to this list
for suggestions. This should seem to be a fairly frequent use-case, and I'm
surprised it does not appear to have previously been addressed.

For the record, here is my sessionInfo()

> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu

locale:
C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.4.1


Thanks in advance for any help!
+ Nicolas Chapados
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20100830/7e7f2304/attachment-0001.htm>


More information about the datatable-help mailing list