[datatable-help] Efficient conversion of data table column to vector

Nicolas Chapados nicolas.chapados at gmail.com
Tue Aug 31 16:24:15 CEST 2010


Hi Matthew,

Many thanks for your quick reply.

I had indeed not realized that a[[colname]] would work just fine.  I also
found a[, eval(as.name(colname))] to work, albeit the syntax is messier than
the double bracket.

Thanks again,
+ Nicolas


On Tue, Aug 31, 2010 at 3:35 AM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Nicolas,
>
> Welcome to the list.
>
> Where the documentation mentions 'quoted' it means the quote() function
> to create an expression, not as in a character string. Alternatively you
> can use [[ in the usual way since a data.table is a list.
>
> > colexp = quote(y)   # rather than "y"
> > a[,eval(colexp)]
>  [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04 GMT"
>  [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08 GMT"
>  [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
>
> or
>
> > colname = "y"
> > a[[colname]]
>  [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04 GMT"
>  [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08 GMT"
>  [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
> >
>
> A single column name is a special case of expressions so although this
> can create a steeper learner curve, it results in more power and
> flexibility later.
>
> Suggestions on how to improve documentation so that 'quoting' is clearer
> are very welcome. I've added an item to the list so we don't forget.
>
> Matthew
>
>
> On Mon, 2010-08-30 at 23:59 -0400, Nicolas Chapados wrote:
> > Dear data.table friends and maintainers,
> >
> >
> > First, thanks to the authors for this excellent package: it really
> > fills a void in the R world.  However, I have a question: I'm looking
> > to have an efficient conversion of a data table object to a vector (of
> > the correct type) when querying a single column whose name is stored
> > in a variable.  As per the vignette and the FAQ, I use the syntax
> >
> >
> >     my.data.table[, colname, with=FALSE]
> >
> >
> > (where colname is a variable containing my desired column name) but
> > this returns another data table, not a vector.  Morever, the eval
> > syntax suggested in the FAQ simply does not work:
> >
> >
> >     my.data.table[, eval(colname)]
> >
> >
> > See example below.  I could use as.matrix on the result, but this
> > carries out undesirable type conversion in the case of columns
> > containing dates: see below.
> >
> >
> > Here is an example to reproduce this problem:
> >
> >
> > > require(data.table)
> > Loading required package: data.table
> > > a <- data.table(x=seq(1, 2, by=0.1), y=seq(as.POSIXct("2010-01-01"),
> > as.POSIXct("2010-01-11"), length.out=11))
> > > a
> >         x          y
> >  [1,] 1.0 2010-01-01
> >  [2,] 1.1 2010-01-02
> >  [3,] 1.2 2010-01-03
> >  [4,] 1.3 2010-01-04
> >  [5,] 1.4 2010-01-05
> >  [6,] 1.5 2010-01-06
> >  [7,] 1.6 2010-01-07
> >  [8,] 1.7 2010-01-08
> >  [9,] 1.8 2010-01-09
> > [10,] 1.9 2010-01-10
> > [11,] 2.0 2010-01-11
> > > colname <- "y"
> >
> >
> > ## The following returns a data table.  How can I get a vector, and
> > still preserve type information?
> > > a[, colname, with=FALSE]
> >                y
> >  [1,] 2010-01-01
> >  [2,] 2010-01-02
> >  [3,] 2010-01-03
> >  [4,] 2010-01-04
> >  [5,] 2010-01-05
> >  [6,] 2010-01-06
> >  [7,] 2010-01-07
> >  [8,] 2010-01-08
> >  [9,] 2010-01-09
> > [10,] 2010-01-10
> > [11,] 2010-01-11
> >
> >
> > ## The eval recipe suggested in the FAQ does not work.
> > > a[, eval(colname)]
> > [1] "y"
> >
> >
> > ## as.vector does not convert away from data.table
> > > as.vector(a[, colname, with=FALSE])
> >                y
> >  [1,] 2010-01-01
> >  [2,] 2010-01-02
> >  [3,] 2010-01-03
> >  [4,] 2010-01-04
> >  [5,] 2010-01-05
> >  [6,] 2010-01-06
> >  [7,] 2010-01-07
> >  [8,] 2010-01-08
> >  [9,] 2010-01-09
> > [10,] 2010-01-10
> > [11,] 2010-01-11
> > > class(as.vector(a[, colname, with=FALSE]))
> > [1] "data.table"
> >
> >
> > ## as.matrix loses type information (NOTE: in my case it is not
> > acceptable to
> > ## convert this character vector back to a POSIXct, due to the loss of
> > important
> > ## timezone information. Furthermore, this would be very inefficient.)
> > > as.matrix(a[, colname, with=FALSE])
> >       y
> >  [1,] "2010-01-01"
> >  [2,] "2010-01-02"
> >  [3,] "2010-01-03"
> >  [4,] "2010-01-04"
> >  [5,] "2010-01-05"
> >  [6,] "2010-01-06"
> >  [7,] "2010-01-07"
> >  [8,] "2010-01-08"
> >  [9,] "2010-01-09"
> > [10,] "2010-01-10"
> > [11,] "2010-01-11"
> > > mode(as.matrix(a[, colname, with=FALSE]))
> > [1] "character"
> >
> >
> > ## Finally, one could go through a data.frame, but this is inefficient
> > ## and it sorts of defeats the purpose of using data.table...
> > > as.data.frame(a[, colname, with=FALSE])[, colname]
> >  [1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04
> > EST"
> >  [5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08
> > EST"
> >  [9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"
> >
> >
> >
> >
> > So at this point, my imagination is running out and I'm turning to
> > this list for suggestions. This should seem to be a fairly frequent
> > use-case, and I'm surprised it does not appear to have previously been
> > addressed.
> >
> >
> > For the record, here is my sessionInfo()
> >
> >
> > > sessionInfo()
> > R version 2.9.2 (2009-08-24)
> > x86_64-pc-linux-gnu
> >
> >
> > locale:
> > C
> >
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> > other attached packages:
> > [1] data.table_1.4.1
> >
> >
> >
> >
> > Thanks in advance for any help!
> > + Nicolas Chapados
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20100831/4a05a143/attachment.htm>


More information about the datatable-help mailing list