[datatable-help] Efficient conversion of data table column to vector

Matthew Dowle mdowle at mdowle.plus.com
Thu Sep 2 08:57:45 CEST 2010


It was the NEWS file I was remembering. 'quoted' was used twice and
should have been 'quote()-ed'. Fixed now. See NEWS link on homepage.
Matthew

On Wed, 2010-09-01 at 02:07 +0100, Matthew Dowle wrote:
> Thanks David. It seems I was remembering emails or posts about
> quote()-ing; that doesn't actually appear in the documentation.
> Apologies to Nicolas who was mislead by FAQ 1.5.
> 
> I've added FAQ 1.6 and added use of [[ into FAQ 1.5, closing FR #693 and
> #1038. I'm using your "quote()-ed" style too now - that's neat.
> 
> Background ... the last sentence of FAQ 1.5 used to be correct in that
> mycol="x";DT[,eval(mycol)] *did* return the column data. That works in
> 1.4.1 on CRAN.  However FAQ 1.1 and 1.2 are not true with 1.4.1. Fixing
> that for consistency (see NEWS and posts) made DT[,eval(mycol)] untrue.
> Think we're there now hopefully.
> 
> Latest committed vignettes are now on the homepage (after one hour to
> publish) rather than links to the CRAN ones. If those changes to FAQ 1.5
> and 1.6 aren't fully ok please just shout.
> 
> Thanks.
> 
> 
> 
> On Tue, 2010-08-31 at 16:17 -0400, David Winsemius wrote:
> > I sent this to Matthew offlist but he wants it "on the record", so  
> > here is what I sent:
> > On Aug 31, 2010, at 11:56 AM, David Winsemius wrote:
> > 
> > >
> > > On Aug 31, 2010, at 3:35 AM, Matthew Dowle wrote:
> > >
> > >>
> > >> Nicolas,
> > >>
> > >> Welcome to the list.
> > >>
> > >> Where the documentation mentions 'quoted' it means the quote()  
> > >> function
> > >> to create an expression, not as in a character string.
> > >
> > 
> > Matthew;
> > 
> > I think you really should look at FAQ 1.5. It says nothing about  
> > "quoted". It does appear to imply that if someone had executed:
> > 
> > colname="x"
> > 
> > ... that both DT[, colname, with=FALSE]  and DT[, eval(colname)]  
> > should "work". Now you are saying that isn't so, that only the first  
> > will return anything like the expected result.
> > 
> > -- 
> > David
> > >
> > >> Alternatively you
> > >> can use [[ in the usual way since a data.table is a list.
> > >>
> > >>> colexp = quote(y)   # rather than "y"
> > >>> a[,eval(colexp)]
> > >> [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04  
> > >> GMT"
> > >> [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08  
> > >> GMT"
> > >> [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
> > >>
> > >> or
> > >>
> > >>> colname = "y"
> > >>> a[[colname]]
> > >> [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04  
> > >> GMT"
> > >> [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08  
> > >> GMT"
> > >> [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
> > >>>
> > >>
> > >> A single column name is a special case of expressions so although  
> > >> this
> > >> can create a steeper learner curve, it results in more power and
> > >> flexibility later.
> > >>
> > >> Suggestions on how to improve documentation so that 'quoting' is  
> > >> clearer
> > >> are very welcome. I've added an item to the list so we don't forget.
> > >>
> > >> Matthew
> > >>
> > >>
> > >> On Mon, 2010-08-30 at 23:59 -0400, Nicolas Chapados wrote:
> > >>> Dear data.table friends and maintainers,
> > >>>
> > >>>
> > >>> First, thanks to the authors for this excellent package: it really
> > >>> fills a void in the R world.  However, I have a question: I'm  
> > >>> looking
> > >>> to have an efficient conversion of a data table object to a vector  
> > >>> (of
> > >>> the correct type) when querying a single column whose name is stored
> > >>> in a variable.  As per the vignette and the FAQ, I use the syntax
> > >>>
> > >>>
> > >>>   my.data.table[, colname, with=FALSE]
> > >>>
> > >>>
> > >>> (where colname is a variable containing my desired column name) but
> > >>> this returns another data table, not a vector.  Morever, the eval
> > >>> syntax suggested in the FAQ simply does not work:
> > >>>
> > >>>
> > >>>   my.data.table[, eval(colname)]
> > >>>
> > >>>
> > >>> See example below.  I could use as.matrix on the result, but this
> > >>> carries out undesirable type conversion in the case of columns
> > >>> containing dates: see below.
> > >>>
> > >>>
> > >>> Here is an example to reproduce this problem:
> > >>>
> > >>>
> > >>>> require(data.table)
> > >>> Loading required package: data.table
> > >>>> a <- data.table(x=seq(1, 2, by=0.1),  
> > >>>> y=seq(as.POSIXct("2010-01-01"),
> > >>> as.POSIXct("2010-01-11"), length.out=11))
> > >>>> a
> > >>>       x          y
> > >>> [1,] 1.0 2010-01-01
> > >>> [2,] 1.1 2010-01-02
> > >>> [3,] 1.2 2010-01-03
> > >>> [4,] 1.3 2010-01-04
> > >>> [5,] 1.4 2010-01-05
> > >>> [6,] 1.5 2010-01-06
> > >>> [7,] 1.6 2010-01-07
> > >>> [8,] 1.7 2010-01-08
> > >>> [9,] 1.8 2010-01-09
> > >>> [10,] 1.9 2010-01-10
> > >>> [11,] 2.0 2010-01-11
> > >>>> colname <- "y"
> > >>>
> > >>>
> > >>> ## The following returns a data table.  How can I get a vector, and
> > >>> still preserve type information?
> > >>>> a[, colname, with=FALSE]
> > >>>              y
> > >>> [1,] 2010-01-01
> > >>> [2,] 2010-01-02
> > >>> [3,] 2010-01-03
> > >>> [4,] 2010-01-04
> > >>> [5,] 2010-01-05
> > >>> [6,] 2010-01-06
> > >>> [7,] 2010-01-07
> > >>> [8,] 2010-01-08
> > >>> [9,] 2010-01-09
> > >>> [10,] 2010-01-10
> > >>> [11,] 2010-01-11
> > >>>
> > >>>
> > >>> ## The eval recipe suggested in the FAQ does not work.
> > >>>> a[, eval(colname)]
> > >>> [1] "y"
> > >>>
> > >>>
> > >>> ## as.vector does not convert away from data.table
> > >>>> as.vector(a[, colname, with=FALSE])
> > >>>              y
> > >>> [1,] 2010-01-01
> > >>> [2,] 2010-01-02
> > >>> [3,] 2010-01-03
> > >>> [4,] 2010-01-04
> > >>> [5,] 2010-01-05
> > >>> [6,] 2010-01-06
> > >>> [7,] 2010-01-07
> > >>> [8,] 2010-01-08
> > >>> [9,] 2010-01-09
> > >>> [10,] 2010-01-10
> > >>> [11,] 2010-01-11
> > >>>> class(as.vector(a[, colname, with=FALSE]))
> > >>> [1] "data.table"
> > >>>
> > >>>
> > >>> ## as.matrix loses type information (NOTE: in my case it is not
> > >>> acceptable to
> > >>> ## convert this character vector back to a POSIXct, due to the  
> > >>> loss of
> > >>> important
> > >>> ## timezone information. Furthermore, this would be very  
> > >>> inefficient.)
> > >>>> as.matrix(a[, colname, with=FALSE])
> > >>>     y
> > >>> [1,] "2010-01-01"
> > >>> [2,] "2010-01-02"
> > >>> [3,] "2010-01-03"
> > >>> [4,] "2010-01-04"
> > >>> [5,] "2010-01-05"
> > >>> [6,] "2010-01-06"
> > >>> [7,] "2010-01-07"
> > >>> [8,] "2010-01-08"
> > >>> [9,] "2010-01-09"
> > >>> [10,] "2010-01-10"
> > >>> [11,] "2010-01-11"
> > >>>> mode(as.matrix(a[, colname, with=FALSE]))
> > >>> [1] "character"
> > >>>
> > >>>
> > >>> ## Finally, one could go through a data.frame, but this is  
> > >>> inefficient
> > >>> ## and it sorts of defeats the purpose of using data.table...
> > >>>> as.data.frame(a[, colname, with=FALSE])[, colname]
> > >>> [1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04
> > >>> EST"
> > >>> [5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08
> > >>> EST"
> > >>> [9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> So at this point, my imagination is running out and I'm turning to
> > >>> this list for suggestions. This should seem to be a fairly frequent
> > >>> use-case, and I'm surprised it does not appear to have previously  
> > >>> been
> > >>> addressed.
> > >>>
> > >>>
> > >>> For the record, here is my sessionInfo()
> > >>>
> > >>>
> > >>>> sessionInfo()
> > >>> R version 2.9.2 (2009-08-24)
> > >>> x86_64-pc-linux-gnu
> > >>>
> > >>>
> > >>> locale:
> > >>> C
> > >>>
> > >>>
> > >>> attached base packages:
> > >>> [1] stats     graphics  grDevices utils     datasets  methods   base
> > >>>
> > >>>
> > >>>
> > >>> other attached packages:
> > >>> [1] data.table_1.4.1
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Thanks in advance for any help!
> > >>> + Nicolas Chapados
> > >>> _______________________________________________
> > >>> datatable-help mailing list
> > >>> datatable-help at lists.r-forge.r-project.org
> > >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >>
> > >>
> > >> _______________________________________________
> > >> datatable-help mailing list
> > >> datatable-help at lists.r-forge.r-project.org
> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >
> > > David Winsemius, MD
> > > West Hartford, CT
> > >
> > 
> > David Winsemius, MD
> > West Hartford, CT
> > 
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list