[datatable-help] Efficient conversion of data table column to vector

David Winsemius dwinsemius at comcast.net
Tue Aug 31 22:17:04 CEST 2010


I sent this to Matthew offlist but he wants it "on the record", so  
here is what I sent:
On Aug 31, 2010, at 11:56 AM, David Winsemius wrote:

>
> On Aug 31, 2010, at 3:35 AM, Matthew Dowle wrote:
>
>>
>> Nicolas,
>>
>> Welcome to the list.
>>
>> Where the documentation mentions 'quoted' it means the quote()  
>> function
>> to create an expression, not as in a character string.
>

Matthew;

I think you really should look at FAQ 1.5. It says nothing about  
"quoted". It does appear to imply that if someone had executed:

colname="x"

... that both DT[, colname, with=FALSE]  and DT[, eval(colname)]  
should "work". Now you are saying that isn't so, that only the first  
will return anything like the expected result.

-- 
David
>
>> Alternatively you
>> can use [[ in the usual way since a data.table is a list.
>>
>>> colexp = quote(y)   # rather than "y"
>>> a[,eval(colexp)]
>> [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04  
>> GMT"
>> [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08  
>> GMT"
>> [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
>>
>> or
>>
>>> colname = "y"
>>> a[[colname]]
>> [1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04  
>> GMT"
>> [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08  
>> GMT"
>> [9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"
>>>
>>
>> A single column name is a special case of expressions so although  
>> this
>> can create a steeper learner curve, it results in more power and
>> flexibility later.
>>
>> Suggestions on how to improve documentation so that 'quoting' is  
>> clearer
>> are very welcome. I've added an item to the list so we don't forget.
>>
>> Matthew
>>
>>
>> On Mon, 2010-08-30 at 23:59 -0400, Nicolas Chapados wrote:
>>> Dear data.table friends and maintainers,
>>>
>>>
>>> First, thanks to the authors for this excellent package: it really
>>> fills a void in the R world.  However, I have a question: I'm  
>>> looking
>>> to have an efficient conversion of a data table object to a vector  
>>> (of
>>> the correct type) when querying a single column whose name is stored
>>> in a variable.  As per the vignette and the FAQ, I use the syntax
>>>
>>>
>>>   my.data.table[, colname, with=FALSE]
>>>
>>>
>>> (where colname is a variable containing my desired column name) but
>>> this returns another data table, not a vector.  Morever, the eval
>>> syntax suggested in the FAQ simply does not work:
>>>
>>>
>>>   my.data.table[, eval(colname)]
>>>
>>>
>>> See example below.  I could use as.matrix on the result, but this
>>> carries out undesirable type conversion in the case of columns
>>> containing dates: see below.
>>>
>>>
>>> Here is an example to reproduce this problem:
>>>
>>>
>>>> require(data.table)
>>> Loading required package: data.table
>>>> a <- data.table(x=seq(1, 2, by=0.1),  
>>>> y=seq(as.POSIXct("2010-01-01"),
>>> as.POSIXct("2010-01-11"), length.out=11))
>>>> a
>>>       x          y
>>> [1,] 1.0 2010-01-01
>>> [2,] 1.1 2010-01-02
>>> [3,] 1.2 2010-01-03
>>> [4,] 1.3 2010-01-04
>>> [5,] 1.4 2010-01-05
>>> [6,] 1.5 2010-01-06
>>> [7,] 1.6 2010-01-07
>>> [8,] 1.7 2010-01-08
>>> [9,] 1.8 2010-01-09
>>> [10,] 1.9 2010-01-10
>>> [11,] 2.0 2010-01-11
>>>> colname <- "y"
>>>
>>>
>>> ## The following returns a data table.  How can I get a vector, and
>>> still preserve type information?
>>>> a[, colname, with=FALSE]
>>>              y
>>> [1,] 2010-01-01
>>> [2,] 2010-01-02
>>> [3,] 2010-01-03
>>> [4,] 2010-01-04
>>> [5,] 2010-01-05
>>> [6,] 2010-01-06
>>> [7,] 2010-01-07
>>> [8,] 2010-01-08
>>> [9,] 2010-01-09
>>> [10,] 2010-01-10
>>> [11,] 2010-01-11
>>>
>>>
>>> ## The eval recipe suggested in the FAQ does not work.
>>>> a[, eval(colname)]
>>> [1] "y"
>>>
>>>
>>> ## as.vector does not convert away from data.table
>>>> as.vector(a[, colname, with=FALSE])
>>>              y
>>> [1,] 2010-01-01
>>> [2,] 2010-01-02
>>> [3,] 2010-01-03
>>> [4,] 2010-01-04
>>> [5,] 2010-01-05
>>> [6,] 2010-01-06
>>> [7,] 2010-01-07
>>> [8,] 2010-01-08
>>> [9,] 2010-01-09
>>> [10,] 2010-01-10
>>> [11,] 2010-01-11
>>>> class(as.vector(a[, colname, with=FALSE]))
>>> [1] "data.table"
>>>
>>>
>>> ## as.matrix loses type information (NOTE: in my case it is not
>>> acceptable to
>>> ## convert this character vector back to a POSIXct, due to the  
>>> loss of
>>> important
>>> ## timezone information. Furthermore, this would be very  
>>> inefficient.)
>>>> as.matrix(a[, colname, with=FALSE])
>>>     y
>>> [1,] "2010-01-01"
>>> [2,] "2010-01-02"
>>> [3,] "2010-01-03"
>>> [4,] "2010-01-04"
>>> [5,] "2010-01-05"
>>> [6,] "2010-01-06"
>>> [7,] "2010-01-07"
>>> [8,] "2010-01-08"
>>> [9,] "2010-01-09"
>>> [10,] "2010-01-10"
>>> [11,] "2010-01-11"
>>>> mode(as.matrix(a[, colname, with=FALSE]))
>>> [1] "character"
>>>
>>>
>>> ## Finally, one could go through a data.frame, but this is  
>>> inefficient
>>> ## and it sorts of defeats the purpose of using data.table...
>>>> as.data.frame(a[, colname, with=FALSE])[, colname]
>>> [1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04
>>> EST"
>>> [5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08
>>> EST"
>>> [9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"
>>>
>>>
>>>
>>>
>>> So at this point, my imagination is running out and I'm turning to
>>> this list for suggestions. This should seem to be a fairly frequent
>>> use-case, and I'm surprised it does not appear to have previously  
>>> been
>>> addressed.
>>>
>>>
>>> For the record, here is my sessionInfo()
>>>
>>>
>>>> sessionInfo()
>>> R version 2.9.2 (2009-08-24)
>>> x86_64-pc-linux-gnu
>>>
>>>
>>> locale:
>>> C
>>>
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>>
>>>
>>> other attached packages:
>>> [1] data.table_1.4.1
>>>
>>>
>>>
>>>
>>> Thanks in advance for any help!
>>> + Nicolas Chapados
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT



More information about the datatable-help mailing list