[datatable-help] Extract Single Column as Vector

Matthew Dowle mdowle at mdowle.plus.com
Sat May 18 18:19:57 CEST 2013


 

In my mind currently, more pressing than drop=T/F is that long
thread about by-without-by the other week. Need to find a few hours in a
dark room to go through it with fresh eyes, draw together the points and
link up with a few FRs and bug reports. I suspect quite a lot might
simplify if we do change that, and I think that's likely. Then drop=T/F
might go away since that would be what it would do by default, iirc.
drop=T/F is entwined with that anyway. 

Matthew 

On 18.05.2013 10:23,
Arunkumar Srinivasan wrote: 

> @Matthew, 
> On another note, are there
plans to implement "drop=T/F" in data.table? 
> 
> Arun 
> 
> On
Saturday, May 18, 2013 at 5:21 PM, Arunkumar Srinivasan wrote: 
> 
>>
Matthew wrote "..the length of an input shouldn't change the type of the
output (only the type of the input should be able to change the type of
the output)."
>> That's a very nice way to put it. 
>> 
>> Arun 
>> 
>>
On Saturday, May 18, 2013 at 5:18 PM, Matthew Dowle wrote: 
>> 
>>> And
FAQ 2.17 has a little more on that : 
>>> 
>>> "In [.data.frame we very
often set drop=FALSE. When we forget, bugs can arise in edge cases
>>>
where single columns are selected and all of a sudden a vector is
returned rather than a single
>>> column data.frame. In [.data.table we
took the opportunity to make it consistent and drop
>>> drop."
>>> 
>>>
If it helps to know, I also use DT[["somename"]] quite a bit. 
>>> 
>>>
Matthew
>>> 
>>> On 18.05.2013 10:04, Matthew Dowle wrote: 
>>> 
>>>>
All good points. The thinking here has this mind :
>>>> 
>>>> myvars =
c("col1","col2")
>>>> DT[, myvars, with=FALSE]
>>>> 
>>>> We don't want
the type of the result to depend on whether myvars is length 1 or not.
Otherwise we may end up with surprises (in production code for example)
if myvars becomes length 1 in future. That's a strong principle that
data.table follows : the length of an input shouldn't change the type of
the output (only the type of the input should be able to change the type
of the output).
>>>> 
>>>> I've just changed those two parts of
?data.table (thanks for highlighting) :
>>>> 
>>>> was :
>>>> "... or
(when with=FALSE) same as j in [.data.frame."
>>>> now :
>>>> "... or
(when with=FALSE) a vector of names or positions to select."
>>>> 
>>>>
Matthew
>>>> 
>>>> On 17.05.2013 20:34, Ricardo Saporta wrote: 
>>>>

>>>>> Hm... Eddi does seem to have a point here. While I agree with
Frank that once you're used to it, it is rather straightforward to deal
with, I can see why one would have the expectation of a vector. ie, that
the last of the following `identical` statements should evaluate to
`TRUE` 
>>>>> 
>>>>> df <- as.data.frame(dt) 
>>>>> > identical(df[,
"a"], dt[, get("a")]) 
>>>>> [1] TRUE 
>>>>> > identical(df[, "a"],
dt[["a"]]) 
>>>>> [1] TRUE 
>>>>> > identical(df[, "a"], dt[, "a",
with=FALSE]) 
>>>>> [1] FALSE 
>>>>> rm(df) 
>>>>> -Rick 
>>>>> 
>>>>>
Ricardo Saporta 
>>>>> Graduate Student, Data Analytics 
>>>>> Rutgers
University, New Jersey 
>>>>> e: saporta at rutgers.edu [14] 
>>>>> 
>>>>>
On Fri, May 17, 2013 at 4:26 PM, Eduard Antonyan
<eduard.antonyan at gmail.com [15]> wrote:
>>>>> 
>>>>>> Well, looking at
the documentation: 
>>>>>> j: A single column name, single expresson of
column names, list() of expressions of column names, an expression or
function call that evaluates to list (including data.frame and
data.table which are lists, too), or (WHEN WITH=FALSE) SAME AS J IN
[.DATA.FRAME. 
>>>>>> ... 
>>>>>> with: By default with=TRUE and j is
evaluated within the frame of x. The column names can be used as
variables. WHEN WITH=FALSE, J WORKS AS IT DOES IN [.DATA.FRAME. 
>>>>>>

>>>>>> The bolded out part of the documentation doesn't match the
actual behavior. 
>>>>>> 
>>>>>> On Fri, May 17, 2013 at 2:44 PM, Frank
Erickson <FErickson at psu.edu [11]> wrote:
>>>>>> 
>>>>>>> @Arun and eddi:
This question has come up before. 
>>>>>>>
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[9] 
>>>>>>> (And I'm sure there are other times, too.) I can't say
I've heard anyone arguing about it, though. :) 
>>>>>>> I guess it works
that way because 
>>>>>>> ...in dt[ ,a], j is an expression which
evaluates to a vector 
>>>>>>> ...in dt[,"a",with=FALSE] the option
turns on the "you must want one or more columns" mode, translating j
from "a" to list(a) 
>>>>>>> It's unintuitive if you're expecting data
frame behavior (you know, drop=TRUE, as Arun mentioned), but if you've
already seen dt[,list(a)], it shouldn't be much of a surprise. Adding
the drop option, and maybe defaulting it to TRUE when with=FALSE might
satisfy eddi's concern...? 
>>>>>>> 
>>>>>>> On Fri, May 17, 2013 at
10:22 AM, Eduard Antonyan <eduard.antonyan at gmail.com [10]>
wrote:
>>>>>>> 
>>>>>>>> I don't remember discussing this issue...? What
is the conceptual difference between dt[, a] and dt[, "a", with = F] and
what does 'drop' have to do with this?? 
>>>>>>>> 
>>>>>>>> On Fri, May
17, 2013 at 10:02 AM, Arunkumar Srinivasan <aragorn168b at gmail.com [6]>
wrote:
>>>>>>>> 
>>>>>>>>> Eduard, are we discussing the same thing
again :)? Wasn't this somehow your question as well.. the discrepancy
between: 
>>>>>>>>> dt[, a] and dt[, "a", with=FALSE]. 
>>>>>>>>> There
should be a drop=TRUE/FALSE option (as in the case of data.frame) that
should be used when you use `with=FALSE`. Until then, the default option
seems to be drop=FALSE, which results in a data.table. 
>>>>>>>>>
Alexandre, as of now, it could be done as Eduard points out. 
>>>>>>>>>

>>>>>>>>> Arun 
>>>>>>>>> 
>>>>>>>>> On Friday, May 17, 2013 at 4:59
PM, Eduard Antonyan wrote: 
>>>>>>>>> 
>>>>>>>>>> Use dt[[colname]], but
this seems like a bug to me - I would've thought that dt[, a] and dt[,
"a", with = F] should return the exact same thing. 
>>>>>>>>>>

>>>>>>>>>> On Fri, May 17, 2013 at 9:42 AM, Alexandre Sieira
<alexandre.sieira at gmail.com [3]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Sorry if
this is a basic question. 
>>>>>>>>>>> 
>>>>>>>>>>> I'm using R 3.0.0
and data.table 1.8.8. The documentation for 'j' states that "A single
column or single expression returns that type, usually a vector."

>>>>>>>>>>> 
>>>>>>>>>>> I am able to obtain this behavior if I know
the column name in advance: 
>>>>>>>>>>> 
>>>>>>>>>>>> dt =
data.table(a=c(1, 2, 3), b=c(4, 5, 6)) 
>>>>>>>>>>> 
>>>>>>>>>>>> dt

>>>>>>>>>>> 
>>>>>>>>>>> a b 
>>>>>>>>>>> 
>>>>>>>>>>> 1: 1 4

>>>>>>>>>>> 
>>>>>>>>>>> 2: 2 5 
>>>>>>>>>>> 
>>>>>>>>>>> 3: 3 6

>>>>>>>>>>> 
>>>>>>>>>>>> str(dt[,a]) 
>>>>>>>>>>> 
>>>>>>>>>>> num
[1:3] 1 2 3 
>>>>>>>>>>> 
>>>>>>>>>>> However, if I don't, no such luck:

>>>>>>>>>>> 
>>>>>>>>>>>> colname="a" 
>>>>>>>>>>>>
str(dt[,colname,with=F]) 
>>>>>>>>>>> Classes 'data.table' and
'data.frame': 3 obs. of 1 variable: 
>>>>>>>>>>> $ a: num 1 2 3

>>>>>>>>>>> - attr(*, ".internal.selfref")=<externalptr> 
>>>>>>>>>>>
If there a way to extract an entire column as a vector if I have the
column name as a character scalar? 
>>>>>>>>>>> Thank you! 
>>>>>>>>>>>

>>>>>>>>>>> -- 
>>>>>>>>>>> Alexandre Sieira
>>>>>>>>>>> CISA, CISSP,
ISO 27001 Lead Auditor
>>>>>>>>>>> 
>>>>>>>>>>> "The truth is rarely
pure and never simple."
>>>>>>>>>>> Oscar Wilde, The Importance of Being
Earnest, 1895, Act I 
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>
datatable-help mailing list
>>>>>>>>>>>
datatable-help at lists.r-forge.r-project.org [1]
>>>>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]
>>>>>>>>>> 
>>>>>>>>>>
_______________________________________________ 
>>>>>>>>>>
datatable-help mailing list 
>>>>>>>>>>
datatable-help at lists.r-forge.r-project.org [4] 
>>>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[5]
>>>>>>>> 
>>>>>>>>
_______________________________________________
>>>>>>>> datatable-help
mailing list
>>>>>>>> datatable-help at lists.r-forge.r-project.org
[7]
>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[8]
>>>>>> 
>>>>>>
_______________________________________________
>>>>>> datatable-help
mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
[12]
>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[13]

 

Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:alexandre.sieira at gmail.com
[4]
mailto:datatable-help at lists.r-forge.r-project.org
[5]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[6]
mailto:aragorn168b at gmail.com
[7]
mailto:datatable-help at lists.r-forge.r-project.org
[8]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[9]
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[10]
mailto:eduard.antonyan at gmail.com
[11] mailto:FErickson at psu.edu
[12]
mailto:datatable-help at lists.r-forge.r-project.org
[13]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[14]
mailto:saporta at rutgers.edu
[15] mailto:eduard.antonyan at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130518/931c0a72/attachment.html>


More information about the datatable-help mailing list