[datatable-help] Extract Single Column as Vector

Matthew Dowle mdowle at mdowle.plus.com
Sat May 18 17:18:31 CEST 2013


 

And FAQ 2.17 has a little more on that : 

 "In [.data.frame we very
often set drop=FALSE. When we forget, bugs can arise in edge cases
where
single columns are selected and all of a sudden a vector is returned
rather than a single
column data.frame. In [.data.table we took the
opportunity to make it consistent and drop
drop."

If it helps to know,
I also use DT[["somename"]] quite a bit. 

Matthew

On 18.05.2013 10:04,
Matthew Dowle wrote: 

> All good points. The thinking here has this
mind :
> 
> myvars = c("col1","col2")
> DT[, myvars, with=FALSE]
> 
> We
don't want the type of the result to depend on whether myvars is length
1 or not. Otherwise we may end up with surprises (in production code for
example) if myvars becomes length 1 in future. That's a strong principle
that data.table follows : the length of an input shouldn't change the
type of the output (only the type of the input should be able to change
the type of the output).
> 
> I've just changed those two parts of
?data.table (thanks for highlighting) :
> 
> was :
> "... or (when
with=FALSE) same as j in [.data.frame."
> now :
> "... or (when
with=FALSE) a vector of names or positions to select."
> 
> Matthew
> 
>
On 17.05.2013 20:34, Ricardo Saporta wrote: 
> 
>> Hm... Eddi does seem
to have a point here. While I agree with Frank that once you're used to
it, it is rather straightforward to deal with, I can see why one would
have the expectation of a vector. ie, that the last of the following
`identical` statements should evaluate to `TRUE` 
>> 
>> df <-
as.data.frame(dt) 
>> > identical(df[, "a"], dt[, get("a")]) 
>> [1]
TRUE 
>> > identical(df[, "a"], dt[["a"]]) 
>> [1] TRUE 
>> >
identical(df[, "a"], dt[, "a", with=FALSE]) 
>> [1] FALSE 
>> rm(df) 
>>
-Rick 
>> 
>> Ricardo Saporta 
>> Graduate Student, Data Analytics 
>>
Rutgers University, New Jersey 
>> e: saporta at rutgers.edu [14] 
>> 
>>
On Fri, May 17, 2013 at 4:26 PM, Eduard Antonyan
<eduard.antonyan at gmail.com [15]> wrote:
>> 
>>> Well, looking at the
documentation: 
>>> j: A single column name, single expresson of column
names, list() of expressions of column names, an expression or function
call that evaluates to list (including data.frame and data.table which
are lists, too), or (WHEN WITH=FALSE) SAME AS J IN [.DATA.FRAME. 
>>>
... 
>>> with: By default with=TRUE and j is evaluated within the frame
of x. The column names can be used as variables. WHEN WITH=FALSE, J
WORKS AS IT DOES IN [.DATA.FRAME. 
>>> 
>>> The bolded out part of the
documentation doesn't match the actual behavior. 
>>> 
>>> On Fri, May
17, 2013 at 2:44 PM, Frank Erickson <FErickson at psu.edu [11]> wrote:
>>>

>>>> @Arun and eddi: This question has come up before. 
>>>>
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[9] 
>>>> (And I'm sure there are other times, too.) I can't say I've
heard anyone arguing about it, though. :) 
>>>> I guess it works that
way because 
>>>> ...in dt[ ,a], j is an expression which evaluates to a
vector 
>>>> ...in dt[,"a",with=FALSE] the option turns on the "you must
want one or more columns" mode, translating j from "a" to list(a) 
>>>>
It's unintuitive if you're expecting data frame behavior (you know,
drop=TRUE, as Arun mentioned), but if you've already seen dt[,list(a)],
it shouldn't be much of a surprise. Adding the drop option, and maybe
defaulting it to TRUE when with=FALSE might satisfy eddi's concern...?

>>>> 
>>>> On Fri, May 17, 2013 at 10:22 AM, Eduard Antonyan
<eduard.antonyan at gmail.com [10]> wrote:
>>>> 
>>>>> I don't remember
discussing this issue...? What is the conceptual difference between dt[,
a] and dt[, "a", with = F] and what does 'drop' have to do with this??

>>>>> 
>>>>> On Fri, May 17, 2013 at 10:02 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com [6]> wrote:
>>>>> 
>>>>>> Eduard, are we
discussing the same thing again :)? Wasn't this somehow your question as
well.. the discrepancy between: 
>>>>>> dt[, a] and dt[, "a",
with=FALSE]. 
>>>>>> There should be a drop=TRUE/FALSE option (as in the
case of data.frame) that should be used when you use `with=FALSE`. Until
then, the default option seems to be drop=FALSE, which results in a
data.table. 
>>>>>> Alexandre, as of now, it could be done as Eduard
points out. 
>>>>>> 
>>>>>> Arun 
>>>>>> 
>>>>>> On Friday, May 17, 2013
at 4:59 PM, Eduard Antonyan wrote: 
>>>>>> 
>>>>>>> Use dt[[colname]],
but this seems like a bug to me - I would've thought that dt[, a] and
dt[, "a", with = F] should return the exact same thing. 
>>>>>>>

>>>>>>> On Fri, May 17, 2013 at 9:42 AM, Alexandre Sieira
<alexandre.sieira at gmail.com [3]> wrote:
>>>>>>> 
>>>>>>>> Sorry if this
is a basic question. 
>>>>>>>> 
>>>>>>>> I'm using R 3.0.0 and
data.table 1.8.8. The documentation for 'j' states that "A single column
or single expression returns that type, usually a vector." 
>>>>>>>>

>>>>>>>> I am able to obtain this behavior if I know the column name in
advance: 
>>>>>>>> 
>>>>>>>>> dt = data.table(a=c(1, 2, 3), b=c(4, 5,
6)) 
>>>>>>>> 
>>>>>>>>> dt 
>>>>>>>> 
>>>>>>>> a b 
>>>>>>>> 
>>>>>>>>
1: 1 4 
>>>>>>>> 
>>>>>>>> 2: 2 5 
>>>>>>>> 
>>>>>>>> 3: 3 6 
>>>>>>>>

>>>>>>>>> str(dt[,a]) 
>>>>>>>> 
>>>>>>>> num [1:3] 1 2 3 
>>>>>>>>

>>>>>>>> However, if I don't, no such luck: 
>>>>>>>> 
>>>>>>>>>
colname="a" 
>>>>>>>>> str(dt[,colname,with=F]) 
>>>>>>>> Classes
'data.table' and 'data.frame': 3 obs. of 1 variable: 
>>>>>>>> $ a: num
1 2 3 
>>>>>>>> - attr(*, ".internal.selfref")=<externalptr> 
>>>>>>>>
If there a way to extract an entire column as a vector if I have the
column name as a character scalar? 
>>>>>>>> Thank you! 
>>>>>>>>

>>>>>>>> -- 
>>>>>>>> Alexandre Sieira
>>>>>>>> CISA, CISSP, ISO 27001
Lead Auditor
>>>>>>>> 
>>>>>>>> "The truth is rarely pure and never
simple."
>>>>>>>> Oscar Wilde, The Importance of Being Earnest, 1895,
Act I 
>>>>>>>> _______________________________________________
>>>>>>>>
datatable-help mailing list
>>>>>>>>
datatable-help at lists.r-forge.r-project.org [1]
>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]
>>>>>>> 
>>>>>>> _______________________________________________

>>>>>>> datatable-help mailing list 
>>>>>>>
datatable-help at lists.r-forge.r-project.org [4] 
>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[5]
>>>>> 
>>>>> _______________________________________________
>>>>>
datatable-help mailing list
>>>>>
datatable-help at lists.r-forge.r-project.org [7]
>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[8]
>>> 
>>> _______________________________________________
>>>
datatable-help mailing list
>>>
datatable-help at lists.r-forge.r-project.org [12]
>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[13]

 

Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:alexandre.sieira at gmail.com
[4]
mailto:datatable-help at lists.r-forge.r-project.org
[5]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[6]
mailto:aragorn168b at gmail.com
[7]
mailto:datatable-help at lists.r-forge.r-project.org
[8]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[9]
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[10]
mailto:eduard.antonyan at gmail.com
[11] mailto:FErickson at psu.edu
[12]
mailto:datatable-help at lists.r-forge.r-project.org
[13]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[14]
mailto:saporta at rutgers.edu
[15] mailto:eduard.antonyan at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130518/730c1ce6/attachment-0001.html>


More information about the datatable-help mailing list