[datatable-help] Extract Single Column as Vector
Matthew Dowle
mdowle at mdowle.plus.com
Sat May 18 17:04:46 CEST 2013
All good points. The thinking here has this mind :
myvars =
c("col1","col2")
DT[, myvars, with=FALSE]
We don't want the type of
the result to depend on whether myvars is length 1 or not. Otherwise we
may end up with surprises (in production code for example) if myvars
becomes length 1 in future. That's a strong principle that data.table
follows : the length of an input shouldn't change the type of the output
(only the type of the input should be able to change the type of the
output).
I've just changed those two parts of ?data.table (thanks for
highlighting) :
was :
"... or (when with=FALSE) same as j in
[.data.frame."
now :
"... or (when with=FALSE) a vector of names or
positions to select."
Matthew
On 17.05.2013 20:34, Ricardo Saporta
wrote:
> Hm... Eddi does seem to have a point here. While I agree with
Frank that once you're used to it, it is rather straightforward to deal
with, I can see why one would have the expectation of a vector. ie, that
the last of the following `identical` statements should evaluate to
`TRUE`
>
> df <- as.data.frame(dt)
> > identical(df[, "a"], dt[,
get("a")])
> [1] TRUE
> > identical(df[, "a"], dt[["a"]])
> [1] TRUE
> > identical(df[, "a"], dt[, "a", with=FALSE])
> [1] FALSE
> rm(df)
> -Rick
>
> Ricardo Saporta
> Graduate Student, Data Analytics
>
Rutgers University, New Jersey
> e: saporta at rutgers.edu [14]
>
> On
Fri, May 17, 2013 at 4:26 PM, Eduard Antonyan <eduard.antonyan at gmail.com
[15]> wrote:
>
>> Well, looking at the documentation:
>> j: A single
column name, single expresson of column names, list() of expressions of
column names, an expression or function call that evaluates to list
(including data.frame and data.table which are lists, too), or (WHEN
WITH=FALSE) SAME AS J IN [.DATA.FRAME.
>> ...
>> with: By default
with=TRUE and j is evaluated within the frame of x. The column names can
be used as variables. WHEN WITH=FALSE, J WORKS AS IT DOES IN
[.DATA.FRAME.
>>
>> The bolded out part of the documentation doesn't
match the actual behavior.
>>
>> On Fri, May 17, 2013 at 2:44 PM,
Frank Erickson <FErickson at psu.edu [11]> wrote:
>>
>>> @Arun and eddi:
This question has come up before.
>>>
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[9]
>>> (And I'm sure there are other times, too.) I can't say I've
heard anyone arguing about it, though. :)
>>> I guess it works that way
because
>>> ...in dt[ ,a], j is an expression which evaluates to a
vector
>>> ...in dt[,"a",with=FALSE] the option turns on the "you must
want one or more columns" mode, translating j from "a" to list(a)
>>>
It's unintuitive if you're expecting data frame behavior (you know,
drop=TRUE, as Arun mentioned), but if you've already seen dt[,list(a)],
it shouldn't be much of a surprise. Adding the drop option, and maybe
defaulting it to TRUE when with=FALSE might satisfy eddi's concern...?
>>>
>>> On Fri, May 17, 2013 at 10:22 AM, Eduard Antonyan
<eduard.antonyan at gmail.com [10]> wrote:
>>>
>>>> I don't remember
discussing this issue...? What is the conceptual difference between dt[,
a] and dt[, "a", with = F] and what does 'drop' have to do with this??
>>>>
>>>> On Fri, May 17, 2013 at 10:02 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com [6]> wrote:
>>>>
>>>>> Eduard, are we discussing
the same thing again :)? Wasn't this somehow your question as well.. the
discrepancy between:
>>>>> dt[, a] and dt[, "a", with=FALSE].
>>>>>
There should be a drop=TRUE/FALSE option (as in the case of data.frame)
that should be used when you use `with=FALSE`. Until then, the default
option seems to be drop=FALSE, which results in a data.table.
>>>>>
Alexandre, as of now, it could be done as Eduard points out.
>>>>>
>>>>> Arun
>>>>>
>>>>> On Friday, May 17, 2013 at 4:59 PM, Eduard
Antonyan wrote:
>>>>>
>>>>>> Use dt[[colname]], but this seems like a
bug to me - I would've thought that dt[, a] and dt[, "a", with = F]
should return the exact same thing.
>>>>>>
>>>>>> On Fri, May 17, 2013
at 9:42 AM, Alexandre Sieira <alexandre.sieira at gmail.com [3]>
wrote:
>>>>>>
>>>>>>> Sorry if this is a basic question.
>>>>>>>
>>>>>>> I'm using R 3.0.0 and data.table 1.8.8. The documentation for
'j' states that "A single column or single expression returns that type,
usually a vector."
>>>>>>>
>>>>>>> I am able to obtain this behavior
if I know the column name in advance:
>>>>>>>
>>>>>>>> dt =
data.table(a=c(1, 2, 3), b=c(4, 5, 6))
>>>>>>>
>>>>>>>> dt
>>>>>>>
>>>>>>> a b
>>>>>>>
>>>>>>> 1: 1 4
>>>>>>>
>>>>>>> 2: 2 5
>>>>>>>
>>>>>>> 3: 3 6
>>>>>>>
>>>>>>>> str(dt[,a])
>>>>>>>
>>>>>>> num
[1:3] 1 2 3
>>>>>>>
>>>>>>> However, if I don't, no such luck:
>>>>>>>
>>>>>>>> colname="a"
>>>>>>>> str(dt[,colname,with=F])
>>>>>>> Classes 'data.table' and 'data.frame': 3 obs. of 1 variable:
>>>>>>> $ a: num 1 2 3
>>>>>>> - attr(*,
".internal.selfref")=<externalptr>
>>>>>>> If there a way to extract an
entire column as a vector if I have the column name as a character
scalar?
>>>>>>> Thank you!
>>>>>>>
>>>>>>> --
>>>>>>> Alexandre
Sieira
>>>>>>> CISA, CISSP, ISO 27001 Lead Auditor
>>>>>>>
>>>>>>> "The
truth is rarely pure and never simple."
>>>>>>> Oscar Wilde, The
Importance of Being Earnest, 1895, Act I
>>>>>>>
_______________________________________________
>>>>>>> datatable-help
mailing list
>>>>>>> datatable-help at lists.r-forge.r-project.org
[1]
>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>>
datatable-help at lists.r-forge.r-project.org [4]
>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[5]
>>>>
>>>> _______________________________________________
>>>>
datatable-help mailing list
>>>>
datatable-help at lists.r-forge.r-project.org [7]
>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[8]
>>
>> _______________________________________________
>>
datatable-help mailing list
>>
datatable-help at lists.r-forge.r-project.org [12]
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[13]
Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:alexandre.sieira at gmail.com
[4]
mailto:datatable-help at lists.r-forge.r-project.org
[5]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[6]
mailto:aragorn168b at gmail.com
[7]
mailto:datatable-help at lists.r-forge.r-project.org
[8]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[9]
http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
[10]
mailto:eduard.antonyan at gmail.com
[11] mailto:FErickson at psu.edu
[12]
mailto:datatable-help at lists.r-forge.r-project.org
[13]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[14]
mailto:saporta at rutgers.edu
[15] mailto:eduard.antonyan at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130518/efd7dafa/attachment.html>
More information about the datatable-help
mailing list