[datatable-help] Extract Single Column as Vector

Arunkumar Srinivasan aragorn168b at gmail.com
Sat May 18 17:23:46 CEST 2013


@Matthew,  

On another note, are there plans to implement "drop=T/F" in data.table?  

Arun


On Saturday, May 18, 2013 at 5:21 PM, Arunkumar Srinivasan wrote:

> Matthew wrote "..the length of an input shouldn't change the type of the output (only the type of the input should be able to change the type of the output)."
> That's a very nice way to put it.
>  
>  
> Arun
>  
>  
> On Saturday, May 18, 2013 at 5:18 PM, Matthew Dowle wrote:
>  
> >   
> > And FAQ 2.17 has a little more on that :
> > "In [.data.frame we very often set drop=FALSE. When we forget, bugs can arise in edge cases
> > where single columns are selected and all of a sudden a vector is returned rather than a single
> > column data.frame. In [.data.table we took the opportunity to make it consistent and drop
> > drop."  
> >   
> > If it helps to know, I also use DT[["somename"]] quite a bit.  
> >   
> > Matthew
> >   
> > On 18.05.2013 10:04, Matthew Dowle wrote:
> > >   
> > > All good points. The thinking here has this mind :
> > >  
> > > myvars = c("col1","col2")
> > > DT[, myvars, with=FALSE]
> > >  
> > > We don't want the type of the result to depend on whether myvars is length 1 or not. Otherwise we may end up with surprises (in production code for example) if myvars becomes length 1 in future. That's a strong principle that data.table follows : the length of an input shouldn't change the type of the output (only the type of the input should be able to change the type of the output).
> > >  
> > > I've just changed those two parts of ?data.table (thanks for highlighting) :
> > >  
> > > was :
> > > "... or (when with=FALSE) same as j in [.data.frame."
> > > now :
> > > "... or (when with=FALSE) a vector of names or positions to select."
> > >  
> > > Matthew  
> > >   
> > > On 17.05.2013 20:34, Ricardo Saporta wrote:
> > > > Hm... Eddi does seem to have a point here.    While I agree with Frank that once you're used to it, it is rather straightforward to deal with, I can see why one would have the expectation of a vector.   ie, that the last of the following `identical` statements should evaluate to `TRUE`  
> > > >     df <- as.data.frame(dt)
> > > >     > identical(df[, "a"], dt[, get("a")])
> > > >     [1] TRUE
> > > >     > identical(df[, "a"], dt[["a"]])
> > > >     [1] TRUE
> > > >     > identical(df[, "a"], dt[, "a", with=FALSE])
> > > >     [1] FALSE
> > > >     rm(df)
> > > >  
> > > > -Rick
> > > >  
> > > > Ricardo Saporta  
> > > > Graduate Student, Data Analytics
> > > > Rutgers University, New Jersey
> > > > e: saporta at rutgers.edu (mailto:saporta at rutgers.edu)
> > > >  
> > > >  
> > > >  
> > > >  
> > > > On Fri, May 17, 2013 at 4:26 PM, Eduard Antonyan <eduard.antonyan at gmail.com (mailto:eduard.antonyan at gmail.com)> wrote:
> > > > > Well, looking at the documentation:  
> > > > > j: A single column name, single expresson of column names, list() of expressions of column names, an expression or function call that evaluates to list (including data.frame and data.table which are lists, too), or (when with=FALSE) same as j in [.data.frame.
> > > > > ...
> > > > > with: By default with=TRUE and j is evaluated within the frame of x. The column names can be used as variables. When with=FALSE, j works as it does in [.data.frame.
> > > > >   
> > > > >  
> > > > > The bolded out part of the documentation doesn't match the actual behavior.  
> > > > >  
> > > > >  
> > > > > On Fri, May 17, 2013 at 2:44 PM, Frank Erickson <FErickson at psu.edu (mailto:FErickson at psu.edu)> wrote:
> > > > > > @Arun and eddi: This question has come up before.  
> > > > > > http://r.789695.n4.nabble.com/Better-hacks-getting-a-vector-AND-using-with-inserting-chunks-of-rows-tt4666592.html
> > > > > > (And I'm sure there are other times, too.) I can't say I've heard anyone arguing about it, though. :)
> > > > > > I guess it works that way because
> > > > > > ...in dt[ ,a], j is an expression which evaluates to a vector
> > > > > > ...in dt[,"a",with=FALSE] the option turns on the "you must want one or more columns" mode, translating j from "a" to list(a)
> > > > > > It's unintuitive if you're expecting data frame behavior (you know, drop=TRUE, as Arun mentioned), but if you've already seen dt[,list(a)], it shouldn't be much of a surprise. Adding the drop option, and maybe defaulting it to TRUE when with=FALSE might satisfy eddi's concern...?
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > > On Fri, May 17, 2013 at 10:22 AM, Eduard Antonyan <eduard.antonyan at gmail.com (mailto:eduard.antonyan at gmail.com)> wrote:
> > > > > > > I don't remember discussing this issue...? What is the conceptual difference between dt[, a] and dt[, "a", with = F] and what does 'drop' have to do with this??  
> > > > > > >  
> > > > > > >  
> > > > > > > On Fri, May 17, 2013 at 10:02 AM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > > > > Eduard, are we discussing the same thing again :)? Wasn't this somehow your question as well.. the discrepancy between:  
> > > > > > > > dt[, a] and dt[, "a", with=FALSE].  
> > > > > > > > There should be a drop=TRUE/FALSE option (as in the case of data.frame) that should be used when you use `with=FALSE`. Until then, the default option seems to be drop=FALSE, which results in a data.table.
> > > > > > > > Alexandre, as of now, it could be done as Eduard points out.
> > > > > > > > Arun
> > > > > > > >  
> > > > > > > >  
> > > > > > > > On Friday, May 17, 2013 at 4:59 PM, Eduard Antonyan wrote:
> > > > > > > >  
> > > > > > > > > Use dt[[colname]], but this seems like a bug to me - I would've thought that dt[, a] and dt[, "a", with = F] should return the exact same thing.
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > On Fri, May 17, 2013 at 9:42 AM, Alexandre Sieira <alexandre.sieira at gmail.com (mailto:alexandre.sieira at gmail.com)> wrote:
> > > > > > > > > > Sorry if this is a basic question.  
> > > > > > > > > >   
> > > > > > > > > > I'm using R 3.0.0 and data.table 1.8.8. The documentation for 'j' states that "A single column or single expression returns that type, usually a vector."
> > > > > > > > > >  
> > > > > > > > > > I am able to obtain this behavior if I know the column name in advance:  
> > > > > > > > > >  
> > > > > > > > > >    
> > > > > > > > > > > dt = data.table(a=c(1, 2, 3), b=c(4, 5, 6))
> > > > > > > > > > > dt
> > > > > > > > > >    a b
> > > > > > > > > > 1: 1 4
> > > > > > > > > > 2: 2 5
> > > > > > > > > > 3: 3 6
> > > > > > > > > > > str(dt[,a])
> > > > > > > > > >  num [1:3] 1 2 3
> > > > > > > > > >   
> > > > > > > > > > However, if I don't, no such luck:
> > > > > > > > > > > colname="a"
> > > > > > > > > > > str(dt[,colname,with=F])
> > > > > > > > > > Classes ‘data.table’ and 'data.frame': 3 obs. of  1 variable:
> > > > > > > > > >  $ a: num  1 2 3
> > > > > > > > > >  - attr(*, ".internal.selfref")=<externalptr>  
> > > > > > > > > >  
> > > > > > > > > > If there a way to extract an entire column as a vector if I have the column name as a character scalar?
> > > > > > > > > > Thank you!
> > > > > > > > > > --  
> > > > > > > > > > Alexandre Sieira
> > > > > > > > > > CISA, CISSP, ISO 27001 Lead Auditor
> > > > > > > > > >  
> > > > > > > > > > "The truth is rarely pure and never simple."
> > > > > > > > > > Oscar Wilde, The Importance of Being Earnest, 1895, Act I  
> > > > > > > > > > _______________________________________________
> > > > > > > > > > datatable-help mailing list
> > > > > > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
> > > > > > > > > _______________________________________________
> > > > > > > > > datatable-help mailing list
> > > > > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > >  
> > > > > > > >  
> > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > >  
> > > > > > > _______________________________________________
> > > > > > > datatable-help mailing list
> > > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
> > > > > _______________________________________________
> > > > > datatable-help mailing list
> > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
> > >   
> > >   
> > >  
> >  
> >   
> >   
> >  
> >  
> >  
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130518/6ea798c9/attachment-0001.html>


More information about the datatable-help mailing list