[datatable-help] Unexpected behavior in setnames()

Arunkumar Srinivasan aragorn168b at gmail.com
Fri Nov 8 21:19:38 CET 2013


Steve, 
Maybe, but it's just getting started :) - we now have to decide what's ambiguous! 
Ex: Is subsetting by column number considered ambiguous (By definition of ambiguous, probably not)? But then it'd be inconsistent with subsetting when column names are provided.. So, should we prioritise consistency over function in this scenario?


Arun


On Friday, November 8, 2013 at 9:16 PM, Steve Lianoglou wrote:

> Wow ... did we just reach a consensus? :-)
> 
> -steve
> 
> On Fri, Nov 8, 2013 at 12:08 PM, Eduard Antonyan
> <eduard.antonyan at gmail.com (mailto:eduard.antonyan at gmail.com)> wrote:
> > Ditto - having dups, but spitting out an error on all ambiguous operations
> > seems like a robust strategy.
> > 
> > 
> > On Fri, Nov 8, 2013 at 2:02 PM, Steve Lianoglou <lianoglou.steve at gene.com (mailto:lianoglou.steve at gene.com)>
> > wrote:
> > > 
> > > Hi,
> > > 
> > > I wanted to point out that I'm in Arun's camp on this one:
> > > 
> > > On Fri, Nov 8, 2013 at 7:09 AM, Arunkumar Srinivasan
> > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > 
> > > > In my opinion, the dup-names should be allowed *only* during creation of
> > > > data.table, and setting names (using `setnames`, `setattr` or the bad
> > > > form
> > > > `names(dt) <- `). Other than that, *ALL* operations should fail (end up
> > > > in
> > > > error), and that includes subsetting operation. The `setnames` gives the
> > > > option for the user to set the names back before writing to a file,
> > > > should
> > > > he choose to keep it at the end.
> > > > 
> > > > I think it's much better this way (strict, but avoids confusion). For
> > > > example, in data.frames, doing DF$x (when x occurs twice) implicitly
> > > > prints
> > > > only the first (no warning/error). Also, split(DF$x, DF$x) uses the
> > > > first
> > > > column and so does split(DF, DF$x).
> > > > 
> > > 
> > > 
> > > As an opinionated footnote: I can acquiesce that since data.frames
> > > allow duplicated column names, I *guess* data.table should *allow*
> > > them, however as is clear (to me) from this long chain of
> > > "possibilities" that one can do, I strongly feel that computing over a
> > > data.table w/ duplicated columns is a fundamentally broken idea as it
> > > is ambiguous as to what the right behavior should be ... forget about
> > > even the (surely fun) book-keeping code required to make it happen.
> > > 
> > > You want to import a table with duplicate names? Fine (we should warn
> > > on import if it was `fread` or `as.data.table`d).
> > > 
> > > You want to set some names to duplicates? Fine -- warn there too.
> > > 
> > > Want to do any computation inside the data.table via `j` or as a
> > > column in `by`? Throw an error and punt the problem to the user to
> > > figure out how they would like to disambiguate the first column named
> > > "a" from the 10th one -- I don't think we need another FAQ explaining
> > > what "the right" way that this should be done is, and why we picked
> > > it.
> > > 
> > > Or if you really want to compute over a data.table with duplicate
> > > names, you might be better served by having the table in "long" format
> > > -- perhaps that's why there are duplicate column names to begin with
> > > (I'm guessing -- I still don't think I would ever want to have duped
> > > names on purpose)
> > > 
> > > My two cents,
> > > 
> > > -steve
> > > 
> > > --
> > > Steve Lianoglou
> > > Computational Biologist
> > > Bioinformatics and Computational Biology
> > > Genentech
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > 
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> 
> 
> 
> 
> -- 
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131108/d05320ef/attachment-0001.html>


More information about the datatable-help mailing list