[datatable-help] Unexpected behavior in setnames()

Eduard Antonyan eduard.antonyan at gmail.com
Sun Nov 3 02:43:02 CET 2013


@Arun: Ok. Thinking about it a bit - I don't like the continuing
enumeration solution because it makes the results too unpredictable, but
could live with adding a ".1" etc. Which I assume is the idea anyway for
resolving duplicates elsewhere.

@Steve: Not sure why you think it doesn't hold much water - I think I can
draw a parallel argument that replicates all of the duplicated names
concerns with a column that is called e.g. `dt$V1` (imagine forgetting the
backticks there and the world of hurt that potentially awaits once you do
that). I am also curious what Matthew would think about this. This is smth
I've encountered and dealt with a lot, so I'm certainly not an unbiased
party here.


On Sat, Nov 2, 2013 at 8:15 PM, Steve Lianoglou <lianoglou.steve at gene.com>wrote:

> On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan
> <eduard.antonyan at gmail.com> wrote:
> > Tbh I don't see why data presentation and preservation (i.e. if you're
> > reading in data with duplicated columns) is not enough of a use case -
> > that's the only reason we allow arbitrary symbols in column names.
> >
> > So, instead of giving you another use case, how about you tell me instead
> > what do you propose should happen here (instead of what happens now):
> >
> >> dt = data.table(1, 2)
> >> dt
> >    V1 V2
> > 1:  1  2
> >> dt[, sum(V2), by = V1]
> >    V1 V1
> > 1:  1  2
>
> Only Matthew could say for sure, but if I were a gambling man I'd bet
> that this was likely something that slipped through the cracks and
> sleeping dogs were left to lie. I'd be curious to see what his
> opinions on this are.
>
> IMHO the "data presentation" argument doesn't really hold much water.
>
> As for "data preservation," I rather see it as imposing structure on
> it to enable efficient -- and sane/unambigous -- computation over it.
> Further, I don't think is a preservation issue at all -- no data is
> lost. The original data is still there in the file that was loaded
> into R. The name of a column is changed when imported (with adequate
> warning) into a data.table so that the user can slice and dice it. I'd
> also guess the user being warned by the duplicate names would most
> likely be happy to receive the warning, but the fact that you disagree
> suggests that this isn't an obvious conclusion ;-)
>
> I'm curious if you would argue for an SQL table to allow duplicate
> column names for the same reasons? I do know you can torture SQL to
> get two colnames to be the same by aliasing, but this also seems to
> have slipped through as an accident:
>
> http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf
>
> (which I found from here):
>
> http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table
>
> Perhaps we should email this guy Hugh to see what he thinks about this one
> :-)
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131102/2f94cf0e/attachment.html>


More information about the datatable-help mailing list