[datatable-help] Unexpected behavior in setnames()

Steve Lianoglou lianoglou.steve at gene.com
Sun Nov 3 02:15:38 CET 2013


On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan
<eduard.antonyan at gmail.com> wrote:
> Tbh I don't see why data presentation and preservation (i.e. if you're
> reading in data with duplicated columns) is not enough of a use case -
> that's the only reason we allow arbitrary symbols in column names.
>
> So, instead of giving you another use case, how about you tell me instead
> what do you propose should happen here (instead of what happens now):
>
>> dt = data.table(1, 2)
>> dt
>    V1 V2
> 1:  1  2
>> dt[, sum(V2), by = V1]
>    V1 V1
> 1:  1  2

Only Matthew could say for sure, but if I were a gambling man I'd bet
that this was likely something that slipped through the cracks and
sleeping dogs were left to lie. I'd be curious to see what his
opinions on this are.

IMHO the "data presentation" argument doesn't really hold much water.

As for "data preservation," I rather see it as imposing structure on
it to enable efficient -- and sane/unambigous -- computation over it.
Further, I don't think is a preservation issue at all -- no data is
lost. The original data is still there in the file that was loaded
into R. The name of a column is changed when imported (with adequate
warning) into a data.table so that the user can slice and dice it. I'd
also guess the user being warned by the duplicate names would most
likely be happy to receive the warning, but the fact that you disagree
suggests that this isn't an obvious conclusion ;-)

I'm curious if you would argue for an SQL table to allow duplicate
column names for the same reasons? I do know you can torture SQL to
get two colnames to be the same by aliasing, but this also seems to
have slipped through as an accident:

http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf

(which I found from here):
http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table

Perhaps we should email this guy Hugh to see what he thinks about this one :-)

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list