[datatable-help] Unexpected behavior in setnames()

Chinmay Patil chinmay.patil at gmail.com
Mon Nov 4 10:54:27 CET 2013


FWIW, data.frame does allow duplicate names as well. In the light that
data.table inherits from data.frame, I would expect that it follows same
convention as data.frame.


On Sun, Nov 3, 2013 at 9:43 AM, Eduard Antonyan
<eduard.antonyan at gmail.com>wrote:

> @Arun: Ok. Thinking about it a bit - I don't like the continuing
> enumeration solution because it makes the results too unpredictable, but
> could live with adding a ".1" etc. Which I assume is the idea anyway for
> resolving duplicates elsewhere.
>
> @Steve: Not sure why you think it doesn't hold much water - I think I can
> draw a parallel argument that replicates all of the duplicated names
> concerns with a column that is called e.g. `dt$V1` (imagine forgetting the
> backticks there and the world of hurt that potentially awaits once you do
> that). I am also curious what Matthew would think about this. This is smth
> I've encountered and dealt with a lot, so I'm certainly not an unbiased
> party here.
>
>
> On Sat, Nov 2, 2013 at 8:15 PM, Steve Lianoglou <lianoglou.steve at gene.com>wrote:
>
>> On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan
>> <eduard.antonyan at gmail.com> wrote:
>> > Tbh I don't see why data presentation and preservation (i.e. if you're
>> > reading in data with duplicated columns) is not enough of a use case -
>> > that's the only reason we allow arbitrary symbols in column names.
>> >
>> > So, instead of giving you another use case, how about you tell me
>> instead
>> > what do you propose should happen here (instead of what happens now):
>> >
>> >> dt = data.table(1, 2)
>> >> dt
>> >    V1 V2
>> > 1:  1  2
>> >> dt[, sum(V2), by = V1]
>> >    V1 V1
>> > 1:  1  2
>>
>> Only Matthew could say for sure, but if I were a gambling man I'd bet
>> that this was likely something that slipped through the cracks and
>> sleeping dogs were left to lie. I'd be curious to see what his
>> opinions on this are.
>>
>> IMHO the "data presentation" argument doesn't really hold much water.
>>
>> As for "data preservation," I rather see it as imposing structure on
>> it to enable efficient -- and sane/unambigous -- computation over it.
>> Further, I don't think is a preservation issue at all -- no data is
>> lost. The original data is still there in the file that was loaded
>> into R. The name of a column is changed when imported (with adequate
>> warning) into a data.table so that the user can slice and dice it. I'd
>> also guess the user being warned by the duplicate names would most
>> likely be happy to receive the warning, but the fact that you disagree
>> suggests that this isn't an obvious conclusion ;-)
>>
>> I'm curious if you would argue for an SQL table to allow duplicate
>> column names for the same reasons? I do know you can torture SQL to
>> get two colnames to be the same by aliasing, but this also seems to
>> have slipped through as an accident:
>>
>> http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf
>>
>> (which I found from here):
>>
>> http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table
>>
>> Perhaps we should email this guy Hugh to see what he thinks about this
>> one :-)
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131104/ce0faff3/attachment.html>


More information about the datatable-help mailing list