[datatable-help] Unexpected behavior in setnames()

Eduard Antonyan eduard.antonyan at gmail.com
Wed Nov 6 17:05:04 CET 2013


Last comment here has an example of using duplicated names -
http://stackoverflow.com/a/19809942/817778 - it's very similar to the one I
mentioned earlier.


On Mon, Nov 4, 2013 at 3:54 AM, Chinmay Patil <chinmay.patil at gmail.com>wrote:

> FWIW, data.frame does allow duplicate names as well. In the light that
> data.table inherits from data.frame, I would expect that it follows same
> convention as data.frame.
>
>
> On Sun, Nov 3, 2013 at 9:43 AM, Eduard Antonyan <eduard.antonyan at gmail.com
> > wrote:
>
>> @Arun: Ok. Thinking about it a bit - I don't like the continuing
>> enumeration solution because it makes the results too unpredictable, but
>> could live with adding a ".1" etc. Which I assume is the idea anyway for
>> resolving duplicates elsewhere.
>>
>> @Steve: Not sure why you think it doesn't hold much water - I think I can
>> draw a parallel argument that replicates all of the duplicated names
>> concerns with a column that is called e.g. `dt$V1` (imagine forgetting the
>> backticks there and the world of hurt that potentially awaits once you do
>> that). I am also curious what Matthew would think about this. This is smth
>> I've encountered and dealt with a lot, so I'm certainly not an unbiased
>> party here.
>>
>>
>> On Sat, Nov 2, 2013 at 8:15 PM, Steve Lianoglou <lianoglou.steve at gene.com
>> > wrote:
>>
>>> On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan
>>> <eduard.antonyan at gmail.com> wrote:
>>> > Tbh I don't see why data presentation and preservation (i.e. if you're
>>> > reading in data with duplicated columns) is not enough of a use case -
>>> > that's the only reason we allow arbitrary symbols in column names.
>>> >
>>> > So, instead of giving you another use case, how about you tell me
>>> instead
>>> > what do you propose should happen here (instead of what happens now):
>>> >
>>> >> dt = data.table(1, 2)
>>> >> dt
>>> >    V1 V2
>>> > 1:  1  2
>>> >> dt[, sum(V2), by = V1]
>>> >    V1 V1
>>> > 1:  1  2
>>>
>>> Only Matthew could say for sure, but if I were a gambling man I'd bet
>>> that this was likely something that slipped through the cracks and
>>> sleeping dogs were left to lie. I'd be curious to see what his
>>> opinions on this are.
>>>
>>> IMHO the "data presentation" argument doesn't really hold much water.
>>>
>>> As for "data preservation," I rather see it as imposing structure on
>>> it to enable efficient -- and sane/unambigous -- computation over it.
>>> Further, I don't think is a preservation issue at all -- no data is
>>> lost. The original data is still there in the file that was loaded
>>> into R. The name of a column is changed when imported (with adequate
>>> warning) into a data.table so that the user can slice and dice it. I'd
>>> also guess the user being warned by the duplicate names would most
>>> likely be happy to receive the warning, but the fact that you disagree
>>> suggests that this isn't an obvious conclusion ;-)
>>>
>>> I'm curious if you would argue for an SQL table to allow duplicate
>>> column names for the same reasons? I do know you can torture SQL to
>>> get two colnames to be the same by aliasing, but this also seems to
>>> have slipped through as an accident:
>>>
>>> http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf
>>>
>>> (which I found from here):
>>>
>>> http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table
>>>
>>> Perhaps we should email this guy Hugh to see what he thinks about this
>>> one :-)
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Computational Biologist
>>> Bioinformatics and Computational Biology
>>> Genentech
>>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131106/0d754b9e/attachment.html>


More information about the datatable-help mailing list