[datatable-help] Unexpected behavior in setnames()

Eduard Antonyan eduard.antonyan at gmail.com
Sun Nov 3 01:43:56 CET 2013


Tbh I don't see why data presentation and preservation (i.e. if you're
reading in data with duplicated columns) is not enough of a use case -
that's the only reason we allow arbitrary symbols in column names.

So, instead of giving you another use case, how about you tell me instead
what do you propose should happen here (instead of what happens now):

> dt = data.table(1, 2)
> dt
   V1 V2
1:  1  2
> dt[, sum(V2), by = V1]
   V1 V1
1:  1  2




On Sat, Nov 2, 2013 at 7:36 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com>wrote:

>  Eddi,
> While loading the data in, maybe, if it is essential to keep names intact,
> we can probably add an argument, "asis=TRUE" or something like that. But I
> don't see a reason for doing anything else in `data.table` using duplicate
> names and trying to catch errors when nothing meaningful can be done with
> them. Besides data presentation, can you tell any other use with them?
>
> Arun
>
> On Sunday, November 3, 2013 at 1:31 AM, Eduard Antonyan wrote:
>
> The main usage case I've personally encountered is data presentation (for
> either self or others), where I would sometimes organize data like so:
>
> category1 name,colname1,colname2,category2 name,colname1,colname2
> ....numbersandstuff....
>
> Also, in general there are many cases I brought up above that generate
> duplicate names, and I definitely don't want either lost columns or renamed
> columns as a result - both are data loss that I don't appreciate.
>
>
> On Sat, Nov 2, 2013 at 7:10 PM, Steve Lianoglou <lianoglou.steve at gene.com>wrote:
>
> Hi,
>
> On Sat, Nov 2, 2013 at 8:41 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> [snip]
> > Overall, I agree keeping duplicate names may help some users. But then,
> the
> > potential side-effects should be marked with warnings/errors distinctly,
> in
> > all cases (and preferably documented).
> [/snip]
>
> I guess I must have missed it, but has anyone anywhere (in this
> thread, a FR or something) actually present a (concrete) compelling
> situation where allowing duplicate column names was actually useful?
>
> I'm hard pressed to come up with any situation where (purposefully)
> keeping duplicate column names in a data.table has more benefit than
> downside. Seems to me that if this ever happens, it most certainly
> would be by mistake.
>
> Can someone help me out here?
>
> In the case of cbinding two data.tables together that end up having
> two duplicate names, I'd imagine unique-ing the names of the
> data.tables and firing a warning that this was done would be most
> useful (uniqueness priority would be from left to right as the
> data.tables are passed into the cbind call)
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131102/1c89d722/attachment-0001.html>


More information about the datatable-help mailing list