<div dir="ltr">Tbh I don't see why data presentation and preservation (i.e. if you're reading in data with duplicated columns) is not enough of a use case - that's the only reason we allow arbitrary symbols in column names.<br>
<br>So, instead of giving you another use case, how about you tell me instead what do you propose should happen here (instead of what happens now):<br><br>> dt = data.table(1, 2)<br>> dt<br> V1 V2<br>1: 1 2<br>
> dt[, sum(V2), by = V1]<br>
V1 V1<br>1: 1 2<br><br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Nov 2, 2013 at 7:36 PM, Arunkumar Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
Eddi,
</div><div>While loading the data in, maybe, if it is essential to keep names intact, we can probably add an argument, "asis=TRUE" or something like that. But I don't see a reason for doing anything else in `data.table` using duplicate names and trying to catch errors when nothing meaningful can be done with them. Besides data presentation, can you tell any other use with them?</div>
<div><div><br></div><div>Arun</div><div><br></div></div><div class="HOEnZb"><div class="h5">
<p style="color:#a0a0a8">On Sunday, November 3, 2013 at 1:31 AM, Eduard Antonyan wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px">
<span><div><div><div dir="ltr"><div><div><div>The main usage case I've personally encountered is data presentation (for either self or others), where I would sometimes organize data like so:<br><br>
</div>category1 name,colname1,colname2,category2 name,colname1,colname2<br>
</div>....numbersandstuff....<br><br></div><div>Also, in general there are many cases I brought up above that generate duplicate names, and I definitely don't want either lost columns or renamed columns as a result - both are data loss that I don't appreciate.<br>
</div></div><div><br><br><div>On Sat, Nov 2, 2013 at 7:10 PM, Steve Lianoglou <span dir="ltr"><<a href="mailto:lianoglou.steve@gene.com" target="_blank">lianoglou.steve@gene.com</a>></span> wrote:<br><blockquote type="cite">
<div>Hi,<br>
<br>
On Sat, Nov 2, 2013 at 8:41 AM, Arunkumar Srinivasan<br>
<<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>> wrote:<br>
[snip]<br>
<div>> Overall, I agree keeping duplicate names may help some users. But then, the<br>
> potential side-effects should be marked with warnings/errors distinctly, in<br>
> all cases (and preferably documented).<br>
</div>[/snip]<br>
<br>
I guess I must have missed it, but has anyone anywhere (in this<br>
thread, a FR or something) actually present a (concrete) compelling<br>
situation where allowing duplicate column names was actually useful?<br>
<br>
I'm hard pressed to come up with any situation where (purposefully)<br>
keeping duplicate column names in a data.table has more benefit than<br>
downside. Seems to me that if this ever happens, it most certainly<br>
would be by mistake.<br>
<br>
Can someone help me out here?<br>
<br>
In the case of cbinding two data.tables together that end up having<br>
two duplicate names, I'd imagine unique-ing the names of the<br>
data.tables and firing a warning that this was done would be most<br>
useful (uniqueness priority would be from left to right as the<br>
data.tables are passed into the cbind call)<br>
<span><font color="#888888"><br>
-steve<br>
<br>
--<br>
Steve Lianoglou<br>
Computational Biologist<br>
Bioinformatics and Computational Biology<br>
Genentech<br>
</font></span></div></blockquote></div><br></div>
</div></div></span>
</blockquote>
<div>
<br>
</div>
</div></div></blockquote></div><br></div>