<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">My 2 cents here.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">There are several reasons why I don’t think, IMHO, allowing multiple columns with the same name is a good idea:</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"> </div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">- It will force the code to use column numbers to access all the data in a predictable fashion (since depending on your code you might now know which of the two columns with the same name will be the first), so we’ll lose all the delicious syntactic sugar painstakingly added to data.table.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">- For people learning data.table and having data.frame or even the concept of a relational table as a reference, this is a definite WTF and will cause confusion and complicate troubleshooting. I speak from experience on this matter. :)</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Even though there might be some situations where this might be a plus, I imagine they are few and far between and could be worked around. I could be wrong, it’s been know to happen :) - but I have never seen and can’t even imagine a situation where multiple columns with the same name would be essential. So in the balance I consider keeping this behavior as a bad trade-off for most users.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Having said that, this is a design decision and it's up to the data.table demigods to decide. :)</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">BTW, is there any part of the data.table documentation that covers this? If you choose to maintain this property, I strongly suggest it be documented somewhere that most beginners would read.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">In my personal example, I ran into this problem after a rather long troubleshooting of a very esoteric problem that was happening in my code. I was renaming a column to a name that already existed, and this broke things in a completely different part of my code. If ‘setnames()’ had at least warned me that a duplicate column name was created, I would have been able to detect the source cause much faster.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div> <div id="bloop_sign_1383392571821545984"><span style="font-family:helvetica,arial;font-size:13px"></span><div style="font-family: Helvetica; line-height: normal; ">-- </div><div style="font-family: Helvetica; line-height: normal; "><span style="font-family: arial; font-size: small; ">Alexandre Sieira</span><br style="font-family: arial; font-size: small; "><span style="font-family: arial; font-size: small; ">CISA, CISSP, ISO 27001 Lead Auditor</span><br style="font-family: arial; font-size: small; "><br style="font-family: arial; font-size: small; "><span style="font-family: arial; font-size: small; ">"The truth is rarely pure and never simple."</span><br style="font-family: arial; font-size: small; "><span style="font-family: arial; font-size: small; ">Oscar Wilde, The Importance of Being Earnest, 1895, Act I</span></div></div> <br><p style="color:#A0A0A8;">On 1 de novembro de 2013 at 21:10:54, datatable-help-request@lists.r-forge.r-project.org (<a href="mailto://datatable-help-request@lists.r-forge.r-project.org">datatable-help-request@lists.r-forge.r-project.org</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div>Send datatable-help mailing list submissions to
<br> datatable-help@lists.r-forge.r-project.org
<br>
<br>To subscribe or unsubscribe via the World Wide Web, visit
<br> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>
<br>or, via email, send a message with subject or body 'help' to
<br> datatable-help-request@lists.r-forge.r-project.org
<br>
<br>You can reach the person managing the list at
<br> datatable-help-owner@lists.r-forge.r-project.org
<br>
<br>When replying, please edit your Subject line so it is more specific
<br>than "Re: Contents of datatable-help digest..."
<br>
<br>
<br>Today's Topics:
<br>
<br> 1. Re: Unexpected behavior in setnames() (Arunkumar Srinivasan)
<br> 2. Re: Unexpected behavior in setnames() (Eduard Antonyan)
<br> 3. Re: Unexpected behavior in setnames() (Arunkumar Srinivasan)
<br>
<br>
<br>----------------------------------------------------------------------
<br>
<br>Message: 1
<br>Date: Sat, 2 Nov 2013 00:02:38 +0100
<br>From: Arunkumar Srinivasan <aragorn168b@gmail.com>
<br>To: Eduard Antonyan <eduard.antonyan@gmail.com>
<br>Cc: "=?utf-8?Q?datatable-help=40lists.r-forge.r-project.org?="
<br> <datatable-help@lists.r-forge.r-project.org>, Alexandre Sieira
<br> <alexandre.sieira@gmail.com>
<br>Subject: Re: [datatable-help] Unexpected behavior in setnames()
<br>Message-ID: <5E98018F047943DE89849EC57A7CF72A@gmail.com>
<br>Content-Type: text/plain; charset="utf-8"
<br>
<br>Yes, it chooses the first. But we won't be able to perform any operation as intended. So why allow duplicate names (ex: in `setnames` as Alexandre asks)?
<br>
<br>Arun
<br>
<br>
<br>On Friday, November 1, 2013 at 11:57 PM, Eduard Antonyan wrote:
<br>
<br>> I think currently it chooses the first "x", but it's definitely a good idea to add a warning there.
<br>>
<br>>
<br>> On Fri, Nov 1, 2013 at 5:51 PM, Arunkumar Srinivasan <aragorn168b@gmail.com (mailto:aragorn168b@gmail.com)> wrote:
<br>> > Ricardo added a bug report here on this topic: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5008&group_id=240&atid=975
<br>> > But I don't think having duplicate names is an easy-to-implement concept. For ex:
<br>> >
<br>> > dt <- data.table(x=1:3, x=4:6, y=c(1,1,2))
<br>> > dt[, print(.SD), by=y]
<br>> > x
<br>> > 1: 1
<br>> > 2: 2
<br>> > x
<br>> > 1: 3
<br>> >
<br>> >
<br>> > .SD loses the second "x". Also, some other questions become difficult to handle. Ex:
<br>> >
<br>> > dt <- data.table(x=c(1,1,2,2), y=c(1,2,3,4), x=c(2,2,1,1))
<br>> > dt[, list(x=x/x[1], y=y), by=x]
<br>> >
<br>> >
<br>> > Which "x" should be choose for which operation?
<br>> >
<br>> > Arun
<br>> >
<br>> >
<br>> > On Friday, November 1, 2013 at 10:59 PM, Eduard Antonyan wrote:
<br>> >
<br>> > > Having duplicate names is allowed and not that unusual in data.table framework, so there is no need to signal anything here.
<br>> > >
<br>> > > A different question is whether there should be a warning here:
<br>> > >
<br>> > > dt = data.table(a = 1, a = 2)
<br>> > > dt[, a]
<br>> > >
<br>> > > and I think that'd be a pretty good FR to have.
<br>> > >
<br>> > >
<br>> > > On Fri, Nov 1, 2013 at 4:49 PM, Alexandre Sieira <alexandre.sieira@gmail.com (mailto:alexandre.sieira@gmail.com)> wrote:
<br>> > > > I found this behavior during a debugging session:
<br>> > > >
<br>> > > > > d = data.table(a=1, b=2, c=3)
<br>> > > > > setnames(d, "a", "b")
<br>> > > > > d
<br>> > > > b b c
<br>> > > > 1: 1 2 3
<br>> > > >
<br>> > > > Shouldn?t setnames() check if the new column names already exist before renaming, and signal an error or at least a warning if they do?
<br>> > > > --
<br>> > > > Alexandre Sieira
<br>> > > > CISA, CISSP, ISO 27001 Lead Auditor
<br>> > > >
<br>> > > > "The truth is rarely pure and never simple."
<br>> > > > Oscar Wilde, The Importance of Being Earnest, 1895, Act I
<br>> > > > _______________________________________________
<br>> > > > datatable-help mailing list
<br>> > > > datatable-help@lists.r-forge.r-project.org (mailto:datatable-help@lists.r-forge.r-project.org)
<br>> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>> > >
<br>> > > _______________________________________________
<br>> > > datatable-help mailing list
<br>> > > datatable-help@lists.r-forge.r-project.org (mailto:datatable-help@lists.r-forge.r-project.org)
<br>> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>> > >
<br>> > >
<br>> > >
<br>> >
<br>> >
<br>>
<br>
<br>-------------- next part --------------
<br>An HTML attachment was scrubbed...
<br>URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131102/9e1310b6/attachment-0001.html>
<br>
<br>------------------------------
<br>
<br>Message: 2
<br>Date: Fri, 1 Nov 2013 18:05:46 -0500
<br>From: Eduard Antonyan <eduard.antonyan@gmail.com>
<br>To: Arunkumar Srinivasan <aragorn168b@gmail.com>
<br>Cc: "datatable-help@lists.r-forge.r-project.org"
<br> <datatable-help@lists.r-forge.r-project.org>, Alexandre Sieira
<br> <alexandre.sieira@gmail.com>
<br>Subject: Re: [datatable-help] Unexpected behavior in setnames()
<br>Message-ID:
<br> <CAHZcBOp_kBZtrUGBXH0Op9zwWjnOyXGim-e+5d+uw1eyTnoz1g@mail.gmail.com>
<br>Content-Type: text/plain; charset="windows-1252"
<br>
<br>Because it's very useful for e.g. data presentation purposes.
<br>
<br>
<br>On Fri, Nov 1, 2013 at 6:02 PM, Arunkumar Srinivasan
<br><aragorn168b@gmail.com>wrote:
<br>
<br>> Yes, it chooses the first. But we won't be able to perform any operation
<br>> as intended. So why allow duplicate names (ex: in `setnames` as Alexandre
<br>> asks)?
<br>>
<br>> Arun
<br>>
<br>> On Friday, November 1, 2013 at 11:57 PM, Eduard Antonyan wrote:
<br>>
<br>> I think currently it chooses the first "x", but it's definitely a good
<br>> idea to add a warning there.
<br>>
<br>>
<br>> On Fri, Nov 1, 2013 at 5:51 PM, Arunkumar Srinivasan <
<br>> aragorn168b@gmail.com> wrote:
<br>>
<br>> Ricardo added a bug report here on this topic:
<br>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5008&group_id=240&atid=975
<br>> But I don't think having duplicate names is an easy-to-implement concept.
<br>> For ex:
<br>>
<br>> dt <- data.table(x=1:3, x=4:6, y=c(1,1,2))
<br>> dt[, print(.SD), by=y]
<br>> x
<br>> 1: 1
<br>> 2: 2
<br>> x
<br>> 1: 3
<br>>
<br>> .SD loses the second "x". Also, some other questions become difficult to
<br>> handle. Ex:
<br>>
<br>> dt <- data.table(x=c(1,1,2,2), y=c(1,2,3,4), x=c(2,2,1,1))
<br>> dt[, list(x=x/x[1], y=y), by=x]
<br>>
<br>> Which "x" should be choose for which operation?
<br>>
<br>> Arun
<br>>
<br>> On Friday, November 1, 2013 at 10:59 PM, Eduard Antonyan wrote:
<br>>
<br>> Having duplicate names is allowed and not that unusual in data.table
<br>> framework, so there is no need to signal anything here.
<br>>
<br>> A different question is whether there should be a warning here:
<br>>
<br>> dt = data.table(a = 1, a = 2)
<br>> dt[, a]
<br>>
<br>> and I think that'd be a pretty good FR to have.
<br>>
<br>>
<br>> On Fri, Nov 1, 2013 at 4:49 PM, Alexandre Sieira <
<br>> alexandre.sieira@gmail.com> wrote:
<br>>
<br>> I found this behavior during a debugging session:
<br>>
<br>> > d = data.table(a=1, b=2, c=3)
<br>> > setnames(d, "a", "b")
<br>> > d
<br>> b b c
<br>> 1: 1 2 3
<br>>
<br>> Shouldn?t setnames() check if the new column names already exist before
<br>> renaming, and signal an error or at least a warning if they do?
<br>>
<br>> --
<br>> Alexandre Sieira
<br>> CISA, CISSP, ISO 27001 Lead Auditor
<br>>
<br>> "The truth is rarely pure and never simple."
<br>> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
<br>>
<br>> _______________________________________________
<br>> datatable-help mailing list
<br>> datatable-help@lists.r-forge.r-project.org
<br>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>>
<br>>
<br>> _______________________________________________
<br>> datatable-help mailing list
<br>> datatable-help@lists.r-forge.r-project.org
<br>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>>
<br>>
<br>>
<br>>
<br>>
<br>-------------- next part --------------
<br>An HTML attachment was scrubbed...
<br>URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131101/664bdb65/attachment-0001.html>
<br>
<br>------------------------------
<br>
<br>Message: 3
<br>Date: Sat, 2 Nov 2013 00:10:41 +0100
<br>From: Arunkumar Srinivasan <aragorn168b@gmail.com>
<br>To: Eduard Antonyan <eduard.antonyan@gmail.com>
<br>Cc: "=?utf-8?Q?datatable-help=40lists.r-forge.r-project.org?="
<br> <datatable-help@lists.r-forge.r-project.org>, Alexandre Sieira
<br> <alexandre.sieira@gmail.com>
<br>Subject: Re: [datatable-help] Unexpected behavior in setnames()
<br>Message-ID: <D70F31E4E83842EF95F46C9565E7AEEA@gmail.com>
<br>Content-Type: text/plain; charset="utf-8"
<br>
<br>Hm, I've not encountered that use myself, can't comment there. Probably then it should be allowed everywhere except where deciding which column could be an issue? Ex: subsetting/aggregating/grouping/by-without-by etc.. should result in error (if one has the time, one could do this by checking if the duplicate column is in use actually or not and then issue an error/warning).
<br>
<br>At the moment, I'm not convinced that it's worth that much trouble to help data presentation.
<br>
<br>Arun
<br>
<br>
<br>On Saturday, November 2, 2013 at 12:05 AM, Eduard Antonyan wrote:
<br>
<br>> Because it's very useful for e.g. data presentation purposes.
<br>>
<br>>
<br>> On Fri, Nov 1, 2013 at 6:02 PM, Arunkumar Srinivasan <aragorn168b@gmail.com (mailto:aragorn168b@gmail.com)> wrote:
<br>> > Yes, it chooses the first. But we won't be able to perform any operation as intended. So why allow duplicate names (ex: in `setnames` as Alexandre asks)?
<br>> >
<br>> > Arun
<br>> >
<br>> >
<br>> > On Friday, November 1, 2013 at 11:57 PM, Eduard Antonyan wrote:
<br>> >
<br>> > > I think currently it chooses the first "x", but it's definitely a good idea to add a warning there.
<br>> > >
<br>> > >
<br>> > > On Fri, Nov 1, 2013 at 5:51 PM, Arunkumar Srinivasan <aragorn168b@gmail.com (mailto:aragorn168b@gmail.com)> wrote:
<br>> > > > Ricardo added a bug report here on this topic: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5008&group_id=240&atid=975
<br>> > > > But I don't think having duplicate names is an easy-to-implement concept. For ex:
<br>> > > >
<br>> > > > dt <- data.table(x=1:3, x=4:6, y=c(1,1,2))
<br>> > > > dt[, print(.SD), by=y]
<br>> > > > x
<br>> > > > 1: 1
<br>> > > > 2: 2
<br>> > > > x
<br>> > > > 1: 3
<br>> > > >
<br>> > > >
<br>> > > > .SD loses the second "x". Also, some other questions become difficult to handle. Ex:
<br>> > > >
<br>> > > > dt <- data.table(x=c(1,1,2,2), y=c(1,2,3,4), x=c(2,2,1,1))
<br>> > > > dt[, list(x=x/x[1], y=y), by=x]
<br>> > > >
<br>> > > >
<br>> > > > Which "x" should be choose for which operation?
<br>> > > >
<br>> > > > Arun
<br>> > > >
<br>> > > >
<br>> > > > On Friday, November 1, 2013 at 10:59 PM, Eduard Antonyan wrote:
<br>> > > >
<br>> > > > > Having duplicate names is allowed and not that unusual in data.table framework, so there is no need to signal anything here.
<br>> > > > >
<br>> > > > > A different question is whether there should be a warning here:
<br>> > > > >
<br>> > > > > dt = data.table(a = 1, a = 2)
<br>> > > > > dt[, a]
<br>> > > > >
<br>> > > > > and I think that'd be a pretty good FR to have.
<br>> > > > >
<br>> > > > >
<br>> > > > > On Fri, Nov 1, 2013 at 4:49 PM, Alexandre Sieira <alexandre.sieira@gmail.com (mailto:alexandre.sieira@gmail.com)> wrote:
<br>> > > > > > I found this behavior during a debugging session:
<br>> > > > > >
<br>> > > > > > > d = data.table(a=1, b=2, c=3)
<br>> > > > > > > setnames(d, "a", "b")
<br>> > > > > > > d
<br>> > > > > > b b c
<br>> > > > > > 1: 1 2 3
<br>> > > > > >
<br>> > > > > > Shouldn?t setnames() check if the new column names already exist before renaming, and signal an error or at least a warning if they do?
<br>> > > > > > --
<br>> > > > > > Alexandre Sieira
<br>> > > > > > CISA, CISSP, ISO 27001 Lead Auditor
<br>> > > > > >
<br>> > > > > > "The truth is rarely pure and never simple."
<br>> > > > > > Oscar Wilde, The Importance of Being Earnest, 1895, Act I
<br>> > > > > > _______________________________________________
<br>> > > > > > datatable-help mailing list
<br>> > > > > > datatable-help@lists.r-forge.r-project.org (mailto:datatable-help@lists.r-forge.r-project.org)
<br>> > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>> > > > >
<br>> > > > > _______________________________________________
<br>> > > > > datatable-help mailing list
<br>> > > > > datatable-help@lists.r-forge.r-project.org (mailto:datatable-help@lists.r-forge.r-project.org)
<br>> > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>> > > > >
<br>> > > > >
<br>> > > > >
<br>> > > >
<br>> > > >
<br>> > >
<br>> >
<br>>
<br>
<br>-------------- next part --------------
<br>An HTML attachment was scrubbed...
<br>URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131102/d785e3f6/attachment.html>
<br>
<br>------------------------------
<br>
<br>_______________________________________________
<br>datatable-help mailing list
<br>datatable-help@lists.r-forge.r-project.org
<br>https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
<br>
<br>End of datatable-help Digest, Vol 45, Issue 2
<br>*********************************************
<br></div></div></span></blockquote></body></html>