[datatable-help] changing data.table by-without-by syntax to require a "by"

Eduard Antonyan eduard.antonyan at gmail.com
Mon Apr 29 15:43:19 CEST 2013


It might help to think of this as an improvement proposal rather than a
problem fix proposal.


On Mon, Apr 29, 2013 at 8:40 AM, Eduard Antonyan
<eduard.antonyan at gmail.com>wrote:

> Thanks Arun, the examples you give are probably interesting in their own
> right, but your post doesn't address advantages/disadvantages of either
> current or proposed syntaxes and simply points out the (obvious) fact that
> current (and other, similar in some ways to current) behavior is possible
> to implement in R.
>
>
> On Sat, Apr 27, 2013 at 10:49 AM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>> Hello,
>> I thought I'd also chip-in my thoughts to eddi's feature request.
>> Short answer: I don't think this feature is necessary. I basically agree
>> with mnel's reply.
>> Long answer: My argument goes along these lines (in addition to the S3/S4
>> methods mnel mentions). If you for example type `[.data.frame` in your
>> R-session, you'd see this snippet:
>>
>>         if (is.matrix(i))
>>             return(as.matrix(x)[i])
>>
>> That is, if you do:
>>
>>     df <- data.frame(x=1:5, y=1:5, z=1:5)
>>     mm <- matrix(1:12, ncol=3)
>>     df[mm] # gives
>>     [1] 1 2 3 4 5 1 2 3 4 5 1 2
>>
>>     df <- data.frame(x=1:2, y=1:2, z=1:2)
>>     df[mm] # gives
>>     [1]  1  2  1  2  1  2 NA NA NA NA NA NA
>>
>> Here, the indexing is a matrix. It's obvious. Now, should this behaviour
>> be changed because people would be confused that subsetting a data.frame
>> resulted in a vector? Or because it's not user friendly? Even better, try
>> out `df[mm, ]`. If `i` is a matrix, this is what the code does. I am not
>> convinced this is "bad" design. Functions take arguments of different types
>> ALL the time and they return outputs *depending on the type of input*. This
>> is why I am not sold on the point of "bad design". It's essential to know
>> the type of objects `i` can take and *understand* it.
>>
>> If a function is designed that takes several types of objects for `i` and
>> their behaviour is documented, and the documented behaviour is consistent,
>> then I can't accept there's a problem.
>>
>> I agree there are people who don't read the manual and "try" things out.
>> But they are going to have problems with every other function in R.
>>
>> For example, "unstack" is a function for which same input type gives
>> different output type. That is, it provides a data.frame if the columns are
>> equal after unstaking and list if they are not. That is, compare the
>> outputs of:
>>
>>     df <- data.frame(x=rep(1:3, each=3), y=1:9)
>>     unstack(df, y ~ x)
>>
>> with
>>
>>     df <- data.frame(x=c(rep(1:3, each=3), 3), y=1:10)
>>     unstack(df, y ~ x)
>>
>> But if people don't read the documentation, they wouldn't know this
>> difference until they land up on errors. Now, making it user-friendly would
>> mean that it "always" returns a list.
>>
>> Now, is this "bad" design because it gives two object types for same
>> input? Does it require a change? I personally don't think so.
>>
>> To sum up, what eddi points out as "not being user-friendly" (or arguably
>> "bad design") is everywhere inside R if you look closely. My view is that
>> it's very clear that there should be some effort in understanding a
>> function before using it. Not all functions are plain simple. Some
>> functions have exceptions and some packages have a steep learning curve.
>>
>> Best,
>> Arun.
>>
>>
>> On Sat, Apr 27, 2013 at 12:00 PM, <
>> datatable-help-request at lists.r-forge.r-project.org> wrote:
>> >
>> > Send datatable-help mailing list submissions to
>> >         datatable-help at lists.r-forge.r-project.org
>>
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> > or, via email, send a message with subject or body 'help' to
>> >         datatable-help-request at lists.r-forge.r-project.org
>>
>> >
>> > You can reach the person managing the list at
>> >         datatable-help-owner at lists.r-forge.r-project.org
>>
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of datatable-help digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >    1. Re: changing data.table by-without-by syntax to require a
>> >       "by" (Frank Erickson)
>> >    2. Re: variable column names (Sam Steingold)
>> >    3. Re: variable column names (Matthew Dowle)
>> >    4. Re: changing data.table by-without-by syntax to require a
>> >       "by" (Matthew Dowle)
>> >    5. Re: variable column names (Victor Kryukov)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Fri, 26 Apr 2013 15:34:39 -0500
>>  > From: Frank Erickson <FErickson at psu.edu>
>> > To: "data.table source forge"
>> >         <datatable-help at lists.r-forge.r-project.org>
>>
>> > Subject: Re: [datatable-help] changing data.table by-without-by syntax
>> >         to require a "by"
>> > Message-ID:
>> >         <CAJd-hdkv1oiSjfA625oBxmXwr5YuVUzz==
>> 3GLaWJTakAtzMJVw at mail.gmail.com>
>>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > I disagree with the criticism of data.table's complexity (in the OP).
>> > There's nothing wrong with overloading the syntax (that is what CS
>> people
>> > call it, right?). As long as Matthew's in control of it, it's likely to
>> > have some internal consistency (which, of course, he could explain).
>> > However, I like the suggestion to add options (defaulting to something
>> > globally adjustable) to disable some of the overloading. Along similar
>> > lines (I think), I find unique.data.table very unintuitive. I can see
>> how
>> > it could be useful, but strongly prefer base::unique for my current
>> > applications.
>> >
>> > Anyway, I have nothing particular to say about the piece of syntax you
>> all
>> > are currently discussing. I just registered with this list to chime in
>> > here, instead of further cluttering SO (where eddi answered one of my
>> > questions yesterday). These emails sure are wide; must be like 1500px!
>> > Interesting to try out this ancient mailing-list form of communication.
>> > Please let me know if I should be using "Reply All" or actually quoting
>> > that massive thread (as everyone else seems to be doing with each post).
>> >
>> > Frank
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL: <
>> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130426/eb6556ae/attachment-0001.html
>> >
>> >
>> > ------------------------------
>> >
>> > Message: 2
>> > Date: Fri, 26 Apr 2013 18:02:31 -0400
>> > From: Sam Steingold <sds at gnu.org>
>> > To: datatable-help at lists.r-forge.r-project.org
>>
>> > Subject: Re: [datatable-help] variable column names
>> > Message-ID: <87wqrpj6h4.fsf at gnu.org>
>> > Content-Type: text/plain
>> >
>> > > * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
>> > >
>> > >> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53
>> +0100]:
>>
>> > >>
>> > >> S.O. is probably better for this kind of question then.
>> > >> But if you don't get an answer there, then come back to
>> datatable-help.
>> > >
>> > >
>> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
>> >
>> > downvoted, unlikely to be answered.
>> >
>> > --
>> > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X
>> 11.0.11300000
>> > http://www.childpsy.net/ http://iris.org.il http://think-israel.org
>> > http://americancensorship.org http://pmw.org.il http://mideasttruth.com
>> > We have preferences. You have biases. They have prejudices.
>> >
>> >
>> >
>> > ------------------------------
>> >
>> > Message: 3
>> > Date: Fri, 26 Apr 2013 23:47:55 +0100
>> > From: Matthew Dowle <mdowle at mdowle.plus.com>
>> > To: <sds at gnu.org>
>> > Cc: datatable-help at lists.r-forge.r-project.org
>>
>> > Subject: Re: [datatable-help] variable column names
>> > Message-ID: <30d6ae8f1a0d6974ebbd54da0d86f3b2 at imap.plus.net>
>>
>> > Content-Type: text/plain; charset=UTF-8; format=flowed
>> >
>> > On 26.04.2013 23:02, Sam Steingold wrote:
>> > >> * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
>> > >>
>> > >>> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53
>>
>> > >>> +0100]:
>> > >>>
>> > >>> S.O. is probably better for this kind of question then.
>> > >>> But if you don't get an answer there, then come back to
>> > >>> datatable-help.
>> > >>
>> > >>
>> > >>
>> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
>> > >
>> > > downvoted, unlikely to be answered.
>> >
>> > I've read it through.
>> >
>> > Perhaps sleep on it, don't look for 24hrs and look again as if you were
>> > trying to answer it yourself. Are there any small changes you can make
>> > to make it easier to answer?  It wasn't me that downvoted but I suspect
>> > it's been done to encourage you to improve the question. Downvotes can
>> > (and often are) reversed.  I've had many more downvotes than you once,
>> > but then I improved it and it went to +10.
>> >
>> > And, it's Friday and we've all had a long week!
>> >
>> > Matthew
>> >
>> >
>> >
>> >
>> > ------------------------------
>> >
>> > Message: 4
>> > Date: Sat, 27 Apr 2013 00:35:17 +0100
>> > From: Matthew Dowle <mdowle at mdowle.plus.com>
>> > To: Frank Erickson <FErickson at psu.edu>
>> > Cc: "data.table source forge"
>> >         <datatable-help at lists.r-forge.r-project.org>
>>
>> > Subject: Re: [datatable-help] changing data.table by-without-by syntax
>> >         to require a "by"
>> > Message-ID: <be967ecd9c927ade15c15eb9985d919e at imap.plus.net>
>>
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> >
>> >
>> > Thanks for your comments Frank.
>> >
>> > Ha, yes it's ancient but still has
>> > a place. Yes "reply all": Back To: sender (if it's to someone in
>> > particular) and cc the list. But on general topics where lots of people
>> > are on the thread, just To: datatable-help alone is fine. Personally I
>> > prefer "top posting". Like I'm doing now. I only scroll down if I need
>> > to. I didn't notice the history was building up. If you comment inline
>> > later, then say "scroll down for comments inline" or something at the
>> > top. Note that Nabble collapses the history for you so threads are much
>> > easier to read there. Or I tend to read via RSS (gmane) in Outlook, so
>> > it feels like an email inbox which turns bold on new posts. You only
>> > need to subscribe to post (spam control). Most people turn off mail
>> > delivery pretty quickly I imagine (or setup an auto rule to move into a
>> > folder, but then you might as well subscribe to RSS I guess).
>> >
>> > S.O. is
>> > quite strict: must be clear questions with a clear answer, only one of
>> > which can be accepted. No opinion, voting, discussing or notices (enter
>> > mailing lists). Chat room is good but for quick chat when people are in
>> > the room at the same time. Many companies (sensibly) block chat access,
>> > though. Mailing lists allows all timezones a chance at a slower pace.
>> > Anonymity is just as acceptable and as easy in both places.
>> >
>> > Matthew
>> >
>> >
>> > On 26.04.2013 21:34, Frank Erickson wrote:
>> >
>> > > I disagree with the
>> > criticism of data.table's complexity (in the OP). There's nothing wrong
>> > with overloading the syntax (that is what CS people call it, right?). As
>> > long as Matthew's in control of it, it's likely to have some internal
>> > consistency (which, of course, he could explain). However, I like the
>> > suggestion to add options (defaulting to something globally adjustable)
>> > to disable some of the overloading. Along similar lines (I think), I
>> > find unique.data.table very unintuitive. I can see how it could be
>> > useful, but strongly prefer base::unique for my current applications.
>> > >
>> > Anyway, I have nothing particular to say about the piece of syntax you
>> > all are currently discussing. I just registered with this list to chime
>> > in here, instead of further cluttering SO (where eddi answered one of my
>> > questions yesterday). These emails sure are wide; must be like 1500px!
>> > Interesting to try out this ancient mailing-list form of communication.
>> > Please let me know if I should be using "Reply All" or actually quoting
>> > that massive thread (as everyone else seems to be doing with each post).
>> >
>> > > Frank
>> >
>> >
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL: <
>> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130427/260f3119/attachment-0001.html
>> >
>> >
>> > ------------------------------
>> >
>> > Message: 5
>> > Date: Fri, 26 Apr 2013 16:42:04 -0700
>> > From: Victor Kryukov <victor.kryukov at gmail.com>
>> > To: Matthew Dowle <mdowle at mdowle.plus.com>
>> > Cc: datatable-help at lists.r-forge.r-project.org, sds at gnu.org
>>
>> > Subject: Re: [datatable-help] variable column names
>> > Message-ID:
>> >         <CANJmMqTz5+6djLEwpZxsub6LB=3L37=JB3xt5AhG1XgWG=
>> nJgw at mail.gmail.com>
>> > Content-Type: text/plain; charset=ISO-8859-1
>>
>> >
>> > On Fri, Apr 26, 2013 at 3:47 PM, Matthew Dowle <mdowle at mdowle.plus.com>
>> wrote:
>> > > On 26.04.2013 23:02, Sam Steingold wrote:
>> > >>>
>> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
>> > >>
>> > >> downvoted, unlikely to be answered.
>> > >
>> > > I've read it through.
>> > >
>> > > Perhaps sleep on it, don't look for 24hrs and look again as if you
>> were
>> > > trying to answer it yourself. Are there any small changes you can
>> make to
>> > > make it easier to answer?  It wasn't me that downvoted but I suspect
>> it's
>> > > been done to encourage you to improve the question. Downvotes can
>> (and often
>> > > are) reversed.  I've had many more downvotes than you once, but then I
>> > > improved it and it went to +10.
>> > >
>> > > And, it's Friday and we've all had a long week!
>> >
>> > Beautiful advice, Matthew!
>> >
>> > Sam - I've provided my answer (and even used Reduce since you seem to
>> > be coming from Lisp land), but I also think some of the down
>> > votes/comments have their merit.
>> >
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>>
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> > End of datatable-help Digest, Vol 38, Issue 26
>> > **********************************************
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130429/b1bb0342/attachment-0001.html>


More information about the datatable-help mailing list