[datatable-help] changing data.table by-without-by syntax to require a "by"

Eduard Antonyan eduard.antonyan at gmail.com
Mon Apr 29 15:40:27 CEST 2013


Thanks Arun, the examples you give are probably interesting in their own
right, but your post doesn't address advantages/disadvantages of either
current or proposed syntaxes and simply points out the (obvious) fact that
current (and other, similar in some ways to current) behavior is possible
to implement in R.


On Sat, Apr 27, 2013 at 10:49 AM, Arunkumar Srinivasan <
aragorn168b at gmail.com> wrote:

> Hello,
> I thought I'd also chip-in my thoughts to eddi's feature request.
> Short answer: I don't think this feature is necessary. I basically agree
> with mnel's reply.
> Long answer: My argument goes along these lines (in addition to the S3/S4
> methods mnel mentions). If you for example type `[.data.frame` in your
> R-session, you'd see this snippet:
>
>         if (is.matrix(i))
>             return(as.matrix(x)[i])
>
> That is, if you do:
>
>     df <- data.frame(x=1:5, y=1:5, z=1:5)
>     mm <- matrix(1:12, ncol=3)
>     df[mm] # gives
>     [1] 1 2 3 4 5 1 2 3 4 5 1 2
>
>     df <- data.frame(x=1:2, y=1:2, z=1:2)
>     df[mm] # gives
>     [1]  1  2  1  2  1  2 NA NA NA NA NA NA
>
> Here, the indexing is a matrix. It's obvious. Now, should this behaviour
> be changed because people would be confused that subsetting a data.frame
> resulted in a vector? Or because it's not user friendly? Even better, try
> out `df[mm, ]`. If `i` is a matrix, this is what the code does. I am not
> convinced this is "bad" design. Functions take arguments of different types
> ALL the time and they return outputs *depending on the type of input*. This
> is why I am not sold on the point of "bad design". It's essential to know
> the type of objects `i` can take and *understand* it.
>
> If a function is designed that takes several types of objects for `i` and
> their behaviour is documented, and the documented behaviour is consistent,
> then I can't accept there's a problem.
>
> I agree there are people who don't read the manual and "try" things out.
> But they are going to have problems with every other function in R.
>
> For example, "unstack" is a function for which same input type gives
> different output type. That is, it provides a data.frame if the columns are
> equal after unstaking and list if they are not. That is, compare the
> outputs of:
>
>     df <- data.frame(x=rep(1:3, each=3), y=1:9)
>     unstack(df, y ~ x)
>
> with
>
>     df <- data.frame(x=c(rep(1:3, each=3), 3), y=1:10)
>     unstack(df, y ~ x)
>
> But if people don't read the documentation, they wouldn't know this
> difference until they land up on errors. Now, making it user-friendly would
> mean that it "always" returns a list.
>
> Now, is this "bad" design because it gives two object types for same
> input? Does it require a change? I personally don't think so.
>
> To sum up, what eddi points out as "not being user-friendly" (or arguably
> "bad design") is everywhere inside R if you look closely. My view is that
> it's very clear that there should be some effort in understanding a
> function before using it. Not all functions are plain simple. Some
> functions have exceptions and some packages have a steep learning curve.
>
> Best,
> Arun.
>
>
> On Sat, Apr 27, 2013 at 12:00 PM, <
> datatable-help-request at lists.r-forge.r-project.org> wrote:
> >
> > Send datatable-help mailing list submissions to
> >         datatable-help at lists.r-forge.r-project.org
>
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> > or, via email, send a message with subject or body 'help' to
> >         datatable-help-request at lists.r-forge.r-project.org
>
> >
> > You can reach the person managing the list at
> >         datatable-help-owner at lists.r-forge.r-project.org
>
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of datatable-help digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: changing data.table by-without-by syntax to require a
> >       "by" (Frank Erickson)
> >    2. Re: variable column names (Sam Steingold)
> >    3. Re: variable column names (Matthew Dowle)
> >    4. Re: changing data.table by-without-by syntax to require a
> >       "by" (Matthew Dowle)
> >    5. Re: variable column names (Victor Kryukov)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 26 Apr 2013 15:34:39 -0500
> > From: Frank Erickson <FErickson at psu.edu>
> > To: "data.table source forge"
> >         <datatable-help at lists.r-forge.r-project.org>
>
> > Subject: Re: [datatable-help] changing data.table by-without-by syntax
> >         to require a "by"
> > Message-ID:
> >         <CAJd-hdkv1oiSjfA625oBxmXwr5YuVUzz==
> 3GLaWJTakAtzMJVw at mail.gmail.com>
>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > I disagree with the criticism of data.table's complexity (in the OP).
> > There's nothing wrong with overloading the syntax (that is what CS people
> > call it, right?). As long as Matthew's in control of it, it's likely to
> > have some internal consistency (which, of course, he could explain).
> > However, I like the suggestion to add options (defaulting to something
> > globally adjustable) to disable some of the overloading. Along similar
> > lines (I think), I find unique.data.table very unintuitive. I can see how
> > it could be useful, but strongly prefer base::unique for my current
> > applications.
> >
> > Anyway, I have nothing particular to say about the piece of syntax you
> all
> > are currently discussing. I just registered with this list to chime in
> > here, instead of further cluttering SO (where eddi answered one of my
> > questions yesterday). These emails sure are wide; must be like 1500px!
> > Interesting to try out this ancient mailing-list form of communication.
> > Please let me know if I should be using "Reply All" or actually quoting
> > that massive thread (as everyone else seems to be doing with each post).
> >
> > Frank
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130426/eb6556ae/attachment-0001.html
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Fri, 26 Apr 2013 18:02:31 -0400
> > From: Sam Steingold <sds at gnu.org>
> > To: datatable-help at lists.r-forge.r-project.org
>
> > Subject: Re: [datatable-help] variable column names
> > Message-ID: <87wqrpj6h4.fsf at gnu.org>
> > Content-Type: text/plain
> >
> > > * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
> > >
> > >> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53 +0100]:
>
> > >>
> > >> S.O. is probably better for this kind of question then.
> > >> But if you don't get an answer there, then come back to
> datatable-help.
> > >
> > >
> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
> >
> > downvoted, unlikely to be answered.
> >
> > --
> > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X
> 11.0.11300000
> > http://www.childpsy.net/ http://iris.org.il http://think-israel.org
> > http://americancensorship.org http://pmw.org.il http://mideasttruth.com
> > We have preferences. You have biases. They have prejudices.
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Fri, 26 Apr 2013 23:47:55 +0100
> > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > To: <sds at gnu.org>
> > Cc: datatable-help at lists.r-forge.r-project.org
>
> > Subject: Re: [datatable-help] variable column names
> > Message-ID: <30d6ae8f1a0d6974ebbd54da0d86f3b2 at imap.plus.net>
>
> > Content-Type: text/plain; charset=UTF-8; format=flowed
> >
> > On 26.04.2013 23:02, Sam Steingold wrote:
> > >> * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
> > >>
> > >>> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53
>
> > >>> +0100]:
> > >>>
> > >>> S.O. is probably better for this kind of question then.
> > >>> But if you don't get an answer there, then come back to
> > >>> datatable-help.
> > >>
> > >>
> > >>
> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
> > >
> > > downvoted, unlikely to be answered.
> >
> > I've read it through.
> >
> > Perhaps sleep on it, don't look for 24hrs and look again as if you were
> > trying to answer it yourself. Are there any small changes you can make
> > to make it easier to answer?  It wasn't me that downvoted but I suspect
> > it's been done to encourage you to improve the question. Downvotes can
> > (and often are) reversed.  I've had many more downvotes than you once,
> > but then I improved it and it went to +10.
> >
> > And, it's Friday and we've all had a long week!
> >
> > Matthew
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Sat, 27 Apr 2013 00:35:17 +0100
> > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > To: Frank Erickson <FErickson at psu.edu>
> > Cc: "data.table source forge"
> >         <datatable-help at lists.r-forge.r-project.org>
>
> > Subject: Re: [datatable-help] changing data.table by-without-by syntax
> >         to require a "by"
> > Message-ID: <be967ecd9c927ade15c15eb9985d919e at imap.plus.net>
>
> > Content-Type: text/plain; charset="utf-8"
> >
> >
> >
> > Thanks for your comments Frank.
> >
> > Ha, yes it's ancient but still has
> > a place. Yes "reply all": Back To: sender (if it's to someone in
> > particular) and cc the list. But on general topics where lots of people
> > are on the thread, just To: datatable-help alone is fine. Personally I
> > prefer "top posting". Like I'm doing now. I only scroll down if I need
> > to. I didn't notice the history was building up. If you comment inline
> > later, then say "scroll down for comments inline" or something at the
> > top. Note that Nabble collapses the history for you so threads are much
> > easier to read there. Or I tend to read via RSS (gmane) in Outlook, so
> > it feels like an email inbox which turns bold on new posts. You only
> > need to subscribe to post (spam control). Most people turn off mail
> > delivery pretty quickly I imagine (or setup an auto rule to move into a
> > folder, but then you might as well subscribe to RSS I guess).
> >
> > S.O. is
> > quite strict: must be clear questions with a clear answer, only one of
> > which can be accepted. No opinion, voting, discussing or notices (enter
> > mailing lists). Chat room is good but for quick chat when people are in
> > the room at the same time. Many companies (sensibly) block chat access,
> > though. Mailing lists allows all timezones a chance at a slower pace.
> > Anonymity is just as acceptable and as easy in both places.
> >
> > Matthew
> >
> >
> > On 26.04.2013 21:34, Frank Erickson wrote:
> >
> > > I disagree with the
> > criticism of data.table's complexity (in the OP). There's nothing wrong
> > with overloading the syntax (that is what CS people call it, right?). As
> > long as Matthew's in control of it, it's likely to have some internal
> > consistency (which, of course, he could explain). However, I like the
> > suggestion to add options (defaulting to something globally adjustable)
> > to disable some of the overloading. Along similar lines (I think), I
> > find unique.data.table very unintuitive. I can see how it could be
> > useful, but strongly prefer base::unique for my current applications.
> > >
> > Anyway, I have nothing particular to say about the piece of syntax you
> > all are currently discussing. I just registered with this list to chime
> > in here, instead of further cluttering SO (where eddi answered one of my
> > questions yesterday). These emails sure are wide; must be like 1500px!
> > Interesting to try out this ancient mailing-list form of communication.
> > Please let me know if I should be using "Reply All" or actually quoting
> > that massive thread (as everyone else seems to be doing with each post).
> >
> > > Frank
> >
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130427/260f3119/attachment-0001.html
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Fri, 26 Apr 2013 16:42:04 -0700
> > From: Victor Kryukov <victor.kryukov at gmail.com>
> > To: Matthew Dowle <mdowle at mdowle.plus.com>
> > Cc: datatable-help at lists.r-forge.r-project.org, sds at gnu.org
>
> > Subject: Re: [datatable-help] variable column names
> > Message-ID:
> >         <CANJmMqTz5+6djLEwpZxsub6LB=3L37=JB3xt5AhG1XgWG=
> nJgw at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
>
> >
> > On Fri, Apr 26, 2013 at 3:47 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
> > > On 26.04.2013 23:02, Sam Steingold wrote:
> > >>>
> http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
> > >>
> > >> downvoted, unlikely to be answered.
> > >
> > > I've read it through.
> > >
> > > Perhaps sleep on it, don't look for 24hrs and look again as if you were
> > > trying to answer it yourself. Are there any small changes you can make
> to
> > > make it easier to answer?  It wasn't me that downvoted but I suspect
> it's
> > > been done to encourage you to improve the question. Downvotes can (and
> often
> > > are) reversed.  I've had many more downvotes than you once, but then I
> > > improved it and it went to +10.
> > >
> > > And, it's Friday and we've all had a long week!
> >
> > Beautiful advice, Matthew!
> >
> > Sam - I've provided my answer (and even used Reduce since you seem to
> > be coming from Lisp land), but I also think some of the down
> > votes/comments have their merit.
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
>
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> > End of datatable-help Digest, Vol 38, Issue 26
> > **********************************************
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130429/a75f00ae/attachment-0001.html>


More information about the datatable-help mailing list