[datatable-help] changing data.table by-without-by syntax to require a "by"

Arunkumar Srinivasan aragorn168b at gmail.com
Sat Apr 27 17:49:13 CEST 2013


Hello,
I thought I'd also chip-in my thoughts to eddi's feature request.
Short answer: I don't think this feature is necessary. I basically agree
with mnel's reply.
Long answer: My argument goes along these lines (in addition to the S3/S4
methods mnel mentions). If you for example type `[.data.frame` in your
R-session, you'd see this snippet:

        if (is.matrix(i))
            return(as.matrix(x)[i])

That is, if you do:

    df <- data.frame(x=1:5, y=1:5, z=1:5)
    mm <- matrix(1:12, ncol=3)
    df[mm] # gives
    [1] 1 2 3 4 5 1 2 3 4 5 1 2

    df <- data.frame(x=1:2, y=1:2, z=1:2)
    df[mm] # gives
    [1]  1  2  1  2  1  2 NA NA NA NA NA NA

Here, the indexing is a matrix. It's obvious. Now, should this behaviour be
changed because people would be confused that subsetting a data.frame
resulted in a vector? Or because it's not user friendly? Even better, try
out `df[mm, ]`. If `i` is a matrix, this is what the code does. I am not
convinced this is "bad" design. Functions take arguments of different types
ALL the time and they return outputs *depending on the type of input*. This
is why I am not sold on the point of "bad design". It's essential to know
the type of objects `i` can take and *understand* it.

If a function is designed that takes several types of objects for `i` and
their behaviour is documented, and the documented behaviour is consistent,
then I can't accept there's a problem.

I agree there are people who don't read the manual and "try" things out.
But they are going to have problems with every other function in R.

For example, "unstack" is a function for which same input type gives
different output type. That is, it provides a data.frame if the columns are
equal after unstaking and list if they are not. That is, compare the
outputs of:

    df <- data.frame(x=rep(1:3, each=3), y=1:9)
    unstack(df, y ~ x)

with

    df <- data.frame(x=c(rep(1:3, each=3), 3), y=1:10)
    unstack(df, y ~ x)

But if people don't read the documentation, they wouldn't know this
difference until they land up on errors. Now, making it user-friendly would
mean that it "always" returns a list.

Now, is this "bad" design because it gives two object types for same input?
Does it require a change? I personally don't think so.

To sum up, what eddi points out as "not being user-friendly" (or arguably
"bad design") is everywhere inside R if you look closely. My view is that
it's very clear that there should be some effort in understanding a
function before using it. Not all functions are plain simple. Some
functions have exceptions and some packages have a steep learning curve.

Best,
Arun.


On Sat, Apr 27, 2013 at 12:00 PM, <
datatable-help-request at lists.r-forge.r-project.org> wrote:
>
> Send datatable-help mailing list submissions to
>         datatable-help at lists.r-forge.r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> or, via email, send a message with subject or body 'help' to
>         datatable-help-request at lists.r-forge.r-project.org
>
> You can reach the person managing the list at
>         datatable-help-owner at lists.r-forge.r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of datatable-help digest..."
>
>
> Today's Topics:
>
>    1. Re: changing data.table by-without-by syntax to require a
>       "by" (Frank Erickson)
>    2. Re: variable column names (Sam Steingold)
>    3. Re: variable column names (Matthew Dowle)
>    4. Re: changing data.table by-without-by syntax to require a
>       "by" (Matthew Dowle)
>    5. Re: variable column names (Victor Kryukov)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 26 Apr 2013 15:34:39 -0500
> From: Frank Erickson <FErickson at psu.edu>
> To: "data.table source forge"
>         <datatable-help at lists.r-forge.r-project.org>
> Subject: Re: [datatable-help] changing data.table by-without-by syntax
>         to require a "by"
> Message-ID:
>         <CAJd-hdkv1oiSjfA625oBxmXwr5YuVUzz==
3GLaWJTakAtzMJVw at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I disagree with the criticism of data.table's complexity (in the OP).
> There's nothing wrong with overloading the syntax (that is what CS people
> call it, right?). As long as Matthew's in control of it, it's likely to
> have some internal consistency (which, of course, he could explain).
> However, I like the suggestion to add options (defaulting to something
> globally adjustable) to disable some of the overloading. Along similar
> lines (I think), I find unique.data.table very unintuitive. I can see how
> it could be useful, but strongly prefer base::unique for my current
> applications.
>
> Anyway, I have nothing particular to say about the piece of syntax you all
> are currently discussing. I just registered with this list to chime in
> here, instead of further cluttering SO (where eddi answered one of my
> questions yesterday). These emails sure are wide; must be like 1500px!
> Interesting to try out this ancient mailing-list form of communication.
> Please let me know if I should be using "Reply All" or actually quoting
> that massive thread (as everyone else seems to be doing with each post).
>
> Frank
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130426/eb6556ae/attachment-0001.html
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 26 Apr 2013 18:02:31 -0400
> From: Sam Steingold <sds at gnu.org>
> To: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] variable column names
> Message-ID: <87wqrpj6h4.fsf at gnu.org>
> Content-Type: text/plain
>
> > * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
> >
> >> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53 +0100]:
> >>
> >> S.O. is probably better for this kind of question then.
> >> But if you don't get an answer there, then come back to datatable-help.
> >
> >
http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
>
> downvoted, unlikely to be answered.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X
11.0.11300000
> http://www.childpsy.net/ http://iris.org.il http://think-israel.org
> http://americancensorship.org http://pmw.org.il http://mideasttruth.com
> We have preferences. You have biases. They have prejudices.
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 26 Apr 2013 23:47:55 +0100
> From: Matthew Dowle <mdowle at mdowle.plus.com>
> To: <sds at gnu.org>
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] variable column names
> Message-ID: <30d6ae8f1a0d6974ebbd54da0d86f3b2 at imap.plus.net>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 26.04.2013 23:02, Sam Steingold wrote:
> >> * Sam Steingold <fqf at tah.bet> [2013-04-26 13:05:39 -0400]:
> >>
> >>> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-26 17:45:53
> >>> +0100]:
> >>>
> >>> S.O. is probably better for this kind of question then.
> >>> But if you don't get an answer there, then come back to
> >>> datatable-help.
> >>
> >>
> >>
http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
> >
> > downvoted, unlikely to be answered.
>
> I've read it through.
>
> Perhaps sleep on it, don't look for 24hrs and look again as if you were
> trying to answer it yourself. Are there any small changes you can make
> to make it easier to answer?  It wasn't me that downvoted but I suspect
> it's been done to encourage you to improve the question. Downvotes can
> (and often are) reversed.  I've had many more downvotes than you once,
> but then I improved it and it went to +10.
>
> And, it's Friday and we've all had a long week!
>
> Matthew
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Sat, 27 Apr 2013 00:35:17 +0100
> From: Matthew Dowle <mdowle at mdowle.plus.com>
> To: Frank Erickson <FErickson at psu.edu>
> Cc: "data.table source forge"
>         <datatable-help at lists.r-forge.r-project.org>
> Subject: Re: [datatable-help] changing data.table by-without-by syntax
>         to require a "by"
> Message-ID: <be967ecd9c927ade15c15eb9985d919e at imap.plus.net>
> Content-Type: text/plain; charset="utf-8"
>
>
>
> Thanks for your comments Frank.
>
> Ha, yes it's ancient but still has
> a place. Yes "reply all": Back To: sender (if it's to someone in
> particular) and cc the list. But on general topics where lots of people
> are on the thread, just To: datatable-help alone is fine. Personally I
> prefer "top posting". Like I'm doing now. I only scroll down if I need
> to. I didn't notice the history was building up. If you comment inline
> later, then say "scroll down for comments inline" or something at the
> top. Note that Nabble collapses the history for you so threads are much
> easier to read there. Or I tend to read via RSS (gmane) in Outlook, so
> it feels like an email inbox which turns bold on new posts. You only
> need to subscribe to post (spam control). Most people turn off mail
> delivery pretty quickly I imagine (or setup an auto rule to move into a
> folder, but then you might as well subscribe to RSS I guess).
>
> S.O. is
> quite strict: must be clear questions with a clear answer, only one of
> which can be accepted. No opinion, voting, discussing or notices (enter
> mailing lists). Chat room is good but for quick chat when people are in
> the room at the same time. Many companies (sensibly) block chat access,
> though. Mailing lists allows all timezones a chance at a slower pace.
> Anonymity is just as acceptable and as easy in both places.
>
> Matthew
>
>
> On 26.04.2013 21:34, Frank Erickson wrote:
>
> > I disagree with the
> criticism of data.table's complexity (in the OP). There's nothing wrong
> with overloading the syntax (that is what CS people call it, right?). As
> long as Matthew's in control of it, it's likely to have some internal
> consistency (which, of course, he could explain). However, I like the
> suggestion to add options (defaulting to something globally adjustable)
> to disable some of the overloading. Along similar lines (I think), I
> find unique.data.table very unintuitive. I can see how it could be
> useful, but strongly prefer base::unique for my current applications.
> >
> Anyway, I have nothing particular to say about the piece of syntax you
> all are currently discussing. I just registered with this list to chime
> in here, instead of further cluttering SO (where eddi answered one of my
> questions yesterday). These emails sure are wide; must be like 1500px!
> Interesting to try out this ancient mailing-list form of communication.
> Please let me know if I should be using "Reply All" or actually quoting
> that massive thread (as everyone else seems to be doing with each post).
>
> > Frank
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130427/260f3119/attachment-0001.html
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 26 Apr 2013 16:42:04 -0700
> From: Victor Kryukov <victor.kryukov at gmail.com>
> To: Matthew Dowle <mdowle at mdowle.plus.com>
> Cc: datatable-help at lists.r-forge.r-project.org, sds at gnu.org
> Subject: Re: [datatable-help] variable column names
> Message-ID:
>         <CANJmMqTz5+6djLEwpZxsub6LB=3L37=JB3xt5AhG1XgWG=
nJgw at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Fri, Apr 26, 2013 at 3:47 PM, Matthew Dowle <mdowle at mdowle.plus.com>
wrote:
> > On 26.04.2013 23:02, Sam Steingold wrote:
> >>>
http://stackoverflow.com/questions/16241687/summarize-a-data-table-across-multiple-columns
> >>
> >> downvoted, unlikely to be answered.
> >
> > I've read it through.
> >
> > Perhaps sleep on it, don't look for 24hrs and look again as if you were
> > trying to answer it yourself. Are there any small changes you can make
to
> > make it easier to answer?  It wasn't me that downvoted but I suspect
it's
> > been done to encourage you to improve the question. Downvotes can (and
often
> > are) reversed.  I've had many more downvotes than you once, but then I
> > improved it and it went to +10.
> >
> > And, it's Friday and we've all had a long week!
>
> Beautiful advice, Matthew!
>
> Sam - I've provided my answer (and even used Reduce since you seem to
> be coming from Lisp land), but I also think some of the down
> votes/comments have their merit.
>
>
> ------------------------------
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> End of datatable-help Digest, Vol 38, Issue 26
> **********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130427/bc0c3fb0/attachment-0001.html>


More information about the datatable-help mailing list