[datatable-help] [J(channel) , ] [J(chan) , ] give different results when chan=channel

Matthew Dowle mdowle at mdowle.plus.com
Fri Nov 9 11:02:17 CET 2012


> Thank, Matthew,
>
> .() is a good idea, it is more intuitive than J() (why join?)

Great. J=Join to stem the confusion between row number lookup, and looking
up the numbers in the key (i.e. joining).

I've seen it said before that .() is intuitive but I've never understood
why really. "." is also used to hide variables in environments (i.e. not
something users should see; . = hidden?) or related to S3 methods (. =
special). What's intuitive about .() as a function for regular users to
use?  At least J() stands for something: join. What's .() stand for? I
only (now) find it intuitive in the context of .() and ..() being
analogous to the file system's ./ and ../   So if we didn't go ahead and
add ..(), I'm not sure I would like .() anymore ;)  Have I missed
something about .()?

All it is is some way to make a list from the integer vector so data.table
knows to join it rather than row number lookup. For character and factor,
it's unambiguous, and the J() (or .()) isn't needed.

> I
> am not so sure about the ..(). It's certainly intuitive, but thinking
> about
> how to explain that feature to students is a nightmare.

Yes but explaining _anything_ to students is a nightmare. Good luck!

> Probably a feature such as you find it in plyr::join would be good: It
> print
> a text ´"joining by: id' that reminds you that you better tell explicitly
> what "by" is. So in my case: since there is a ambiguity in "channel", a
> warning should be printed, with a recommendation to use ..() or .().

Great idea. Some extra verbosity in verbosity mode would be good. And then
it would make sense to have several levels of verbosity rather than the
current on/off.

>
> Eats performance, though.

I didn't know this. Does it search the stack each time to see if there is
an ambiguity, then? Presumably it has to search all the way up to confirm
there's no ambiguity.   I wouldn't propose to do that in data.table.  The
verbosity mode wouldn't detect that,  but just say what it is doing and if
the variable isn't in the data.table then to use ..() to make the code
robust.  In a similar way that if 'by' is provided equal to key(i) then it
suggests (now in 1.8.3) to remove 'by' and leave it to by-without-by.

Thanks for discussion. It's very useful to get feedback like this at an
early stage.

Matthew





More information about the datatable-help mailing list