[datatable-help] Problem with FAQ 2.8

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jun 7 06:34:06 CEST 2013


On Thu, Jun 6, 2013 at 11:50 PM, Michael Nelson
<michael.nelson at sydney.edu.au> wrote:
> This is related to
>
> FR 2693
>
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978
>
> What is happening is that the `join` columns must be referenced using their names as defined in `i` (or Y in X[Y] syntax)
>
> The FAQ doesn't explicitly  cover how you are supposed to reference the columns used in the join.
>
> Perhaps some binding magic could be used to ensure that either column name could be used. I don't think it is useful want both to be defined and available as separate objects - -that would mean there were two copies of something that are identical in value (but not name!)
>

Note that the FAQ says that the X variables are "in scope".and the
ordinary meaning of being in scope is that such a variable can be
referenced in an unqualified manner so I think it does imply that
these variables can be accessed.

I assume from your response that the answer to my question is that the
FAQ is wrong and the behavior is as intended.

If that is the case then it would be desirable that the behavior of
the software be changed to make the FAQ correct.  Having tiny little
exceptions like this that are difficult to remember and error prone
just makes the software harder to use.   Another possibility would be
to outlaw having keys in X and Y which have different names (although
that would be drastic and inconvenient though safer and easier to
learn then the current situation).

For example, continuing the code in my post here is a second example
consider what would happen if this were to occur:

> id1 <- 1
> d1[d2, sum(id1 * val)]
   id1 V1
1:   1  1
2:   2  5
3:   4 NA

It would be difficult to realize without close examination that there
is an error in this code (assuming that the writer intended id1 to be
taken from d1).  Here d1$id1 is not in scope (contrary to the FAQ) and
so id1 in the caller is used resulting in wrong output (relative to
the result intended).

Here is another oddity. It seems that in the first case we cannot
access id1 but if we do a join and then access the columns in a
separate [] then we can.

if (exists("id1")) rm(id1)
> d1[d2, id1]
Error in `[.data.table`(d1, d2, id1) : object 'id1' not found
> d1[d2][, id1]
[1] 1 2 2 4

Note that in R's merge one can refer to both keys and in SQL when one
does a join one can as well so the behavior we have been discussing
here seems entirely unexpected.


More information about the datatable-help mailing list