[datatable-help] Return Select/Join that does NOT match?

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Wed Jul 28 20:21:47 CEST 2010


Looks like a bug. Please could you paste into a bug report in the tracker
on R-forge; saves me a bit of time. Thanks.

> Finally, I can also use "by", but the FAQ says "i expression" is faster

Which FAQ number please?  I'm thinking "by" is the right way if you want
all levels in A. One use of i with mult="all" is when you only want a few
levels, like a fast sql 'having'. Is that the FAQ you mean?

Actually, thinking about it, if you want just the first row for each
group, as in your example, then yes joining using i will be a lot faster
than using j=B[1] with 'by'.

If that is the use-case then note that unique(A) is probably a vector scan
(would need to check latest R). levels(A) should be a lot faster,
depending on unused levels and how much data there is per level.

Matthew


> Thank you very much for a quick answer to [1] and [2], Matthew! Here is
an example for [3]
>
> using:
>
> R-2.11.0 x32
> data.table1.4
>
>> DT = data.table(A = c("o", "x"), B = 1:10, key = "A")
>
>       A  B
>  [1,] o  1
>  [2,] o  3
>  [3,] o  5
>  [4,] o  7
>  [5,] o  9
>  [6,] x  2
>  [7,] x  4
>  [8,] x  6
>  [9,] x  8
> [10,] x 10
>
>> DT[CJ(unique(A))]
> Error hint: the i expression sees the column variables. Column names
(variables) will mask variables in the calling frame. Check for any
conflicts.
> Error in `[.data.table`(DT, CJ(unique(A))) :
> Error in unique(A) : object 'A' not found
>
>> DT[  {  CJ(unique(A))  }  ]
>      A B
> [1,] o 1
> [2,] x 2
>
> Interestingly, I can replace CJ() with data.table(), then I don't need
{}
>
>> DT[data.table(unique(A))]
>      A B
> [1,] o 1
> [2,] x 2
>
> *** NOTE that I don't use {} like the example using CJ()
>
> Finally, I can also use "by", but the FAQ says "i expression" is faster
>
>  > DT[, B[1], by = A]
>      A B
> [1,] o 1
> [2,] x 2
>
>
>> [3] Might be a bug there. Please can you provide a small reproducible
example and version information. I'm assuming by key1, key2 you really
mean col1, col2. Thanks, Matthew
>
>>> [3] I thought I can do this:
>>> DataTable[ CJ( FN(key1), FN(key2), FN(key3) ) ], but it complains
about column names.
>>> *FN is a function
>>> Later I found I can do this, DataTable[ { CJ( FN(key1), FN(key2),
FN(key3) ) } ],
>>> I just add { } outside CJ
>>> Don't understand why, but at least it works. I really wonder whether I
should do this or there is a more correct syntax?
>








More information about the datatable-help mailing list