[datatable-help] Update table from other table
Arunkumar Srinivasan
aragorn168b at gmail.com
Fri Sep 12 17:14:26 CEST 2014
I think you mean:
dta[dtb, b:=b, by=.EACHI]
and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.
by=.EACHI does exactly what it means, really. It evaluates j for each i match. Let’s first see the matches:
dta[dtb, which=TRUE]
# [1] 1 1 3
So, first row of dtb matches with first of dta. The second of dtb matches with 1st of dta and so on.
When you add by=.EACHI, as shown on the top, j-expression is evaluated on each of these matches. So, it’ll be evaluated 3-times here. On the other hand, without it, j is evaluated once. In this case, it doesn’t make a difference either way. So you should avoid by=.EACHI, as it’ll be slower with it.
It’s particularly useful when you’d like to perform operations in j, that depends on the values in j on that group. For example, consider these data.tables dt1 and dt2:
dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")
And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI feature.. you’d approach the problem like this:
dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
With by=.EACHI, this is simply:
dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
Here, your expression is evaluated on each i.
Another interesting use case is, say, you’d like to create a lagged vector of y:
dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
It’s that simple.. really. Basically, as long as the operation you’re performing in j affects it depending on whether j is executed for that group or as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI has no effect, and therefore you’re wanting to use a normal join there..
This is not a text book definition, rather my understanding of this awesome feature!
Hope this helps.
Arun
From: Juan Manuel Truppia <jmtruppia at gmail.com>
Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
Date: September 11, 2014 at 10:16:41 PM
To: datatable-help at lists.r-forge.r-project.org <datatable-help at lists.r-forge.r-project.org>>
Subject: [datatable-help] Update table from other table
What is the best data.table way of doing something similar to UPDATE FROM in SQL?
I used to do something like
dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
dta[dtb, b := b]
However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I can't determine when.
So, just to be sure, I do
dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
Is the .EACHI and the nomatch necessary?
In this case, I want the row with idx 1 and 3 (the matching ones) to end with a b value from the matching b column in dtb, and the row with idx 2 (the one that isn't in dtb) to end up with NA in column b.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140912/59e01ad3/attachment.html>
More information about the datatable-help
mailing list