[datatable-help] Update table from other table

Arunkumar Srinivasan aragorn168b at gmail.com
Fri Sep 12 17:14:26 CEST 2014


I think you mean:

dta[dtb, b:=b, by=.EACHI]
and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.

by=.EACHI does exactly what it means, really. It evaluates j for each i match. Let’s first see the matches:

dta[dtb, which=TRUE]
# [1] 1 1 3
So, first row of dtb matches with first of dta. The second of dtb matches with 1st of dta and so on.

When you add by=.EACHI, as shown on the top, j-expression is evaluated on each of these matches. So, it’ll be evaluated 3-times here. On the other hand, without it, j is evaluated once. In this case, it doesn’t make a difference either way. So you should avoid by=.EACHI, as it’ll be slower with it.

It’s particularly useful when you’d like to perform operations in j, that depends on the values in j on that group. For example, consider these data.tables dt1 and dt2:

dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")
And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI feature.. you’d approach the problem like this:

dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
With by=.EACHI, this is simply:

dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
Here, your expression is evaluated on each i.

Another interesting use case is, say, you’d like to create a lagged vector of y:

dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
It’s that simple.. really. Basically, as long as the operation you’re performing in j affects it depending on whether j is executed for that group or as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI has no effect, and therefore you’re wanting to use a normal join there..

This is not a text book definition, rather my understanding of this awesome feature!

Hope this helps.


Arun

From: Juan Manuel Truppia <jmtruppia at gmail.com>
Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
Date: September 11, 2014 at 10:16:41 PM
To: datatable-help at lists.r-forge.r-project.org <datatable-help at lists.r-forge.r-project.org>>
Subject:  [datatable-help] Update table from other table  

What is the best data.table way of doing something similar to UPDATE FROM in SQL?

I used to do something like

dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
dta[dtb, b := b]

However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I can't determine when.

So, just to be sure, I do 

dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]

Is the .EACHI and the nomatch necessary?

In this case, I want the row with idx 1 and 3 (the matching ones) to end with a b value from the matching b column in dtb, and the row with idx 2 (the one that isn't in dtb) to end up with NA in column b.


_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140912/59e01ad3/attachment.html>


More information about the datatable-help mailing list