[datatable-help] Update table from other table

Juan Manuel Truppia jmtruppia at gmail.com
Fri Sep 12 17:46:39 CEST 2014


Great! (sorry, .EACHI = TRUE was an old definition).
It's good to know also that nomatch = 0 is irrelevant when using :=, I
always used is to avoid the rows in dtb creeping in dtb as NAs.
Also, it's really useful to know that by = EACHI should be used when the
calculations you are perfoming depend on the group or not. This came in
really in handy yesterday, and should be emphasized in .EACHI description.
Should I perform a pull request?

On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan <
aragorn168b at gmail.com> wrote:

> I think you mean:
>
> dta[dtb, b:=b, by=.EACHI]
>
> and not .EACHI = TRUE. Not sure what's the use of nomatch=0L along with :=
> .
>
> by=.EACHI does exactly what it means, really. It evaluates j for each i
> match. Let's first see the matches:
>
> dta[dtb, which=TRUE]
> # [1] 1 1 3
>
> So, first row of dtb matches with first of dta. The second of dtb matches
> with 1st of dta and so on.
>
> When you add by=.EACHI, as shown on the top, j-expression is evaluated on
> each of these matches. So, it'll be evaluated 3-times here. On the other
> hand, without it, j is evaluated once. In this case, it doesn't make a
> difference either way. So you should avoid by=.EACHI, as it'll be slower
> with it.
>
> It's particularly useful when you'd like to perform operations in j, that
> depends on the values in j on *that* group. For example, consider these
> data.tables dt1 and dt2:
>
> dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
> dt2 = data.table(x=3:5, z=10, key="x")
>
> And, you'd like to get sum(y)*z while joining.. If not for the by=.EACHI
> feature.. you'd approach the problem like this:
>
> dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
>
> With by=.EACHI, this is simply:
>
> dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
>
> Here, your expression is evaluated on each i.
>
> Another interesting use case is, say, you'd like to create a lagged vector
> of y:
>
> dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
>
> It's that simple.. really. Basically, as long as the operation you're
> performing in j affects it depending on whether j is executed for that
> group or as a whole, then you're most likely looking for by=.EACHI. If
> not, by=.EACHI has no effect, and therefore you're wanting to use a normal
> join there..
>
> This is not a text book definition, rather my understanding of this
> awesome feature!
>
> Hope this helps.
>
>
> Arun
>
> From: Juan Manuel Truppia <jmtruppia at gmail.com> <jmtruppia at gmail.com>
> Reply: Juan Manuel Truppia <jmtruppia at gmail.com>> <jmtruppia at gmail.com>
> Date: September 11, 2014 at 10:16:41 PM
> To: datatable-help at lists.r-forge.r-project.org
> <datatable-help at lists.r-forge.r-project.org>>
> <datatable-help at lists.r-forge.r-project.org>
> Subject:  [datatable-help] Update table from other table
>
>  What is the best data.table way of doing something similar to UPDATE
> FROM in SQL?
>
> I used to do something like
>
> dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
> dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
> dta[dtb, b := b]
>
> However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but
> I can't determine when.
>
> So, just to be sure, I do
>
> dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
>
> Is the .EACHI and the nomatch necessary?
>
> In this case, I want the row with idx 1 and 3 (the matching ones) to end
> with a b value from the matching b column in dtb, and the row with idx 2
> (the one that isn't in dtb) to end up with NA in column b.
>
>
>  _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140912/243c61c4/attachment.html>


More information about the datatable-help mailing list