[datatable-help] Update table from other table

Juan Manuel Truppia jmtruppia at gmail.com
Mon Sep 15 19:52:06 CEST 2014


Arun, sometimes it helps to have nomatch = 0 when using :=

dta = data.table(idx = c(1,1), key = "idx")
dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")

This fails because of cartesian join not allowed by default

dta[dtb, val := i.val]

but this doesn't

dta[dtb, val := i.val, nomatch = 0]

This is the same as doing

dta[dtb, val := i.val, allow.cartesian = TRUE]

On Fri, Sep 12, 2014 at 8:08 PM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

> Glad it helped.
> Always welcome "pull requests" :).
>
> Arun
>
> From: Juan Manuel Truppia <jmtruppia at gmail.com> <jmtruppia at gmail.com>
> Reply: Juan Manuel Truppia <jmtruppia at gmail.com>> <jmtruppia at gmail.com>
> Date: September 12, 2014 at 5:46:59 PM
> To: Arunkumar Srinivasan <aragorn168b at gmail.com>> <aragorn168b at gmail.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> <datatable-help at lists.r-forge.r-project.org>>
> <datatable-help at lists.r-forge.r-project.org>
> Subject:  Re: [datatable-help] Update table from other table
>
>  Great! (sorry, .EACHI = TRUE was an old definition).
> It's good to know also that nomatch = 0 is irrelevant when using :=, I
> always used is to avoid the rows in dtb creeping in dtb as NAs.
> Also, it's really useful to know that by = EACHI should be used when the
> calculations you are perfoming depend on the group or not. This came in
> really in handy yesterday, and should be emphasized in .EACHI description.
> Should I perform a pull request?
>
> On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>>  I think you mean:
>>
>> dta[dtb, b:=b, by=.EACHI]
>>
>> and not .EACHI = TRUE. Not sure what's the use of nomatch=0L along with
>> :=.
>>
>> by=.EACHI does exactly what it means, really. It evaluates j for each i
>> match. Let's first see the matches:
>>
>> dta[dtb, which=TRUE]
>> # [1] 1 1 3
>>
>> So, first row of dtb matches with first of dta. The second of dtb
>> matches with 1st of dta and so on.
>>
>> When you add by=.EACHI, as shown on the top, j-expression is evaluated
>> on each of these matches. So, it'll be evaluated 3-times here. On the other
>> hand, without it, j is evaluated once. In this case, it doesn't make a
>> difference either way. So you should avoid by=.EACHI, as it'll be slower
>> with it.
>>
>> It's particularly useful when you'd like to perform operations in j,
>> that depends on the values in j on *that* group. For example, consider
>> these data.tables dt1 and dt2:
>>
>> dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
>> dt2 = data.table(x=3:5, z=10, key="x")
>>
>> And, you'd like to get sum(y)*z while joining.. If not for the by=.EACHI
>> feature.. you'd approach the problem like this:
>>
>> dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
>>
>> With by=.EACHI, this is simply:
>>
>> dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
>>
>> Here, your expression is evaluated on each i.
>>
>> Another interesting use case is, say, you'd like to create a lagged
>> vector of y:
>>
>> dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
>>
>> It's that simple.. really. Basically, as long as the operation you're
>> performing in j affects it depending on whether j is executed for that
>> group or as a whole, then you're most likely looking for by=.EACHI. If
>> not, by=.EACHI has no effect, and therefore you're wanting to use a normal
>> join there..
>>
>> This is not a text book definition, rather my understanding of this
>> awesome feature!
>>
>> Hope this helps.
>>
>>  Arun
>>
>> From: Juan Manuel Truppia <jmtruppia at gmail.com> <jmtruppia at gmail.com>
>> Reply: Juan Manuel Truppia <jmtruppia at gmail.com>> <jmtruppia at gmail.com>
>> Date: September 11, 2014 at 10:16:41 PM
>> To: datatable-help at lists.r-forge.r-project.org
>> <datatable-help at lists.r-forge.r-project.org>>
>> <datatable-help at lists.r-forge.r-project.org>
>> Subject:  [datatable-help] Update table from other table
>>
>>    What is the best data.table way of doing something similar to UPDATE
>> FROM in SQL?
>>
>> I used to do something like
>>
>> dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
>> dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
>> dta[dtb, b := b]
>>
>> However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but
>> I can't determine when.
>>
>> So, just to be sure, I do
>>
>> dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
>>
>> Is the .EACHI and the nomatch necessary?
>>
>> In this case, I want the row with idx 1 and 3 (the matching ones) to end
>> with a b value from the matching b column in dtb, and the row with idx 2
>> (the one that isn't in dtb) to end up with NA in column b.
>>
>>
>>   _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140915/ed52b074/attachment.html>


More information about the datatable-help mailing list