[datatable-help] Update table from other table

Michael Smith my.r.help at gmail.com
Tue Sep 16 05:51:43 CEST 2014


That's interesting.

Internally, which join is more efficient? The first one (using
nomatch=0) I suppose?

M


On 09/16/2014 01:52 AM, Juan Manuel Truppia wrote:
> Arun, sometimes it helps to have nomatch = 0 when using :=
> 
> dta = data.table(idx = c(1,1), key = "idx")
> dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")
> 
> This fails because of cartesian join not allowed by default
> 
> dta[dtb, val := i.val]
> 
> but this doesn't
> 
> dta[dtb, val := i.val, nomatch = 0]
> 
> This is the same as doing
> 
> dta[dtb, val := i.val, allow.cartesian = TRUE]
> 
> On Fri, Sep 12, 2014 at 8:08 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com <mailto:aragorn168b at gmail.com>> wrote:
> 
>     Glad it helped.
>     Always welcome "pull requests" :).
> 
>     Arun
> 
>     From: Juan Manuel Truppia <jmtruppia at gmail.com>
>     <mailto:jmtruppia at gmail.com>
>     Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
>     <mailto:jmtruppia at gmail.com>
>     Date: September 12, 2014 at 5:46:59 PM
>     To: Arunkumar Srinivasan <aragorn168b at gmail.com>>
>     <mailto:aragorn168b at gmail.com>
>     Cc: datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     <datatable-help at lists.r-forge.r-project.org>>
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     Subject: Re: [datatable-help] Update table from other table
> 
>>     Great! (sorry, .EACHI = TRUE was an old definition).
>>     It's good to know also that nomatch = 0 is irrelevant when using
>>     :=, I always used is to avoid the rows in dtb creeping in dtb as NAs.
>>     Also, it's really useful to know that by = EACHI should be used
>>     when the calculations you are perfoming depend on the group or
>>     not. This came in really in handy yesterday, and should be
>>     emphasized in .EACHI description. Should I perform a pull request?
>>
>>     On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan
>>     <aragorn168b at gmail.com <mailto:aragorn168b at gmail.com>> wrote:
>>
>>         I think you mean:
>>
>>         |dta[dtb, b:=b, by=.EACHI]
>>         |
>>
>>         and not |.EACHI = TRUE|. Not sure what’s the use of
>>         |nomatch=0L| along with |:=|.
>>
>>         |by=.EACHI| does exactly what it means, really. It evaluates
>>         |j| for each |i| match. Let’s first see the matches:
>>
>>         |dta[dtb, which=TRUE]
>>         # [1] 1 1 3
>>         |
>>
>>         So, first row of |dtb| matches with first of |dta|. The second
>>         of |dtb| matches with 1st of |dta| and so on.
>>
>>         When you add |by=.EACHI|, as shown on the top, |j-expression|
>>         is evaluated on each of these matches. So, it’ll be evaluated
>>         3-times here. On the other hand, without it, |j| is evaluated
>>         once. In this case, it doesn’t make a difference either way.
>>         So you should avoid |by=.EACHI|, as it’ll be slower with it.
>>
>>         It’s particularly useful when you’d like to perform operations
>>         in |j|, that depends on the values in |j| on /that/ group. For
>>         example, consider these data.tables |dt1| and |dt2|:
>>
>>         |dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
>>         dt2 = data.table(x=3:5, z=10, key="x")
>>         |
>>
>>         And, you’d like to get |sum(y)*z| while joining.. If not for
>>         the |by=.EACHI| feature.. you’d approach the problem like this:
>>
>>         |dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
>>         |
>>
>>         With |by=.EACHI|, this is simply:
>>
>>         |dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
>>         |
>>
>>         Here, your expression is evaluated on each |i|.
>>
>>         Another interesting use case is, say, you’d like to create a
>>         lagged vector of |y|:
>>
>>         |dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
>>         |
>>
>>         It’s that simple.. really. Basically, as long as the operation
>>         you’re performing in |j| affects it depending on whether j is
>>         executed for that group or as a whole, then you’re most likely
>>         looking for |by=.EACHI|. If not, |by=.EACHI| has no effect,
>>         and therefore you’re wanting to use a |normal join| there..
>>
>>         This is not a text book definition, rather my understanding of
>>         this awesome feature!
>>
>>         Hope this helps.
>>
>>
>>         Arun
>>
>>         From: Juan Manuel Truppia <jmtruppia at gmail.com>
>>         <mailto:jmtruppia at gmail.com>
>>         Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
>>         <mailto:jmtruppia at gmail.com>
>>         Date: September 11, 2014 at 10:16:41 PM
>>         To: datatable-help at lists.r-forge.r-project.org
>>         <mailto:datatable-help at lists.r-forge.r-project.org>
>>         <datatable-help at lists.r-forge.r-project.org>>
>>         <mailto:datatable-help at lists.r-forge.r-project.org>
>>         Subject:  [datatable-help] Update table from other table
>>
>>>         What is the best data.table way of doing something similar to
>>>         UPDATE FROM in SQL?
>>>
>>>         I used to do something like
>>>
>>>         dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
>>>         dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
>>>         dta[dtb, b := b]
>>>
>>>         However, after the 1.9.3 and the explicit .EACHI, it fails
>>>         sometimes, but I can't determine when.
>>>
>>>         So, just to be sure, I do 
>>>
>>>         dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
>>>
>>>         Is the .EACHI and the nomatch necessary?
>>>
>>>         In this case, I want the row with idx 1 and 3 (the matching
>>>         ones) to end with a b value from the matching b column in
>>>         dtb, and the row with idx 2 (the one that isn't in dtb) to
>>>         end up with NA in column b.
>>>
>>>
>>>         _______________________________________________
>>>         datatable-help mailing list
>>>         datatable-help at lists.r-forge.r-project.org
>>>         <mailto:datatable-help at lists.r-forge.r-project.org>
>>>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
> 
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 


More information about the datatable-help mailing list