[datatable-help] Update table from other table
Michael Smith
my.r.help at gmail.com
Tue Sep 16 05:51:43 CEST 2014
That's interesting.
Internally, which join is more efficient? The first one (using
nomatch=0) I suppose?
M
On 09/16/2014 01:52 AM, Juan Manuel Truppia wrote:
> Arun, sometimes it helps to have nomatch = 0 when using :=
>
> dta = data.table(idx = c(1,1), key = "idx")
> dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")
>
> This fails because of cartesian join not allowed by default
>
> dta[dtb, val := i.val]
>
> but this doesn't
>
> dta[dtb, val := i.val, nomatch = 0]
>
> This is the same as doing
>
> dta[dtb, val := i.val, allow.cartesian = TRUE]
>
> On Fri, Sep 12, 2014 at 8:08 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com <mailto:aragorn168b at gmail.com>> wrote:
>
> Glad it helped.
> Always welcome "pull requests" :).
>
> Arun
>
> From: Juan Manuel Truppia <jmtruppia at gmail.com>
> <mailto:jmtruppia at gmail.com>
> Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
> <mailto:jmtruppia at gmail.com>
> Date: September 12, 2014 at 5:46:59 PM
> To: Arunkumar Srinivasan <aragorn168b at gmail.com>>
> <mailto:aragorn168b at gmail.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> <mailto:datatable-help at lists.r-forge.r-project.org>
> <datatable-help at lists.r-forge.r-project.org>>
> <mailto:datatable-help at lists.r-forge.r-project.org>
> Subject: Re: [datatable-help] Update table from other table
>
>> Great! (sorry, .EACHI = TRUE was an old definition).
>> It's good to know also that nomatch = 0 is irrelevant when using
>> :=, I always used is to avoid the rows in dtb creeping in dtb as NAs.
>> Also, it's really useful to know that by = EACHI should be used
>> when the calculations you are perfoming depend on the group or
>> not. This came in really in handy yesterday, and should be
>> emphasized in .EACHI description. Should I perform a pull request?
>>
>> On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan
>> <aragorn168b at gmail.com <mailto:aragorn168b at gmail.com>> wrote:
>>
>> I think you mean:
>>
>> |dta[dtb, b:=b, by=.EACHI]
>> |
>>
>> and not |.EACHI = TRUE|. Not sure what’s the use of
>> |nomatch=0L| along with |:=|.
>>
>> |by=.EACHI| does exactly what it means, really. It evaluates
>> |j| for each |i| match. Let’s first see the matches:
>>
>> |dta[dtb, which=TRUE]
>> # [1] 1 1 3
>> |
>>
>> So, first row of |dtb| matches with first of |dta|. The second
>> of |dtb| matches with 1st of |dta| and so on.
>>
>> When you add |by=.EACHI|, as shown on the top, |j-expression|
>> is evaluated on each of these matches. So, it’ll be evaluated
>> 3-times here. On the other hand, without it, |j| is evaluated
>> once. In this case, it doesn’t make a difference either way.
>> So you should avoid |by=.EACHI|, as it’ll be slower with it.
>>
>> It’s particularly useful when you’d like to perform operations
>> in |j|, that depends on the values in |j| on /that/ group. For
>> example, consider these data.tables |dt1| and |dt2|:
>>
>> |dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
>> dt2 = data.table(x=3:5, z=10, key="x")
>> |
>>
>> And, you’d like to get |sum(y)*z| while joining.. If not for
>> the |by=.EACHI| feature.. you’d approach the problem like this:
>>
>> |dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
>> |
>>
>> With |by=.EACHI|, this is simply:
>>
>> |dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
>> |
>>
>> Here, your expression is evaluated on each |i|.
>>
>> Another interesting use case is, say, you’d like to create a
>> lagged vector of |y|:
>>
>> |dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
>> |
>>
>> It’s that simple.. really. Basically, as long as the operation
>> you’re performing in |j| affects it depending on whether j is
>> executed for that group or as a whole, then you’re most likely
>> looking for |by=.EACHI|. If not, |by=.EACHI| has no effect,
>> and therefore you’re wanting to use a |normal join| there..
>>
>> This is not a text book definition, rather my understanding of
>> this awesome feature!
>>
>> Hope this helps.
>>
>>
>> Arun
>>
>> From: Juan Manuel Truppia <jmtruppia at gmail.com>
>> <mailto:jmtruppia at gmail.com>
>> Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
>> <mailto:jmtruppia at gmail.com>
>> Date: September 11, 2014 at 10:16:41 PM
>> To: datatable-help at lists.r-forge.r-project.org
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> <datatable-help at lists.r-forge.r-project.org>>
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> Subject: [datatable-help] Update table from other table
>>
>>> What is the best data.table way of doing something similar to
>>> UPDATE FROM in SQL?
>>>
>>> I used to do something like
>>>
>>> dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
>>> dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
>>> dta[dtb, b := b]
>>>
>>> However, after the 1.9.3 and the explicit .EACHI, it fails
>>> sometimes, but I can't determine when.
>>>
>>> So, just to be sure, I do
>>>
>>> dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
>>>
>>> Is the .EACHI and the nomatch necessary?
>>>
>>> In this case, I want the row with idx 1 and 3 (the matching
>>> ones) to end with a b value from the matching b column in
>>> dtb, and the row with idx 2 (the one that isn't in dtb) to
>>> end up with NA in column b.
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> <mailto:datatable-help at lists.r-forge.r-project.org>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
More information about the datatable-help
mailing list