<div dir="ltr">Great! (sorry, .EACHI = TRUE was an old definition).<div>It's good to know also that nomatch = 0 is irrelevant when using :=, I always used is to avoid the rows in dtb creeping in dtb as NAs.</div><div>Also, it's really useful to know that by = EACHI should be used when the calculations you are perfoming depend on the group or not. This came in really in handy yesterday, and should be emphasized in .EACHI description. Should I perform a pull request?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><p>I think you mean:</p>
<pre><code>dta[dtb, b:=b, by=.EACHI]
</code></pre>
<p>and not <code>.EACHI = TRUE</code>. Not sure what’s the use of <code>nomatch=0L</code> along with <code>:=</code>.</p>
<p><code>by=.EACHI</code> does exactly what it means, really. It evaluates <code>j</code> for each <code>i</code> match. Let’s first see the matches:</p>
<pre><code>dta[dtb, which=TRUE]
# [1] 1 1 3
</code></pre>
<p>So, first row of <code>dtb</code> matches with first of <code>dta</code>. The second of <code>dtb</code> matches with 1st of <code>dta</code> and so on.</p>
<p>When you add <code>by=.EACHI</code>, as shown on the top, <code>j-expression</code> is evaluated on each of these matches. So, it’ll be evaluated 3-times here. On the other hand, without it, <code>j</code> is evaluated once. In this case, it doesn’t make a difference either way. So you should avoid <code>by=.EACHI</code>, as it’ll be slower with it.</p>
<p>It’s particularly useful when you’d like to perform operations in <code>j</code>, that depends on the values in <code>j</code> on <em>that</em> group. For example, consider these data.tables <code>dt1</code> and <code>dt2</code>:</p>
<pre><code>dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")
</code></pre>
<p>And, you’d like to get <code>sum(y)*z</code> while joining.. If not for the <code>by=.EACHI</code> feature.. you’d approach the problem like this:</p>
<pre><code>dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
</code></pre>
<p>With <code>by=.EACHI</code>, this is simply:</p>
<pre><code>dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
</code></pre>
<p>Here, your expression is evaluated on each <code>i</code>.</p>
<p>Another interesting use case is, say, you’d like to create a lagged vector of <code>y</code>:</p>
<pre><code>dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
</code></pre>
<p>It’s that simple.. really. Basically, as long as the operation you’re performing in <code>j</code> affects it depending on whether j is executed for that group or as a whole, then you’re most likely looking for <code>by=.EACHI</code>. If not, <code>by=.EACHI</code> has no effect, and therefore you’re wanting to use a <code>normal join</code> there..</p>
<p>This is not a text book definition, rather my understanding of this awesome feature!</p>
<p>Hope this helps.</p>
<p></p></div><div><div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div> <div><div style="font-family:helvetica,arial;font-size:13px">Arun</div></div> <div style="color:black"><br>From: <span style="color:black">Juan Manuel Truppia</span> <a href="mailto:jmtruppia@gmail.com" target="_blank"><jmtruppia@gmail.com></a><br>Reply: <span style="color:black">Juan Manuel Truppia</span> <a href="mailto:jmtruppia@gmail.com" target="_blank"><jmtruppia@gmail.com>></a><br>Date: <span style="color:black">September 11, 2014 at 10:16:41 PM</span><br>To: <span style="color:black"><a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a></span> <a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank"><datatable-help@lists.r-forge.r-project.org>></a><br>Subject: <span style="color:black"> [datatable-help] Update table from other table <br></span></div><br> <blockquote type="cite"><span><div><div></div><div><div><div class="h5">
<div dir="ltr">What is the best data.table way of doing something
similar to UPDATE FROM in SQL?
<div><br>
<div>I used to do something like</div>
<div><br></div>
<div>dta = data.table(idx = c(1, 2, 3), a = runif(3), key =
"idx")</div>
<div>dtb = data.table(idx = c(1, 3), b = runif(3), key =
"idx")</div>
<div>dta[dtb, b := b]</div>
<div><br></div>
<div>However, after the 1.9.3 and the explicit .EACHI, it fails
sometimes, but I can't determine when.</div>
<div><br></div>
<div>So, just to be sure, I do </div>
<div><br></div>
<div>dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]<br></div>
</div>
<div><br></div>
<div>Is the .EACHI and the nomatch necessary?</div>
<div><br></div>
<div>In this case, I want the row with idx 1 and 3 (the matching
ones) to end with a b value from the matching b column in dtb, and
the row with idx 2 (the one that isn't in dtb) to end up with NA in
column b.</div>
<div><br></div>
<div><br></div>
</div></div></div>
_______________________________________________
<br>datatable-help mailing list
<br><a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<br><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div></div></span></blockquote></div><div><p></p></div></div></blockquote></div><br></div>