<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Steve and Matthew,<div><br></div><div>Very helpful solutions indeed! Thanks a lot.</div><div><br></div><div>I played around with all your valuable suggestions a little.</div><div>To me it seems that, the simplest one step solution that would handle ties the way I had hoped for is:</div><div><br></div><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 11px/normal Monaco; ">DT<span style="color: #052499">[,</span><span style="color: #b0140f"> </span>cmx<span style="color: #b0140f"> </span><span style="color: #052499">:=</span><span style="color: #b0140f"> </span><span style="color: #052499">rank(</span>p<span style="color: #052499">,</span>ties.method<span style="color: #052499">=</span><span style="color: #b0140f">"max"</span><span style="color: #052499">),</span><span style="color: #b0140f"> </span>by<span style="color: #052499">=</span>grp<span style="color: #052499">]</span></div></div><div><span style="color: #4f4f4f"><br></span></div><div>--Philip</div><div><br><div><div>On Nov 19, 2012, at 11:54 PM, Matthew Dowle wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>On 18.11.2012 20:03, Steve Lianoglou wrote:<br><blockquote type="cite">Hi,<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On Sun, Nov 18, 2012 at 11:19 AM, Philip de Witt Hamer<br></blockquote><blockquote type="cite"><<a href="mailto:pcvdwh@gmail.com">pcvdwh@gmail.com</a>> wrote:<br></blockquote><blockquote type="cite"><blockquote type="cite">Dear all,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">data.table is great! thanks for this life(time)saving package.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Now, I run into a difficult nut to crack using ':='.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I'd like to do a calculation using column information conditional on another<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">column<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">first some jumbo data:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">library(data.table)<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">DT <- data.table(<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"> 1:50,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"> rep(1:5,each=10),<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"> runif(50,0,1)<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">)<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">setnames(DT, 1:3, c("id","grp","p"))<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">id's are unique<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">grp's speaks for itself<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">think of p's as e.g. p-values<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">next, if I want to obtain the nr of p values at least as extreme as the p of<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">each row from the whole set, this seems to work well:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">DT[,c1 := sum(DT[,p] <= p), by=id]<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">but then, I would like to get the nr of p values at least as extreme as the<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">p of each row for the subset with identical grp, I am having a hard time,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">because these attempts fail:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">DT[,c2 := sum(DT[grp,p] <= p),by=id]<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">DT[,c3 := sum(DT[DT[,grp]==grp,p] <= p), by=id]<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">You will want to group by "grp".<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">This gets you pretty close -- it fails the "ties" criterion:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">DT[, cg := rank(p) - 1, by=grp]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">If you *really* want to keep the ties criterion, perhaps here's a way<br></blockquote><blockquote type="cite">to do so by avoiding a for loop:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">DT[, cgo := rowSums(outer(p, p, '-') > 0), by=grp]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The problem is that if your groups are very large, the `outer` call<br></blockquote><blockquote type="cite">might chew lots of RAM, since you'll be creating a p x p matrix (per<br></blockquote><blockquote type="cite">group).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Does that get you where you need to be?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">-steve<br></blockquote><br><br>Grouping by grp feels right to me, too. How about :<br><br> setkey(DT,grp,p)<br><br>and then using the ordered p within each group :<br><br> DT[,c1:=seq_len(.N),by=grp]<br> DT[,c1:=max(c1),by='grp,p'] # to deal with ties<br><br>NB: data.table grouping of numerics is machine tolerance aware. So<br>this ties treatment is more like sum(DT[,p] <= p+tol) which may or<br>may not be what you need. tol = .Machine$double.eps ^ 0.5.<br><br>Or, staying with the self join approach, one trick for the scoping<br>issue you hit is :<br><br> DT[,c3:={i=list(grp);sum(DT[i,p]<=p)},by=id]<br><br>Where the DT[i,...] part relies on the fact that single name i is evaluated<br>in calling scope.<br><br>Or another way in one step is :<br><br> DT[,c3:=sum(DT[eval(.(grp)),p]<=p),by=id]<br><br>which uses the feature that eval() is already like what ..() will do in future.<br><br>But grouping by grp should be much faster and cleaner, if possible.<br><br>Matthew<br><br><br><br></div></blockquote></div><br></div></body></html>