Matthew,<br><br>I experimented with them and the one that worked best <br>for me was this:<div class="im"><br><br>setkey(DT,A,B)<br>
start = DT[J("A",2),which=TRUE,mult="<div>first"]<br>
end = DT["A",which=TRUE,mult="last"]<br>
DT[start:end, ...]</div><br></div>Thanks for the suggestions!<br><font color="#888888"><br>Steve</font><br><br><div class="gmail_quote">On Tue, Jul 26, 2011 at 1:35 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi,<br>
<br>
Good question. The vector scan B>=2 should be quite quick provided it<br>
follows the DT["a"]. It will be a vector scan, yes, but only over a<br>
small subset of DT$B, and that subset will be contiguous in memory. What<br>
might be biting instead is that the chained query DT["a"][B>=2] will<br>
subset all the columns of DT in the first []. That inefficiency could be<br>
dominating depending on how many columns DT has vs how many you really<br>
need to use. If that's the case you can speed it up a lot like this :<br>
DT["a",list(columns I know I need)][B>=2, expression using those<br>
columns]<br>
<br>
Or, on a different tack, to go as fast as possible (as requested),<br>
perhaps (untested) :<br>
<br>
setkey(DT,A,B)<br>
start = DT[J("A",2),which=TRUE,mult="first"]<br>
end = DT["A",which=TRUE,mult="last"]<br>
DT[start:end, ...]<br>
<br>
Or, getting fancy now in one less step (again, untested) :<br>
<br>
w = DT[J("A",c(2,Inf)),which=TRUE,roll=TRUE]<br>
DT[w[1]:w[2],...]<br>
<br>
but that only works if you know 2 exists in B, and that there are no<br>
duplicates of 2. Possibly check DT$B[w[1]]>=2 and +1 to w[1] if not.<br>
<br>
Much neater would be the FR to do range (i.e. between) queries :<br>
<br>
<a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=203&group_id=240&atid=978" target="_blank">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=203&group_id=240&atid=978</a><br>
<br>
So, if list() columns were easily created as per previous threads,<br>
it might be simply :<br>
<br>
setkey(DT,A,B)<br>
DT[J("a",V(low,upp)),...]<br>
<br>
where V() stands for vector, and would create a list() join column. Open and closed<br>
ends can be done via +/-1 to low and upp. One sided via setting low to -Inf or<br>
upp to +Inf. That idiom might allow some funky queries such as a different range for<br>
each row of i, efficiently both in terms of amount of code, and execution speed.<br>
<font color="#888888"><br>
Matthew<br>
</font><div><div></div><div class="h5"><br>
<br>
On Mon, 2011-07-25 at 20:46 -0700, Steve Harman wrote:<br>
> Hello All,<br>
><br>
> I have a data table, DT, and two columns, A and B. A has character<br>
> values and B has numeric values.<br>
> I need to find the rows matching "a" AND greater than or equal to 2.<br>
> After setkey(DT,A), I am using DT["a"][B>=2].<br>
><br>
> However, since this command needs to be repeated many times for many<br>
> different values,<br>
> I would like it to be as fast as possible.<br>
><br>
> If I had to test for equality for both variables, then I would use<br>
> setkey(DT,A,B) followed by DT[J("A",2)]. However, the second condition<br>
> is greater than or equal to, which, results in slower execution<br>
> compared to matching for equality for both variables.<br>
><br>
> I wanted to direct this question to the list to take advantage of any<br>
> speed improvement that can be possible and I might be missing. Thank<br>
> you very much in advance.<br>
><br>
> Steve<br>
><br>
> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
<br>
</div></div></blockquote></div><br>