<div dir="ltr">This is because `x %between% y` works by calling `between(x, y[1], y[2])`, so your call becomes:<div><br></div><div> dt[date %between c(start, end)] ----> dt[between(date, c(start, end)[1], c(start, end)[2])]<br>
<div><br></div></div><div>I don't know if there is anything that can be done about it (aside from not using the operator version with vectors).</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Oct 6, 2013 at 5:29 PM, drclark <span dir="ltr"><<a href="mailto:clark9876@airquality.dk" target="_blank">clark9876@airquality.dk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear data.table experts,<br>
<br>
I was inspired by SO topic How to match two data.frames with an inexact<br>
matching identifier (one identifier has to be in the range of the other) for<br>
a problem I have to calculate pollutant statistics during various episodes<br>
from monitoring data. The episodes (like the fiscal quarters in the SO<br>
topic) are defined for each site in a lookup table with starting and ending<br>
dates. The start and end dates can be different at different sites. The SO<br>
answer used >= and <= to check the date was in the range from start to end.<br>
mD[qD][Month>=startMonth & Month<=endMonth]<br>
<br>
This approach may suit my problem, but I thought that I could use "between"<br>
rather than the two logical comparisons. I tried both the between()<br>
function and its equivalent %between% operator -- and I get two different<br>
results. The between() version is correct, but %between% gives a wrong<br>
answer. Am I missing something in the syntax for using between?<br>
<br>
My version of the SO data, merge and results below. I changed the variable<br>
names to suit my work: ID->site, Month->date, MonValue->conc,<br>
QTRValue->episodeID.<br>
<br>
require(data.table) # data.table 1.8.10 on R 3.0.2 under Win7x64<br>
# the measurement data<br>
dat <- data.table(site = rep(c("A","B"), each=10),<br>
date = rep(1:10, times = 2), # could be day or hour<br>
conc = sample(30:50,2*10,replace=TRUE), # the pollutant<br>
data<br>
key="site,date")<br>
dat<br>
# site date conc<br>
# 1: A 1 48<br>
# 2: A 2 44<br>
# 3: A 3 50<br>
# 4: A 4 47<br>
# 5: A 5 35<br>
# 6: A 6 47<br>
# 7: A 7 38<br>
# 8: A 8 34<br>
# 9: A 9 46<br>
#10: A 10 35<br>
#11: B 1 45<br>
#12: B 2 35<br>
#13: B 3 40<br>
#14: B 4 41<br>
#15: B 5 37<br>
#16: B 6 37<br>
#17: B 7 32<br>
#18: B 8 41<br>
#19: B 9 31<br>
#20: B 10 32<br>
#<br>
# definitions for the episodes<br>
episode <- data.table(<br>
site = rep(c("A", "B"), each = 3),<br>
start = c(1, 4, 7, 1, 3, 8),<br>
end = c(3, 5, 10, 2, 5, 10),<br>
episodeID = rep(1:3, 2),<br>
key="site")<br>
episode<br>
# site start end episodeID<br>
# 1: A 1 3 1<br>
# 2: A 4 5 2<br>
# 3: A 7 10 3<br>
# 4: B 1 2 1<br>
# 5: B 3 5 2<br>
# 6: B 8 10 3<br>
#<br>
# join measurement data and episode list (for later aggregation using<br>
mean() etc.)<br>
# approach from the SO thread -- gives the right result<br>
dat[episode, allow.cartesian=TRUE][date>=start & date<=end]<br>
site date conc start end episodeID<br>
# 1: A 1 48 1 3 1<br>
# 2: A 2 44 1 3 1<br>
# 3: A 3 50 1 3 1<br>
# 4: A 4 47 4 5 2<br>
# 5: A 5 35 4 5 2<br>
# 6: A 7 38 7 10 3<br>
# 7: A 8 34 7 10 3<br>
# 8: A 9 46 7 10 3<br>
# 9: A 10 35 7 10 3<br>
# 10: B 1 45 1 2 1<br>
# 11: B 2 35 1 2 1<br>
# 12: B 3 40 3 5 2<br>
# 13: B 4 41 3 5 2<br>
# 14: B 5 37 3 5 2<br>
# 15: B 8 41 8 10 3<br>
# 16: B 9 31 8 10 3<br>
# 17: B 10 32 8 10 3<br>
<br>
# using between() -- also gives the desired result<br>
dat[episode, allow.cartesian=TRUE][between (date,start,end)]<br>
# (returns same result as above)<br>
<br>
# using %between% -- gives different result - not the right answer<br>
dat[episode, allow.cartesian=TRUE][date %between% c(start,end)]<br>
# site date conc start end episodeID<br>
# 1: A 1 48 1 3 1<br>
# 2: A 1 48 4 5 2<br>
# 3: A 1 48 7 10 3<br>
# 4: B 1 45 1 2 1<br>
# 5: B 1 45 3 5 2<br>
# 6: B 1 45 8 10 3<br>
<br>
So why does the %between% operator give a different result than between()?<br>
There must be some detail of syntax I need to learn here. I also tried<br>
putting the whole %between% expression in parenthesis, but that doesn't make<br>
any difference:<br>
dat[episode, allow.cartesian=TRUE][(date %between% c(start,end))]<br>
<br>
Best regards.<br>
Douglas Clark<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html" target="_blank">http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html</a><br>
Sent from the datatable-help mailing list archive at Nabble.com.<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote></div><br></div>