<div dir="ltr">This is because `x %between% y` works by calling `between(x, y[1], y[2])`, so your call becomes:<div><br></div><div>   dt[date %between c(start, end)]  ----> dt[between(date, c(start, end)[1], c(start, end)[2])]<br>


<div><br></div></div><div>I don't know if there is anything that can be done about it (aside from not using the operator version with vectors).</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Oct 6, 2013 at 5:29 PM, drclark <span dir="ltr"><<a href="mailto:clark9876@airquality.dk" target="_blank">clark9876@airquality.dk</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear data.table experts,<br>

<br>

I was inspired by SO topic How to match two data.frames with an inexact<br>

matching identifier (one identifier has to be in the range of the other) for<br>

a problem I have to calculate pollutant statistics during various episodes<br>

from monitoring data. The episodes (like the fiscal quarters in the SO<br>

topic) are defined for each site in a lookup table with starting and ending<br>

dates. The start and end dates can be different at different sites. The SO<br>

answer used >= and <= to check the date was in the range from start to end.<br>

  mD[qD][Month>=startMonth & Month<=endMonth]<br>

<br>

This approach may suit my problem, but I thought that I could use "between"<br>

rather than the two logical comparisons.  I tried both the between()<br>

function and its equivalent %between% operator -- and I get two different<br>

results. The between() version is correct, but %between% gives a wrong<br>

answer. Am I missing something in the syntax for using between?<br>

<br>

My version of the SO data, merge and results below. I changed the variable<br>

names to suit my work: ID->site, Month->date, MonValue->conc,<br>

QTRValue->episodeID.<br>

<br>

require(data.table)   # data.table 1.8.10  on R 3.0.2 under Win7x64<br>

# the measurement data<br>

dat <- data.table(site = rep(c("A","B"), each=10),<br>

                  date = rep(1:10, times = 2),     # could be day or hour<br>

                  conc = sample(30:50,2*10,replace=TRUE),  # the pollutant<br>

data<br>

                  key="site,date")<br>

dat<br>

#    site date conc<br>

# 1:    A    1   48<br>

# 2:    A    2   44<br>

# 3:    A    3   50<br>

# 4:    A    4   47<br>

# 5:    A    5   35<br>

# 6:    A    6   47<br>

# 7:    A    7   38<br>

# 8:    A    8   34<br>

# 9:    A    9   46<br>

#10:    A   10   35<br>

#11:    B    1   45<br>

#12:    B    2   35<br>

#13:    B    3   40<br>

#14:    B    4   41<br>

#15:    B    5   37<br>

#16:    B    6   37<br>

#17:    B    7   32<br>

#18:    B    8   41<br>

#19:    B    9   31<br>

#20:    B   10   32<br>

#<br>

# definitions for the episodes<br>

episode <- data.table(<br>

                site = rep(c("A", "B"), each = 3),<br>

                start = c(1, 4, 7, 1, 3, 8),<br>

                end = c(3, 5, 10, 2, 5, 10),<br>

                episodeID = rep(1:3, 2),<br>

                key="site")<br>

episode<br>

#   site start end episodeID<br>

# 1:    A     1   3         1<br>

# 2:    A     4   5         2<br>

# 3:    A     7  10         3<br>

# 4:    B     1   2         1<br>

# 5:    B     3   5         2<br>

# 6:    B     8  10         3<br>

#<br>

# join measurement data and episode list  (for later aggregation using<br>

mean() etc.)<br>

# approach from the SO thread -- gives the right result<br>

dat[episode, allow.cartesian=TRUE][date>=start & date<=end]<br>

    site date conc start end episodeID<br>

#   1:    A    1   48     1   3         1<br>

#   2:    A    2   44     1   3         1<br>

#   3:    A    3   50     1   3         1<br>

#   4:    A    4   47     4   5         2<br>

#   5:    A    5   35     4   5         2<br>

#   6:    A    7   38     7  10         3<br>

#   7:    A    8   34     7  10         3<br>

#   8:    A    9   46     7  10         3<br>

#   9:    A   10   35     7  10         3<br>

# 10:    B    1   45     1   2         1<br>

# 11:    B    2   35     1   2         1<br>

# 12:    B    3   40     3   5         2<br>

# 13:    B    4   41     3   5         2<br>

# 14:    B    5   37     3   5         2<br>

# 15:    B    8   41     8  10         3<br>

# 16:    B    9   31     8  10         3<br>

# 17:    B   10   32     8  10         3<br>

<br>

# using between() -- also gives the desired result<br>

dat[episode, allow.cartesian=TRUE][between (date,start,end)]<br>

#  (returns same result as above)<br>

<br>

# using %between% -- gives different result - not the right answer<br>

dat[episode, allow.cartesian=TRUE][date %between% c(start,end)]<br>

#    site date conc start end episodeID<br>

# 1:    A    1   48     1   3         1<br>

# 2:    A    1   48     4   5         2<br>

# 3:    A    1   48     7  10         3<br>

# 4:    B    1   45     1   2         1<br>

# 5:    B    1   45     3   5         2<br>

# 6:    B    1   45     8  10         3<br>

<br>

So why does the %between% operator give a different result than between()?<br>

There must be some detail of syntax I need to learn here.  I also tried<br>

putting the whole %between% expression in parenthesis, but that doesn't make<br>

any difference:<br>

  dat[episode, allow.cartesian=TRUE][(date %between% c(start,end))]<br>

<br>

Best regards.<br>

Douglas Clark<br>

<br>

<br>

<br>

--<br>

View this message in context: <a href="http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html" target="_blank">http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html</a><br>


Sent from the datatable-help mailing list archive at Nabble.com.<br>

_______________________________________________<br>

datatable-help mailing list<br>

<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

</blockquote></div><br></div>