[datatable-help] between() versus %between% - why different results?
Eduard Antonyan
eduard.antonyan at gmail.com
Mon Oct 7 20:31:30 CEST 2013
This is because `x %between% y` works by calling `between(x, y[1], y[2])`,
so your call becomes:
dt[date %between c(start, end)] ----> dt[between(date, c(start,
end)[1], c(start, end)[2])]
I don't know if there is anything that can be done about it (aside from not
using the operator version with vectors).
On Sun, Oct 6, 2013 at 5:29 PM, drclark <clark9876 at airquality.dk> wrote:
> Dear data.table experts,
>
> I was inspired by SO topic How to match two data.frames with an inexact
> matching identifier (one identifier has to be in the range of the other)
> for
> a problem I have to calculate pollutant statistics during various episodes
> from monitoring data. The episodes (like the fiscal quarters in the SO
> topic) are defined for each site in a lookup table with starting and ending
> dates. The start and end dates can be different at different sites. The SO
> answer used >= and <= to check the date was in the range from start to end.
> mD[qD][Month>=startMonth & Month<=endMonth]
>
> This approach may suit my problem, but I thought that I could use "between"
> rather than the two logical comparisons. I tried both the between()
> function and its equivalent %between% operator -- and I get two different
> results. The between() version is correct, but %between% gives a wrong
> answer. Am I missing something in the syntax for using between?
>
> My version of the SO data, merge and results below. I changed the variable
> names to suit my work: ID->site, Month->date, MonValue->conc,
> QTRValue->episodeID.
>
> require(data.table) # data.table 1.8.10 on R 3.0.2 under Win7x64
> # the measurement data
> dat <- data.table(site = rep(c("A","B"), each=10),
> date = rep(1:10, times = 2), # could be day or hour
> conc = sample(30:50,2*10,replace=TRUE), # the pollutant
> data
> key="site,date")
> dat
> # site date conc
> # 1: A 1 48
> # 2: A 2 44
> # 3: A 3 50
> # 4: A 4 47
> # 5: A 5 35
> # 6: A 6 47
> # 7: A 7 38
> # 8: A 8 34
> # 9: A 9 46
> #10: A 10 35
> #11: B 1 45
> #12: B 2 35
> #13: B 3 40
> #14: B 4 41
> #15: B 5 37
> #16: B 6 37
> #17: B 7 32
> #18: B 8 41
> #19: B 9 31
> #20: B 10 32
> #
> # definitions for the episodes
> episode <- data.table(
> site = rep(c("A", "B"), each = 3),
> start = c(1, 4, 7, 1, 3, 8),
> end = c(3, 5, 10, 2, 5, 10),
> episodeID = rep(1:3, 2),
> key="site")
> episode
> # site start end episodeID
> # 1: A 1 3 1
> # 2: A 4 5 2
> # 3: A 7 10 3
> # 4: B 1 2 1
> # 5: B 3 5 2
> # 6: B 8 10 3
> #
> # join measurement data and episode list (for later aggregation using
> mean() etc.)
> # approach from the SO thread -- gives the right result
> dat[episode, allow.cartesian=TRUE][date>=start & date<=end]
> site date conc start end episodeID
> # 1: A 1 48 1 3 1
> # 2: A 2 44 1 3 1
> # 3: A 3 50 1 3 1
> # 4: A 4 47 4 5 2
> # 5: A 5 35 4 5 2
> # 6: A 7 38 7 10 3
> # 7: A 8 34 7 10 3
> # 8: A 9 46 7 10 3
> # 9: A 10 35 7 10 3
> # 10: B 1 45 1 2 1
> # 11: B 2 35 1 2 1
> # 12: B 3 40 3 5 2
> # 13: B 4 41 3 5 2
> # 14: B 5 37 3 5 2
> # 15: B 8 41 8 10 3
> # 16: B 9 31 8 10 3
> # 17: B 10 32 8 10 3
>
> # using between() -- also gives the desired result
> dat[episode, allow.cartesian=TRUE][between (date,start,end)]
> # (returns same result as above)
>
> # using %between% -- gives different result - not the right answer
> dat[episode, allow.cartesian=TRUE][date %between% c(start,end)]
> # site date conc start end episodeID
> # 1: A 1 48 1 3 1
> # 2: A 1 48 4 5 2
> # 3: A 1 48 7 10 3
> # 4: B 1 45 1 2 1
> # 5: B 1 45 3 5 2
> # 6: B 1 45 8 10 3
>
> So why does the %between% operator give a different result than between()?
> There must be some detail of syntax I need to learn here. I also tried
> putting the whole %between% expression in parenthesis, but that doesn't
> make
> any difference:
> dat[episode, allow.cartesian=TRUE][(date %between% c(start,end))]
>
> Best regards.
> Douglas Clark
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131007/137f332c/attachment.html>
More information about the datatable-help
mailing list