Regarding fasttime: my understanding is that only works after 1970.<br><br><div class="gmail_quote">On Mon, Feb 25, 2013 at 7:41 PM, <span dir="ltr"><<a href="mailto:datatable-help-request@lists.r-forge.r-project.org" target="_blank">datatable-help-request@lists.r-forge.r-project.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send datatable-help mailing list submissions to<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:datatable-help-request@lists.r-forge.r-project.org">datatable-help-request@lists.r-forge.r-project.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:datatable-help-owner@lists.r-forge.r-project.org">datatable-help-owner@lists.r-forge.r-project.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of datatable-help digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. About adding fastmatch and fasttime to data.table (stat quant)<br>
2. Potential bug with sorting/summarizing by POSIXct and logical<br>
column (Victor Kryukov)<br>
3. Re: About adding fastmatch and fasttime to data.table<br>
(Matthew Dowle)<br>
4. Re: Potential bug with sorting/summarizing by POSIXct and<br>
logical column (Michael Nelson)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Mon, 25 Feb 2013 19:40:35 +0100<br>
From: stat quant <<a href="mailto:statquant@outlook.com">statquant@outlook.com</a>><br>
To: <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
Subject: [datatable-help] About adding fastmatch and fasttime to<br>
data.table<br>
Message-ID:<br>
<<a href="mailto:CAJJHHA9qL8hURXF0%2B8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA@mail.gmail.com">CAJJHHA9qL8hURXF0+8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Hello list,<br>
<br>
Looking at fastmatch and fasttime, I realized that those package consists<br>
solely in 1 C file (each).<br>
We spoke about the possibility to add those to data.table, I tried to<br>
contact S.Urbanek without any success so I do not have feedback from his<br>
side.<br>
Using fastPOSIXct provide a huge gain when one have to load files with<br>
datetime, on my laptop using data.table:::fread, I realized that most of<br>
the time is spent casting datetimes to POSIXct (I have several columns).<br>
<br>
Looking at fasttime, you can see pretty good improvement (factor 15)<br>
<br>
R) ts <- as.character(.POSIXct(runif(1e6) * unclass(Sys.time())))<br>
R) system.time(a <- as.POSIXct(ts, "GMT"))<br>
utilisateur syst?me ?coul?<br>
6.49 0.04 6.57<br>
R) system.time(b <- fastPOSIXct(ts, "GMT"))<br>
utilisateur syst?me ?coul?<br>
0.40 0.00 0.41<br>
<br>
When colClasses will be implemented in fread, can I suggest to allow using<br>
fasttime as an option ?<br>
Concerning fastmatch, the vignette already shows some nice benchmarks, I<br>
tend to do a lot of selects based on string columns, not sure if this is<br>
the case for most of us.<br>
<br>
My 0.002 cent<br>
Cheers<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html" target="_blank">http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 25 Feb 2013 14:26:28 -0800<br>
From: Victor Kryukov <<a href="mailto:victor.kryukov@gmail.com">victor.kryukov@gmail.com</a>><br>
To: <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
Subject: [datatable-help] Potential bug with sorting/summarizing by<br>
POSIXct and logical column<br>
Message-ID:<br>
<CANJmMqTdpKGL3Bq=y-fYCsWDc8uTe3h-g+VoGBV=<a href="mailto:1X%2Bn5suowA@mail.gmail.com">1X+n5suowA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Hello,<br>
<br>
I've encounted what looks like a bug while sorting by POSIXct and logical<br>
column, which may or may not be related to the following bug:<br>
<br>
<a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975" target="_blank">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975</a><br>
<br>
Here are all the details:<br>
<a href="http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns" target="_blank">http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns</a><br>
<br>
Here is the test case:<br>
<br>
# First some data<br>
data <- data.table(structure(list(<br>
month = structure(c(1356998400, 1356998400, 1356998400,<br>
1359676800, 1354320000, 1359676800, 1359676800,<br>
1356998400, 1356998400,<br>
1354320000, 1354320000, 1354320000, 1359676800,<br>
1359676800, 1359676800,<br>
1356998400, 1359676800, 1359676800, 1356998400,<br>
1359676800, 1359676800,<br>
1359676800, 1359676800, 1354320000, 1354320000),<br>
class = c("POSIXct",<br>
<br>
"POSIXt"), tzone = "UTC"),<br>
portal = c(TRUE, TRUE, FALSE, TRUE,<br>
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE,<br>
FALSE,<br>
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,<br>
TRUE, TRUE<br>
),<br>
satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,<br>
9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L,<br>
9L, 10L, 9L,<br>
10L, 10L)),<br>
.Names = c("month", "portal", "satisfaction"),<br>
row.names = c(NA, -25L), class = "data.frame"))<br>
<br>
# Summarizing by month, portal with tapply works:<br>
<br>
> tapply(data$satisfaction, list(data$month, data$portal), mean)<br>
FALSE TRUE<br>
2012-12-01 8.5 8.000000<br>
2013-01-01 10.0 10.000000<br>
2013-02-01 9.0 9.545455<br>
<br>
# Summarizing with 'by' argument of data.table does not:<br>
<br>
> data[, mean(satisfaction), by = 'month,portal']><br>
data[, mean(satisfaction), by = list(month, portal)]<br>
month portal V1<br>
1: 2013-01-01 FALSE 10.000000<br>
2: 2013-02-01 TRUE 9.000000<br>
3: 2013-01-01 TRUE 10.000000<br>
4: 2012-12-01 FALSE 8.500000<br>
5: 2012-12-01 TRUE 7.333333<br>
6: 2013-02-01 TRUE 9.666667<br>
7: 2013-02-01 FALSE 9.000000<br>
8: 2012-12-01 TRUE 10.000000<br>
<br>
# Summarizing only this year's data works:<br>
data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal']<br>
month portal V1<br>
1: 2013-01-01 TRUE 10.000000<br>
2: 2013-01-01 FALSE 10.000000<br>
3: 2013-02-01 TRUE 9.545455<br>
4: 2013-02-01 FALSE 9.000000<br>
<br>
Yours Sincerely,<br>
Victor Kryukov<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html" target="_blank">http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Tue, 26 Feb 2013 00:39:09 +0000<br>
From: Matthew Dowle <<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>><br>
To: <<a href="mailto:statquant@outlook.com">statquant@outlook.com</a>><br>
Cc: <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
Subject: Re: [datatable-help] About adding fastmatch and fasttime to<br>
data.table<br>
Message-ID: <<a href="mailto:aed96221d7d28ff8d77ea8823135b49a@imap.plus.net">aed96221d7d28ff8d77ea8823135b49a@imap.plus.net</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
<br>
<br>
Hi,<br>
<br>
This sounds like a geat idea. I don't know why Simon U didn't<br>
reply, or without success, so that may depend on the way you asked,<br>
whether he is on holiday at the moment, his reaction to the precise<br>
wording of the email you wrote, or some other factor. It is difficult to<br>
tell! But we don't need to wait for him or for for you: this is open<br>
source. You have got much further than I have so if you'd like to add<br>
this please go ahead and make progress. You're very welcome to join the<br>
project and commit directly. Or if you can't for some reason please file<br>
as a feature request so it doesn't get forgotten.<br>
<br>
Matthew<br>
<br>
On<br>
<a href="tel:25.02.2013%2018" value="+12502201318">25.02.2013 18</a>:40, stat quant wrote:<br>
<br>
> Hello list,<br>
><br>
> Looking at<br>
fastmatch and fasttime, I realized that those package consists solely in<br>
1 C file (each).<br>
> We spoke about the possibility to add those to<br>
data.table, I tried to contact S.Urbanek without any success so I do not<br>
have feedback from his side.<br>
> Using fastPOSIXct provide a huge gain<br>
when one have to load files with datetime, on my laptop using<br>
data.table:::fread, I realized that most of the time is spent casting<br>
datetimes to POSIXct (I have several columns).<br>
><br>
> Looking at<br>
fasttime, you can see pretty good improvement (factor 15)<br>
><br>
> R) ts R)<br>
system.time(a utilisateur syst?me ?coul?<br>
> 6.49 0.04 6.57<br>
> R)<br>
system.time(b utilisateur syst?me ?coul?<br>
> 0.40 0.00 0.41<br>
><br>
> When<br>
colClasses will be implemented in fread, can I suggest to allow using<br>
fasttime as an option ?<br>
> Concerning fastmatch, the vignette already<br>
shows some nice benchmarks, I tend to do a lot of selects based on<br>
string columns, not sure if this is the case for most of us.<br>
><br>
> My<br>
0.002 cent<br>
> Cheers<br>
<br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html" target="_blank">http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Tue, 26 Feb 2013 00:40:02 +0000<br>
From: Michael Nelson <<a href="mailto:michael.nelson@sydney.edu.au">michael.nelson@sydney.edu.au</a>><br>
To: "<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>"<br>
<<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>><br>
Subject: Re: [datatable-help] Potential bug with sorting/summarizing<br>
by POSIXct and logical column<br>
Message-ID:<br>
<<a href="mailto:6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4@EX-MBX-PRO-04.mcs.usyd.edu.au">6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4@EX-MBX-PRO-04.mcs.usyd.edu.au</a>><br>
<br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
I can't replicate this problem using data.table 1.8.7 (installed about 3 weeks ago) on<br>
R version 2.15.2 (2012-10-26)<br>
Platform: i386-w64-mingw32/i386 (32-bit)<br>
<br>
Michael<br>
________________________________<br>
From: <a href="mailto:datatable-help-bounces@lists.r-forge.r-project.org">datatable-help-bounces@lists.r-forge.r-project.org</a> [<a href="mailto:datatable-help-bounces@lists.r-forge.r-project.org">datatable-help-bounces@lists.r-forge.r-project.org</a>] on behalf of Victor Kryukov [<a href="mailto:victor.kryukov@gmail.com">victor.kryukov@gmail.com</a>]<br>
Sent: Tuesday, 26 February 2013 9:26 AM<br>
To: <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column<br>
<br>
Hello,<br>
<br>
I've encounted what looks like a bug while sorting by POSIXct and logical column, which may or may not be related to the following bug:<br>
<br>
<a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975" target="_blank">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975</a><br>
<br>
Here are all the details: <a href="http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns" target="_blank">http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns</a><br>
<br>
Here is the test case:<br>
<br>
# First some data<br>
data <- data.table(structure(list(<br>
month = structure(c(1356998400, 1356998400, 1356998400,<br>
1359676800, 1354320000, 1359676800, 1359676800, 1356998400, 1356998400,<br>
1354320000, 1354320000, 1354320000, 1359676800, 1359676800, 1359676800,<br>
1356998400, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800,<br>
1359676800, 1359676800, 1354320000, 1354320000), class = c("POSIXct",<br>
"POSIXt"), tzone = "UTC"),<br>
portal = c(TRUE, TRUE, FALSE, TRUE,<br>
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE,<br>
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE<br>
),<br>
satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,<br>
9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, 9L, 10L, 9L,<br>
10L, 10L)),<br>
.Names = c("month", "portal", "satisfaction"),<br>
row.names = c(NA, -25L), class = "data.frame"))<br>
<br>
# Summarizing by month, portal with tapply works:<br>
<br>
> tapply(data$satisfaction, list(data$month, data$portal), mean)<br>
FALSE TRUE<br>
2012-12-01 8.5 8.000000<br>
2013-01-01 10.0 10.000000<br>
2013-02-01 9.0 9.545455<br>
<br>
# Summarizing with 'by' argument of data.table does not:<br>
<br>
> data[, mean(satisfaction), by = 'month,portal']><br>
data[, mean(satisfaction), by = list(month, portal)]<br>
month portal V1<br>
1: 2013-01-01 FALSE 10.000000<br>
2: 2013-02-01 TRUE 9.000000<br>
3: 2013-01-01 TRUE 10.000000<br>
4: 2012-12-01 FALSE 8.500000<br>
5: 2012-12-01 TRUE 7.333333<br>
6: 2013-02-01 TRUE 9.666667<br>
7: 2013-02-01 FALSE 9.000000<br>
8: 2012-12-01 TRUE 10.000000<br>
<br>
# Summarizing only this year's data works:<br>
data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal']<br>
month portal V1<br>
1: 2013-01-01 TRUE 10.000000<br>
2: 2013-01-01 FALSE 10.000000<br>
3: 2013-02-01 TRUE 9.545455<br>
4: 2013-02-01 FALSE 9.000000<br>
<br>
Yours Sincerely,<br>
Victor Kryukov<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html" target="_blank">http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
End of datatable-help Digest, Vol 36, Issue 8<br>
*********************************************<br>
</blockquote></div><br>