[datatable-help] datatable-help Digest, Vol 36, Issue 8

Alexander Chernyakov alexander.chernyakov at gmail.com
Tue Feb 26 01:46:46 CET 2013


Regarding fasttime: my understanding is that only works after 1970.

On Mon, Feb 25, 2013 at 7:41 PM, <
datatable-help-request at lists.r-forge.r-project.org> wrote:

> Send datatable-help mailing list submissions to
>         datatable-help at lists.r-forge.r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> or, via email, send a message with subject or body 'help' to
>         datatable-help-request at lists.r-forge.r-project.org
>
> You can reach the person managing the list at
>         datatable-help-owner at lists.r-forge.r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of datatable-help digest..."
>
>
> Today's Topics:
>
>    1. About adding fastmatch and fasttime to data.table (stat quant)
>    2. Potential bug with sorting/summarizing by POSIXct and logical
>       column (Victor Kryukov)
>    3. Re: About adding fastmatch and fasttime to data.table
>       (Matthew Dowle)
>    4. Re: Potential bug with sorting/summarizing by POSIXct     and
>       logical column (Michael Nelson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 25 Feb 2013 19:40:35 +0100
> From: stat quant <statquant at outlook.com>
> To: datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] About adding fastmatch and fasttime to
>         data.table
> Message-ID:
>         <
> CAJJHHA9qL8hURXF0+8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello list,
>
> Looking at fastmatch and fasttime, I realized that those package consists
> solely in 1 C file (each).
> We spoke about the possibility to add those to data.table, I tried to
> contact S.Urbanek without any success so I do not have feedback from his
> side.
> Using fastPOSIXct provide a huge gain when one have to load files with
> datetime, on my laptop using data.table:::fread, I realized that most of
> the time is spent casting datetimes to POSIXct (I have several columns).
>
> Looking at fasttime, you can see pretty good improvement (factor 15)
>
> R) ts <- as.character(.POSIXct(runif(1e6) * unclass(Sys.time())))
> R)   system.time(a <- as.POSIXct(ts, "GMT"))
> utilisateur     syst?me      ?coul?
>        6.49        0.04        6.57
> R)   system.time(b <- fastPOSIXct(ts, "GMT"))
> utilisateur     syst?me      ?coul?
>        0.40        0.00        0.41
>
> When colClasses will be implemented in fread, can I suggest to allow using
> fasttime as an option ?
> Concerning fastmatch, the vignette already shows some nice benchmarks, I
> tend to do a lot of selects based on string columns, not sure if this is
> the case for most of us.
>
> My 0.002 cent
> Cheers
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 25 Feb 2013 14:26:28 -0800
> From: Victor Kryukov <victor.kryukov at gmail.com>
> To: datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] Potential bug with sorting/summarizing by
>         POSIXct and logical column
> Message-ID:
>         <CANJmMqTdpKGL3Bq=y-fYCsWDc8uTe3h-g+VoGBV=
> 1X+n5suowA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
> I've encounted what looks like a bug while sorting by POSIXct and logical
> column, which may or may not be related to the following bug:
>
>
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
>
> Here are all the details:
>
> http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
>
> Here is the test case:
>
>     # First some data
>     data <- data.table(structure(list(
>       month = structure(c(1356998400, 1356998400, 1356998400,
>                           1359676800, 1354320000, 1359676800, 1359676800,
> 1356998400, 1356998400,
>                           1354320000, 1354320000, 1354320000, 1359676800,
> 1359676800, 1359676800,
>                           1356998400, 1359676800, 1359676800, 1356998400,
> 1359676800, 1359676800,
>                           1359676800, 1359676800, 1354320000, 1354320000),
> class = c("POSIXct",
>
>          "POSIXt"), tzone = "UTC"),
>       portal = c(TRUE, TRUE, FALSE, TRUE,
>                  TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE,
> FALSE,
>                  TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
> TRUE, TRUE
>       ),
>       satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,
>                        9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L,
> 9L, 10L, 9L,
>                        10L, 10L)),
>                       .Names = c("month", "portal", "satisfaction"),
>                       row.names = c(NA, -25L), class = "data.frame"))
>
>     # Summarizing by month, portal with tapply works:
>
>     > tapply(data$satisfaction, list(data$month, data$portal), mean)
>     FALSE      TRUE
>     2012-12-01   8.5  8.000000
>     2013-01-01  10.0 10.000000
>     2013-02-01   9.0  9.545455
>
>     # Summarizing with 'by' argument of data.table does not:
>
>     > data[, mean(satisfaction), by = 'month,portal']>
>       data[, mean(satisfaction), by = list(month, portal)]
>     month portal        V1
>     1: 2013-01-01  FALSE 10.000000
>     2: 2013-02-01   TRUE  9.000000
>     3: 2013-01-01   TRUE 10.000000
>     4: 2012-12-01  FALSE  8.500000
>     5: 2012-12-01   TRUE  7.333333
>     6: 2013-02-01   TRUE  9.666667
>     7: 2013-02-01  FALSE  9.000000
>     8: 2012-12-01   TRUE 10.000000
>
>     # Summarizing only this year's data works:
>     data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal']
>     month portal        V1
>     1: 2013-01-01   TRUE 10.000000
>     2: 2013-01-01  FALSE 10.000000
>     3: 2013-02-01   TRUE  9.545455
>     4: 2013-02-01  FALSE  9.000000
>
> Yours Sincerely,
> Victor Kryukov
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Tue, 26 Feb 2013 00:39:09 +0000
> From: Matthew Dowle <mdowle at mdowle.plus.com>
> To: <statquant at outlook.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] About adding fastmatch and fasttime to
>         data.table
> Message-ID: <aed96221d7d28ff8d77ea8823135b49a at imap.plus.net>
> Content-Type: text/plain; charset="utf-8"
>
>
>
> Hi,
>
> This sounds like a geat idea. I don't know why Simon U didn't
> reply, or without success, so that may depend on the way you asked,
> whether he is on holiday at the moment, his reaction to the precise
> wording of the email you wrote, or some other factor. It is difficult to
> tell! But we don't need to wait for him or for for you: this is open
> source. You have got much further than I have so if you'd like to add
> this please go ahead and make progress. You're very welcome to join the
> project and commit directly. Or if you can't for some reason please file
> as a feature request so it doesn't get forgotten.
>
> Matthew
>
> On
> 25.02.2013 18:40, stat quant wrote:
>
> > Hello list,
> >
> > Looking at
> fastmatch and fasttime, I realized that those package consists solely in
> 1 C file (each).
> > We spoke about the possibility to add those to
> data.table, I tried to contact S.Urbanek without any success so I do not
> have feedback from his side.
> > Using fastPOSIXct provide a huge gain
> when one have to load files with datetime, on my laptop using
> data.table:::fread, I realized that most of the time is spent casting
> datetimes to POSIXct (I have several columns).
> >
> > Looking at
> fasttime, you can see pretty good improvement (factor 15)
> >
> > R) ts R)
> system.time(a utilisateur syst?me ?coul?
> > 6.49 0.04 6.57
> > R)
> system.time(b utilisateur syst?me ?coul?
> > 0.40 0.00 0.41
> >
> > When
> colClasses will be implemented in fread, can I suggest to allow using
> fasttime as an option ?
> > Concerning fastmatch, the vignette already
> shows some nice benchmarks, I tend to do a lot of selects based on
> string columns, not sure if this is the case for most of us.
> >
> > My
> 0.002 cent
> > Cheers
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Tue, 26 Feb 2013 00:40:02 +0000
> From: Michael Nelson <michael.nelson at sydney.edu.au>
> To: "datatable-help at lists.r-forge.r-project.org"
>         <datatable-help at lists.r-forge.r-project.org>
> Subject: Re: [datatable-help] Potential bug with sorting/summarizing
>         by POSIXct      and logical column
> Message-ID:
>         <
> 6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I can't replicate this problem using data.table 1.8.7 (installed about 3
> weeks ago) on
> R version 2.15.2 (2012-10-26)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> Michael
> ________________________________
> From: datatable-help-bounces at lists.r-forge.r-project.org [
> datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Victor
> Kryukov [victor.kryukov at gmail.com]
> Sent: Tuesday, 26 February 2013 9:26 AM
> To: datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] Potential bug with sorting/summarizing by
> POSIXct and logical column
>
> Hello,
>
> I've encounted what looks like a bug while sorting by POSIXct and logical
> column, which may or may not be related to the following bug:
>
>
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
>
> Here are all the details:
> http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
>
> Here is the test case:
>
>     # First some data
>     data <- data.table(structure(list(
>       month = structure(c(1356998400, 1356998400, 1356998400,
>                           1359676800, 1354320000, 1359676800, 1359676800,
> 1356998400, 1356998400,
>                           1354320000, 1354320000, 1354320000, 1359676800,
> 1359676800, 1359676800,
>                           1356998400, 1359676800, 1359676800, 1356998400,
> 1359676800, 1359676800,
>                           1359676800, 1359676800, 1354320000, 1354320000),
> class = c("POSIXct",
>
>            "POSIXt"), tzone = "UTC"),
>       portal = c(TRUE, TRUE, FALSE, TRUE,
>                  TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE,
> FALSE,
>                  TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
> TRUE, TRUE
>       ),
>       satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,
>                        9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L,
> 9L, 10L, 9L,
>                        10L, 10L)),
>                       .Names = c("month", "portal", "satisfaction"),
>                       row.names = c(NA, -25L), class = "data.frame"))
>
>     # Summarizing by month, portal with tapply works:
>
>     > tapply(data$satisfaction, list(data$month, data$portal), mean)
>     FALSE      TRUE
>     2012-12-01   8.5  8.000000
>     2013-01-01  10.0 10.000000
>     2013-02-01   9.0  9.545455
>
>     # Summarizing with 'by' argument of data.table does not:
>
>     > data[, mean(satisfaction), by = 'month,portal']>
>       data[, mean(satisfaction), by = list(month, portal)]
>     month portal        V1
>     1: 2013-01-01  FALSE 10.000000
>     2: 2013-02-01   TRUE  9.000000
>     3: 2013-01-01   TRUE 10.000000
>     4: 2012-12-01  FALSE  8.500000
>     5: 2012-12-01   TRUE  7.333333
>     6: 2013-02-01   TRUE  9.666667
>     7: 2013-02-01  FALSE  9.000000
>     8: 2012-12-01   TRUE 10.000000
>
>     # Summarizing only this year's data works:
>     data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal']
>     month portal        V1
>     1: 2013-01-01   TRUE 10.000000
>     2: 2013-01-01  FALSE 10.000000
>     3: 2013-02-01   TRUE  9.545455
>     4: 2013-02-01  FALSE  9.000000
>
> Yours Sincerely,
> Victor Kryukov
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> End of datatable-help Digest, Vol 36, Issue 8
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/7daf37b8/attachment-0001.html>


More information about the datatable-help mailing list