[datatable-help] datatable-help Digest, Vol 36, Issue 8

Matthew Dowle mdowle at mdowle.plus.com
Tue Feb 26 11:47:02 CET 2013


 

Thanks. Have added that (1970 potential issue) to statquant's FR to
follow up...


https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2582&group_id=240&atid=978


On 26.02.2013 00:46, Alexander Chernyakov wrote: 

> Regarding
fasttime: my understanding is that only works after 1970.
> 
> On Mon,
Feb 25, 2013 at 7:41 PM,
<datatable-help-request at lists.r-forge.r-project.org [32]> wrote:
> 
>>
Send datatable-help mailing list submissions to
>>
datatable-help at lists.r-forge.r-project.org [1]
>> 
>> To subscribe or
unsubscribe via the World Wide Web, visit
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]
>> 
>> or, via email, send a message with subject or body 'help'
to
>> datatable-help-request at lists.r-forge.r-project.org [3]
>> 
>> You
can reach the person managing the list at
>>
datatable-help-owner at lists.r-forge.r-project.org [4]
>> 
>> When
replying, please edit your Subject line so it is more specific
>> than
"Re: Contents of datatable-help digest..."
>> 
>> Today's Topics:
>> 
>>
1. About adding fastmatch and fasttime to data.table (stat quant)
>> 2.
Potential bug with sorting/summarizing by POSIXct and logical
>> column
(Victor Kryukov)
>> 3. Re: About adding fastmatch and fasttime to
data.table
>> (Matthew Dowle)
>> 4. Re: Potential bug with
sorting/summarizing by POSIXct and
>> logical column (Michael Nelson)
>>

>>
----------------------------------------------------------------------
>>

>> Message: 1
>> Date: Mon, 25 Feb 2013 19:40:35 +0100
>> From: stat
quant <statquant at outlook.com [5]>
>> To:
datatable-help at lists.r-forge.r-project.org [6]
>> Subject:
[datatable-help] About adding fastmatch and fasttime to
>> data.table
>>
Message-ID:
>>
<CAJJHHA9qL8hURXF0+8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA at mail.gmail.com
[7]>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Hello
list,
>> 
>> Looking at fastmatch and fasttime, I realized that those
package consists
>> solely in 1 C file (each).
>> We spoke about the
possibility to add those to data.table, I tried to
>> contact S.Urbanek
without any success so I do not have feedback from his
>> side.
>> Using
fastPOSIXct provide a huge gain when one have to load files with
>>
datetime, on my laptop using data.table:::fread, I realized that most
of
>> the time is spent casting datetimes to POSIXct (I have several
columns).
>> 
>> Looking at fasttime, you can see pretty good
improvement (factor 15)
>> 
>> R) ts R) system.time(a utilisateur
syst?me ?coul?
>> 6.49 0.04 6.57
>> R) system.time(b utilisateur syst?me
?coul?
>> 0.40 0.00 0.41
>> 
>> When colClasses will be implemented in
fread, can I suggest to allow using
>> fasttime as an option ?
>>
Concerning fastmatch, the vignette already shows some nice benchmarks,
I
>> tend to do a lot of selects based on string columns, not sure if
this is
>> the case for most of us.
>> 
>> My 0.002 cent
>> Cheers
>>
-------------- next part --------------
>> An HTML attachment was
scrubbed...
>> URL:
<http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html
[8]>
>> 
>> ------------------------------
>> 
>> Message: 2
>> Date:
Mon, 25 Feb 2013 14:26:28 -0800
>> From: Victor Kryukov
<victor.kryukov at gmail.com [9]>
>> To:
datatable-help at lists.r-forge.r-project.org [10]
>> Subject:
[datatable-help] Potential bug with sorting/summarizing by
>> POSIXct
and logical column
>> Message-ID:
>> 1X+n5suowA at mail.gmail.com>
>>
Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Hello,
>> 
>> I've
encounted what looks like a bug while sorting by POSIXct and logical
>>
column, which may or may not be related to the following bug:
>> 
>>
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
[11]
>> 
>> Here are all the details:
>>
http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
[12]
>> 
>> Here is the test case:
>> 
>> # First some data
>> data
month = structure(c(1356998400, 1356998400, 1356998400,
>> 1359676800,
1354320000, 1359676800, 1359676800,
>> 1356998400, 1356998400,
>>
1354320000, 1354320000, 1354320000, 1359676800,
>> 1359676800,
1359676800,
>> 1356998400, 1359676800, 1359676800, 1356998400,
>>
1359676800, 1359676800,
>> 1359676800, 1359676800, 1354320000,
1354320000),
>> class = c("POSIXct",
>> 
>> "POSIXt"), tzone =
"UTC"),
>> portal = c(TRUE, TRUE, FALSE, TRUE,
>> TRUE, TRUE, TRUE,
TRUE, TRUE, FALSE, TRUE, FALSE, TRUE,
>> FALSE,
>> TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>> TRUE, TRUE
>> ),
>> satisfaction
= c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,
>> 9L, 2L, 8L, 10L, 9L,
10L, 10L, 9L, 10L, 10L, 10L,
>> 9L, 10L, 9L,
>> 10L, 10L)),
>> .Names =
c("month", "portal", "satisfaction"),
>> row.names = c(NA, -25L), class
= "data.frame"))
>> 
>> # Summarizing by month, portal with tapply
works:
>> 
>> > tapply(data$satisfaction, list(data$month, data$portal),
mean)
>> FALSE TRUE
>> 2012-12-01 8.5 8.000000
>> 2013-01-01 10.0
10.000000
>> 2013-02-01 9.0 9.545455
>> 
>> # Summarizing with 'by'
argument of data.table does not:
>> 
>> > data[, mean(satisfaction), by
= 'month,portal']>
>> data[, mean(satisfaction), by = list(month,
portal)]
>> month portal V1
>> 1: 2013-01-01 FALSE 10.000000
>> 2:
2013-02-01 TRUE 9.000000
>> 3: 2013-01-01 TRUE 10.000000
>> 4:
2012-12-01 FALSE 8.500000
>> 5: 2012-12-01 TRUE 7.333333
>> 6:
2013-02-01 TRUE 9.666667
>> 7: 2013-02-01 FALSE 9.000000
>> 8:
2012-12-01 TRUE 10.000000
>> 
>> # Summarizing only this year's data
works:
>> data[month >= ymd(20130101), mean(satisfaction), by =
'month,portal']
>> month portal V1
>> 1: 2013-01-01 TRUE 10.000000
>> 2:
2013-01-01 FALSE 10.000000
>> 3: 2013-02-01 TRUE 9.545455
>> 4:
2013-02-01 FALSE 9.000000
>> 
>> Yours Sincerely,
>> Victor Kryukov
>>
-------------- next part --------------
>> An HTML attachment was
scrubbed...
>> URL:
<http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html
[13]>
>> 
>> ------------------------------
>> 
>> Message: 3
>> Date:
Tue, 26 Feb 2013 00:39:09 +0000
>> From: Matthew Dowle
<mdowle at mdowle.plus.com [14]>
>> To: <statquant at outlook.com [15]>
>> Cc:
datatable-help at lists.r-forge.r-project.org [16]
>> Subject: Re:
[datatable-help] About adding fastmatch and fasttime to
>> data.table
>>
Message-ID: <aed96221d7d28ff8d77ea8823135b49a at imap.plus.net [17]>
>>
Content-Type: text/plain; charset="utf-8"
>> 
>> Hi,
>> 
>> This sounds
like a geat idea. I don't know why Simon U didn't
>> reply, or without
success, so that may depend on the way you asked,
>> whether he is on
holiday at the moment, his reaction to the precise
>> wording of the
email you wrote, or some other factor. It is difficult to
>> tell! But
we don't need to wait for him or for for you: this is open
>> source.
You have got much further than I have so if you'd like to add
>> this
please go ahead and make progress. You're very welcome to join the
>>
project and commit directly. Or if you can't for some reason please
file
>> as a feature request so it doesn't get forgotten.
>> 
>>
Matthew
>> 
>> On
>> 25.02.2013 18:40, stat quant wrote:
>> 
>> > Hello
list,
>> >
>> > Looking at
>> fastmatch and fasttime, I realized that
those package consists solely in
>> 1 C file (each).
>> > We spoke about
the possibility to add those to
>> data.table, I tried to contact
S.Urbanek without any success so I do not
>> have feedback from his
side.
>> > Using fastPOSIXct provide a huge gain
>> when one have to
load files with datetime, on my laptop using
>> data.table:::fread, I
realized that most of the time is spent casting
>> datetimes to POSIXct
(I have several columns).
>> >
>> > Looking at
>> fasttime, you can see
pretty good improvement (factor 15)
>> >
>> > R) ts R)
>> system.time(a
utilisateur syst?me ?coul?
>> > 6.49 0.04 6.57
>> > R)
>> system.time(b
utilisateur syst?me ?coul?
>> > 0.40 0.00 0.41
>> >
>> > When
>>
colClasses will be implemented in fread, can I suggest to allow using
>>
fasttime as an option ?
>> > Concerning fastmatch, the vignette
already
>> shows some nice benchmarks, I tend to do a lot of selects
based on
>> string columns, not sure if this is the case for most of
us.
>> >
>> > My
>> 0.002 cent
>> > Cheers
>> 
>> -------------- next
part --------------
>> An HTML attachment was scrubbed...
>> URL:
<http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html
[18]>
>> 
>> ------------------------------
>> 
>> Message: 4
>> Date:
Tue, 26 Feb 2013 00:40:02 +0000
>> From: Michael Nelson
<michael.nelson at sydney.edu.au [19]>
>> To:
"datatable-help at lists.r-forge.r-project.org [20]"
>>
<datatable-help at lists.r-forge.r-project.org [21]>
>> Subject: Re:
[datatable-help] Potential bug with sorting/summarizing
>> by POSIXct
and logical column
>> Message-ID:
>>
<6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au
[22]>
>> 
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> I
can't replicate this problem using data.table 1.8.7 (installed about 3
weeks ago) on
>> R version 2.15.2 (2012-10-26)
>> Platform:
i386-w64-mingw32/i386 (32-bit)
>> 
>> Michael
>>
________________________________
>> From:
datatable-help-bounces at lists.r-forge.r-project.org [23]
[datatable-help-bounces at lists.r-forge.r-project.org [24]] on behalf of
Victor Kryukov [victor.kryukov at gmail.com [25]]
>> Sent: Tuesday, 26
February 2013 9:26 AM
>> To: datatable-help at lists.r-forge.r-project.org
[26]
>> Subject: [datatable-help] Potential bug with sorting/summarizing
by POSIXct and logical column
>> 
>> Hello,
>> 
>> I've encounted what
looks like a bug while sorting by POSIXct and logical column, which may
or may not be related to the following bug:
>> 
>>
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
[27]
>> 
>> Here are all the details:
http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
[28]
>> 
>> Here is the test case:
>> 
>> # First some data
>> data
month = structure(c(1356998400, 1356998400, 1356998400,
>> 1359676800,
1354320000, 1359676800, 1359676800, 1356998400, 1356998400,
>>
1354320000, 1354320000, 1354320000, 1359676800, 1359676800,
1359676800,
>> 1356998400, 1359676800, 1359676800, 1356998400,
1359676800, 1359676800,
>> 1359676800, 1359676800, 1354320000,
1354320000), class = c("POSIXct",
>> "POSIXt"), tzone = "UTC"),
>>
portal = c(TRUE, TRUE, FALSE, TRUE,
>> TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE,
>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE
>> ),
>> satisfaction = c(10L, 10L, 10L,
9L, 10L, 10L, 9L, 10L, 10L,
>> 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L,
10L, 10L, 9L, 10L, 9L,
>> 10L, 10L)),
>> .Names = c("month", "portal",
"satisfaction"),
>> row.names = c(NA, -25L), class = "data.frame"))
>>

>> # Summarizing by month, portal with tapply works:
>> 
>> >
tapply(data$satisfaction, list(data$month, data$portal), mean)
>> FALSE
TRUE
>> 2012-12-01 8.5 8.000000
>> 2013-01-01 10.0 10.000000
>>
2013-02-01 9.0 9.545455
>> 
>> # Summarizing with 'by' argument of
data.table does not:
>> 
>> > data[, mean(satisfaction), by =
'month,portal']>
>> data[, mean(satisfaction), by = list(month,
portal)]
>> month portal V1
>> 1: 2013-01-01 FALSE 10.000000
>> 2:
2013-02-01 TRUE 9.000000
>> 3: 2013-01-01 TRUE 10.000000
>> 4:
2012-12-01 FALSE 8.500000
>> 5: 2012-12-01 TRUE 7.333333
>> 6:
2013-02-01 TRUE 9.666667
>> 7: 2013-02-01 FALSE 9.000000
>> 8:
2012-12-01 TRUE 10.000000
>> 
>> # Summarizing only this year's data
works:
>> data[month >= ymd(20130101), mean(satisfaction), by =
'month,portal']
>> month portal V1
>> 1: 2013-01-01 TRUE 10.000000
>> 2:
2013-01-01 FALSE 10.000000
>> 3: 2013-02-01 TRUE 9.545455
>> 4:
2013-02-01 FALSE 9.000000
>> 
>> Yours Sincerely,
>> Victor Kryukov
>>
-------------- next part --------------
>> An HTML attachment was
scrubbed...
>> URL:
<http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html
[29]>
>> 
>> ------------------------------
>> 
>>
_______________________________________________
>> datatable-help
mailing list
>> datatable-help at lists.r-forge.r-project.org [30]
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[31]
>> 
>> End of datatable-help Digest, Vol 36, Issue 8
>>
*********************************************

 

Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:datatable-help-request at lists.r-forge.r-project.org
[4]
mailto:datatable-help-owner at lists.r-forge.r-project.org
[5]
mailto:statquant at outlook.com
[6]
mailto:datatable-help at lists.r-forge.r-project.org
[7]
mailto:CAJJHHA9qL8hURXF0%2B8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA at mail.gmail.com
[8]
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html
[9]
mailto:victor.kryukov at gmail.com
[10]
mailto:datatable-help at lists.r-forge.r-project.org
[11]
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
[12]
http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
[13]
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html
[14]
mailto:mdowle at mdowle.plus.com
[15] mailto:statquant at outlook.com
[16]
mailto:datatable-help at lists.r-forge.r-project.org
[17]
mailto:aed96221d7d28ff8d77ea8823135b49a at imap.plus.net
[18]
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html
[19]
mailto:michael.nelson at sydney.edu.au
[20]
mailto:datatable-help at lists.r-forge.r-project.org
[21]
mailto:datatable-help at lists.r-forge.r-project.org
[22]
mailto:6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au
[23]
mailto:datatable-help-bounces at lists.r-forge.r-project.org
[24]
mailto:datatable-help-bounces at lists.r-forge.r-project.org
[25]
mailto:victor.kryukov at gmail.com
[26]
mailto:datatable-help at lists.r-forge.r-project.org
[27]
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
[28]
http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
[29]
http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html
[30]
mailto:datatable-help at lists.r-forge.r-project.org
[31]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[32]
mailto:datatable-help-request at lists.r-forge.r-project.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/5757a7e6/attachment-0001.html>


More information about the datatable-help mailing list