[datatable-help] segfault with 1.8.3 when i is keyed, by is used and mult='first' or 'last'

Matthew Dowle mdowle at mdowle.plus.com
Tue Oct 9 16:38:12 CEST 2012


Hi Garrett,

That looks reproducible to me. Good stuff. Anything that can be pasted
into a fresh R session, or a file that can be sourced, classes as
reproducible. Sometimes it isn't possible so it's just the best you can
do. If you really think someone can guess at the answer without a
reproducible example, then that's ok - just say that's what you're doing.

This one is great. And you've saved me quite a bit of time by testing in
1.8.2 vs 1.8.3, so that narrows it down.

Please file a bug.report(package="data.table") so it doesn't get
forgotten. It might take me some time to get to it.

Matthew

> Hello,
>
> Since I am brand new to data.table and do not know the syntax very
> well, I have an advantage in finding segfaults.
>
> Before I get to that, I have a basic question about reproducibility.
>
> How do I provide reproducible data?  If I use `dput`, then `dget` or
> `source` will give an error like the one that I showed yesterday that
> svSocket::evalServer gets:
>
>     > dget('~/tmp/v.dput')
>     Error in parse(file = file) : ~/tmp/v.dput:151:35: unexpected '<'
>     150: ), class = c("data.table", "data.frame"), sorted = c("Symbol",
>     151: "TimeStamp"), .internal.selfref = <
>                                       ^
> ------
> I could use something like `dput(as.data.frame(my.data))`, but that
> isn't quite the same and would require me to reset keys and probably
> do other things that I do not understand yet.
>
> For this e-mail, I attached an RData file which may get stripped by
> the mail server (but I'll send directly to Matthew to make sure at
> least someone gets it).  If the attachment is stripped, you can create
> very similar data like this (feel free to scold me for bad code here):
>
> ##install.packages("TrueFX", repos='http://r-forge.r-project.org')
> library(TrueFX)
> x <- data.table(QueryTrueFX(), key="Symbol,TimeStamp")
> v <- x
> endtime <- Sys.time() + 1
> while(Sys.time() < endtime) {
>   v <- data.table(rbindlist(list(v, QueryTrueFX())),
> key='Symbol,TimeStamp')
> }
> #save(v, file='~/tmp/v.RData')
>
> ## If the above isn't sufficient to give a segfault then try letting
> it run for longer; eg. endtime <- Sys.time() + 10
> ------
>
> The data.table looks like this:
>> load('~/tmp/v.RData')
>> v
>       Symbol Bid.Price Ask.Price     High      Low               TimeStamp
>   1: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
>   2: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
>   3: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
>   4: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
>   5: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
>  ---
> 146: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
> 147: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
> 148: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
> 149: USD/JPY  78.36700  78.37200 78.37400 78.27800 2012-10-09 00:16:29.650
> 150: USD/JPY  78.36700  78.37200 78.37400 78.27800 2012-10-09 00:16:29.650
>
>> tables()
>      NAME NROW MB COLS
> [1,] v     150 1  Symbol,Bid.Price,Ask.Price,High,Low,TimeStamp
>      KEY
> [1,] Symbol,TimeStamp
> Total: 1MB
>
>> str(v)
> Classes ‘data.table’ and 'data.frame':	150 obs. of  6 variables:
>  $ Symbol   : chr  "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
>  $ Bid.Price: num  1.02 1.02 1.02 1.02 1.02 ...
>  $ Ask.Price: num  1.02 1.02 1.02 1.02 1.02 ...
>  $ High     : num  1.02 1.02 1.02 1.02 1.02 ...
>  $ Low      : num  1.02 1.02 1.02 1.02 1.02 ...
>  $ TimeStamp: POSIXct, format: "2012-10-09 00:16:28" "2012-10-09 00:16:28"
> ...
>  - attr(*, "sorted")= chr  "Symbol" "TimeStamp"
>  - attr(*, ".internal.selfref")=<externalptr>
>
>
> I now know that I can get the last value of the Bid.Price of EUR/USD
> for each TimeStamp like this:
>
>> v['EUR/USD', last(Bid.Price), by=TimeStamp]
>                  TimeStamp      V1
> 1: 2012-10-09 00:16:28.381 1.29831
> 2: 2012-10-09 00:16:29.625 1.29831
> 3: 2012-10-09 00:16:29.750 1.29831
>
> However, before I tried that, I tried this which will cause a segfault
> with data.table R-Forge Rev. 655, but not with CRAN v1.8.2.
> # v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']
>
> ------------------------------------------------------------------------------------------------
>> v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']
>
>  *** caught segfault ***
> address 0x104d3e7e0, cause 'memory not mapped'
>
> Traceback:
>  1: `[.data.table`(v, "EUR/USD", Bid.Price, by = TimeStamp, mult = "last")
>  2: v["EUR/USD", Bid.Price, by = TimeStamp, mult = "last"]
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>
> #------------------------------------------------------------------------------
>> v["EUR/USD", Bid.Price, keyby='TimeStamp', mult='last']
>
>  *** caught segfault ***
> address 0x1079639e0, cause 'memory not mapped'
>
> Traceback:
>  1: `[.data.table`(v, "EUR/USD", Bid.Price, keyby = "TimeStamp",
> mult = "last")
>  2: v["EUR/USD", Bid.Price, keyby = "TimeStamp", mult = "last"]
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>
> ---------------------------------------------------------------------------------------------
> # This does NOT segfault
> #v[Symbol == 'EUR/USD', Bid.Price, by=TimeStamp, mult='last']
> ---------------------------------------------------------------------------------------------
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] data.table_1.8.3
>
> ----------------------------------------------------------------------------------------------
>
> Thank you for your time,
> Garrett
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list