[datatable-help] segfault with 1.8.3 when i is keyed, by is used and mult='first' or 'last'

G See gsee000 at gmail.com
Tue Oct 9 16:05:07 CEST 2012


Hello,

Since I am brand new to data.table and do not know the syntax very
well, I have an advantage in finding segfaults.

Before I get to that, I have a basic question about reproducibility.

How do I provide reproducible data?  If I use `dput`, then `dget` or
`source` will give an error like the one that I showed yesterday that
svSocket::evalServer gets:

    > dget('~/tmp/v.dput')
    Error in parse(file = file) : ~/tmp/v.dput:151:35: unexpected '<'
    150: ), class = c("data.table", "data.frame"), sorted = c("Symbol",
    151: "TimeStamp"), .internal.selfref = <
                                      ^
------
I could use something like `dput(as.data.frame(my.data))`, but that
isn't quite the same and would require me to reset keys and probably
do other things that I do not understand yet.

For this e-mail, I attached an RData file which may get stripped by
the mail server (but I'll send directly to Matthew to make sure at
least someone gets it).  If the attachment is stripped, you can create
very similar data like this (feel free to scold me for bad code here):

##install.packages("TrueFX", repos='http://r-forge.r-project.org')
library(TrueFX)
x <- data.table(QueryTrueFX(), key="Symbol,TimeStamp")
v <- x
endtime <- Sys.time() + 1
while(Sys.time() < endtime) {
  v <- data.table(rbindlist(list(v, QueryTrueFX())), key='Symbol,TimeStamp')
}
#save(v, file='~/tmp/v.RData')

## If the above isn't sufficient to give a segfault then try letting
it run for longer; eg. endtime <- Sys.time() + 10
------

The data.table looks like this:
> load('~/tmp/v.RData')
> v
      Symbol Bid.Price Ask.Price     High      Low               TimeStamp
  1: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
  2: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
  3: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
  4: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
  5: AUD/USD   1.02255   1.02262  1.02277  1.01868 2012-10-09 00:16:28.273
 ---
146: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
147: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
148: USD/JPY  78.36700  78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
149: USD/JPY  78.36700  78.37200 78.37400 78.27800 2012-10-09 00:16:29.650
150: USD/JPY  78.36700  78.37200 78.37400 78.27800 2012-10-09 00:16:29.650

> tables()
     NAME NROW MB COLS
[1,] v     150 1  Symbol,Bid.Price,Ask.Price,High,Low,TimeStamp
     KEY
[1,] Symbol,TimeStamp
Total: 1MB

> str(v)
Classes ‘data.table’ and 'data.frame':	150 obs. of  6 variables:
 $ Symbol   : chr  "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
 $ Bid.Price: num  1.02 1.02 1.02 1.02 1.02 ...
 $ Ask.Price: num  1.02 1.02 1.02 1.02 1.02 ...
 $ High     : num  1.02 1.02 1.02 1.02 1.02 ...
 $ Low      : num  1.02 1.02 1.02 1.02 1.02 ...
 $ TimeStamp: POSIXct, format: "2012-10-09 00:16:28" "2012-10-09 00:16:28" ...
 - attr(*, "sorted")= chr  "Symbol" "TimeStamp"
 - attr(*, ".internal.selfref")=<externalptr>


I now know that I can get the last value of the Bid.Price of EUR/USD
for each TimeStamp like this:

> v['EUR/USD', last(Bid.Price), by=TimeStamp]
                 TimeStamp      V1
1: 2012-10-09 00:16:28.381 1.29831
2: 2012-10-09 00:16:29.625 1.29831
3: 2012-10-09 00:16:29.750 1.29831

However, before I tried that, I tried this which will cause a segfault
with data.table R-Forge Rev. 655, but not with CRAN v1.8.2.
# v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']

------------------------------------------------------------------------------------------------
> v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']

 *** caught segfault ***
address 0x104d3e7e0, cause 'memory not mapped'

Traceback:
 1: `[.data.table`(v, "EUR/USD", Bid.Price, by = TimeStamp, mult = "last")
 2: v["EUR/USD", Bid.Price, by = TimeStamp, mult = "last"]

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

#------------------------------------------------------------------------------
> v["EUR/USD", Bid.Price, keyby='TimeStamp', mult='last']

 *** caught segfault ***
address 0x1079639e0, cause 'memory not mapped'

Traceback:
 1: `[.data.table`(v, "EUR/USD", Bid.Price, keyby = "TimeStamp",
mult = "last")
 2: v["EUR/USD", Bid.Price, keyby = "TimeStamp", mult = "last"]

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

---------------------------------------------------------------------------------------------
# This does NOT segfault
#v[Symbol == 'EUR/USD', Bid.Price, by=TimeStamp, mult='last']
---------------------------------------------------------------------------------------------

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.3

----------------------------------------------------------------------------------------------

Thank you for your time,
Garrett
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v.RData
Type: application/octet-stream
Size: 912 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121009/a0003ea5/attachment.obj>


More information about the datatable-help mailing list