[datatable-help] segfault with 1.8.3 when i is keyed, by is used and mult='first' or 'last'
G See
gsee000 at gmail.com
Tue Oct 9 16:05:07 CEST 2012
Hello,
Since I am brand new to data.table and do not know the syntax very
well, I have an advantage in finding segfaults.
Before I get to that, I have a basic question about reproducibility.
How do I provide reproducible data? If I use `dput`, then `dget` or
`source` will give an error like the one that I showed yesterday that
svSocket::evalServer gets:
> dget('~/tmp/v.dput')
Error in parse(file = file) : ~/tmp/v.dput:151:35: unexpected '<'
150: ), class = c("data.table", "data.frame"), sorted = c("Symbol",
151: "TimeStamp"), .internal.selfref = <
^
------
I could use something like `dput(as.data.frame(my.data))`, but that
isn't quite the same and would require me to reset keys and probably
do other things that I do not understand yet.
For this e-mail, I attached an RData file which may get stripped by
the mail server (but I'll send directly to Matthew to make sure at
least someone gets it). If the attachment is stripped, you can create
very similar data like this (feel free to scold me for bad code here):
##install.packages("TrueFX", repos='http://r-forge.r-project.org')
library(TrueFX)
x <- data.table(QueryTrueFX(), key="Symbol,TimeStamp")
v <- x
endtime <- Sys.time() + 1
while(Sys.time() < endtime) {
v <- data.table(rbindlist(list(v, QueryTrueFX())), key='Symbol,TimeStamp')
}
#save(v, file='~/tmp/v.RData')
## If the above isn't sufficient to give a segfault then try letting
it run for longer; eg. endtime <- Sys.time() + 10
------
The data.table looks like this:
> load('~/tmp/v.RData')
> v
Symbol Bid.Price Ask.Price High Low TimeStamp
1: AUD/USD 1.02255 1.02262 1.02277 1.01868 2012-10-09 00:16:28.273
2: AUD/USD 1.02255 1.02262 1.02277 1.01868 2012-10-09 00:16:28.273
3: AUD/USD 1.02255 1.02262 1.02277 1.01868 2012-10-09 00:16:28.273
4: AUD/USD 1.02255 1.02262 1.02277 1.01868 2012-10-09 00:16:28.273
5: AUD/USD 1.02255 1.02262 1.02277 1.01868 2012-10-09 00:16:28.273
---
146: USD/JPY 78.36700 78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
147: USD/JPY 78.36700 78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
148: USD/JPY 78.36700 78.37100 78.37400 78.27800 2012-10-09 00:16:28.808
149: USD/JPY 78.36700 78.37200 78.37400 78.27800 2012-10-09 00:16:29.650
150: USD/JPY 78.36700 78.37200 78.37400 78.27800 2012-10-09 00:16:29.650
> tables()
NAME NROW MB COLS
[1,] v 150 1 Symbol,Bid.Price,Ask.Price,High,Low,TimeStamp
KEY
[1,] Symbol,TimeStamp
Total: 1MB
> str(v)
Classes ‘data.table’ and 'data.frame': 150 obs. of 6 variables:
$ Symbol : chr "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
$ Bid.Price: num 1.02 1.02 1.02 1.02 1.02 ...
$ Ask.Price: num 1.02 1.02 1.02 1.02 1.02 ...
$ High : num 1.02 1.02 1.02 1.02 1.02 ...
$ Low : num 1.02 1.02 1.02 1.02 1.02 ...
$ TimeStamp: POSIXct, format: "2012-10-09 00:16:28" "2012-10-09 00:16:28" ...
- attr(*, "sorted")= chr "Symbol" "TimeStamp"
- attr(*, ".internal.selfref")=<externalptr>
I now know that I can get the last value of the Bid.Price of EUR/USD
for each TimeStamp like this:
> v['EUR/USD', last(Bid.Price), by=TimeStamp]
TimeStamp V1
1: 2012-10-09 00:16:28.381 1.29831
2: 2012-10-09 00:16:29.625 1.29831
3: 2012-10-09 00:16:29.750 1.29831
However, before I tried that, I tried this which will cause a segfault
with data.table R-Forge Rev. 655, but not with CRAN v1.8.2.
# v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']
------------------------------------------------------------------------------------------------
> v['EUR/USD', Bid.Price, by=TimeStamp, mult='last']
*** caught segfault ***
address 0x104d3e7e0, cause 'memory not mapped'
Traceback:
1: `[.data.table`(v, "EUR/USD", Bid.Price, by = TimeStamp, mult = "last")
2: v["EUR/USD", Bid.Price, by = TimeStamp, mult = "last"]
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
#------------------------------------------------------------------------------
> v["EUR/USD", Bid.Price, keyby='TimeStamp', mult='last']
*** caught segfault ***
address 0x1079639e0, cause 'memory not mapped'
Traceback:
1: `[.data.table`(v, "EUR/USD", Bid.Price, keyby = "TimeStamp",
mult = "last")
2: v["EUR/USD", Bid.Price, keyby = "TimeStamp", mult = "last"]
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
---------------------------------------------------------------------------------------------
# This does NOT segfault
#v[Symbol == 'EUR/USD', Bid.Price, by=TimeStamp, mult='last']
---------------------------------------------------------------------------------------------
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.3
----------------------------------------------------------------------------------------------
Thank you for your time,
Garrett
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v.RData
Type: application/octet-stream
Size: 912 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121009/a0003ea5/attachment.obj>
More information about the datatable-help
mailing list