[datatable-help] ":=" with "by" reassignment/updation + adding new column leads to crash

Matthew Dowle mdowle at mdowle.plus.com
Mon May 13 03:31:23 CEST 2013


 

Hi, 

Now fixed in v1.8.9 : 

o Mixing adding and updating into one
DT[, `:=`(existingCol=...,newCol=...), by=...] now works
 without error
or segfault, #2778 and #2528. Many thanks to Arunkumar Srinivasan for
reporting
 both with reproducible examples. Tests added.

Matthew 

On
12.05.2013 11:44, Matthew Dowle wrote: 

> Hi, 
> 
> Yes I get that in
latest dev too. Thanks for the nice example, please file. 
> 
> Matthew

> 
> On 12.05.2013 08:53, Arunkumar Srinivasan wrote: 
> 
>> Hi, 
>> I
just discovered some weird R-session crash in data.table. Here's an
example to reproduce the crash. I did not find any bug filed regarding
this issue. Maybe others can verify this? Then I'll file it as a bug.

>> The issue is this. Suppose you've a data.table with two columns x
and y as follows: 
>> require(data.table) 
>> DT <- data.table(x =
rep(1:2, c(3,2)), y = 6:10) 
>> 
>> x y 
>> 1: 1 6 
>> 2: 1 7 
>> 3: 1 8

>> 4: 2 9 
>> 5: 2 10 
>> Now you want to add a new column "z" by
reference grouped by "x". So, you'd do: 
>> 
>> DT[, `:=`(z = .GRP), by
= x] 
>> 
>> x y z 
>> 1: 1 6 1 
>> 2: 1 7 1 
>> 3: 1 8 1 
>> 4: 2 9 2

>> 5: 2 10 2 
>> Now, for the sake of producing this error, assume that
you assigned "z" the wrong value and that you want to change it. But,
also you realised that you want to add another column "w" as well. So,
you go ahead and do (remember to do the previous step and then this
one): 
>> DT[, `:=`(z = .N, w = 2), by = x] # R session crashes 
>>
Here, both R and Rstudio session crashes with the traceback message: 
>>

>> *** caught segfault *** 
>> address 0x0, cause 'memory not mapped'

>> Traceback: 
>> 1: `[.data.table`(DT, , `:=`(z = .GRP, w = 2), by =
x) 
>> 2: DT[, `:=`(z = .GRP, w = 2), by = x] 
>> This on the other hand
works as expected if you assign both columns the first time. 
>> 
>>
require(data.table) 
>> DT <- data.table(x = rep(1:2, c(3,2)), y = 6:10)

>> DT[, `:=`(z = .GRP, w = 2), by = x] # works fine 
>> That is, if you
assign by reference (:=) with "by" and re-assign a variable while also
creating another variable, there seems to be a segfault. This error may
not be limited to this case, but that I've just tested this. 
>> Here's
my sessionInfo() from before the crash: 
>> 
>> R version 3.0.0
(2013-04-03) 
>> Platform: x86_64-apple-darwin10.8.0 (64-bit) 
>>
locale: 
>> [1]
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 
>>
attached base packages: 
>> [1] stats graphics grDevices utils datasets
methods base 
>> other attached packages: 
>> [1] data.table_1.8.8 
>>
loaded via a namespace (and not attached): 
>> [1] tools_3.0.0 
>> Best,

>> Arun

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130513/55e257a5/attachment.html>


More information about the datatable-help mailing list