[datatable-help] internal FALSE/TRUE value has been modified

Matt Dowle mdowle at mdowle.plus.com
Thu May 1 17:29:34 CEST 2014


Reproduced, thanks for nice example. Not sure yet but what R 3.1 now 
does is store length 1 logical vectors once only, globally, for 
efficiency to avoid many new allocations for the common case of single 
TRUE or FALSE values passed around at C or R level (a nice and welcome 
change).  Since data.table modifies vectors by reference,  if that 
vector is length 1 a new data.table bug as from R 3.1 could be modifying 
R's internal value of TRUE or FALSE whenever length 1 logical vectors 
occur. Clearly a serious bug. The test suite immediately broke the day 
after the R-devel change was made (good) and was one reason data.table 
was in error state in CRAN checks for quite a while before R 3.1 
shipped.  It was typically tests of 1-row data.table's including a 
logical column and modifying that logical column that broke. We fixed 
that and put in checks to detect and warn if R's internal value has been 
been modified, just in case.  Those changes were in v1.9.2 on CRAN.  I 
think I wasn't 100% confident in the detection test (false positives) so 
made it a warning instead of an error.  Now that R 3.1 is out and we 
haven't had any false positives, it should be an error.

The feature of this upc_table is that all the groups are size 1 :

 > upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 1

If we change the example so that one group has more than 1 row, it works 
ok :

 > upc_table = data.table(upc=c(1:99998,1,1), upc_ver_uc=rep(c(1,2), 
times=50000), is_PL=rep(c(T, F, F, T), each=25000), 
product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
 > upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 2
 > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc, 
upc_ver_uc)]

So it seems the problem is in the single allocation of working memory 
for the largest group when that's just 1 and contains a logical column.  
Odd, I would have sworn we caught that! Will fix.

R-devel are planning to do more of this small-object-sharing for common 
single integer values e.g. 0-10,  so we'll need to add more tests 
accordingly.

Thanks,
Matt



On 01/05/14 05:40, James Sams wrote:
> I don't really know what this error message means. A quick example to 
> show what I'm seeing:
>
> > library(data.table)
> data.table 1.9.3  For help type: help("data.table")
> > upc_table = data.table(upc=1:100000, upc_ver_uc=rep(c(1,2), 
> times=50000), is_PL=rep(c(T, F, F, T), each=25000), 
> product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc, 
> upc_ver_uc)]
> Warning message:
> In `[.data.table`(upc_table, , list(is_PL, product_module_code), :
>   internal TRUE value has been modified
>
> When I continue using R, I eventually start getting more errors, such as:
>
> Error in gettext(domain, unlist(args)) : invalid 'string' value
> Error during wrapup: invalid 'string' value
>
> and then terminal input/output becomes corrupted. I only start getting 
> these error messages once I start using data.table; but the messages 
> don't necessarily occur only with data.table functions.
>
> I don't know if the last statement above is executing correctly or 
> not. I'm rather confused as to what is going on. I was using a 
> somewhat stale (maybe a couple of weeks old) svn version of 
> data.table; but I see the same behavior with the latest data.table 
> (r1263). I'm using CRAN's R 3.1 package for Ubuntu on 13.10 and 14.04.
>
>
>
> > sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 
> LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C LC_ADDRESS=C               
> LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
> [1] data.table_1.9.3
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  stringr_0.6.2
>



More information about the datatable-help mailing list