[datatable-help] internal FALSE/TRUE value has been modified
Matt Dowle
mdowle at mdowle.plus.com
Thu May 1 17:29:34 CEST 2014
Reproduced, thanks for nice example. Not sure yet but what R 3.1 now
does is store length 1 logical vectors once only, globally, for
efficiency to avoid many new allocations for the common case of single
TRUE or FALSE values passed around at C or R level (a nice and welcome
change). Since data.table modifies vectors by reference, if that
vector is length 1 a new data.table bug as from R 3.1 could be modifying
R's internal value of TRUE or FALSE whenever length 1 logical vectors
occur. Clearly a serious bug. The test suite immediately broke the day
after the R-devel change was made (good) and was one reason data.table
was in error state in CRAN checks for quite a while before R 3.1
shipped. It was typically tests of 1-row data.table's including a
logical column and modifying that logical column that broke. We fixed
that and put in checks to detect and warn if R's internal value has been
been modified, just in case. Those changes were in v1.9.2 on CRAN. I
think I wasn't 100% confident in the detection test (false positives) so
made it a warning instead of an error. Now that R 3.1 is out and we
haven't had any false positives, it should be an error.
The feature of this upc_table is that all the groups are size 1 :
> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 1
If we change the example so that one group has more than 1 row, it works
ok :
> upc_table = data.table(upc=c(1:99998,1,1), upc_ver_uc=rep(c(1,2),
times=50000), is_PL=rep(c(T, F, F, T), each=25000),
product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
[1] 2
> upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
upc_ver_uc)]
So it seems the problem is in the single allocation of working memory
for the largest group when that's just 1 and contains a logical column.
Odd, I would have sworn we caught that! Will fix.
R-devel are planning to do more of this small-object-sharing for common
single integer values e.g. 0-10, so we'll need to add more tests
accordingly.
Thanks,
Matt
On 01/05/14 05:40, James Sams wrote:
> I don't really know what this error message means. A quick example to
> show what I'm seeing:
>
> > library(data.table)
> data.table 1.9.3 For help type: help("data.table")
> > upc_table = data.table(upc=1:100000, upc_ver_uc=rep(c(1,2),
> times=50000), is_PL=rep(c(T, F, F, T), each=25000),
> product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
> upc_ver_uc)]
> Warning message:
> In `[.data.table`(upc_table, , list(is_PL, product_module_code), :
> internal TRUE value has been modified
>
> When I continue using R, I eventually start getting more errors, such as:
>
> Error in gettext(domain, unlist(args)) : invalid 'string' value
> Error during wrapup: invalid 'string' value
>
> and then terminal input/output becomes corrupted. I only start getting
> these error messages once I start using data.table; but the messages
> don't necessarily occur only with data.table functions.
>
> I don't know if the last statement above is executing correctly or
> not. I'm rather confused as to what is going on. I was using a
> somewhat stale (maybe a couple of weeks old) svn version of
> data.table; but I see the same behavior with the latest data.table
> (r1263). I'm using CRAN's R 3.1 package for Ubuntu on 13.10 and 14.04.
>
>
>
> > sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
> LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.9.3
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2
>
More information about the datatable-help
mailing list