[datatable-help] internal FALSE/TRUE value has been modified
Matt Dowle
mdowle at mdowle.plus.com
Fri Jun 6 02:40:52 CEST 2014
Now fixed in v1.9.3 :
o The warning "internal TRUE value has been modified" with recently
released R 3.1
when grouping a table containing a logical column and where all
groups are just 1 row
is now fixed and tests added. Thanks to James Sams for the
reproducible example.
The warning is issued by R and we have asked if it can be upgraded
to error.
Matt
On 01/05/14 16:29, Matt Dowle wrote:
>
> Reproduced, thanks for nice example. Not sure yet but what R 3.1 now
> does is store length 1 logical vectors once only, globally, for
> efficiency to avoid many new allocations for the common case of single
> TRUE or FALSE values passed around at C or R level (a nice and welcome
> change). Since data.table modifies vectors by reference, if that
> vector is length 1 a new data.table bug as from R 3.1 could be
> modifying R's internal value of TRUE or FALSE whenever length 1
> logical vectors occur. Clearly a serious bug. The test suite
> immediately broke the day after the R-devel change was made (good) and
> was one reason data.table was in error state in CRAN checks for quite
> a while before R 3.1 shipped. It was typically tests of 1-row
> data.table's including a logical column and modifying that logical
> column that broke. We fixed that and put in checks to detect and warn
> if R's internal value has been been modified, just in case. Those
> changes were in v1.9.2 on CRAN. I think I wasn't 100% confident in
> the detection test (false positives) so made it a warning instead of
> an error. Now that R 3.1 is out and we haven't had any false
> positives, it should be an error.
>
> The feature of this upc_table is that all the groups are size 1 :
>
> > upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
> [1] 1
>
> If we change the example so that one group has more than 1 row, it
> works ok :
>
> > upc_table = data.table(upc=c(1:99998,1,1), upc_ver_uc=rep(c(1,2),
> times=50000), is_PL=rep(c(T, F, F, T), each=25000),
> product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
> > upc_table[, .N, by=list(upc, upc_ver_uc)][,max(N)]
> [1] 2
> > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
> upc_ver_uc)]
>
> So it seems the problem is in the single allocation of working memory
> for the largest group when that's just 1 and contains a logical
> column. Odd, I would have sworn we caught that! Will fix.
>
> R-devel are planning to do more of this small-object-sharing for
> common single integer values e.g. 0-10, so we'll need to add more
> tests accordingly.
>
> Thanks,
> Matt
>
>
>
> On 01/05/14 05:40, James Sams wrote:
>> I don't really know what this error message means. A quick example to
>> show what I'm seeing:
>>
>> > library(data.table)
>> data.table 1.9.3 For help type: help("data.table")
>> > upc_table = data.table(upc=1:100000, upc_ver_uc=rep(c(1,2),
>> times=50000), is_PL=rep(c(T, F, F, T), each=25000),
>> product_module_code=rep(1:4, times=25000), ignore.column=2:100001)
>> > upc = upc_table[, list(is_PL, product_module_code), keyby=list(upc,
>> upc_ver_uc)]
>> Warning message:
>> In `[.data.table`(upc_table, , list(is_PL, product_module_code), :
>> internal TRUE value has been modified
>>
>> When I continue using R, I eventually start getting more errors, such
>> as:
>>
>> Error in gettext(domain, unlist(args)) : invalid 'string' value
>> Error during wrapup: invalid 'string' value
>>
>> and then terminal input/output becomes corrupted. I only start
>> getting these error messages once I start using data.table; but the
>> messages don't necessarily occur only with data.table functions.
>>
>> I don't know if the last statement above is executing correctly or
>> not. I'm rather confused as to what is going on. I was using a
>> somewhat stale (maybe a couple of weeks old) svn version of
>> data.table; but I see the same behavior with the latest data.table
>> (r1263). I'm using CRAN's R 3.1 package for Ubuntu on 13.10 and 14.04.
>>
>>
>>
>> > sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
>> LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] data.table_1.9.3
>>
>> loaded via a namespace (and not attached):
>> [1] plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2
>>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
More information about the datatable-help
mailing list