[datatable-help] dancing with alloc.col

Kaupas, George George.Kaupas at spansion.com
Tue Aug 7 22:24:56 CEST 2012


I'm running into this "truelength is greater than 1000 items over-allocated" warning/error as I use := to add columns to a data.frame, e.g.:

tl (1346) is greater than 1000 items over-allocated (ncol = 308). If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().

The long preamble to this is a stackoverflow thread (http://stackoverflow.com/questions/10015544) in which I needed to update the contents of one data.table with the contents of another.

The solution required the columns of both data.tables to match, hence my pre-processing loop to add columns to each data.table to satisfy the identical(names(dt1),names(dt2)) criteria. I may have to re-architect this depending on what is going on with this allocation business.

If, for example, dt1 has 200 columns, and dt2 has 2000, and together they have 2100 unique columns, I'm going to add 1900 columns to dt1. If I set alloc.col to 2100 before my column-adding loop, I'll get slapped because 2100 is more than 1000 greater than the 200 columns present in dt1.

So do I need to spoon-feed alloc.col? Every iteration through the loop set it to length(dt1)+1 before adding a column? That seems rather brutal. Alternatively checking for the delta between truelength and length, and how close that is to the magic 1000 number, and then only adjusting the setting seems fragile.

I did try to make sense of the help for alloc.col. Regarding the bit about "if two or more variables are bound to the same data.table"; the column addition is within a function, and only one variable references the data.table, at least in the scope of the function. The function calling that function has a variable for the data.table too, so I don't know if that counts. Then there is mention of using copy (not sure how that helps, and BTW the hyperlink for copy goes to the page for setkey, which does mention copy, but suggests "See ?copy" which just conjures up the setkey page again), setting alloc.col, or changing datatable.alloccol (doesn't seem to help).

The warning asked for sessionInfo; FWIW, here it is:

R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C                 LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

other attached packages:
[1] data.table_1.8.2

Thanks
George

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120807/deaa7466/attachment-0001.html>


More information about the datatable-help mailing list