[datatable-help] dancing with alloc.col

Matthew Dowle mdowle at mdowle.plus.com
Wed Aug 8 12:42:27 CEST 2012


:)

When the column allocation is full, there's a formula to decide how much
to grow the allocation by. The check is there (iirc) to make sure that's
not growing the table too much. If you have 1 million columns, you
probably don't want to double that to 2 million, just to add 1 column. But
if you do, then use alloc.col first. That was the thinking. But that
thinking is biting in your case.

Simplest might be to downgrade the warning to a message when verbosity is
on, then.

In the meantime, does wrapping with suppressWarning() work around it for
now? Since in your case you know that over-allocating by more than 1000 is
appropriate.

    suppressWarnings(DT[,newcol:=])

Thanks for reporting. Interesting use case.

Matthew

> I'm running into this "truelength is greater than 1000 items
> over-allocated" warning/error as I use := to add columns to a data.frame,
> e.g.:
>
> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If you
> didn't set the datatable.alloccol option very large, please report this to
> datatable-help including the result of sessionInfo().
>
> The long preamble to this is a stackoverflow thread
> (http://stackoverflow.com/questions/10015544) in which I needed to update
> the contents of one data.table with the contents of another.
>
> The solution required the columns of both data.tables to match, hence my
> pre-processing loop to add columns to each data.table to satisfy the
> identical(names(dt1),names(dt2)) criteria. I may have to re-architect this
> depending on what is going on with this allocation business.
>
> If, for example, dt1 has 200 columns, and dt2 has 2000, and together they
> have 2100 unique columns, I'm going to add 1900 columns to dt1. If I set
> alloc.col to 2100 before my column-adding loop, I'll get slapped because
> 2100 is more than 1000 greater than the 200 columns present in dt1.
>
> So do I need to spoon-feed alloc.col? Every iteration through the loop set
> it to length(dt1)+1 before adding a column? That seems rather brutal.
> Alternatively checking for the delta between truelength and length, and
> how close that is to the magic 1000 number, and then only adjusting the
> setting seems fragile.
>
> I did try to make sense of the help for alloc.col. Regarding the bit about
> "if two or more variables are bound to the same data.table"; the column
> addition is within a function, and only one variable references the
> data.table, at least in the scope of the function. The function calling
> that function has a variable for the data.table too, so I don't know if
> that counts. Then there is mention of using copy (not sure how that helps,
> and BTW the hyperlink for copy goes to the page for setkey, which does
> mention copy, but suggests "See ?copy" which just conjures up the setkey
> page again), setting alloc.col, or changing datatable.alloccol (doesn't
> seem to help).
>
> The warning asked for sessionInfo; FWIW, here it is:
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] data.table_1.8.2
>
> Thanks
> George
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list