[datatable-help] Fail to add new columns within a function

Matt Dowle mdowle at mdowle.plus.com
Sun Dec 15 01:28:04 CET 2013


Hi,

This isn't a bug really.   A documentation or too low default issue maybe.

When all spare slots are used up, there is no choice but to make a 
shallow copy and create a new vector of column pointer slots. This is 
the pointer (address in RAM) which any variable names (symbols) point 
to.   When this happens, data.table does a reasonable job of changing 
the symbol in calling scope too,  but within a function within a 
function it's tricky.  In your function,  x is actually being updated by 
reference, but in local scope when the shallow copy happens ... when the 
spare slots are used up.

By default :

datatable.alloccol = quote(max(100L,ncol(DT)+64L))

Some people just change this to be a much larger number.  That's the 
easiest.  Just over-allocate massively :

options(datatable.alloccol = 10000)

If you have under 50 tables,  this won't matter a jot.   If you have 
1000's of tables, then the spare space could become significant.

Assuming 64bit,  10000 * 8bytes / 1024^2 = 78KB.   Knowing this allows 
you to choose the appropriate amount of over-allocation for your 
case.    50 tables * 78KB = 4MB = e.g. 0.01% of 32GB

Or,  if you know you are about to add a lot of columns by reference via 
a function,  you can increase the over-allocation of one table using the 
alloc.col function :

alloc.col(DT, 200)

In case the example was actually close to the real example,  you can add 
a lot of columns in one step and the LHS of := can be an expression :

DT[, paste0('a', 1:101) := 1]   # add 101 columns named "a1", "a2" ... 
"a101", all set to 1

and set() may be an easier alternative to := in this case,  now that it 
can add columns as from v1.8.11

If there is a real world example where it really needs to be wrapped in 
a function in a function then that would be needed to see (or an example 
closer to reality) to convince (me at least) that we need to do better here.

HTH,
Matt



On 14/12/13 13:10, Arunkumar Srinivasan wrote:
> Hi Huashan,
> Great reproducible example! Would you mind filing a bug report here 
> <https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975>?
> Thank you,
> Arun
>
> On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:
>
>> I just found out that when the column quota are reached, adding new 
>> columns
>> within a function will fail.
>>
>> Blow are the testing code:
>>
>> testF2=function(x){
>> add_var<-function(varname){
>> x[, `:=`(eval(substitute(varname)), 1), with=F]
>> }
>> sapply(paste0('a', 1:101), add_var)
>> }
>>
>> dd=data.table(a=1:3)
>> truelength(dd)
>> testF2(dd)
>> dim(dd) # only 100 columns
>>
>> dd[, new:=3]
>> dim(dd) # adding new column outside a function is OK.
>>
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
>> Sent from the datatable-help mailing list archive at Nabble.com 
>> <http://Nabble.com>.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org 
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131215/10ee9f1e/attachment.html>


More information about the datatable-help mailing list