[datatable-help] Fail to add new columns within a function
Matt Dowle
mdowle at mdowle.plus.com
Sun Dec 15 01:28:04 CET 2013
Hi,
This isn't a bug really. A documentation or too low default issue maybe.
When all spare slots are used up, there is no choice but to make a
shallow copy and create a new vector of column pointer slots. This is
the pointer (address in RAM) which any variable names (symbols) point
to. When this happens, data.table does a reasonable job of changing
the symbol in calling scope too, but within a function within a
function it's tricky. In your function, x is actually being updated by
reference, but in local scope when the shallow copy happens ... when the
spare slots are used up.
By default :
datatable.alloccol = quote(max(100L,ncol(DT)+64L))
Some people just change this to be a much larger number. That's the
easiest. Just over-allocate massively :
options(datatable.alloccol = 10000)
If you have under 50 tables, this won't matter a jot. If you have
1000's of tables, then the spare space could become significant.
Assuming 64bit, 10000 * 8bytes / 1024^2 = 78KB. Knowing this allows
you to choose the appropriate amount of over-allocation for your
case. 50 tables * 78KB = 4MB = e.g. 0.01% of 32GB
Or, if you know you are about to add a lot of columns by reference via
a function, you can increase the over-allocation of one table using the
alloc.col function :
alloc.col(DT, 200)
In case the example was actually close to the real example, you can add
a lot of columns in one step and the LHS of := can be an expression :
DT[, paste0('a', 1:101) := 1] # add 101 columns named "a1", "a2" ...
"a101", all set to 1
and set() may be an easier alternative to := in this case, now that it
can add columns as from v1.8.11
If there is a real world example where it really needs to be wrapped in
a function in a function then that would be needed to see (or an example
closer to reality) to convince (me at least) that we need to do better here.
HTH,
Matt
On 14/12/13 13:10, Arunkumar Srinivasan wrote:
> Hi Huashan,
> Great reproducible example! Would you mind filing a bug report here
> <https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975>?
> Thank you,
> Arun
>
> On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:
>
>> I just found out that when the column quota are reached, adding new
>> columns
>> within a function will fail.
>>
>> Blow are the testing code:
>>
>> testF2=function(x){
>> add_var<-function(varname){
>> x[, `:=`(eval(substitute(varname)), 1), with=F]
>> }
>> sapply(paste0('a', 1:101), add_var)
>> }
>>
>> dd=data.table(a=1:3)
>> truelength(dd)
>> testF2(dd)
>> dim(dd) # only 100 columns
>>
>> dd[, new:=3]
>> dim(dd) # adding new column outside a function is OK.
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
>> Sent from the datatable-help mailing list archive at Nabble.com
>> <http://Nabble.com>.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131215/10ee9f1e/attachment.html>
More information about the datatable-help
mailing list