[datatable-help] dancing with alloc.col
George.Kaupas at spansion.com
Wed Aug 8 20:17:39 CEST 2012
Thanks for the quick and patient response, as it was indeed my own fault.
In the interest of debugging I had set options(warn=3) as well as options(datatable.verbose=TRUE); setting warn=0 does indeed allow my code to run to satisfactory completion.
I have to stop ignoring the subtle (to me) messages R throws; in this case, "Error in ... (converted from warning)".
So a followup general R question is, if I use options(warn=0), my .Rout contains a line like "There were 46 warnings (use warnings() to see them)". If instead I wrap the := statements with suppressWarnings(), I don't get that. Is there a way to suppress the "There were n warnings" message?
From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
Sent: Wednesday, August 08, 2012 5:45 AM
To: Kaupas, George
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] dancing with alloc.col
Oh and since you're looping the := or set(), then options(warn=0) before the loop is probably faster than repeated calls to suppressWarnings().
> When the column allocation is full, there's a formula to decide how
> much to grow the allocation by. The check is there (iirc) to make sure
> that's not growing the table too much. If you have 1 million columns,
> you probably don't want to double that to 2 million, just to add 1
> column. But if you do, then use alloc.col first. That was the
> thinking. But that thinking is biting in your case.
> Simplest might be to downgrade the warning to a message when verbosity
> is on, then.
> In the meantime, does wrapping with suppressWarning() work around it
> for now? Since in your case you know that over-allocating by more than
> 1000 is appropriate.
> Thanks for reporting. Interesting use case.
>> I'm running into this "truelength is greater than 1000 items
>> over-allocated" warning/error as I use := to add columns to a
>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If
>> you didn't set the datatable.alloccol option very large, please
>> report this to datatable-help including the result of sessionInfo().
>> The long preamble to this is a stackoverflow thread
>> (http://stackoverflow.com/questions/10015544) in which I needed to
>> update the contents of one data.table with the contents of another.
>> The solution required the columns of both data.tables to match, hence
>> my pre-processing loop to add columns to each data.table to satisfy
>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect
>> this depending on what is going on with this allocation business.
>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together
>> they have 2100 unique columns, I'm going to add 1900 columns to dt1.
>> If I set alloc.col to 2100 before my column-adding loop, I'll get
>> slapped because
>> 2100 is more than 1000 greater than the 200 columns present in dt1.
>> So do I need to spoon-feed alloc.col? Every iteration through the
>> loop set it to length(dt1)+1 before adding a column? That seems
>> rather brutal.
>> Alternatively checking for the delta between truelength and length,
>> and how close that is to the magic 1000 number, and then only
>> adjusting the setting seems fragile.
>> I did try to make sense of the help for alloc.col. Regarding the bit
>> about "if two or more variables are bound to the same data.table";
>> the column addition is within a function, and only one variable
>> references the data.table, at least in the scope of the function. The
>> function calling that function has a variable for the data.table too,
>> so I don't know if that counts. Then there is mention of using copy
>> (not sure how that helps, and BTW the hyperlink for copy goes to the
>> page for setkey, which does mention copy, but suggests "See ?copy"
>> which just conjures up the setkey page again), setting alloc.col, or
>> changing datatable.alloccol (doesn't seem to help).
>> The warning asked for sessionInfo; FWIW, here it is:
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>  LC_PAPER=C LC_NAME=C
>>  LC_ADDRESS=C LC_TELEPHONE=C
>>  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> attached base packages:
>>  stats graphics grDevices utils datasets methods
>>  base
>> other attached packages:
>>  data.table_1.8.2
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
More information about the datatable-help