[datatable-help] dancing with alloc.col
Matthew Dowle
mdowle at mdowle.plus.com
Thu Aug 9 00:57:36 CEST 2012
> Thanks for the quick and patient response, as it was indeed my own fault.
Hardly your fault. The 1000 over-allocation check hasn't come up before
and it isn't documented.
> In the interest of debugging I had set options(warn=3)
Ah, that makes sense. Although 3 is the same as 2 afaik; i.e., 2 or larger
means warnings are turned into errors.
> as well as
> options(datatable.verbose=TRUE); setting warn=0 does indeed allow my code
> to run to satisfactory completion.
Great, glad the workaround works. I'll still look at downgrading or
removing that warning.
>
> I have to stop ignoring the subtle (to me) messages R throws; in this
> case, "Error in ... (converted from warning)".
>
> So a followup general R question is, if I use options(warn=0), my .Rout
> contains a line like "There were 46 warnings (use warnings() to see
> them)". If instead I wrap the := statements with suppressWarnings(), I
> don't get that. Is there a way to suppress the "There were n warnings"
> message?
Oops, I meant oldwarn=options(warn=-1) (or any negative value according to
?options) before the loop, to ignore the warnings. Then after the loop
setback to the old value: options(warn=oldwarn).
>
> -----Original Message-----
> From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
> Sent: Wednesday, August 08, 2012 5:45 AM
> To: Kaupas, George
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] dancing with alloc.col
>
>
> Oh and since you're looping the := or set(), then options(warn=0) before
> the loop is probably faster than repeated calls to suppressWarnings().
>
>>
>> :)
>>
>> When the column allocation is full, there's a formula to decide how
>> much to grow the allocation by. The check is there (iirc) to make sure
>> that's not growing the table too much. If you have 1 million columns,
>> you probably don't want to double that to 2 million, just to add 1
>> column. But if you do, then use alloc.col first. That was the
>> thinking. But that thinking is biting in your case.
>>
>> Simplest might be to downgrade the warning to a message when verbosity
>> is on, then.
>>
>> In the meantime, does wrapping with suppressWarning() work around it
>> for now? Since in your case you know that over-allocating by more than
>> 1000 is appropriate.
>>
>> suppressWarnings(DT[,newcol:=])
>>
>> Thanks for reporting. Interesting use case.
>>
>> Matthew
>>
>>> I'm running into this "truelength is greater than 1000 items
>>> over-allocated" warning/error as I use := to add columns to a
>>> data.frame,
>>> e.g.:
>>>
>>> tl (1346) is greater than 1000 items over-allocated (ncol = 308). If
>>> you didn't set the datatable.alloccol option very large, please
>>> report this to datatable-help including the result of sessionInfo().
>>>
>>> The long preamble to this is a stackoverflow thread
>>> (http://stackoverflow.com/questions/10015544) in which I needed to
>>> update the contents of one data.table with the contents of another.
>>>
>>> The solution required the columns of both data.tables to match, hence
>>> my pre-processing loop to add columns to each data.table to satisfy
>>> the
>>> identical(names(dt1),names(dt2)) criteria. I may have to re-architect
>>> this depending on what is going on with this allocation business.
>>>
>>> If, for example, dt1 has 200 columns, and dt2 has 2000, and together
>>> they have 2100 unique columns, I'm going to add 1900 columns to dt1.
>>> If I set alloc.col to 2100 before my column-adding loop, I'll get
>>> slapped because
>>> 2100 is more than 1000 greater than the 200 columns present in dt1.
>>>
>>> So do I need to spoon-feed alloc.col? Every iteration through the
>>> loop set it to length(dt1)+1 before adding a column? That seems
>>> rather brutal.
>>> Alternatively checking for the delta between truelength and length,
>>> and how close that is to the magic 1000 number, and then only
>>> adjusting the setting seems fragile.
>>>
>>> I did try to make sense of the help for alloc.col. Regarding the bit
>>> about "if two or more variables are bound to the same data.table";
>>> the column addition is within a function, and only one variable
>>> references the data.table, at least in the scope of the function. The
>>> function calling that function has a variable for the data.table too,
>>> so I don't know if that counts. Then there is mention of using copy
>>> (not sure how that helps, and BTW the hyperlink for copy goes to the
>>> page for setkey, which does mention copy, but suggests "See ?copy"
>>> which just conjures up the setkey page again), setting alloc.col, or
>>> changing datatable.alloccol (doesn't seem to help).
>>>
>>> The warning asked for sessionInfo; FWIW, here it is:
>>>
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=C LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
>>> [7] base
>>>
>>> other attached packages:
>>> [1] data.table_1.8.2
>>>
>>> Thanks
>>> George
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl
>>> e-help
>>
>>
>
>
>
More information about the datatable-help
mailing list