[Rcpp-devel] Very Large Matrices in RcppArmadillo

Tue Jul 17 17:44:19 CEST 2012

Thank you all for the responses.

Christian, I didn't know about the copy_aux_mem option.  I will have to
take a look at that.

Dirk, thanks for looking into the 64-bit matrix indices.

Doug, the place in my code where I get the error is when I multiply
matrices.  I might have matrices X and Y, where X is 300000x500 and Y is
500x300000 and I want Z = X * Y.  I could break up the matrices into
smaller chunks and do the multiplication, but Z is later used in several
other multi-step calculations (with addition and multiplication mostly) so
I think that would be a last resort.  If I can get the 64-bit matrix
indices working in RcppArmadillo, I think that will solve much of the
problem, because I will only need to return very long vectors and not big
matrices.

Joshua
-- 
Joshua French, Ph.D.
Assistant Professor
Department of Mathematical and Statistical Sciences
University of Colorado Denver
Joshua.French at ucdenver.edu
http://math.ucdenver.edu/~jfrench/
Ph:  303-556-6265  Fax:  303-556-8550

On 7/17/12 8:56 AM, "Douglas Bates" <bates at stat.wisc.edu> wrote:

>On Tue, Jul 17, 2012 at 8:14 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
>>
>> On 16 July 2012 at 23:30, French, Joshua wrote:
>> | I am doing some linear algebra on large matrices in R and receiving
>>the
>> | following error:  "allocMatrix: too many elements specified".  From
>>what I
>> | understand, the error is caused by the fact that R uses 32-bit ints
>>and not
>> | 64-bit ints for matrix indices, so R doesn't have a way to represent
>>all the
>> | elements in the very large matrix.
>> |
>> | My two questions:
>> |
>> | 1.  Armadillo (and presumably RcppArmadillo) will not have this issue
>>since
>> | Armadillo provided support for 64-bit indices as of version 2.4.0.
>>Is there a
>> | way to easily utilize this functionality from within RcppArmadillo?
>>
>> I need to double check but this may have been a compile-time option you
>>need
>> to enable. In any event ... R indices are still limited so you may not
>>be
>> able to pass these back and forth.
>>
>> | 2.  I have found in the past that some of the speeds gains from
>>RcppArmadillo
>> | in comparison to pure R are lost when passing large matrices as
>>arguments.
>> |  There will always be overhead when passing arguments (especially
>>large matrix
>> | arguments) to pretty much any function.  Are there any tricks to
>>minimize the
>> | overhead when passing a non-sparse matrix argument of say 1,000,000
>>by 500 from
>> | R to Armadillo?
>>
>> I defer all question concerning sparse matrices to Doug and other users
>>of
>> sparse matrix code.  I live mostly in a small-to-medium size dense
>>matrix world.
>
>Actually the question was about non-sparse matrices.  It looks as if
>it is the number of rows in the matrix that will be problematic.  An
>upper bound on the number of rows is the maximum integer value divided
>by the number of columns.
>
>> .Machine$integer.max / 500
>[1] 4294967
>
>I would try not to exceed about 1/2 to 1/4 of that bound.
>
>A simple way of handling data sets that are much larger than that is
>to work with a sample of the rows.  If that is not feasible then I
>would create a list of matrices each of size 1,000,000 by 500 or so
>and vertically concatenate them in the  C++ code.  Of course, this
>means a copying operation.  Also, when you are finished if you need to
>pass results back to R then you face a similar problem getting a large
>matrix in C++ back into R storage.
>
>You can create a read-only matrix in C++ using the storage from R as
>described by Christian for RcppArmadillo or using the Eigen::Map class
>in RcppEigen.
>
>What are you (Joshua) doing with these large matrices?  If the main
>calculations involve X'X-type calculations you can carry out the
>calculations on horizontal chunks and then assemble the results.