[Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
Dirk Eddelbuettel
edd at debian.org
Mon Apr 1 17:13:54 CEST 2013
On 1 April 2013 at 17:04, Ramon Diaz-Uriarte wrote:
|
|
|
| On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel <edd at debian.org> wrote:
|
| > On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote:
| > |
| > | Dear All,
| > |
| > | I am confused about creating Rcpp Numeric Matrices larger than
| > | .Machine$integer.max. The code below illustrates some of the points
| > | (probably with too much detail ;-). These are some things that puzzle me:
|
| > Which R version did you use?
|
| Ooops, sorry.
|
| > version
| _
| platform x86_64-pc-linux-gnu
| arch x86_64
| os linux-gnu
| system x86_64, linux-gnu
| status Patched
| major 2
| minor 15.3
I think you can't really expect this to work. R, up to this version, has the
very famous 2^31 - 1 index limit.
| year 2013
| month 03
| day 03
| svn rev 62150
| language R
| version.string R version 2.15.3 Patched (2013-03-03 r62150)
| nickname Security Blanket
|
|
|
| > Does what you attempt work _in straight C code
| > bypassing Rcpp_ ?
|
| In straight C++, using std::vector, this works (though not, as I tried it,
| in naive straight C, as shown in the comments). It will use ~ 35 GB of
| memory:
Sure, but "does not matter" as it is outside of R.
In R, you can do this _if you go the route of outside memory management_ as
eg bigmemory and ff do.
| #include <iostream>
| #include <vector>
| #include <iterator>
|
| int main() {
|
| // double v1[500000L * 9000L]; // this segfaults
| // double v1[4300000000]; // this segfaults
|
| std::vector<double> v2(500000L * 9000L);
| std::cout << " Max size v2: " << v2.max_size() << std::endl;
| std::cout << " Current size v2: " << v2.size() << std::endl;
|
| double tt = 0;
| for(size_t t = 0; t < v2.size(); ++t)
| v2[t] = ++tt;
| std::cout << "\n Assigned to vector" << std::endl;
| std::cout << "\n Last value is " << v2[(500000L * 9000L) - 1] << std::endl;
| return 0;
| }
|
| Anyway, I guess the example is not really relevant for this case.
Agreed.
| > If you used R 2.*, then the attempt makes little sense AFAICT.
|
| Sorry, I was not clear. I was not (consciously) _attempting_ to do
| that. In my "for real" code the dimensions of the object are set almost at
| the end of a long simulation and in a few cases those numbers were much
| larger than I expected (I did not realize how big until I started looking
| into the segfaults and the errors).
I understand. But I think you should consider writing some sort of "reducers"
to not require to swallow that whole object.
| What I found confusing was the segmentation fault, because the behavior
| seems inconsistent. Sometimes there was no segfault because the error
| ("negative length vectors are not allowed (...)") was triggered. But
| sometimes the object seemed to have been created (and thus I assumed sizes
| were OK ---yes, before looking at the actual sizes) and then the segfault
| took place later.
<insert Oscar Wilde quote about conistency being ... just kidding>
I think we simply see an error condition for undefined behaviour.
Dirk
|
|
|
|
| R.
|
|
| > If you used R 3.0.0, then you may have noticed that R is ahead of us, and you
| > are welcome to help close the gap :)
|
| > Dirk
|
|
| > | 1. For some values of number of rows and columns, creating the matrix is
| > | not allowed, with the message "negative length vectors are not allowed",
| > | but with other values the creation of the matrix proceeds without
| > | (apparent) troubles, even when the total size is >> 2^31 - 1.
| > |
| > | 1.a. Is this intended?
| > |
| > | 1.b. I understand the error message is coming from R (not Rcpp) and thus
| > | this is not something that can be made easier to understand?
| > |
| > |
| > | 2. The part I found confusing is that the same problem (number of cells >
| > | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests
| > | itself much later (either in the C++ code or later in R).
| > |
| > | I was expecting (maybe the problem are my expectations) an error early on,
| > | when creating the matrix; if the creation proceeds without trouble, I was
| > | not expecting a segfault (as I think all cells are initialized to cero).
| > |
| > | Is the recommended procedure to check if the product of dimensions is <
| > | 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit
| > | systems?).
| > |
| > |
| > | Best,
| > |
| > | R.
| > |
| > |
| > |
| > | // Beginning of file max-size.cpp
| > |
| > | #include <Rcpp.h>
| > |
| > | using namespace Rcpp;
| > |
| > |
| > | // [[Rcpp::export]]
| > |
| > | NumericMatrix f1(IntegerVector nr, IntegerVector nc,
| > | IntegerVector sf = 0) {
| > | int nrow = as<int>(nr);
| > | int ncol = as<int>(nc);
| > | int segf = as<int>(sf);
| > |
| > | NumericMatrix outM(nrow, ncol);
| > | std::cout << " After creating outM" << std::endl;
| > | outM(nrow - 1, 0) = 1;
| > | std::cout << " After asigning to last row, first column"
| > | << std::endl;
| > |
| > | std::cout << " Some other value: 1, 0: "
| > | << outM(1, 0) << std::endl;
| > |
| > | if( (nrow > 1) && (ncol > 3) )
| > | std::cout << " Some other value: nrow - 1, ncol - 3: "
| > | << outM(nrow - 1, ncol - 3) << std::endl;
| > |
| > | outM(nrow - 1, ncol - 1) = 1;
| > | std::cout << " After asigning something to last cell"
| > | << std::endl;
| > |
| > | std::cout << " Try to return the last assignment: "
| > | << outM(nrow - 1, ncol - 1) << std::endl;
| > |
| > | if((nrow >= 500000) && segf) {
| > | std::cout << "\n Assign a few around/beyond 2^32 - 1. Should segfault\n";
| > | for(int i = 4290; i < 4300; ++i) {
| > | std::cout << " i = " << i << std::endl;
| > | outM(nrow - 1, i) = 0;
| > | }
| > | }
| > |
| > | return wrap(outM);
| > | }
| > |
| > | // End of file max-size.cpp
| > |
| > |
| > |
| > |
| > |
| > | ################################################
| > | library(Rcpp)
| > | sourceCpp("max-size.cpp", verbose = TRUE)
| > |
| > | (tmp <- f1(4, 5))
| > |
| > |
| > | 4294967 * 500 > .Machine$integer.max
| > | tmp <- f1(4294967, 500)
| > | object.size(tmp)/(4294967 * 500) ## ~ 8
| > |
| > | 4294967 * 501 > .Machine$integer.max
| > | tmp <- f1(4294967, 501) ## negative length vectors
| > |
| > | 500000 * 9000 > .Machine$integer.max
| > | tmp <- f1(500000, 9000) ## sometimes segfaults
| > | tmp[500000, 9000]
| > | object.size(tmp) ## things are missing
| > | prod(dim(tmp)) > .Machine$integer.max
| > |
| > | ## using either of these usually leads to segfault
| > |
| > | for(i in (4290:4300)) print(tmp[500000, i])
| > |
| > | f1(500000, 9000, 1)
| > |
| > | #####################################################
| > |
| > |
| > | --
| > | Ramon Diaz-Uriarte
| > | Department of Biochemistry, Lab B-25
| > | Facultad de Medicina
| > | Universidad Autónoma de Madrid
| > | Arzobispo Morcillo, 4
| > | 28029 Madrid
| > | Spain
| > |
| > | Phone: +34-91-497-2412
| > |
| > | Email: rdiaz02 at gmail.com
| > | ramon.diaz at iib.uam.es
| > |
| > | http://ligarto.org/rdiaz
| > |
| > |
| > | _______________________________________________
| > | Rcpp-devel mailing list
| > | Rcpp-devel at lists.r-forge.r-project.org
| > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| > --
| > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| --
| Ramon Diaz-Uriarte
| Department of Biochemistry, Lab B-25
| Facultad de Medicina
| Universidad Autónoma de Madrid
| Arzobispo Morcillo, 4
| 28029 Madrid
| Spain
|
| Phone: +34-91-497-2412
|
| Email: rdiaz02 at gmail.com
| ramon.diaz at iib.uam.es
|
| http://ligarto.org/rdiaz
|
|
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel
mailing list