[Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Mon Apr 1 17:04:03 CEST 2013
On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel <edd at debian.org> wrote:
> On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote:
> |
> | Dear All,
> |
> | I am confused about creating Rcpp Numeric Matrices larger than
> | .Machine$integer.max. The code below illustrates some of the points
> | (probably with too much detail ;-). These are some things that puzzle me:
> Which R version did you use?
Ooops, sorry.
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status Patched
major 2
minor 15.3
year 2013
month 03
day 03
svn rev 62150
language R
version.string R version 2.15.3 Patched (2013-03-03 r62150)
nickname Security Blanket
> Does what you attempt work _in straight C code
> bypassing Rcpp_ ?
In straight C++, using std::vector, this works (though not, as I tried it,
in naive straight C, as shown in the comments). It will use ~ 35 GB of
memory:
#include <iostream>
#include <vector>
#include <iterator>
int main() {
// double v1[500000L * 9000L]; // this segfaults
// double v1[4300000000]; // this segfaults
std::vector<double> v2(500000L * 9000L);
std::cout << " Max size v2: " << v2.max_size() << std::endl;
std::cout << " Current size v2: " << v2.size() << std::endl;
double tt = 0;
for(size_t t = 0; t < v2.size(); ++t)
v2[t] = ++tt;
std::cout << "\n Assigned to vector" << std::endl;
std::cout << "\n Last value is " << v2[(500000L * 9000L) - 1] << std::endl;
return 0;
}
Anyway, I guess the example is not really relevant for this case.
> If you used R 2.*, then the attempt makes little sense AFAICT.
Sorry, I was not clear. I was not (consciously) _attempting_ to do
that. In my "for real" code the dimensions of the object are set almost at
the end of a long simulation and in a few cases those numbers were much
larger than I expected (I did not realize how big until I started looking
into the segfaults and the errors).
What I found confusing was the segmentation fault, because the behavior
seems inconsistent. Sometimes there was no segfault because the error
("negative length vectors are not allowed (...)") was triggered. But
sometimes the object seemed to have been created (and thus I assumed sizes
were OK ---yes, before looking at the actual sizes) and then the segfault
took place later.
R.
> If you used R 3.0.0, then you may have noticed that R is ahead of us, and you
> are welcome to help close the gap :)
> Dirk
> | 1. For some values of number of rows and columns, creating the matrix is
> | not allowed, with the message "negative length vectors are not allowed",
> | but with other values the creation of the matrix proceeds without
> | (apparent) troubles, even when the total size is >> 2^31 - 1.
> |
> | 1.a. Is this intended?
> |
> | 1.b. I understand the error message is coming from R (not Rcpp) and thus
> | this is not something that can be made easier to understand?
> |
> |
> | 2. The part I found confusing is that the same problem (number of cells >
> | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests
> | itself much later (either in the C++ code or later in R).
> |
> | I was expecting (maybe the problem are my expectations) an error early on,
> | when creating the matrix; if the creation proceeds without trouble, I was
> | not expecting a segfault (as I think all cells are initialized to cero).
> |
> | Is the recommended procedure to check if the product of dimensions is <
> | 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit
> | systems?).
> |
> |
> | Best,
> |
> | R.
> |
> |
> |
> | // Beginning of file max-size.cpp
> |
> | #include <Rcpp.h>
> |
> | using namespace Rcpp;
> |
> |
> | // [[Rcpp::export]]
> |
> | NumericMatrix f1(IntegerVector nr, IntegerVector nc,
> | IntegerVector sf = 0) {
> | int nrow = as<int>(nr);
> | int ncol = as<int>(nc);
> | int segf = as<int>(sf);
> |
> | NumericMatrix outM(nrow, ncol);
> | std::cout << " After creating outM" << std::endl;
> | outM(nrow - 1, 0) = 1;
> | std::cout << " After asigning to last row, first column"
> | << std::endl;
> |
> | std::cout << " Some other value: 1, 0: "
> | << outM(1, 0) << std::endl;
> |
> | if( (nrow > 1) && (ncol > 3) )
> | std::cout << " Some other value: nrow - 1, ncol - 3: "
> | << outM(nrow - 1, ncol - 3) << std::endl;
> |
> | outM(nrow - 1, ncol - 1) = 1;
> | std::cout << " After asigning something to last cell"
> | << std::endl;
> |
> | std::cout << " Try to return the last assignment: "
> | << outM(nrow - 1, ncol - 1) << std::endl;
> |
> | if((nrow >= 500000) && segf) {
> | std::cout << "\n Assign a few around/beyond 2^32 - 1. Should segfault\n";
> | for(int i = 4290; i < 4300; ++i) {
> | std::cout << " i = " << i << std::endl;
> | outM(nrow - 1, i) = 0;
> | }
> | }
> |
> | return wrap(outM);
> | }
> |
> | // End of file max-size.cpp
> |
> |
> |
> |
> |
> | ################################################
> | library(Rcpp)
> | sourceCpp("max-size.cpp", verbose = TRUE)
> |
> | (tmp <- f1(4, 5))
> |
> |
> | 4294967 * 500 > .Machine$integer.max
> | tmp <- f1(4294967, 500)
> | object.size(tmp)/(4294967 * 500) ## ~ 8
> |
> | 4294967 * 501 > .Machine$integer.max
> | tmp <- f1(4294967, 501) ## negative length vectors
> |
> | 500000 * 9000 > .Machine$integer.max
> | tmp <- f1(500000, 9000) ## sometimes segfaults
> | tmp[500000, 9000]
> | object.size(tmp) ## things are missing
> | prod(dim(tmp)) > .Machine$integer.max
> |
> | ## using either of these usually leads to segfault
> |
> | for(i in (4290:4300)) print(tmp[500000, i])
> |
> | f1(500000, 9000, 1)
> |
> | #####################################################
> |
> |
> | --
> | Ramon Diaz-Uriarte
> | Department of Biochemistry, Lab B-25
> | Facultad de Medicina
> | Universidad Autónoma de Madrid
> | Arzobispo Morcillo, 4
> | 28029 Madrid
> | Spain
> |
> | Phone: +34-91-497-2412
> |
> | Email: rdiaz02 at gmail.com
> | ramon.diaz at iib.uam.es
> |
> | http://ligarto.org/rdiaz
> |
> |
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
More information about the Rcpp-devel
mailing list