[Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Tue Apr 2 21:01:11 CEST 2013
On Mon, 1 Apr 2013 10:13:54 -0500,Dirk Eddelbuettel <edd at debian.org> wrote:
> On 1 April 2013 at 17:04, Ramon Diaz-Uriarte wrote:
> |
> |
> |
> | On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel <edd at debian.org> wrote:
> |
> | > On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote:
> | > |
> | > | Dear All,
> | > |
> | > | I am confused about creating Rcpp Numeric Matrices larger than
> | > | .Machine$integer.max. The code below illustrates some of the points
> | > | (probably with too much detail ;-). These are some things that puzzle me:
> |
> | > Which R version did you use?
> |
> | Ooops, sorry.
> |
> | > version
> | _
> | platform x86_64-pc-linux-gnu
> | arch x86_64
> | os linux-gnu
> | system x86_64, linux-gnu
> | status Patched
> | major 2
> | minor 15.3
> I think you can't really expect this to work. R, up to this version, has the
> very famous 2^31 - 1 index limit.
> | year 2013
> | month 03
> | day 03
> | svn rev 62150
> | language R
> | version.string R version 2.15.3 Patched (2013-03-03 r62150)
> | nickname Security Blanket
> |
> |
> |
> | > Does what you attempt work _in straight C code
> | > bypassing Rcpp_ ?
> |
> | In straight C++, using std::vector, this works (though not, as I tried it,
> | in naive straight C, as shown in the comments). It will use ~ 35 GB of
> | memory:
> Sure, but "does not matter" as it is outside of R.
> In R, you can do this _if you go the route of outside memory management_ as
> eg bigmemory and ff do.
Thanks! However, for the current stuff I definitely want the output to
stay well within the 2^32 limit.
> | #include <iostream>
> | #include <vector>
> | #include <iterator>
> |
> | int main() {
> |
> | // double v1[500000L * 9000L]; // this segfaults
> | // double v1[4300000000]; // this segfaults
> |
> | std::vector<double> v2(500000L * 9000L);
> | std::cout << " Max size v2: " << v2.max_size() << std::endl;
> | std::cout << " Current size v2: " << v2.size() << std::endl;
> |
> | double tt = 0;
> | for(size_t t = 0; t < v2.size(); ++t)
> | v2[t] = ++tt;
> | std::cout << "\n Assigned to vector" << std::endl;
> | std::cout << "\n Last value is " << v2[(500000L * 9000L) - 1] << std::endl;
> | return 0;
> | }
> |
> | Anyway, I guess the example is not really relevant for this case.
> Agreed.
> | > If you used R 2.*, then the attempt makes little sense AFAICT.
> |
> | Sorry, I was not clear. I was not (consciously) _attempting_ to do
> | that. In my "for real" code the dimensions of the object are set almost at
> | the end of a long simulation and in a few cases those numbers were much
> | larger than I expected (I did not realize how big until I started looking
> | into the segfaults and the errors).
> I understand. But I think you should consider writing some sort of "reducers"
> to not require to swallow that whole object.
Yes, agreed; that is what I'm trying now.
> | What I found confusing was the segmentation fault, because the behavior
> | seems inconsistent. Sometimes there was no segfault because the error
> | ("negative length vectors are not allowed (...)") was triggered. But
> | sometimes the object seemed to have been created (and thus I assumed sizes
> | were OK ---yes, before looking at the actual sizes) and then the segfault
> | took place later.
> <insert Oscar Wilde quote about conistency being ... just kidding>
C++ is still way tooooo big for me to try the imaginative route; for now,
I'll stay inside the box ;-).
R.
> I think we simply see an error condition for undefined behaviour.
> Dirk
> |
> |
> |
> |
> | R.
> |
> |
> | > If you used R 3.0.0, then you may have noticed that R is ahead of us, and you
> | > are welcome to help close the gap :)
> |
> | > Dirk
> |
> |
> | > | 1. For some values of number of rows and columns, creating the matrix is
> | > | not allowed, with the message "negative length vectors are not allowed",
> | > | but with other values the creation of the matrix proceeds without
> | > | (apparent) troubles, even when the total size is >> 2^31 - 1.
> | > |
> | > | 1.a. Is this intended?
> | > |
> | > | 1.b. I understand the error message is coming from R (not Rcpp) and thus
> | > | this is not something that can be made easier to understand?
> | > |
> | > |
> | > | 2. The part I found confusing is that the same problem (number of cells >
> | > | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests
> | > | itself much later (either in the C++ code or later in R).
> | > |
> | > | I was expecting (maybe the problem are my expectations) an error early on,
> | > | when creating the matrix; if the creation proceeds without trouble, I was
> | > | not expecting a segfault (as I think all cells are initialized to cero).
> | > |
> | > | Is the recommended procedure to check if the product of dimensions is <
> | > | 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit
> | > | systems?).
> | > |
> | > |
> | > | Best,
> | > |
> | > | R.
> | > |
> | > |
> | > |
> | > | // Beginning of file max-size.cpp
> | > |
> | > | #include <Rcpp.h>
> | > |
> | > | using namespace Rcpp;
> | > |
> | > |
> | > | // [[Rcpp::export]]
> | > |
> | > | NumericMatrix f1(IntegerVector nr, IntegerVector nc,
> | > | IntegerVector sf = 0) {
> | > | int nrow = as<int>(nr);
> | > | int ncol = as<int>(nc);
> | > | int segf = as<int>(sf);
> | > |
> | > | NumericMatrix outM(nrow, ncol);
> | > | std::cout << " After creating outM" << std::endl;
> | > | outM(nrow - 1, 0) = 1;
> | > | std::cout << " After asigning to last row, first column"
> | > | << std::endl;
> | > |
> | > | std::cout << " Some other value: 1, 0: "
> | > | << outM(1, 0) << std::endl;
> | > |
> | > | if( (nrow > 1) && (ncol > 3) )
> | > | std::cout << " Some other value: nrow - 1, ncol - 3: "
> | > | << outM(nrow - 1, ncol - 3) << std::endl;
> | > |
> | > | outM(nrow - 1, ncol - 1) = 1;
> | > | std::cout << " After asigning something to last cell"
> | > | << std::endl;
> | > |
> | > | std::cout << " Try to return the last assignment: "
> | > | << outM(nrow - 1, ncol - 1) << std::endl;
> | > |
> | > | if((nrow >= 500000) && segf) {
> | > | std::cout << "\n Assign a few around/beyond 2^32 - 1. Should segfault\n";
> | > | for(int i = 4290; i < 4300; ++i) {
> | > | std::cout << " i = " << i << std::endl;
> | > | outM(nrow - 1, i) = 0;
> | > | }
> | > | }
> | > |
> | > | return wrap(outM);
> | > | }
> | > |
> | > | // End of file max-size.cpp
> | > |
> | > |
> | > |
> | > |
> | > |
> | > | ################################################
> | > | library(Rcpp)
> | > | sourceCpp("max-size.cpp", verbose = TRUE)
> | > |
> | > | (tmp <- f1(4, 5))
> | > |
> | > |
> | > | 4294967 * 500 > .Machine$integer.max
> | > | tmp <- f1(4294967, 500)
> | > | object.size(tmp)/(4294967 * 500) ## ~ 8
> | > |
> | > | 4294967 * 501 > .Machine$integer.max
> | > | tmp <- f1(4294967, 501) ## negative length vectors
> | > |
> | > | 500000 * 9000 > .Machine$integer.max
> | > | tmp <- f1(500000, 9000) ## sometimes segfaults
> | > | tmp[500000, 9000]
> | > | object.size(tmp) ## things are missing
> | > | prod(dim(tmp)) > .Machine$integer.max
> | > |
> | > | ## using either of these usually leads to segfault
> | > |
> | > | for(i in (4290:4300)) print(tmp[500000, i])
> | > |
> | > | f1(500000, 9000, 1)
> | > |
> | > | #####################################################
> | > |
> | > |
> | > | --
> | > | Ramon Diaz-Uriarte
> | > | Department of Biochemistry, Lab B-25
> | > | Facultad de Medicina
> | > | Universidad Autónoma de Madrid
> | > | Arzobispo Morcillo, 4
> | > | 28029 Madrid
> | > | Spain
> | > |
> | > | Phone: +34-91-497-2412
> | > |
> | > | Email: rdiaz02 at gmail.com
> | > | ramon.diaz at iib.uam.es
> | > |
> | > | http://ligarto.org/rdiaz
> | > |
> | > |
> | > | _______________________________________________
> | > | Rcpp-devel mailing list
> | > | Rcpp-devel at lists.r-forge.r-project.org
> | > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> | > --
> | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
> | --
> | Ramon Diaz-Uriarte
> | Department of Biochemistry, Lab B-25
> | Facultad de Medicina
> | Universidad Autónoma de Madrid
> | Arzobispo Morcillo, 4
> | 28029 Madrid
> | Spain
> |
> | Phone: +34-91-497-2412
> |
> | Email: rdiaz02 at gmail.com
> | ramon.diaz at iib.uam.es
> |
> | http://ligarto.org/rdiaz
> |
> |
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
More information about the Rcpp-devel
mailing list