[Rcpp-devel] Report of CRAN compilation problem and solution with architecture x86_32

Juan Domingo Esteve Juan.Domingo at uv.es
Sat Nov 26 21:10:30 CET 2022


Keywords: runtime error, package check, 32 bit architectures, large files

This is the second of two reports with CRAN check problems that I found in my package and
that affect only some particular architectures (in this case, x86_32)

Problem description:

  When compiling a package with C++ source code using Rcpp in a Linux system,
  kernel 5.19.16-100, distribution Fedora 35, the generated package passed
  R CMD check --as-cran test, giving no compilation warnings and no execution errors.

  Nevertheless, the runtime tests in the CRAN server provoked an error exclusively
  for the x86_32 architecture (found mostly in old PCs).

  Let's suppose you have stored a variable of unsigned long long type at the end
  of a binary file. You think you can read it with:

  unsigned long long endofbindata;
  std::string fname="yourfilename";

  std::ifstream f(fname.c_str());
  f.seekg(-sizeof(unsigned long long),std::ios::end);
  f.read((char *)&endofbindata,sizeof(unsigned long long));

  and indeed you can, but ONLY in 64-bit architectures. The function seekg does not
  work as expected in 32-bit architectures since the first parameter (offset)
  is of type streamoff which does not seem to be defined equally by g++ for 32 and
  64 bit architectures. In 32 bit provokes over/underflow and absurd results
  on execution EVEN IF THE FILE is smaller than 2^32 bytes (in compilation, even in
  a 32-bit computer, no error or warning is raised so you don't notice the problem).

My solution has been:

  Make a more portable function to get the size of a file using the stat system call, like:

  unsigned long long GetFileSize(std::string fname)
  {
         struct stat stat_buf;
         int rc = stat(fname.c_str(), &stat_buf);
         if (rc != 0)
         {
          std::string err="Cannot obtain information (with stat system call) of file "+fname+"\n";
          err += "This is probably because you are running this in a 32-bit architecture and the file is bigger than 4 GB.\n";
          err += "Unfortunately, we have not found yet a solution for that and, if you need to manage so big files,\n";
          err += "probably you should consider using a 64-bit architecture.\n";
          Rcpp::stop(err);
          // NOTE: may be definition of __USE_FILE_OFFSET64 could solve this but it might provoke other problems...
         }
         else
          return ((unsigned long long)stat_buf.st_size);
  }

  According to the stat manual, stat returns this error:

    EOVERFLOW
         pathname or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the  types
         off_t,  ino_t,  or  blkcnt_t.   This  error  can  occur  when, for example, an application compiled on a 32-bit platform without
         -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.

  Done that, use the returned number (if it has succeeded) to go there (or there, less an offset) with a f.seekg call.

  As you see, I have not found a real solution, but at least this warns the user about the problem of using large files
  in 32-bit architectures.

  This should be now infrequent in practice, since every day less 32-bit computers remain in use,
  but since CRAN still checks with them I have preferred to document it, just in case anyone else may
  benefit of the information.

     Juan

-- 
________________________________________________________________
Juan Domingo Esteve
Dept. of Informatics, School of Engineering
University of Valencia
Avda. de la Universidad, s/n.
         46100-Burjasot (Valencia)
            SPAIN

Telephone:      +34-963543572
Fax:            +34-963543550
email:  Juan.Domingo at uv.es
________________________________________________________________


More information about the Rcpp-devel mailing list