[Rcpp-devel] Report of CRAN compilation problem and solution with architecture x86_32
Juan Domingo Esteve
Juan.Domingo at uv.es
Sat Nov 26 21:10:30 CET 2022
Keywords: runtime error, package check, 32 bit architectures, large files
This is the second of two reports with CRAN check problems that I found in my package and
that affect only some particular architectures (in this case, x86_32)
Problem description:
When compiling a package with C++ source code using Rcpp in a Linux system,
kernel 5.19.16-100, distribution Fedora 35, the generated package passed
R CMD check --as-cran test, giving no compilation warnings and no execution errors.
Nevertheless, the runtime tests in the CRAN server provoked an error exclusively
for the x86_32 architecture (found mostly in old PCs).
Let's suppose you have stored a variable of unsigned long long type at the end
of a binary file. You think you can read it with:
unsigned long long endofbindata;
std::string fname="yourfilename";
std::ifstream f(fname.c_str());
f.seekg(-sizeof(unsigned long long),std::ios::end);
f.read((char *)&endofbindata,sizeof(unsigned long long));
and indeed you can, but ONLY in 64-bit architectures. The function seekg does not
work as expected in 32-bit architectures since the first parameter (offset)
is of type streamoff which does not seem to be defined equally by g++ for 32 and
64 bit architectures. In 32 bit provokes over/underflow and absurd results
on execution EVEN IF THE FILE is smaller than 2^32 bytes (in compilation, even in
a 32-bit computer, no error or warning is raised so you don't notice the problem).
My solution has been:
Make a more portable function to get the size of a file using the stat system call, like:
unsigned long long GetFileSize(std::string fname)
{
struct stat stat_buf;
int rc = stat(fname.c_str(), &stat_buf);
if (rc != 0)
{
std::string err="Cannot obtain information (with stat system call) of file "+fname+"\n";
err += "This is probably because you are running this in a 32-bit architecture and the file is bigger than 4 GB.\n";
err += "Unfortunately, we have not found yet a solution for that and, if you need to manage so big files,\n";
err += "probably you should consider using a 64-bit architecture.\n";
Rcpp::stop(err);
// NOTE: may be definition of __USE_FILE_OFFSET64 could solve this but it might provoke other problems...
}
else
return ((unsigned long long)stat_buf.st_size);
}
According to the stat manual, stat returns this error:
EOVERFLOW
pathname or fd refers to a file whose size, inode number, or number of blocks cannot be represented in, respectively, the types
off_t, ino_t, or blkcnt_t. This error can occur when, for example, an application compiled on a 32-bit platform without
-D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bytes.
Done that, use the returned number (if it has succeeded) to go there (or there, less an offset) with a f.seekg call.
As you see, I have not found a real solution, but at least this warns the user about the problem of using large files
in 32-bit architectures.
This should be now infrequent in practice, since every day less 32-bit computers remain in use,
but since CRAN still checks with them I have preferred to document it, just in case anyone else may
benefit of the information.
Juan
--
________________________________________________________________
Juan Domingo Esteve
Dept. of Informatics, School of Engineering
University of Valencia
Avda. de la Universidad, s/n.
46100-Burjasot (Valencia)
SPAIN
Telephone: +34-963543572
Fax: +34-963543550
email: Juan.Domingo at uv.es
________________________________________________________________
More information about the Rcpp-devel
mailing list