[Rcpp-devel] Logic error causes R memory to be corrupt

Anton Bossenbroek anton.bossenbroek at me.com
Thu Oct 6 16:52:00 CEST 2016


Hi Everyone,

I want to add a large number of objects in C++ that are managed by `shared_ptr` in a `vector`. However, when I push the limits of the amount that I want to allocate the data in R becomes inconsistent.

I will first show the test script and then the c++ file that cause the error. The expected results are shown at the bottom.

# test script
The test script permits to add an arbitrary number of objects to a vector. 

require(Rcpp)
sourceCpp(file="~/tmp/example.cpp")

add_children <- function(number_of_children = 2) {
  p <- 0
  st <- initialize_storage(p, p)
  for (i in 1 : number_of_children) {
    st <- add_node(st, i, i)
  }
  return(st)
}

# example.cpp
The vector is stored in a object that is managed by R but the elements in the vector are managed by `shared_ptr` and created with `make_shared`. 

// [[Rcpp::plugins(cpp11)]]
#include <RcppCommon.h>

#include <memory>
#include <vector>

using namespace std;

struct Element : public std::enable_shared_from_this< Element > {
  SEXP key_;

  /* Simple constructor that assigns the key. */
  Element(SEXP key) : key_(key) {}

  /* Convert the object to a R object. */
  operator SEXP() const;
};

typedef shared_ptr<Element> element_sp;
typedef vector<element_sp> element_sp_vec;
typedef shared_ptr<element_sp_vec> element_sp_vec_sp;

struct Storage {
  /* Internal storage of nodes. */
  element_sp_vec nodes_;

  /* Empty constructor. */
  Storage() {}

  /* Add a node to the storage with its key set to key. */
  void add_element(SEXP key) {
    /* Since Element objects are managed by shared_ptr we create a new class
     * with make_shared. */
    element_sp e = make_shared<Element>(key);
    /* Add the node to the internal storage. */
    nodes_.push_back(e);
  }

  element_sp_vec_sp get_nodes() {
    /* Create a shared pointer that will hold all the results. Although we
     * could do this simpler it mimics the logic I implemented in my real
     * program. There I need to swap elements in the list after the copy of the
     * vector. */
    element_sp_vec_sp res(new element_sp_vec());
    /* Copy the data in the nodes vector to the result vector. */
    *res = nodes_;
    return res;
  }
};

#include <Rcpp.h>

using namespace Rcpp;

/* Convert the Element object to a list with key set its internal member */
Element::operator SEXP() const
{
  List serial;

  serial["key"] = key_;

  return serial;
}

typedef XPtr<Storage> st_xptr;

// [[Rcpp::export]]
SEXP
initialize_storage()
{
  /* Create a new storage managed by R. */
  Storage* st = new Storage();
  st_xptr p(st, true);

  return p;
}

// [[Rcpp::export]]
SEXP
add_element(SEXP st_sexp, SEXP key)
{
  st_xptr st(st_sexp);
  /* Add a new element to the internal storage. */
  st->add_element(key);

  return st;
}

// [[Rcpp::export]]
List
get_nodes(SEXP st_sexp)
{
  st_xptr st(st_sexp);
  /* Retrieve the elements in the internal storage. */
  element_sp_vec_sp c_res = st->get_nodes();

  /* Allocate a List to store all our results. */
  List result(c_res->size());
  int i = 0;
  /* Iterate through the results and store the result in our list. */
  for (auto it : *c_res) {
    result[i] = wrap(*it);
    ++i;
  }

  return result;
}

Below follow a few test cases of the script with the behavior that I experience on Mac OS Sierra with clang.

## n = 10

Everything works fine

n <- 10
a <- add_children(number_of_children = n)
res <- sapply(get_nodes(a), function(x) x[["key"]])
all(res == 0 : n)
# [1] TRUE

## n = 100

Everything works fine

n <- 100
a <- add_children(number_of_children = n)
res <- sapply(get_nodes(a), function(x) x[["key"]])
all(res == 0 : n)
# [1] TRUE

## n = 10000

Something goes wrong.

n <- 10000
a <- add_children(number_of_children = n)

res <- sapply(get_nodes(a), function(x) x[["key"]])
all(res == 0 : n)
# [1] FALSE
# There were 50 or more warnings (use warnings() to see the first 50)

Some further research shows that the warnings are:

warnings()
# Warning messages:
# 1: NAs introduced by coercion
# 2: NAs introduced by coercion
# 3: NAs introduced by coercion
# 4: NAs introduced by coercion
### etc.

a closer inspection into the content of `res` shows that it has a non numeric value,

res[1000]
# [[1]]
# [1] "data"

which is surprising to me since the script only added numeric `SEXP` values to the `vector`. My expected output for this value of `n` would be the same as the cases above.

## gctorture
I reran the `n=10000` example with `gctorture(TRUE)` but did not receive any warning but the data is corrupt. Two random elements in the `res` list:

# [[998]]
# <CHARSXP: "\"key\"">
# 
# [[999]]
# [1] "srcref"

= Replication
I replicated these results on Mac OS X Sierra as well as Docker image based on rocker.

sessionInfo()
# R version 3.3.1 (2016-06-21)
# Platform: x86_64-apple-darwin15.5.0 (64-bit)
# Running under: OS X 10.12 (Sierra)
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] Rcpp_0.12.7    setwidth_1.0-4 colorout_1.1-2
# 
# loaded via a namespace (and not attached):
# [1] tools_3.3.1

### uname

uname -prsv
Darwin 16.0.0 Darwin Kernel Version 16.0.0: Mon Aug 29 17:56:20 PDT 2016; root:xnu-3789.1.32~3/RELEASE_X86_64 i386

### clang

clang -v
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin16.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Any advise on what may be the problem here?


More information about the Rcpp-devel mailing list