# [Rcpp-devel] Performance/memory management question

Toki Loo tokiloo1 at yahoo.fr
Thu Mar 28 00:54:07 CET 2013

```I have an initial (10 ^5, 20) matrix including observations for a set of individuals (individual column in the matrix)
I want to "sample with replacement the list of individus (unique)" and get the list of observations (with eventual repetitions)
Simplified Ex : m( 5 , 2)
given m :
Ind  Obs
1   3.4
1   3.6
2   5
3   6
4   7

resample(m) may give
1 3.4
1 3.6
2 5
1 3.4
1 3.6
1 3.4
1 3.6
if 1 2 1 1 were sampled from the 1 2 3 4  inds.

I'm trying to do it via Rcpp and here is some code

// [[Rcpp::export]]
void resample(NumericMatrix mat) {

int nrow = mat.nrow();
IntegerVector d1(nrow);
for (int i = 0; i < nrow; i++) {
d1[i] = mat(i, 0);
}
std::cout << "Number of elements in mat:  " << d1.length() << std::endl;
std::multimap<int, NumericVector> m;
for (int i = 0; i < nrow; i++) {
NumericVector d = mat.row(i);
m.insert(std::pair<int, NumericVector>(d1[i], d));
}

// Create vector of deduplicated entries:
std::set<int> keys_dedup;
for (int i = 0; i < nrow; ++i) keys_dedup.insert(d1[i]);
std::cout << "Number of elements in set :  " << keys_dedup.size() << std::endl;
std: vector<int> vec;
vec.assign(keys_dedup.begin(), keys_dedup.end());
std::cout << "Number of elements in vec :  " << vec.size() << std::endl;

//sampling among the unique keys
Engine eng;
eng.seed((unsigned int) 123);
std::tr1::uniform_int<int> unif(0, vec.size() - 1);
std::list<NumericVector> samples;
for (int i = 0; i < vec.size(); ++i) {
int u = unif(eng);
std::cout << u << " : " << vec[u] << std::endl;

std::pair<std::multimap<int, NumericVector>::iterator,
std::multimap<int, NumericVector>::iterator> ret =
m.equal_range(vec[u]);
for (std::multimap<int, NumericVector>::iterator it = ret.first;
it != ret.second; ++it) {
samples.push_back(it->second);
}
}
std::cout << "Number of elements in samples :  " << samples.size() << std::endl;

//    NumericMatrix matR(samples.size(), mat.ncol());
//        for (int i = 0; i < samples.size(); ++i) {
//            matR.row(i) = Rcpp::as(samples[i]);
//        }
//    return matR;
}

I have a performance related question :
m is a 10^⁵ * 20 matrix
if i submit : system.time(m <- resample(m))

I see:
Number of elements in mat:  100000
Number of elements in set :  939
Number of elements in vec :  939
Number of elements in samples :  99008  !!!!( here in the console it takes less than 1 sec to get there)
utilisateur     système      écoulé
38.531       0.004      38.631

I would like to know if possible how to decrease the 38 seconds between the std::cout (in the c++ code) and the end of the execution in R.
Could this be due to memory management/garbage collection, as I can see the last cout in less than 1 sec in the R console ?