<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div>I have an initial (10 ^5, 20) matrix including observations for a set of individuals (individual column in the matrix)<br>I want to "sample with replacement the list of individus (unique)" and get the list of observations (with eventual repetitions)<br>Simplified Ex : m( 5 , 2)<br>given m :</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: times new roman,new york,times,serif; background-color: transparent; font-style: normal;">Ind Obs<br>1 3.4<br>1 3.6<br>2 5<br>3 6<br>4 7<br><br>resample(m) may give<br>1 3.4<br>1 3.6<br>2 5<br>1 3.4<br>1 3.6<br>1 3.4<br>1 3.6<br>if 1 2 1 1 were sampled from the 1 2 3 4 inds.<br><br>I'm trying to do it via Rcpp and here is some code <br><br>// [[Rcpp::export]]<br>void resample(NumericMatrix mat)
{<br><br> int nrow = mat.nrow();<br> IntegerVector d1(nrow);<br> for (int i = 0; i < nrow; i++) {<br> d1[i] = mat(i, 0);<br> }<br> std::cout << "Number of elements in mat: " << d1.length() << std::endl;<br> std::multimap<int, NumericVector> m;<br> for (int i = 0; i < nrow; i++) {<br> NumericVector d = mat.row(i);<br> m.insert(std::pair<int, NumericVector>(d1[i], d));<br> }<br><br> // Create vector of deduplicated entries: <br> std::set<int> keys_dedup;<br> for (int i = 0; i < nrow; ++i) keys_dedup.insert(d1[i]);<br> std::cout << "Number of elements in set : " <<
keys_dedup.size() << std::endl;<br> std: vector<int> vec;<br> vec.assign(keys_dedup.begin(), keys_dedup.end());<br> std::cout << "Number of elements in vec : " << vec.size() << std::endl;<br><br> //sampling among the unique keys<br> Engine eng;<br> eng.seed((unsigned int) 123);<br> std::tr1::uniform_int<int> unif(0, vec.size() - 1);<br> std::list<NumericVector> samples;<br> for (int i = 0; i < vec.size(); ++i) {<br> int u = unif(eng);<br> std::cout << u << " : " << vec[u] << std::endl;<br><br> std::pair<std::multimap<int, NumericVector>::iterator,<br>
std::multimap<int, NumericVector>::iterator> ret =<br> m.equal_range(vec[u]);<br> for (std::multimap<int, NumericVector>::iterator it = ret.first;<br> it != ret.second; ++it) {<br> samples.push_back(it->second);<br> }<br> }<br> std::cout << "Number of elements in samples : " << samples.size() << std::endl;<br><br> // NumericMatrix matR(samples.size(), mat.ncol());<br> // for (int i = 0; i < samples.size(); ++i) {<br> //
matR.row(i) = Rcpp::as(samples[i]);<br> // }<br>// return matR; <br>}<br><br>I have a performance related question : <br>m is a 10^⁵ * 20 matrix<br>if i submit : system.time(m <- resample(m))<br><br>I see: <br>Number of elements in mat: 100000<br>Number of elements in set : 939<br>Number of elements in vec : 939<br>Number of elements in samples : 99008 !!!!( here in the console it takes less than 1 sec to get there)<br>utilisateur système écoulé <br> 38.531 0.004 38.631 <br><br><br>I would like to know if possible how to decrease the 38 seconds between the std::cout (in the c++ code) and the end of the execution in R. <br>Could this be due to memory
management/garbage collection, as I can see the last cout in less than 1 sec in the R console ?<br><br>Please advise<br>Toki<br></div></div></body></html>