Ok, I started with smaller examples. I understand more or less how to manipulate IntegerVectors, but not StringVectors (see below), and thus I can't even start manipulating a simple list of StringVectors. Even so I looked at mailing lists, StackOverflow, package pdf, source code on R-Forge...<br>
<br>The following code tells me "warning: cannot pass objects of non-POD type ‘struct Rcpp::internal::string_proxy<16>’ through ‘...’; call will abort at runtime": why does it complain about printing the string in vec_s[i]?<br>
<br>fn <- cxxfunction(signature(l_in="list"),<br> body='<br>using namespace Rcpp;<br>List l = List(l_in);<br>Rprintf("list size: %d\\n", l.size());<br><br>IntegerVector vec_i= IntegerVector(2);<br>
vec_i[0] = 1;<br>vec_i[1] = 2;<br>List l2 = List::create(_["vec"] = vec_i);<br>Rprintf("vec_i size: %d\\n", vec_i.size());<br>for(int i=0; i<vec_i.size(); ++i)<br> Rprintf("vec_i[%d]=%d\\n", i, vec_i[i]);<br>
<br>StringVector vec_s = StringVector::create("toto");<br>vec_s[0] = "toto";<br>Rprintf("vec_s size: %d\\n", vec_s.size());<br>for(int i=0; i<vec_s.size(); ++i)<br> Rprintf("vec_s[%d]=%s\\n", i, vec_s[i]);<br>
<br>return l2;<br>',<br> plugin="Rcpp", verbose=TRUE)<br>print(fn(list(a=c(1,2,3), b=c("a","b","c"))))<br><br>Moreover, how can I access the component of a list given as input, as "l_in" above? Should I use l.begin()? or l[1]? or l["a"]? none of them seems to compile successfully.<br>
<br><div class="gmail_quote">On Thu, Aug 11, 2011 at 8:54 PM, Dirk Eddelbuettel <span dir="ltr"><<a href="mailto:edd@debian.org">edd@debian.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
Howdy,<br>
<div><div></div><div class="h5"><br>
On 11 August 2011 at 20:44, Walrus Foolhill wrote:<br>
| Ok, thanks for your answer, but I wasn't clear enough. So here are more details<br>
| of what I want to do.<br>
|<br>
| I have one list named "probes":<br>
| probes <- list(chr1=data.frame(name=c("p1","p2"),<br>
| start=c(81,95),<br>
| end=c(85,100),<br>
| stringsAsFactors=FALSE))<br>
|<br>
| I also have one list named "genes":<br>
| genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111), end=c<br>
| (90,190)),<br>
| chr2=data.frame(name="g3", start=11, end=90))<br>
|<br>
| I need to compare those two lists in order to obtain the following list which<br>
| contains, for each gene, the name of the probes included in it:<br>
| links <- list(chr1=list(g1=c("p1")))<br>
|<br>
| Here is my R function (assuming that the probes are sorted based on their start<br>
| and end coordinates):<br>
|<br>
| fun.l <- function(genes, probes){<br>
| links <- lapply(names(genes), function(<a href="http://chr.name" target="_blank">chr.name</a>){<br>
| if(! <a href="http://chr.name" target="_blank">chr.name</a> %in% names(probes))<br>
| return(NULL)<br>
| <br>
| res <- list()<br>
| <br>
| genes.c <- genes[[<a href="http://chr.name" target="_blank">chr.name</a>]]<br>
| probes.c <- probes[[<a href="http://chr.name" target="_blank">chr.name</a>]]<br>
| <br>
| for(<a href="http://gene.name" target="_blank">gene.name</a> in genes.c$name){<br>
| gene <- genes.c[genes.c$name == <a href="http://gene.name" target="_blank">gene.name</a>,]<br>
| res[[<a href="http://gene.name" target="_blank">gene.name</a>]] <- vector()<br>
| for(<a href="http://probe.name" target="_blank">probe.name</a> in probes.c$name){<br>
| probe <- probes.c[probes.c$name == <a href="http://probe.name" target="_blank">probe.name</a>,]<br>
| if(probe$start >= gene$start && probe$end <= gene$end)<br>
| res[[<a href="http://gene.name" target="_blank">gene.name</a>]] <- append(res[[<a href="http://gene.name" target="_blank">gene.name</a>]], <a href="http://probe.name" target="_blank">probe.name</a>)<br>
| else if(probe$start > gene$end)<br>
| break<br>
| }<br>
| if(length(res[[<a href="http://gene.name" target="_blank">gene.name</a>]]) == 0)<br>
| res[[<a href="http://gene.name" target="_blank">gene.name</a>]] <- NULL<br>
| }<br>
| <br>
| if(length(res) == 0)<br>
| res <- NA<br>
| return(res)<br>
| })<br>
| names(links) <- names(genes)<br>
| links <- Filter(function(links.c){!is.null(links.c)}, links)<br>
| return(links)<br>
| }<br>
|<br>
| And here is the beginning of my attempt using Rcpp:<br>
|<br>
| src <- '<br>
| using namespace Rcpp;<br>
|<br>
| List genes = List(genes_in);<br>
| int genes_nb_chr = genes.length();<br>
| std::vector<std::string> genes_chr = genes.names();<br>
|<br>
| List probes = List(probes_in);<br>
| int probes_nb_chr = probes.length();<br>
|<br>
| std::vector< std::vector<std::string> > links;<br>
|<br>
| // the main task is performed in this loop<br>
| for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){<br>
| DataFrame genes_c = DataFrame(genes[chrnum]);<br>
| // ... add code to map probes on genes, that is fill "links" ...<br>
| }<br>
|<br>
| return wrap(links);<br>
| '<br>
|<br>
| funC <- cxxfunction(signature(genes_in="list",<br>
| probes_in="list"),<br>
| body=src, plugin="Rcpp")<br>
|<br>
| The problem starts quite early: when I compile this piece of code, I get<br>
| "error: call of overloaded ‘DataFrame(Rcpp::internal::generic_proxy<19>)’ is<br>
| ambiguous".<br>
<br>
</div></div>Try a simpler mock-up. I don't have it in me to work through this now.<br>
DataFrames are a little different from C++ -- start by trying to summarize in<br>
just a vector, or collection of vectors.<br>
<div class="im"><br>
| What should I do to go through the "probes" and "genes" lists given as input?<br>
| Maybe more generically, how can we go through a list of lists (of lists...)<br>
| with Rcpp?<br>
|<br>
| 2nd (small) question, I don't manage to use Rprintf when using inline, for<br>
| instance Rprintf("%d\n", i);, it complains about the quotes. What should I do<br>
| to print statement from within the for loop?<br>
<br>
</div>The backslashes need escaping as in<br>
<br>
R> printing <- cxxfunction(, plugin="Rcpp", body=' Rprintf("foo\\n"); ')<br>
R> printing()<br>
foo<br>
NULL<br>
R><br>
<div class="im"><br>
| Thanks in advance. As my question is very long, I won't mind if you tell me to<br>
| find another way by myself. But maybe one of you can put me on the good track.<br>
<br>
</div>You are doing good but you have decent size problem. Try breaking into<br>
smaller pieces and a handle on each problem in turn.<br>
<br>
Dirk<br>
<div><div></div><div class="h5"><br>
|<br>
| On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <<a href="mailto:edd@debian.org">edd@debian.org</a>> wrote:<br>
|<br>
|<br>
| On 11 August 2011 at 03:06, Walrus Foolhill wrote:<br>
| | Hello,<br>
| | I need to create a list and then fill it sequentially by adding<br>
| components in a<br>
| | for loop. Here is an example that works:<br>
| |<br>
| | library(inline)<br>
| | src <- '<br>
| | Rcpp::List mylist(2);<br>
| | for(int i=0; i<2; ++i)<br>
| | mylist[i] = i;<br>
| | mylist.names() = CharacterVector::create("a","b");<br>
| | return mylist;<br>
| | '<br>
| | fun <- cxxfunction(body=src, plugin="Rcpp")<br>
| | print(fun())<br>
| |<br>
| | But what I really want is to create an empty list and then fill it, that<br>
| is<br>
| | without specifying its number of components before hand... This is<br>
| because I<br>
| | don't know in advance at which step of the for loop I will need to create<br>
| a new<br>
| | component. Here is an example, that obviously doesn't work, but that<br>
| should<br>
| | show what I am looking for:<br>
| |<br>
| | Rcpp::List mylist;<br>
| | CharacterVector names = CharacterVector::create("a", "b");<br>
|<br>
| If you know how long names is, you know how long mylist going to be ....<br>
|<br>
| | for(int i=0; i<2; ++i){<br>
| | mylist.add(names[i], IntegerVector::create());<br>
| | mylist[names[i]].push_back(i);<br>
|<br>
| I don't understand what that is trying to do.<br>
|<br>
| | }<br>
| | return mylist;<br>
| |<br>
| | Do you know how I could achieve this? Thanks.<br>
|<br>
| Rcpp::List is an alias for Rcpp::GenericVector, and derives from Vector.<br>
| You<br>
| can look at the public member functions -- there are things like<br>
|<br>
| push_back()<br>
| push_front()<br>
| insert()<br>
|<br>
| etc that behave like STL functions __but are inefficient as we (almost<br>
| always) need to copy the whole object__ so they are not recommended.<br>
|<br>
| When I had to deal with 'unknown quantities of data' returning I was mostly<br>
| able to either turn it into a 'fixed or known columns, unknow rows' problem<br>
| (easy, just grow row-wise) or I 'cached' in a C++ data structure first<br>
| before<br>
| returning to R via Rcpp structures -- and then I knew the dimensions for<br>
| the<br>
| to-be-created object too.<br>
|<br>
| Dirk<br>
|<br>
|<br>
| --<br>
| Two new Rcpp master classes for R and C++ integration scheduled for<br>
| New York (Sep 24) and San Francisco (Oct 8), more details are at<br>
| <a href="http://dirk.eddelbuettel.com/blog/2011/08/04#" target="_blank">http://dirk.eddelbuettel.com/blog/2011/08/04#</a><br>
| rcpp_classes_2011-09_and_2011-10<br>
|<br>
|<br>
<br>
</div></div>--<br>
<div class="im">Two new Rcpp master classes for R and C++ integration scheduled for<br>
New York (Sep 24) and San Francisco (Oct 8), more details are at<br>
<a href="http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10" target="_blank">http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10</a><br>
</div><a href="http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php" target="_blank">http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php</a><br>
</blockquote></div><br>