[Rcpp-devel] add new components to list without specifying list size initially

Dirk Eddelbuettel edd at debian.org
Fri Aug 12 23:37:13 CEST 2011


On 12 August 2011 at 16:26, Walrus Foolhill wrote:
| Thanks for your advice, I now understand how to manipulate one-level lists:
| 
| fn <- cxxfunction(signature(l_in="list"),
|                   body='
| using namespace Rcpp;
| List l(l_in);
| IntegerVector lf = l["foo"];
| CharacterVector lb = l["bar"];
| for(int i=0; i<lf.size(); ++i)
|   Rprintf("l[%s][%i] %i\\n", "foo", i, lf[i]);
| for(int i=0; i<lb.size(); ++i)
|   Rprintf("l[%s][%i] %s\\n", "bar", i, std::string(lb[i]).c_str());
| ', plugin="Rcpp", verbose=TRUE)
| z <- fn(list(foo=c(1,2,3,4),bar=c("bar1","bar2")))
| 
| But what about 2-level lists? Why the following code doesn't compile?
| 
| fn <- cxxfunction(signature(l_in="list"),
|                   body='
| using namespace Rcpp;
| List l(l_in);
| List lf(l["foo"]);
| ', plugin="Rcpp", verbose=TRUE)
| z <- fn(list(foo=list(bar=1)))
| 
| And what the following message mean? "error: call of overloaded ‘Vector
| (Rcpp::internal::generic_name_proxy<19>)’ is ambiguous"
| 
| I had a look at "runit.Vector.R" on r-forge, but couldn't find any test
| involving 2-level (or more) lists, although on SO in June 2010 (http://
| stackoverflow.com/questions/3088650/how-do-i-create-a-list-of-vectors-in-rcpp/
| 3088744#3088744), you said that it should work.
| 
| I checked that I can create a 2-level list, but the code below doesn't compile
| if I uncomment the last Rprintf line:

There can be times when the C++ templating gets in the way, so if this
doesn't work in a single statement, decompose it into two (one to assign to a
temp, another to print them temp) and move on.

I have done two-level lists in the past; one key is that a list ... is just
another SEXP, or can be wrap()'ed to a SEXP, and you can hence assign a list
to be a component of another. And then another and so on...

Dirk


| 
| fn <- cxxfunction(signature(),
|                   body='
| using namespace Rcpp;
| IntegerVector vi(2);
| vi[0] = 2;
| vi[1] = 8;
| List ll = List::create(Named("bar")=vi);
| Rprintf("ll.size %i\\n", ll.size());
| List l = List::create(Named("foo")=ll);
| Rprintf("l.size %i\\n", l.size());
| //Rprintf("l.ll.size %i\\n", l["foo"].size());
| return l;
| ', plugin="Rcpp", verbose=TRUE)
| print(fn())
| 
| Thus once again I'm stuck, but if I know how to access 2-level lists, I think I
| will be able to go back to my original problem, and stop sending emails on this
| mailing list ;)
| 
| On Fri, Aug 12, 2011 at 8:09 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| 
| 
|     On 12 August 2011 at 01:22, Walrus Foolhill wrote:
|     | Ok, I started with smaller examples. I understand more or less how to
|     | manipulate IntegerVectors, but not StringVectors (see below), and thus I
|     can't
|     | even start manipulating a simple list of StringVectors. Even so I looked
|     at
|     | mailing lists, StackOverflow, package pdf, source code on R-Forge...
|     |
|     | The following code tells me "warning: cannot pass objects of non-POD type
|     | ‘struct Rcpp::internal::string_proxy<16>’ through ‘...’; call will abort
|     at
|     | runtime": why does it complain about printing the string in vec_s[i]?
| 
|     Again, simpler helps. That is the standard C / C++ error message of
| 
|            std:string foo = "bar";
|            printf("String is %s \n", foo);
| 
|     where you need foo.c_str() to pass a char* to printf.
| 
|     | fn <- cxxfunction(signature(l_in="list"),
|     |                   body='
|     | using namespace Rcpp;
|     | List l = List(l_in);
|     | Rprintf("list size: %d\\n", l.size());
|     |
|     | IntegerVector vec_i= IntegerVector(2);
|     | vec_i[0] = 1;
|     | vec_i[1] = 2;
|     | List l2 = List::create(_["vec"] = vec_i);
|     | Rprintf("vec_i size: %d\\n", vec_i.size());
|     | for(int i=0; i<vec_i.size(); ++i)
|     |   Rprintf("vec_i[%d]=%d\\n", i, vec_i[i]);
|     |
|     | StringVector vec_s = StringVector::create("toto");
|     | vec_s[0] = "toto";
|     | Rprintf("vec_s size: %d\\n", vec_s.size());
|     | for(int i=0; i<vec_s.size(); ++i)
|     |   Rprintf("vec_s[%d]=%s\\n", i, vec_s[i]);
| 
|     Try vec_s[i].c_str() instead.
| 
|     Dirk
| 
|     | return l2;
|     | ',
|     |                   plugin="Rcpp", verbose=TRUE)
|     | print(fn(list(a=c(1,2,3), b=c("a","b","c"))))
|     |
|     | Moreover, how can I access the component of a list given as input, as
|     "l_in"
|     | above? Should I use l.begin()? or l[1]? or l["a"]? none of them seems to
|     | compile successfully.
|     |
|     | On Thu, Aug 11, 2011 at 8:54 PM, Dirk Eddelbuettel <edd at debian.org>
|     wrote:
|     |
|     |
|     |     Howdy,
|     |
|     |     On 11 August 2011 at 20:44, Walrus Foolhill wrote:
|     |     | Ok, thanks for your answer, but I wasn't clear enough. So here are
|     more
|     |     details
|     |     | of what I want to do.
|     |     |
|     |     | I have one list named "probes":
|     |     | probes <- list(chr1=data.frame(name=c("p1","p2"),
|     |     |                  start=c(81,95),
|     |     |                  end=c(85,100),
|     |     |                  stringsAsFactors=FALSE))
|     |     |
|     |     | I also have one list named "genes":
|     |     | genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111),
|     end=c
|     |     | (90,190)),
|     |     |                 chr2=data.frame(name="g3", start=11, end=90))
|     |     |
|     |     | I need to compare those two lists in order to obtain the following
|     list
|     |     which
|     |     | contains, for each gene, the name of the probes included in it:
|     |     | links <- list(chr1=list(g1=c("p1")))
|     |     |
|     |     | Here is my R function (assuming that the probes are sorted based on
|     their
|     |     start
|     |     | and end coordinates):
|     |     |
|     |     | fun.l <- function(genes, probes){
|     |     |   links <- lapply(names(genes), function(chr.name){
|     |     |     if(! chr.name %in% names(probes))
|     |     |       return(NULL)
|     |     |    
|     |     |     res <- list()
|     |     |    
|     |     |     genes.c <- genes[[chr.name]]
|     |     |     probes.c <- probes[[chr.name]]
|     |     |    
|     |     |     for(gene.name in genes.c$name){
|     |     |       gene <- genes.c[genes.c$name == gene.name,]
|     |     |       res[[gene.name]] <- vector()
|     |     |       for(probe.name in probes.c$name){
|     |     |         probe <- probes.c[probes.c$name == probe.name,]
|     |     |         if(probe$start >= gene$start && probe$end <= gene$end)
|     |     |           res[[gene.name]] <- append(res[[gene.name]], probe.name)
|     |     |         else if(probe$start > gene$end)
|     |     |           break
|     |     |       }
|     |     |       if(length(res[[gene.name]]) == 0)
|     |     |         res[[gene.name]] <- NULL
|     |     |     }
|     |     |    
|     |     |     if(length(res) == 0)
|     |     |       res <- NA
|     |     |     return(res)
|     |     |   })
|     |     |   names(links) <- names(genes)
|     |     |   links <- Filter(function(links.c){!is.null(links.c)}, links)
|     |     |   return(links)
|     |     | }
|     |     |
|     |     | And here is the beginning of my attempt using Rcpp:
|     |     |
|     |     | src <- '
|     |     | using namespace Rcpp;
|     |     |
|     |     | List genes = List(genes_in);
|     |     | int genes_nb_chr = genes.length();
|     |     | std::vector<std::string> genes_chr = genes.names();
|     |     |
|     |     | List probes = List(probes_in);
|     |     | int probes_nb_chr = probes.length();
|     |     |
|     |     | std::vector< std::vector<std::string> > links;
|     |     |
|     |     | // the main task is performed in this loop
|     |     | for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){
|     |     |   DataFrame genes_c = DataFrame(genes[chrnum]);
|     |     |   // ... add code to map probes on genes, that is fill "links" ...
|     |     | }
|     |     |
|     |     | return wrap(links);
|     |     | '
|     |     |
|     |     | funC <- cxxfunction(signature(genes_in="list",
|     |     |                                 probes_in="list"),
|     |     |                       body=src, plugin="Rcpp")
|     |     |
|     |     | The problem starts quite early: when I compile this piece of code,
|     I get
|     |     | "error: call of overloaded ‘DataFrame(Rcpp::internal::generic_proxy
|     <19>)’
|     |     is
|     |     | ambiguous".
|     |
|     |     Try a simpler mock-up. I don't have it in me to work through this
|     now.
|     |     DataFrames are a little different from C++ -- start by trying to
|     summarize
|     |     in
|     |     just a vector, or collection of vectors.
|     |
|     |     | What should I do to go through the "probes" and "genes" lists given
|     as
|     |     input?
|     |     | Maybe more generically, how can we go through a list of lists (of
|     |     lists...)
|     |     | with Rcpp?
|     |     |
|     |     | 2nd (small) question, I don't manage to use Rprintf when using
|     inline,
|     |     for
|     |     | instance Rprintf("%d\n", i);, it complains about the quotes. What
|     should
|     |     I do
|     |     | to print statement from within the for loop?
|     |
|     |     The backslashes need escaping as in
|     |
|     |      R> printing <- cxxfunction(, plugin="Rcpp", body=' Rprintf("foo\\
|     n"); ')
|     |      R> printing()
|     |      foo
|     |      NULL
|     |      R>
|     |
|     |     | Thanks in advance. As my question is very long, I won't mind if you
|     tell
|     |     me to
|     |     | find another way by myself. But maybe one of you can put me on the
|     good
|     |     track.
|     |
|     |     You are doing good but you have decent size problem. Try breaking
|     into
|     |     smaller pieces and a handle on each problem in turn.
|     |
|     |     Dirk
|     |
|     |     |
|     |     | On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <edd at debian.org>
|     |     wrote:
|     |     |
|     |     |
|     |     |     On 11 August 2011 at 03:06, Walrus Foolhill wrote:
|     |     |     | Hello,
|     |     |     | I need to create a list and then fill it sequentially by
|     adding
|     |     |     components in a
|     |     |     | for loop. Here is an example that works:
|     |     |     |
|     |     |     | library(inline)
|     |     |     | src <- '
|     |     |     | Rcpp::List mylist(2);
|     |     |     | for(int i=0; i<2; ++i)
|     |     |     |   mylist[i] = i;
|     |     |     | mylist.names() = CharacterVector::create("a","b");
|     |     |     | return mylist;
|     |     |     | '
|     |     |     | fun <- cxxfunction(body=src, plugin="Rcpp")
|     |     |     | print(fun())
|     |     |     |
|     |     |     | But what I really want is to create an empty list and then
|     fill it,
|     |     that
|     |     |     is
|     |     |     | without specifying its number of components before hand...
|     This is
|     |     |     because I
|     |     |     | don't know in advance at which step of the for loop I will
|     need to
|     |     create
|     |     |     a new
|     |     |     | component. Here is an example, that obviously doesn't work,
|     but
|     |     that
|     |     |     should
|     |     |     | show what I am looking for:
|     |     |     |
|     |     |     | Rcpp::List mylist;
|     |     |     | CharacterVector names = CharacterVector::create("a", "b");
|     |     |
|     |     |     If you know how long names is, you know how long mylist going
|     to be
|     |     ....
|     |     |
|     |     |     | for(int i=0; i<2; ++i){
|     |     |     |   mylist.add(names[i], IntegerVector::create());
|     |     |     |   mylist[names[i]].push_back(i);
|     |     |
|     |     |     I don't understand what that is trying to do.
|     |     |
|     |     |     | }
|     |     |     | return mylist;
|     |     |     |
|     |     |     | Do you know how I could achieve this? Thanks.
|     |     |
|     |     |     Rcpp::List is an alias for Rcpp::GenericVector, and derives
|     from
|     |     Vector.
|     |     |     You
|     |     |     can look at the public member functions -- there are things
|     like
|     |     |
|     |     |        push_back()
|     |     |        push_front()
|     |     |        insert()
|     |     |
|     |     |     etc that behave like STL functions __but are inefficient as we
|     |     (almost
|     |     |     always) need to copy the whole object__ so they are not
|     recommended.
|     |     |
|     |     |     When I had to deal with 'unknown quantities of data' returning
|     I was
|     |     mostly
|     |     |     able to either turn it into a 'fixed or known columns, unknow
|     rows'
|     |     problem
|     |     |     (easy, just grow row-wise) or I 'cached' in a C++ data
|     structure
|     |     first
|     |     |     before
|     |     |     returning to R via Rcpp structures -- and then I knew the
|     dimensions
|     |     for
|     |     |     the
|     |     |     to-be-created object too.
|     |     |
|     |     |     Dirk
|     |     |
|     |     |
|     |     |     --
|     |     |     Two new Rcpp master classes for R and C++ integration scheduled
|     for
|     |     |     New York (Sep 24) and San Francisco (Oct 8), more details are
|     at
|     |     |     http://dirk.eddelbuettel.com/blog/2011/08/04#
|     |     |     rcpp_classes_2011-09_and_2011-10
|     |     |
|     |     |
|     |
|     |     --
|     |     Two new Rcpp master classes for R and C++ integration scheduled for
|     |     New York (Sep 24) and San Francisco (Oct 8), more details are at
|     |     http://dirk.eddelbuettel.com/blog/2011/08/04#
|     |     rcpp_classes_2011-09_and_2011-10
|     |     http://www.revolutionanalytics.com/products/training/public/
|     |     rcpp-master-class.php
|     |
|     |
| 
|     --
|     Two new Rcpp master classes for R and C++ integration scheduled for
|     New York (Sep 24) and San Francisco (Oct 8), more details are at
|     http://dirk.eddelbuettel.com/blog/2011/08/04#
|     rcpp_classes_2011-09_and_2011-10
|     http://www.revolutionanalytics.com/products/training/public/
|     rcpp-master-class.php
| 
| 

-- 
Two new Rcpp master classes for R and C++ integration scheduled for 
New York (Sep 24) and San Francisco (Oct 8), more details are at
http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php


More information about the Rcpp-devel mailing list