[Rcpp-devel] add new components to list without specifying list size initially

Dirk Eddelbuettel edd at debian.org
Fri Aug 12 03:54:13 CEST 2011


Howdy,

On 11 August 2011 at 20:44, Walrus Foolhill wrote:
| Ok, thanks for your answer, but I wasn't clear enough. So here are more details
| of what I want to do.
| 
| I have one list named "probes":
| probes <- list(chr1=data.frame(name=c("p1","p2"),
|                  start=c(81,95),
|                  end=c(85,100),
|                  stringsAsFactors=FALSE))
| 
| I also have one list named "genes":
| genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111), end=c
| (90,190)),
|                 chr2=data.frame(name="g3", start=11, end=90))
| 
| I need to compare those two lists in order to obtain the following list which
| contains, for each gene, the name of the probes included in it:
| links <- list(chr1=list(g1=c("p1")))
| 
| Here is my R function (assuming that the probes are sorted based on their start
| and end coordinates):
| 
| fun.l <- function(genes, probes){
|   links <- lapply(names(genes), function(chr.name){
|     if(! chr.name %in% names(probes))
|       return(NULL)
|    
|     res <- list()
|    
|     genes.c <- genes[[chr.name]]
|     probes.c <- probes[[chr.name]]
|    
|     for(gene.name in genes.c$name){
|       gene <- genes.c[genes.c$name == gene.name,]
|       res[[gene.name]] <- vector()
|       for(probe.name in probes.c$name){
|         probe <- probes.c[probes.c$name == probe.name,]
|         if(probe$start >= gene$start && probe$end <= gene$end)
|           res[[gene.name]] <- append(res[[gene.name]], probe.name)
|         else if(probe$start > gene$end)
|           break
|       }
|       if(length(res[[gene.name]]) == 0)
|         res[[gene.name]] <- NULL
|     }
|    
|     if(length(res) == 0)
|       res <- NA
|     return(res)
|   })
|   names(links) <- names(genes)
|   links <- Filter(function(links.c){!is.null(links.c)}, links)
|   return(links)
| }
| 
| And here is the beginning of my attempt using Rcpp:
| 
| src <- '
| using namespace Rcpp;
| 
| List genes = List(genes_in);
| int genes_nb_chr = genes.length();
| std::vector<std::string> genes_chr = genes.names();
| 
| List probes = List(probes_in);
| int probes_nb_chr = probes.length();
| 
| std::vector< std::vector<std::string> > links;
| 
| // the main task is performed in this loop
| for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){
|   DataFrame genes_c = DataFrame(genes[chrnum]);
|   // ... add code to map probes on genes, that is fill "links" ...
| }
| 
| return wrap(links);
| '
| 
| funC <- cxxfunction(signature(genes_in="list",
|                                 probes_in="list"),
|                       body=src, plugin="Rcpp")
| 
| The problem starts quite early: when I compile this piece of code, I get
| "error: call of overloaded ‘DataFrame(Rcpp::internal::generic_proxy<19>)’ is
| ambiguous".

Try a simpler mock-up. I don't have it in me to work through this now.
DataFrames are a little different from C++ -- start by trying to summarize in
just a vector, or collection of vectors.
 
| What should I do to go through the "probes" and "genes" lists given as input?
| Maybe more generically, how can we go through a list of lists (of lists...)
| with Rcpp?
| 
| 2nd (small) question, I don't manage to use Rprintf when using inline, for
| instance Rprintf("%d\n", i);, it complains about the quotes. What should I do
| to print statement from within the for loop?

The backslashes need escaping as in

  R> printing <- cxxfunction(, plugin="Rcpp", body=' Rprintf("foo\\n"); ')
  R> printing()
  foo
  NULL
  R> 
 
| Thanks in advance. As my question is very long, I won't mind if you tell me to
| find another way by myself. But maybe one of you can put me on the good track.

You are doing good but you have decent size problem. Try breaking into
smaller pieces and a handle on each problem in turn.

Dirk

| 
| On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| 
| 
|     On 11 August 2011 at 03:06, Walrus Foolhill wrote:
|     | Hello,
|     | I need to create a list and then fill it sequentially by adding
|     components in a
|     | for loop. Here is an example that works:
|     |
|     | library(inline)
|     | src <- '
|     | Rcpp::List mylist(2);
|     | for(int i=0; i<2; ++i)
|     |   mylist[i] = i;
|     | mylist.names() = CharacterVector::create("a","b");
|     | return mylist;
|     | '
|     | fun <- cxxfunction(body=src, plugin="Rcpp")
|     | print(fun())
|     |
|     | But what I really want is to create an empty list and then fill it, that
|     is
|     | without specifying its number of components before hand... This is
|     because I
|     | don't know in advance at which step of the for loop I will need to create
|     a new
|     | component. Here is an example, that obviously doesn't work, but that
|     should
|     | show what I am looking for:
|     |
|     | Rcpp::List mylist;
|     | CharacterVector names = CharacterVector::create("a", "b");
| 
|     If you know how long names is, you know how long mylist going to be ....
| 
|     | for(int i=0; i<2; ++i){
|     |   mylist.add(names[i], IntegerVector::create());
|     |   mylist[names[i]].push_back(i);
| 
|     I don't understand what that is trying to do.
| 
|     | }
|     | return mylist;
|     |
|     | Do you know how I could achieve this? Thanks.
| 
|     Rcpp::List is an alias for Rcpp::GenericVector, and derives from Vector.
|     You
|     can look at the public member functions -- there are things like
| 
|        push_back()
|        push_front()
|        insert()
| 
|     etc that behave like STL functions __but are inefficient as we (almost
|     always) need to copy the whole object__ so they are not recommended.
| 
|     When I had to deal with 'unknown quantities of data' returning I was mostly
|     able to either turn it into a 'fixed or known columns, unknow rows' problem
|     (easy, just grow row-wise) or I 'cached' in a C++ data structure first
|     before
|     returning to R via Rcpp structures -- and then I knew the dimensions for
|     the
|     to-be-created object too.
| 
|     Dirk
| 
| 
|     --
|     Two new Rcpp master classes for R and C++ integration scheduled for
|     New York (Sep 24) and San Francisco (Oct 8), more details are at
|     http://dirk.eddelbuettel.com/blog/2011/08/04#
|     rcpp_classes_2011-09_and_2011-10
| 
| 

-- 
Two new Rcpp master classes for R and C++ integration scheduled for 
New York (Sep 24) and San Francisco (Oct 8), more details are at
http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php


More information about the Rcpp-devel mailing list