[Rcpp-devel] add new components to list without specifying list size initially

Walrus Foolhill walrus.foolhill at gmail.com
Fri Aug 12 08:22:08 CEST 2011


Ok, I started with smaller examples. I understand more or less how to
manipulate IntegerVectors, but not StringVectors (see below), and thus I
can't even start manipulating a simple list of StringVectors. Even so I
looked at mailing lists, StackOverflow, package pdf, source code on
R-Forge...

The following code tells me "warning: cannot pass objects of non-POD type
‘struct Rcpp::internal::string_proxy<16>’ through ‘...’; call will abort at
runtime": why does it complain about printing the string in vec_s[i]?

fn <- cxxfunction(signature(l_in="list"),
                  body='
using namespace Rcpp;
List l = List(l_in);
Rprintf("list size: %d\\n", l.size());

IntegerVector vec_i= IntegerVector(2);
vec_i[0] = 1;
vec_i[1] = 2;
List l2 = List::create(_["vec"] = vec_i);
Rprintf("vec_i size: %d\\n", vec_i.size());
for(int i=0; i<vec_i.size(); ++i)
  Rprintf("vec_i[%d]=%d\\n", i, vec_i[i]);

StringVector vec_s = StringVector::create("toto");
vec_s[0] = "toto";
Rprintf("vec_s size: %d\\n", vec_s.size());
for(int i=0; i<vec_s.size(); ++i)
  Rprintf("vec_s[%d]=%s\\n", i, vec_s[i]);

return l2;
',
                  plugin="Rcpp", verbose=TRUE)
print(fn(list(a=c(1,2,3), b=c("a","b","c"))))

Moreover, how can I access the component of a list given as input, as "l_in"
above? Should I use l.begin()? or l[1]? or l["a"]? none of them seems to
compile successfully.

On Thu, Aug 11, 2011 at 8:54 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

>
> Howdy,
>
> On 11 August 2011 at 20:44, Walrus Foolhill wrote:
> | Ok, thanks for your answer, but I wasn't clear enough. So here are more
> details
> | of what I want to do.
> |
> | I have one list named "probes":
> | probes <- list(chr1=data.frame(name=c("p1","p2"),
> |                  start=c(81,95),
> |                  end=c(85,100),
> |                  stringsAsFactors=FALSE))
> |
> | I also have one list named "genes":
> | genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111), end=c
> | (90,190)),
> |                 chr2=data.frame(name="g3", start=11, end=90))
> |
> | I need to compare those two lists in order to obtain the following list
> which
> | contains, for each gene, the name of the probes included in it:
> | links <- list(chr1=list(g1=c("p1")))
> |
> | Here is my R function (assuming that the probes are sorted based on their
> start
> | and end coordinates):
> |
> | fun.l <- function(genes, probes){
> |   links <- lapply(names(genes), function(chr.name){
> |     if(! chr.name %in% names(probes))
> |       return(NULL)
> |
> |     res <- list()
> |
> |     genes.c <- genes[[chr.name]]
> |     probes.c <- probes[[chr.name]]
> |
> |     for(gene.name in genes.c$name){
> |       gene <- genes.c[genes.c$name == gene.name,]
> |       res[[gene.name]] <- vector()
> |       for(probe.name in probes.c$name){
> |         probe <- probes.c[probes.c$name == probe.name,]
> |         if(probe$start >= gene$start && probe$end <= gene$end)
> |           res[[gene.name]] <- append(res[[gene.name]], probe.name)
> |         else if(probe$start > gene$end)
> |           break
> |       }
> |       if(length(res[[gene.name]]) == 0)
> |         res[[gene.name]] <- NULL
> |     }
> |
> |     if(length(res) == 0)
> |       res <- NA
> |     return(res)
> |   })
> |   names(links) <- names(genes)
> |   links <- Filter(function(links.c){!is.null(links.c)}, links)
> |   return(links)
> | }
> |
> | And here is the beginning of my attempt using Rcpp:
> |
> | src <- '
> | using namespace Rcpp;
> |
> | List genes = List(genes_in);
> | int genes_nb_chr = genes.length();
> | std::vector<std::string> genes_chr = genes.names();
> |
> | List probes = List(probes_in);
> | int probes_nb_chr = probes.length();
> |
> | std::vector< std::vector<std::string> > links;
> |
> | // the main task is performed in this loop
> | for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){
> |   DataFrame genes_c = DataFrame(genes[chrnum]);
> |   // ... add code to map probes on genes, that is fill "links" ...
> | }
> |
> | return wrap(links);
> | '
> |
> | funC <- cxxfunction(signature(genes_in="list",
> |                                 probes_in="list"),
> |                       body=src, plugin="Rcpp")
> |
> | The problem starts quite early: when I compile this piece of code, I get
> | "error: call of overloaded ‘DataFrame(Rcpp::internal::generic_proxy<19>)’
> is
> | ambiguous".
>
> Try a simpler mock-up. I don't have it in me to work through this now.
> DataFrames are a little different from C++ -- start by trying to summarize
> in
> just a vector, or collection of vectors.
>
> | What should I do to go through the "probes" and "genes" lists given as
> input?
> | Maybe more generically, how can we go through a list of lists (of
> lists...)
> | with Rcpp?
> |
> | 2nd (small) question, I don't manage to use Rprintf when using inline,
> for
> | instance Rprintf("%d\n", i);, it complains about the quotes. What should
> I do
> | to print statement from within the for loop?
>
> The backslashes need escaping as in
>
>  R> printing <- cxxfunction(, plugin="Rcpp", body=' Rprintf("foo\\n"); ')
>  R> printing()
>  foo
>  NULL
>  R>
>
> | Thanks in advance. As my question is very long, I won't mind if you tell
> me to
> | find another way by myself. But maybe one of you can put me on the good
> track.
>
> You are doing good but you have decent size problem. Try breaking into
> smaller pieces and a handle on each problem in turn.
>
> Dirk
>
> |
> | On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <edd at debian.org>
> wrote:
> |
> |
> |     On 11 August 2011 at 03:06, Walrus Foolhill wrote:
> |     | Hello,
> |     | I need to create a list and then fill it sequentially by adding
> |     components in a
> |     | for loop. Here is an example that works:
> |     |
> |     | library(inline)
> |     | src <- '
> |     | Rcpp::List mylist(2);
> |     | for(int i=0; i<2; ++i)
> |     |   mylist[i] = i;
> |     | mylist.names() = CharacterVector::create("a","b");
> |     | return mylist;
> |     | '
> |     | fun <- cxxfunction(body=src, plugin="Rcpp")
> |     | print(fun())
> |     |
> |     | But what I really want is to create an empty list and then fill it,
> that
> |     is
> |     | without specifying its number of components before hand... This is
> |     because I
> |     | don't know in advance at which step of the for loop I will need to
> create
> |     a new
> |     | component. Here is an example, that obviously doesn't work, but
> that
> |     should
> |     | show what I am looking for:
> |     |
> |     | Rcpp::List mylist;
> |     | CharacterVector names = CharacterVector::create("a", "b");
> |
> |     If you know how long names is, you know how long mylist going to be
> ....
> |
> |     | for(int i=0; i<2; ++i){
> |     |   mylist.add(names[i], IntegerVector::create());
> |     |   mylist[names[i]].push_back(i);
> |
> |     I don't understand what that is trying to do.
> |
> |     | }
> |     | return mylist;
> |     |
> |     | Do you know how I could achieve this? Thanks.
> |
> |     Rcpp::List is an alias for Rcpp::GenericVector, and derives from
> Vector.
> |     You
> |     can look at the public member functions -- there are things like
> |
> |        push_back()
> |        push_front()
> |        insert()
> |
> |     etc that behave like STL functions __but are inefficient as we
> (almost
> |     always) need to copy the whole object__ so they are not recommended.
> |
> |     When I had to deal with 'unknown quantities of data' returning I was
> mostly
> |     able to either turn it into a 'fixed or known columns, unknow rows'
> problem
> |     (easy, just grow row-wise) or I 'cached' in a C++ data structure
> first
> |     before
> |     returning to R via Rcpp structures -- and then I knew the dimensions
> for
> |     the
> |     to-be-created object too.
> |
> |     Dirk
> |
> |
> |     --
> |     Two new Rcpp master classes for R and C++ integration scheduled for
> |     New York (Sep 24) and San Francisco (Oct 8), more details are at
> |     http://dirk.eddelbuettel.com/blog/2011/08/04#
> |     rcpp_classes_2011-09_and_2011-10
> |
> |
>
> --
> Two new Rcpp master classes for R and C++ integration scheduled for
> New York (Sep 24) and San Francisco (Oct 8), more details are at
>
> http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
>
> http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110812/7e39f407/attachment-0001.htm>


More information about the Rcpp-devel mailing list