[Rcpp-devel] Rcpp "version" of R's match function

Romain Francois romain at r-enthusiasts.com
Fri Nov 16 08:56:57 CET 2012


We need to fond the right compromise between bloating Rcpp (which is 
already quite huge:

wc src/* inst/include/**/** inst/include/* 2> /dev/null | tail -n1
    66180  784183 6425152 total

and support generic enough things.

I can see things like union and setdiff being generic enough (we already 
have unique btw).


Then for other things, being in another package is not that bad.
An after all, this is what Rcpp really is about: give others the tools.

Romain

Le 15/11/12 20:07, Søren Højsgaard a écrit :
> Dear list
>
> [>>] I am not sure if Hadleys remark below was an invitation to make a "wishlish", but I'll take the risk:
>
> 1) I have made several packages related to graphical models for multivariate data. Much of these packages deals with "book keeping": operations on sets of subsets of a finite set of variables, so in these packages there is much use of union(), setdiff(), etc and these function all heavily use match(). The same applies to unique() which is also based on match(). It would be very nice to have these in c++ form. Hence, with a c++ version of match() these should be low-hanging apples.
>
> 2) Also of relevance to the graphical model packages is a c++ version of aperm() for permuting an array.
>
> 3) There are operations on such arrays which I imagine could be conveniently made in the Rcpp-framwork. Consider a 2x2x2 contingency table with dimnames a,b,c. Call this table n(a,b,c). The all-two-factor log-linear model will have generators (a,b)(a,c)(c,b). Iterative proportional fitting works as follows: Let m(a,b,c) denotes the array of fitted values (at the current iteration). Then the update for the (c,b) generator is
>
>   m(a,b,c) <- m(a,b,c) n(c,b)/m(c,b)
>
> To do this one must have
>   marginalization: n(a,b,c) -> n(b,c)
>   permutation: n(b,c) -> n(c,b)
>   division: n(c,b)/m(c,b)
>   multiplication: m(a,b,c) * ( n(c,b)/m(c,b) )
>
> I am aware that iterative proportional fitting is already implemented in loglin, but there are other kind of (graphical) models where similar updates are needed. In connection with message passing in Bayesian networks, one operation often needed is
>
>   m(a,b,c) <- n(a,b) * n(c,b)
>
> which will result in an array with dimensions (a,b,c). All of this stuff is implemented in the gRbase backage as R functions, and it would be very convenient to have these operations as c++ functions. In the gRbase implementation it is required that the arrays do have dimnames, and I guess it must be so also in c++.
>
> I am perfectly aware that I should program these facilities in c++ using Rcpp, but I just can't resist to mention these wishes, in case they are "almost there" in c++.
>
> Best regards
> Søren
>
>
>
>
>
>
>
>
>
>
> Hmmm - see http://cran.r-project.org/web/packages/fastmatch/index.html
>
> Hadley
>
> PS.  Would you be interested in a set of R functions that from a quick skim of the R sources that I think could be much much faster if implemented in Rcpp?
>
>
> --
> RStudio / Rice University
> http://had.co.nz/
>


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list