[Rcpp-devel] Rcpp "version" of R's match function

Søren Højsgaard sorenh at math.aau.dk
Thu Nov 15 20:07:11 CET 2012


Dear list

[>>] I am not sure if Hadleys remark below was an invitation to make a "wishlish", but I'll take the risk: 

1) I have made several packages related to graphical models for multivariate data. Much of these packages deals with "book keeping": operations on sets of subsets of a finite set of variables, so in these packages there is much use of union(), setdiff(), etc and these function all heavily use match(). The same applies to unique() which is also based on match(). It would be very nice to have these in c++ form. Hence, with a c++ version of match() these should be low-hanging apples.

2) Also of relevance to the graphical model packages is a c++ version of aperm() for permuting an array. 

3) There are operations on such arrays which I imagine could be conveniently made in the Rcpp-framwork. Consider a 2x2x2 contingency table with dimnames a,b,c. Call this table n(a,b,c). The all-two-factor log-linear model will have generators (a,b)(a,c)(c,b). Iterative proportional fitting works as follows: Let m(a,b,c) denotes the array of fitted values (at the current iteration). Then the update for the (c,b) generator is

 m(a,b,c) <- m(a,b,c) n(c,b)/m(c,b)

To do this one must have 
 marginalization: n(a,b,c) -> n(b,c)
 permutation: n(b,c) -> n(c,b)
 division: n(c,b)/m(c,b)
 multiplication: m(a,b,c) * ( n(c,b)/m(c,b) )

I am aware that iterative proportional fitting is already implemented in loglin, but there are other kind of (graphical) models where similar updates are needed. In connection with message passing in Bayesian networks, one operation often needed is

 m(a,b,c) <- n(a,b) * n(c,b)

which will result in an array with dimensions (a,b,c). All of this stuff is implemented in the gRbase backage as R functions, and it would be very convenient to have these operations as c++ functions. In the gRbase implementation it is required that the arrays do have dimnames, and I guess it must be so also in c++.

I am perfectly aware that I should program these facilities in c++ using Rcpp, but I just can't resist to mention these wishes, in case they are "almost there" in c++.

Best regards
Søren










Hmmm - see http://cran.r-project.org/web/packages/fastmatch/index.html

Hadley

PS.  Would you be interested in a set of R functions that from a quick skim of the R sources that I think could be much much faster if implemented in Rcpp?


--
RStudio / Rice University
http://had.co.nz/


More information about the Rcpp-devel mailing list