[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

Tue Oct 9 18:17:34 CEST 2012

Dear Dirk,

I dont see how to do that in Armadillo, but I think I can create same size
NumericMatrix B = is.na(X);
This maybe I could use for indexing ? To save time, and not to introduce
extra looping.

Also, I want to perform regression column X on Z, and column Y on Z.
Does arma::solve(...) handle nan values ? or I guess I have to check that
myself ?

The function works nicely, and while R implementation takes at least 40min
(surely more because I terminated), Armadillo computes everything in less
then one minute. But I skipped columns with missing cells, which now I want
to include.

Thanks,
Mateusz

On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:

>
> On 9 October 2012 at 10:08, Douglas Bates wrote:
> | You may find it easier to use the Rcpp class NumericMatrix than to use
> | RcppArmadillo.  Detection of NA's is built in to R and Rcpp classes but
> not
> | RcppArmadillo. For each pair of columns, run a loop that checks for NA's
> at
> | each position in each column, skips the position if NA's are detected and
> | otherwise increments the squared sums, cross-product and number of
> elements.
>
> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
> is
> convenient to stay there.
>
> You should be able to set up a little "sweeper" function which applies one
> of
>
>     R_IsNA              just NA
>     R_IsNaN             just NaN
>     R_IsFinite          NA, NaN or Inf
>
> across a vector or matrix and returns you an index vector. Armadillo can
> use
> indexing vectors in ways that are similar in R.
>
> Dirk
>
>
> |
> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com" <
> mateusz.kaduk at gmail.com>
> | wrote:
> |
> |     Hi,
> |
> |     I have written small code in C++ using Armadillo and inline with
> |     RcppArmadillo package.
> |     The input is data.marix(X). Some cells might be NAs. Example in R: X
> =
> |     matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
> |
> |     I am calculating conditional correlation on columns of that matrix,
> just
> |     picking vectors, so cor(X,Y).
> |     The problem is that sometimes I might have empty cell in one or both
> |     vectors, in that case I would like to skip that row, and procede with
> |     calculating Pearson's correlation on remaining data. I know that
> there will
> |     be difference in degrees of freedom, but I have over 100 rows, so
> skiping
> |     few shouldnt matter that much.
> |
> |     Basically my question boils down to solving the problem:
> |     How to find which colvec cells are nan, and remove this index from
> both X
> |     and Y colvec, before calculating correlation.
> |
> |     I would be very grateful for help,
> |
> |     Kind regards,
> |     Mateusz Kaduk
> |
> |     _______________________________________________
> |     Rcpp-devel mailing list
> |     Rcpp-devel at lists.r-forge.r-project.org
> |
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> |
> |
> | ----------------------------------------------------------------------
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121009/d8b5c178/attachment-0001.html>