[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
mateusz.kaduk at gmail.com
mateusz.kaduk at gmail.com
Tue Oct 9 18:17:34 CEST 2012
Dear Dirk,
I dont see how to do that in Armadillo, but I think I can create same size
NumericMatrix B = is.na(X);
This maybe I could use for indexing ? To save time, and not to introduce
extra looping.
Also, I want to perform regression column X on Z, and column Y on Z.
Does arma::solve(...) handle nan values ? or I guess I have to check that
myself ?
The function works nicely, and while R implementation takes at least 40min
(surely more because I terminated), Armadillo computes everything in less
then one minute. But I skipped columns with missing cells, which now I want
to include.
Thanks,
Mateusz
On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 9 October 2012 at 10:08, Douglas Bates wrote:
> | You may find it easier to use the Rcpp class NumericMatrix than to use
> | RcppArmadillo. Detection of NA's is built in to R and Rcpp classes but
> not
> | RcppArmadillo. For each pair of columns, run a loop that checks for NA's
> at
> | each position in each column, skips the position if NA's are detected and
> | otherwise increments the squared sums, cross-product and number of
> elements.
>
> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
> is
> convenient to stay there.
>
> You should be able to set up a little "sweeper" function which applies one
> of
>
> R_IsNA just NA
> R_IsNaN just NaN
> R_IsFinite NA, NaN or Inf
>
> across a vector or matrix and returns you an index vector. Armadillo can
> use
> indexing vectors in ways that are similar in R.
>
> Dirk
>
>
> |
> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com" <
> mateusz.kaduk at gmail.com>
> | wrote:
> |
> | Hi,
> |
> | I have written small code in C++ using Armadillo and inline with
> | RcppArmadillo package.
> | The input is data.marix(X). Some cells might be NAs. Example in R: X
> =
> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
> |
> | I am calculating conditional correlation on columns of that matrix,
> just
> | picking vectors, so cor(X,Y).
> | The problem is that sometimes I might have empty cell in one or both
> | vectors, in that case I would like to skip that row, and procede with
> | calculating Pearson's correlation on remaining data. I know that
> there will
> | be difference in degrees of freedom, but I have over 100 rows, so
> skiping
> | few shouldnt matter that much.
> |
> | Basically my question boils down to solving the problem:
> | How to find which colvec cells are nan, and remove this index from
> both X
> | and Y colvec, before calculating correlation.
> |
> | I would be very grateful for help,
> |
> | Kind regards,
> | Mateusz Kaduk
> |
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> |
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> |
> |
> | ----------------------------------------------------------------------
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121009/d8b5c178/attachment-0001.html>
More information about the Rcpp-devel
mailing list