[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

mateusz.kaduk at gmail.com mateusz.kaduk at gmail.com
Tue Oct 9 19:07:09 CEST 2012


Can you provide an example how to convert Armadillo colvec to uvec vector
which I assume works as selector for rows?

Thanks

On 9 October 2012 18:17, mateusz.kaduk at gmail.com <mateusz.kaduk at gmail.com>wrote:

> Dear Dirk,
>
> I dont see how to do that in Armadillo, but I think I can create same size
> NumericMatrix B = is.na(X);
> This maybe I could use for indexing ? To save time, and not to introduce
> extra looping.
>
> Also, I want to perform regression column X on Z, and column Y on Z.
> Does arma::solve(...) handle nan values ? or I guess I have to check that
> myself ?
>
> The function works nicely, and while R implementation takes at least 40min
> (surely more because I terminated), Armadillo computes everything in less
> then one minute. But I skipped columns with missing cells, which now I want
> to include.
>
> Thanks,
> Mateusz
>
>
> On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:
>
>>
>> On 9 October 2012 at 10:08, Douglas Bates wrote:
>> | You may find it easier to use the Rcpp class NumericMatrix than to use
>> | RcppArmadillo.  Detection of NA's is built in to R and Rcpp classes but
>> not
>> | RcppArmadillo. For each pair of columns, run a loop that checks for
>> NA's at
>> | each position in each column, skips the position if NA's are detected
>> and
>> | otherwise increments the squared sums, cross-product and number of
>> elements.
>>
>> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
>> is
>> convenient to stay there.
>>
>> You should be able to set up a little "sweeper" function which applies
>> one of
>>
>>     R_IsNA              just NA
>>     R_IsNaN             just NaN
>>     R_IsFinite          NA, NaN or Inf
>>
>> across a vector or matrix and returns you an index vector. Armadillo can
>> use
>> indexing vectors in ways that are similar in R.
>>
>> Dirk
>>
>>
>> |
>> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com" <
>> mateusz.kaduk at gmail.com>
>> | wrote:
>> |
>> |     Hi,
>> |
>> |     I have written small code in C++ using Armadillo and inline with
>> |     RcppArmadillo package.
>> |     The input is data.marix(X). Some cells might be NAs. Example in R:
>> X =
>> |     matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
>> |
>> |     I am calculating conditional correlation on columns of that matrix,
>> just
>> |     picking vectors, so cor(X,Y).
>> |     The problem is that sometimes I might have empty cell in one or both
>> |     vectors, in that case I would like to skip that row, and procede
>> with
>> |     calculating Pearson's correlation on remaining data. I know that
>> there will
>> |     be difference in degrees of freedom, but I have over 100 rows, so
>> skiping
>> |     few shouldnt matter that much.
>> |
>> |     Basically my question boils down to solving the problem:
>> |     How to find which colvec cells are nan, and remove this index from
>> both X
>> |     and Y colvec, before calculating correlation.
>> |
>> |     I would be very grateful for help,
>> |
>> |     Kind regards,
>> |     Mateusz Kaduk
>> |
>> |     _______________________________________________
>> |     Rcpp-devel mailing list
>> |     Rcpp-devel at lists.r-forge.r-project.org
>> |
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> |
>> |
>> | ----------------------------------------------------------------------
>> | _______________________________________________
>> | Rcpp-devel mailing list
>> | Rcpp-devel at lists.r-forge.r-project.org
>> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> --
>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121009/7847bcdd/attachment.html>


More information about the Rcpp-devel mailing list