[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
mateusz.kaduk at gmail.com
mateusz.kaduk at gmail.com
Tue Oct 9 19:07:09 CEST 2012
Can you provide an example how to convert Armadillo colvec to uvec vector
which I assume works as selector for rows?
Thanks
On 9 October 2012 18:17, mateusz.kaduk at gmail.com <mateusz.kaduk at gmail.com>wrote:
> Dear Dirk,
>
> I dont see how to do that in Armadillo, but I think I can create same size
> NumericMatrix B = is.na(X);
> This maybe I could use for indexing ? To save time, and not to introduce
> extra looping.
>
> Also, I want to perform regression column X on Z, and column Y on Z.
> Does arma::solve(...) handle nan values ? or I guess I have to check that
> myself ?
>
> The function works nicely, and while R implementation takes at least 40min
> (surely more because I terminated), Armadillo computes everything in less
> then one minute. But I skipped columns with missing cells, which now I want
> to include.
>
> Thanks,
> Mateusz
>
>
> On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:
>
>>
>> On 9 October 2012 at 10:08, Douglas Bates wrote:
>> | You may find it easier to use the Rcpp class NumericMatrix than to use
>> | RcppArmadillo. Detection of NA's is built in to R and Rcpp classes but
>> not
>> | RcppArmadillo. For each pair of columns, run a loop that checks for
>> NA's at
>> | each position in each column, skips the position if NA's are detected
>> and
>> | otherwise increments the squared sums, cross-product and number of
>> elements.
>>
>> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
>> is
>> convenient to stay there.
>>
>> You should be able to set up a little "sweeper" function which applies
>> one of
>>
>> R_IsNA just NA
>> R_IsNaN just NaN
>> R_IsFinite NA, NaN or Inf
>>
>> across a vector or matrix and returns you an index vector. Armadillo can
>> use
>> indexing vectors in ways that are similar in R.
>>
>> Dirk
>>
>>
>> |
>> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com" <
>> mateusz.kaduk at gmail.com>
>> | wrote:
>> |
>> | Hi,
>> |
>> | I have written small code in C++ using Armadillo and inline with
>> | RcppArmadillo package.
>> | The input is data.marix(X). Some cells might be NAs. Example in R:
>> X =
>> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
>> |
>> | I am calculating conditional correlation on columns of that matrix,
>> just
>> | picking vectors, so cor(X,Y).
>> | The problem is that sometimes I might have empty cell in one or both
>> | vectors, in that case I would like to skip that row, and procede
>> with
>> | calculating Pearson's correlation on remaining data. I know that
>> there will
>> | be difference in degrees of freedom, but I have over 100 rows, so
>> skiping
>> | few shouldnt matter that much.
>> |
>> | Basically my question boils down to solving the problem:
>> | How to find which colvec cells are nan, and remove this index from
>> both X
>> | and Y colvec, before calculating correlation.
>> |
>> | I would be very grateful for help,
>> |
>> | Kind regards,
>> | Mateusz Kaduk
>> |
>> | _______________________________________________
>> | Rcpp-devel mailing list
>> | Rcpp-devel at lists.r-forge.r-project.org
>> |
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> |
>> |
>> | ----------------------------------------------------------------------
>> | _______________________________________________
>> | Rcpp-devel mailing list
>> | Rcpp-devel at lists.r-forge.r-project.org
>> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> --
>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121009/7847bcdd/attachment.html>
More information about the Rcpp-devel
mailing list