[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

Douglas Bates bates at stat.wisc.edu
Tue Oct 9 19:43:43 CEST 2012


By the way, how are you calculating this correlation in R?  Are you
using the cor function?

I'm confused because the cor function in R does a bit of bookkeeping
then calls a C function using .Internal.  It seems unlikely to me that
one could make a C++/Rcpp function run much faster.

On Tue, Oct 9, 2012 at 12:07 PM, mateusz.kaduk at gmail.com
<mateusz.kaduk at gmail.com> wrote:
> Can you provide an example how to convert Armadillo colvec to uvec vector
> which I assume works as selector for rows?
>
> Thanks
>
> On 9 October 2012 18:17, mateusz.kaduk at gmail.com <mateusz.kaduk at gmail.com>
> wrote:
>>
>> Dear Dirk,
>>
>> I dont see how to do that in Armadillo, but I think I can create same size
>> NumericMatrix B = is.na(X);
>> This maybe I could use for indexing ? To save time, and not to introduce
>> extra looping.
>>
>> Also, I want to perform regression column X on Z, and column Y on Z.
>> Does arma::solve(...) handle nan values ? or I guess I have to check that
>> myself ?
>>
>> The function works nicely, and while R implementation takes at least 40min
>> (surely more because I terminated), Armadillo computes everything in less
>> then one minute. But I skipped columns with missing cells, which now I want
>> to include.
>>
>> Thanks,
>> Mateusz
>>
>>
>> On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>
>>>
>>> On 9 October 2012 at 10:08, Douglas Bates wrote:
>>> | You may find it easier to use the Rcpp class NumericMatrix than to use
>>> | RcppArmadillo.  Detection of NA's is built in to R and Rcpp classes but
>>> not
>>> | RcppArmadillo. For each pair of columns, run a loop that checks for
>>> NA's at
>>> | each position in each column, skips the position if NA's are detected
>>> and
>>> | otherwise increments the squared sums, cross-product and number of
>>> elements.
>>>
>>> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
>>> is
>>> convenient to stay there.
>>>
>>> You should be able to set up a little "sweeper" function which applies
>>> one of
>>>
>>>     R_IsNA              just NA
>>>     R_IsNaN             just NaN
>>>     R_IsFinite          NA, NaN or Inf
>>>
>>> across a vector or matrix and returns you an index vector. Armadillo can
>>> use
>>> indexing vectors in ways that are similar in R.
>>>
>>> Dirk
>>>
>>>
>>> |
>>> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com"
>>> <mateusz.kaduk at gmail.com>
>>> | wrote:
>>> |
>>> |     Hi,
>>> |
>>> |     I have written small code in C++ using Armadillo and inline with
>>> |     RcppArmadillo package.
>>> |     The input is data.marix(X). Some cells might be NAs. Example in R:
>>> X =
>>> |     matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
>>> |
>>> |     I am calculating conditional correlation on columns of that matrix,
>>> just
>>> |     picking vectors, so cor(X,Y).
>>> |     The problem is that sometimes I might have empty cell in one or
>>> both
>>> |     vectors, in that case I would like to skip that row, and procede
>>> with
>>> |     calculating Pearson's correlation on remaining data. I know that
>>> there will
>>> |     be difference in degrees of freedom, but I have over 100 rows, so
>>> skiping
>>> |     few shouldnt matter that much.
>>> |
>>> |     Basically my question boils down to solving the problem:
>>> |     How to find which colvec cells are nan, and remove this index from
>>> both X
>>> |     and Y colvec, before calculating correlation.
>>> |
>>> |     I would be very grateful for help,
>>> |
>>> |     Kind regards,
>>> |     Mateusz Kaduk
>>> |
>>> |     _______________________________________________
>>> |     Rcpp-devel mailing list
>>> |     Rcpp-devel at lists.r-forge.r-project.org
>>> |
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> |
>>> |
>>> | ----------------------------------------------------------------------
>>> | _______________________________________________
>>> | Rcpp-devel mailing list
>>> | Rcpp-devel at lists.r-forge.r-project.org
>>> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> --
>>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>>
>>
>


More information about the Rcpp-devel mailing list