[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
Douglas Bates
bates at stat.wisc.edu
Tue Oct 9 19:43:43 CEST 2012
By the way, how are you calculating this correlation in R? Are you
using the cor function?
I'm confused because the cor function in R does a bit of bookkeeping
then calls a C function using .Internal. It seems unlikely to me that
one could make a C++/Rcpp function run much faster.
On Tue, Oct 9, 2012 at 12:07 PM, mateusz.kaduk at gmail.com
<mateusz.kaduk at gmail.com> wrote:
> Can you provide an example how to convert Armadillo colvec to uvec vector
> which I assume works as selector for rows?
>
> Thanks
>
> On 9 October 2012 18:17, mateusz.kaduk at gmail.com <mateusz.kaduk at gmail.com>
> wrote:
>>
>> Dear Dirk,
>>
>> I dont see how to do that in Armadillo, but I think I can create same size
>> NumericMatrix B = is.na(X);
>> This maybe I could use for indexing ? To save time, and not to introduce
>> extra looping.
>>
>> Also, I want to perform regression column X on Z, and column Y on Z.
>> Does arma::solve(...) handle nan values ? or I guess I have to check that
>> myself ?
>>
>> The function works nicely, and while R implementation takes at least 40min
>> (surely more because I terminated), Armadillo computes everything in less
>> then one minute. But I skipped columns with missing cells, which now I want
>> to include.
>>
>> Thanks,
>> Mateusz
>>
>>
>> On 9 October 2012 17:37, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>
>>>
>>> On 9 October 2012 at 10:08, Douglas Bates wrote:
>>> | You may find it easier to use the Rcpp class NumericMatrix than to use
>>> | RcppArmadillo. Detection of NA's is built in to R and Rcpp classes but
>>> not
>>> | RcppArmadillo. For each pair of columns, run a loop that checks for
>>> NA's at
>>> | each position in each column, skips the position if NA's are detected
>>> and
>>> | otherwise increments the squared sums, cross-product and number of
>>> elements.
>>>
>>> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
>>> is
>>> convenient to stay there.
>>>
>>> You should be able to set up a little "sweeper" function which applies
>>> one of
>>>
>>> R_IsNA just NA
>>> R_IsNaN just NaN
>>> R_IsFinite NA, NaN or Inf
>>>
>>> across a vector or matrix and returns you an index vector. Armadillo can
>>> use
>>> indexing vectors in ways that are similar in R.
>>>
>>> Dirk
>>>
>>>
>>> |
>>> | On Oct 9, 2012 9:53 AM, "mateusz.kaduk at gmail.com"
>>> <mateusz.kaduk at gmail.com>
>>> | wrote:
>>> |
>>> | Hi,
>>> |
>>> | I have written small code in C++ using Armadillo and inline with
>>> | RcppArmadillo package.
>>> | The input is data.marix(X). Some cells might be NAs. Example in R:
>>> X =
>>> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
>>> |
>>> | I am calculating conditional correlation on columns of that matrix,
>>> just
>>> | picking vectors, so cor(X,Y).
>>> | The problem is that sometimes I might have empty cell in one or
>>> both
>>> | vectors, in that case I would like to skip that row, and procede
>>> with
>>> | calculating Pearson's correlation on remaining data. I know that
>>> there will
>>> | be difference in degrees of freedom, but I have over 100 rows, so
>>> skiping
>>> | few shouldnt matter that much.
>>> |
>>> | Basically my question boils down to solving the problem:
>>> | How to find which colvec cells are nan, and remove this index from
>>> both X
>>> | and Y colvec, before calculating correlation.
>>> |
>>> | I would be very grateful for help,
>>> |
>>> | Kind regards,
>>> | Mateusz Kaduk
>>> |
>>> | _______________________________________________
>>> | Rcpp-devel mailing list
>>> | Rcpp-devel at lists.r-forge.r-project.org
>>> |
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> |
>>> |
>>> | ----------------------------------------------------------------------
>>> | _______________________________________________
>>> | Rcpp-devel mailing list
>>> | Rcpp-devel at lists.r-forge.r-project.org
>>> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> --
>>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>>
>>
>
More information about the Rcpp-devel
mailing list