[Rcpp-devel] iterator for sparse Matrix

Saurabh B saurabh.writes at gmail.com
Thu Jan 23 00:04:25 CET 2014


Hi there,

I am new to RcppArmadillo and very excited about the sparse matrix
functionality it offers. I have a large optim that I wish to break down
into stochastic descent in Rcpp to speed things up.

To start small, I used a simple logisitic regression function that I have
working in R -

Norm <- function(w1, w2){
  sqrt(sum((w2 - w1)^2))
}

xentropy <- function(x, y, w) {
  - ((y * x) / (1 + exp(y * t(w) %*% x)) )
}

stochDesc <- function(param, eta = 0.01, maxit = 10, reltol = .01, x, y) {
  # Initialize
  w = param
  iter = 0
  rel_err = reltol + 1
  x_dash = cbind(rep(1, nrow(x)), x) # Add bias

  while(iter < maxit & rel_err > reltol ) {
    iter = iter + 1

    prev_w <- w
    for(curidx in sample(nrow(x))) { # Random shuffle
      grad <- xentropy(x_dash[curidx, ], y[curidx], w)
      w <- w - eta *  grad # Update weights
    }
    rel_err = Norm(w, prev_w)
    # could also use:
    # rel_err = base::norm(matrix(prev_w - w, nrow = 1), type = "F")
    cat(paste("Epoch ", iter, " Relative cost:", rel_err, "\n"))
  }
  list(w = w, iter = iter)
}

This does every operation that my more complex gradient descent will do,
but it is pretty straightforward so I started converting it to Rcpp -

#include <RcppArmadillo.h>
#include <iostream>
using namespace std;
using namespace Rcpp ;
// [[Rcpp::depends(RcppArmadillo)]]

double Norm(NumericVector w1, NumericVector w2) {
  // Compute the Forbius Norm
  double op = pow(sum(pow(w2 - w1, 2)), .5);
  return(op);
}


NumericVector xentropy(NumericVector x, NumericVector y, NumericVector w) {
  // Return the cross entropy
  // NumericVector op = - ((y * x) / (1 + exp(y * trans(w) %*% x)) ); //
How to transpose?
  NumericVector op = w; // Not correct
  return (op);
}


// [[Rcpp::export]]
NumericVector stochDescCpp(NumericVector param, NumericMatrix x,
NumericVector y,
                          double eta = 0.01, int maxit = 10, double reltol
= .01) {
  // Run stochastic Descent to compute logistic regression
  // Initialize
  NumericVector w (param);
  NumericVector prev_w (w);
  int epoch = 0;
  NumericVector grad (param.length());

  double rel_err = reltol + 1;
  // Don't know how to add bias, will do that in R before passing it here.
  //NumericMatrix x_dash(x.nrow(), x.ncol() + 1, rep(1, x.nrow()),
x.begin());

  // Loop through for each epoch
  while(epoch < maxit && rel_err > reltol) {
    epoch++;

    prev_w = w;
    for(int curidx = 0; curidx < x.nrow(); ++curidx) { // How to random
shuffle?
      grad = xentropy(x(curidx, _), y[curidx], w);
      w = w - eta * grad; // Update weights
    }
    rel_err = Norm(w, prev_w);
    cout << "Epoch " << epoch << " Relative cost:" << rel_err << endl;
  }
  return(w);
}

I am running into many issues. Here they are ranked in order of importance)
-
1) When I run this (incorrect) program -
stochDescCpp(param = rep(0,3), x = X, y = Y, maxit = 100)

I get -
Error in .Primitive(".Call")(<pointer: 0x0000000069d42560>, param, x,  :
  negative length vectors are not allowed

All the inputs are valid and work fine with the R version. So I don't know
where this is coming from? I put a bunch of couts and I know that it is
happening inside the for loop but don't know why. In general, what is the
best way to debug code written this way? I am using RStudio.

2) I don't know how to transpose the matrix in rCpp. I read that I could
that using rCppArmadillo, but then I don't know how to convert from
NumericMatrix to arma matrix. I don't want to copy as this operation will
happen 1000x of times, so this has to be fast.

3) I don't know how to randomly shuffle the rows. I tried RcppArmadillo::sample
but haven't gotten the output to work. E.g. do I access elements as
x(rowOrd[curIdx],_)?

4) (minor issue) How do I add the bias unit? In other words, add a new
column to the input matrix filled with all 1s. Do I have to create a new
matrix, copy it column by column? this is not a dealbreaker since I can do
this in R before I send in the inputs.

Sorry if my questions are too basic and don't seem well researched. I am
trying to go for a practical example.

Any help will be greatly appreciated.

Sincerely,
Saurabh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140122/4335a454/attachment.html>


More information about the Rcpp-devel mailing list