[Rcpp-devel] Building/linking trouble with cxxfunction()

Ken.Williams at thomsonreuters.com Ken.Williams at thomsonreuters.com
Fri Feb 18 23:50:21 CET 2011


On 2/16/11 3:55 PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>I decided that it would be simple enough to write a small package to
>show exactly how you would pass a dgCMatrix to an Rcpp-based C++
>function and create a smat object.

Hi Doug, did you ever get a test case working for this?

As a comparison, here's some code that calls SVDLIBC's 'svd' executable
(which I put into /usr/local/bin) successfully on a sparse matrix of class
dgCMatrix.

This takes 4 CPU seconds to accomplish, with only 0.132 seconds of that
time actually computing the SVD.  The rest is reading & writing the matrix
files.


=============================================
write.sparse <- function (m, to) {
  ## Writes in a format that SVDLIBC can read
  stopifnot(inherits(m, "dgCMatrix"))
  fh <- file(to, open="w")
  
  wl <- function(...) cat(..., "\n", file=fh)
  
  ## header
  wl(dim(m), length(m at x))
  
  globalCount <- 1
  nper <- diff(m at p)
  for(i in 1:ncol(m)) {
    wl(nper[i])  ## Number of entries in this column
    if (nper[i]==0) next
    for(j in 1:nper[i]) {
      wl(m at i[globalCount], m at x[m at p[i]+j])
      globalCount <- globalCount+1
    }
  }
}

my.svd <- function(x, nu, nv) {
  stopifnot(nu==nv)
  write.sparse(x, "/tmp/sparse.m")
  rc <- system(paste("/usr/local/bin/svd -o /tmp/sout -d", nu,
"/tmp/sparse.m"))
  if (rc != 0)
    stop("Couldn't run external svd code")
  d <- scan("/tmp/sout-S", skip=1)
  ut <- read.table("/tmp/sout-Ut", skip=1)
  vt <- read.table("/tmp/sout-Vt", skip=1)
  list(d=d, u=t(ut), v=t(vt))
}


set.seed(123)
A <- sparseMatrix(i=sample(10000, 1000),
                  j=sample(12000, 1000),
                  x=runif(1000),
                  dims=c(10000, 12000))


res <- my.svd(A, 3, 3)

# Loading the matrix...
# Computing the SVD...
# SOLVING THE [A^TA] EIGENPROBLEM
# NO. OF ROWS               =  10000
# NO. OF COLUMNS            =  12000
# NO. OF NON-ZERO VALUES    =   1000
# MATRIX DENSITY            =   0.00%
# MAX. NO. OF LANCZOS STEPS =  10000
# MAX. NO. OF EIGENPAIRS    =      3
# LEFT  END OF THE INTERVAL = -1.00E-30
# RIGHT END OF THE INTERVAL =  1.00E-30
# KAPPA                     =  1.00E-06
# 
# TRANSPOSING THE MATRIX FOR SPEED
# NUMBER OF LANCZOS STEPS   =    169
# RITZ VALUES STABILIZED    =      4
# SINGULAR VALUES FOUND     =      3
# 
# ELAPSED CPU TIME          =  0.132 sec.
# MULTIPLICATIONS BY A      =    176
# MULTIPLICATIONS BY A^T    =    173





res$d
[1] 0.999651 0.999274 0.997689


dim(res$u)
[1] 10000     3

dim(res$v)
[1] 12000     3
=============================================



--
Ken Williams
Senior Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com
http://labs.thomsonreuters.com








>  The package is at
>http://www.stat.wisc.edu/~bates/spSVD_1.0.tar.gz  It manages to pass
>the values from the dgCMatrix object to the smat struct, although this
>was somewhat more difficult than I had anticipated because the smat
>structure uses long int and not int values.  So if you look at SVD.cpp
>in that package you will see that I allocate std::vector<long> objects
>then copy the contents of the Rcpp::IntegerVector objects into them.
>
>There was one more "gotcha".  There is a C function called "machar"
>defined in the las2.c file and there is one in the R API.  The one in
>the R API wins because R is loaded before the dll for the package and
>the calls are not compatible.  I changed the name of the one in las2.c
>to svd_machar.  I'm currently running a test case
>
>> library(Matrix)
>Loading required package: lattice
>
>Attaching package: 'Matrix'
>
>The following object(s) are masked from 'package:base':
>
>    det
>
>> library(spSVD)
>Loading required package: Rcpp
>Warning message:
>In loadNamespace(package, c(which.lib.loc, lib.loc), keep.source =
>keep.source) :
>  package 'spSVD' contains no R code
>> data(KNex)
>> str(KNex$mm)
>Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
>  ..@ i       : int [1:8755] 0 2 25 27 163 165 1258 1261 1276 1278 ...
>  ..@ p       : int [1:713] 0 13 17 26 38 43 52 56 61 67 ...
>  ..@ Dim     : int [1:2] 1850 712
>  ..@ Dimnames:List of 2
>  .. ..$ : NULL
>  .. ..$ : NULL
>  ..@ x       : num [1:8755] 0.277 0.277 0.277 0.277 0.277 ...
>  ..@ factors : list()
>> .Call("spSVD", KNex$mm, 713L)
>SOLVING THE [A^TA] EIGENPROBLEM
>NO. OF ROWS               =   1850
>NO. OF COLUMNS            =    712
>NO. OF NON-ZERO VALUES    =   8755
>MATRIX DENSITY            =   0.66%
>MAX. NO. OF LANCZOS STEPS =    712
>MAX. NO. OF EIGENPAIRS    =    712
>LEFT  END OF THE INTERVAL = -1.00E-30
>RIGHT END OF THE INTERVAL =  1.00E-30
>KAPPA                     =  1.00E-06
>
>but it is taking a long time.  I don't know if it is a programming
>error or the wrong calling sequence or just a slow algorithm but it
>has been chugging away for 7 minutes.  In constrast, it takes about 2
>seconds to calculate the singular values of that matrix as a dense
>matrix
>
>> data(KNex)
>> mm1 <- as(KNex$mm, "matrix")
>> str(mm1)
> num [1:1850, 1:712] 0.277 0 0.277 0 0 ...
> - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : NULL
>
>> system.time(dsvd <- La.svd(mm1, nu=0, nv=0))
>   user  system elapsed
>  2.280   0.030   2.664
>



More information about the Rcpp-devel mailing list