[Rcpp-devel] Building/linking trouble with cxxfunction()

Sat Feb 19 16:20:27 CET 2011

On Fri, Feb 18, 2011 at 4:50 PM,  <Ken.Williams at thomsonreuters.com> wrote:

> On 2/16/11 3:55 PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>I decided that it would be simple enough to write a small package to
>>show exactly how you would pass a dgCMatrix to an Rcpp-based C++
>>function and create a smat object.

> Hi Doug, did you ever get a test case working for this?

I didn't try further.  I was hoping that you would pick up the sources
for the package and experiment with it. :-)

I was using a value of 712 for the number of eigenvalue-eigenvector
pairs to determine, which may have been a reason that it was taking so
long, relative to the example you show below where only 3 pairs are
determined.  I would suggest grabbing the sources for the package,
installing it and seeing if you can reproduce the results from the
external call.

This sparse SVN code is based on a somewhat older algorithm that was
coded in Fortran and then translated, perhaps with f2c.  I know that
for dense matrices there have been substantial advances is algorithms
for the singular value decomposition in the last 10-20 years and
perhaps the same is true for sparse matrices.  I see that Tim Davis is
planning on incorporating the SVD in his SparseSuite code but I don't
see indications of of when (see the notes on "pending changes" at
http://www.cise.ufl.edu/research/sparse/).  I am copying Tim on this
reply in case he would like to comment.  In my opinion Tim's code is
the state of the art in sparse matrix methods (except for the fact
that it is primarily in C rather than expressed as classes in C++).

> As a comparison, here's some code that calls SVDLIBC's 'svd' executable
> (which I put into /usr/local/bin) successfully on a sparse matrix of class
> dgCMatrix.
>
> This takes 4 CPU seconds to accomplish, with only 0.132 seconds of that
> time actually computing the SVD.  The rest is reading & writing the matrix
> files.
>
>
> =============================================
> write.sparse <- function (m, to) {
>  ## Writes in a format that SVDLIBC can read
>  stopifnot(inherits(m, "dgCMatrix"))
>  fh <- file(to, open="w")
>
>  wl <- function(...) cat(..., "\n", file=fh)
>
>  ## header
>  wl(dim(m), length(m at x))
>
>  globalCount <- 1
>  nper <- diff(m at p)
>  for(i in 1:ncol(m)) {
>    wl(nper[i])  ## Number of entries in this column
>    if (nper[i]==0) next
>    for(j in 1:nper[i]) {
>      wl(m at i[globalCount], m at x[m at p[i]+j])
>      globalCount <- globalCount+1
>    }
>  }
> }
>
> my.svd <- function(x, nu, nv) {
>  stopifnot(nu==nv)
>  write.sparse(x, "/tmp/sparse.m")
>  rc <- system(paste("/usr/local/bin/svd -o /tmp/sout -d", nu,
> "/tmp/sparse.m"))
>  if (rc != 0)
>    stop("Couldn't run external svd code")
>  d <- scan("/tmp/sout-S", skip=1)
>  ut <- read.table("/tmp/sout-Ut", skip=1)
>  vt <- read.table("/tmp/sout-Vt", skip=1)
>  list(d=d, u=t(ut), v=t(vt))
> }
>
>
> set.seed(123)
> A <- sparseMatrix(i=sample(10000, 1000),
>                  j=sample(12000, 1000),
>                  x=runif(1000),
>                  dims=c(10000, 12000))
>
>
> res <- my.svd(A, 3, 3)
>
> # Loading the matrix...
> # Computing the SVD...
> # SOLVING THE [A^TA] EIGENPROBLEM
> # NO. OF ROWS               =  10000
> # NO. OF COLUMNS            =  12000
> # NO. OF NON-ZERO VALUES    =   1000
> # MATRIX DENSITY            =   0.00%
> # MAX. NO. OF LANCZOS STEPS =  10000
> # MAX. NO. OF EIGENPAIRS    =      3
> # LEFT  END OF THE INTERVAL = -1.00E-30
> # RIGHT END OF THE INTERVAL =  1.00E-30
> # KAPPA                     =  1.00E-06
> #
> # TRANSPOSING THE MATRIX FOR SPEED
> # NUMBER OF LANCZOS STEPS   =    169
> # RITZ VALUES STABILIZED    =      4
> # SINGULAR VALUES FOUND     =      3
> #
> # ELAPSED CPU TIME          =  0.132 sec.
> # MULTIPLICATIONS BY A      =    176
> # MULTIPLICATIONS BY A^T    =    173
>
>
>
>
>
> res$d
> [1] 0.999651 0.999274 0.997689
>
>
> dim(res$u)
> [1] 10000     3
>
> dim(res$v)
> [1] 12000     3
> =============================================
>
>
>
> --
> Ken Williams
> Senior Research Scientist
> Thomson Reuters
> Phone: 651-848-7712
> ken.williams at thomsonreuters.com
> http://labs.thomsonreuters.com
>
>
>
>
>
>
>
>
>>  The package is at
>>http://www.stat.wisc.edu/~bates/spSVD_1.0.tar.gz  It manages to pass
>>the values from the dgCMatrix object to the smat struct, although this
>>was somewhat more difficult than I had anticipated because the smat
>>structure uses long int and not int values.  So if you look at SVD.cpp
>>in that package you will see that I allocate std::vector<long> objects
>>then copy the contents of the Rcpp::IntegerVector objects into them.
>>
>>There was one more "gotcha".  There is a C function called "machar"
>>defined in the las2.c file and there is one in the R API.  The one in
>>the R API wins because R is loaded before the dll for the package and
>>the calls are not compatible.  I changed the name of the one in las2.c
>>to svd_machar.  I'm currently running a test case
>>
>>> library(Matrix)
>>Loading required package: lattice
>>
>>Attaching package: 'Matrix'
>>
>>The following object(s) are masked from 'package:base':
>>
>>    det
>>
>>> library(spSVD)
>>Loading required package: Rcpp
>>Warning message:
>>In loadNamespace(package, c(which.lib.loc, lib.loc), keep.source =
>>keep.source) :
>>  package 'spSVD' contains no R code
>>> data(KNex)
>>> str(KNex$mm)
>>Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
>>  ..@ i       : int [1:8755] 0 2 25 27 163 165 1258 1261 1276 1278 ...
>>  ..@ p       : int [1:713] 0 13 17 26 38 43 52 56 61 67 ...
>>  ..@ Dim     : int [1:2] 1850 712
>>  ..@ Dimnames:List of 2
>>  .. ..$ : NULL
>>  .. ..$ : NULL
>>  ..@ x       : num [1:8755] 0.277 0.277 0.277 0.277 0.277 ...
>>  ..@ factors : list()
>>> .Call("spSVD", KNex$mm, 713L)
>>SOLVING THE [A^TA] EIGENPROBLEM
>>NO. OF ROWS               =   1850
>>NO. OF COLUMNS            =    712
>>NO. OF NON-ZERO VALUES    =   8755
>>MATRIX DENSITY            =   0.66%
>>MAX. NO. OF LANCZOS STEPS =    712
>>MAX. NO. OF EIGENPAIRS    =    712
>>LEFT  END OF THE INTERVAL = -1.00E-30
>>RIGHT END OF THE INTERVAL =  1.00E-30
>>KAPPA                     =  1.00E-06
>>
>>but it is taking a long time.  I don't know if it is a programming
>>error or the wrong calling sequence or just a slow algorithm but it
>>has been chugging away for 7 minutes.  In constrast, it takes about 2
>>seconds to calculate the singular values of that matrix as a dense
>>matrix
>>
>>> data(KNex)
>>> mm1 <- as(KNex$mm, "matrix")
>>> str(mm1)
>> num [1:1850, 1:712] 0.277 0 0.277 0 0 ...
>> - attr(*, "dimnames")=List of 2
>>  ..$ : NULL
>>  ..$ : NULL
>>
>>> system.time(dsvd <- La.svd(mm1, nu=0, nv=0))
>>   user  system elapsed
>>  2.280   0.030   2.664
>>
>
>