[Rcpp-devel] Building/linking trouble with cxxfunction()
Ken.Williams at thomsonreuters.com
Ken.Williams at thomsonreuters.com
Fri Feb 18 23:50:21 CET 2011
On 2/16/11 3:55 PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>I decided that it would be simple enough to write a small package to
>show exactly how you would pass a dgCMatrix to an Rcpp-based C++
>function and create a smat object.
Hi Doug, did you ever get a test case working for this?
As a comparison, here's some code that calls SVDLIBC's 'svd' executable
(which I put into /usr/local/bin) successfully on a sparse matrix of class
dgCMatrix.
This takes 4 CPU seconds to accomplish, with only 0.132 seconds of that
time actually computing the SVD. The rest is reading & writing the matrix
files.
=============================================
write.sparse <- function (m, to) {
## Writes in a format that SVDLIBC can read
stopifnot(inherits(m, "dgCMatrix"))
fh <- file(to, open="w")
wl <- function(...) cat(..., "\n", file=fh)
## header
wl(dim(m), length(m at x))
globalCount <- 1
nper <- diff(m at p)
for(i in 1:ncol(m)) {
wl(nper[i]) ## Number of entries in this column
if (nper[i]==0) next
for(j in 1:nper[i]) {
wl(m at i[globalCount], m at x[m at p[i]+j])
globalCount <- globalCount+1
}
}
}
my.svd <- function(x, nu, nv) {
stopifnot(nu==nv)
write.sparse(x, "/tmp/sparse.m")
rc <- system(paste("/usr/local/bin/svd -o /tmp/sout -d", nu,
"/tmp/sparse.m"))
if (rc != 0)
stop("Couldn't run external svd code")
d <- scan("/tmp/sout-S", skip=1)
ut <- read.table("/tmp/sout-Ut", skip=1)
vt <- read.table("/tmp/sout-Vt", skip=1)
list(d=d, u=t(ut), v=t(vt))
}
set.seed(123)
A <- sparseMatrix(i=sample(10000, 1000),
j=sample(12000, 1000),
x=runif(1000),
dims=c(10000, 12000))
res <- my.svd(A, 3, 3)
# Loading the matrix...
# Computing the SVD...
# SOLVING THE [A^TA] EIGENPROBLEM
# NO. OF ROWS = 10000
# NO. OF COLUMNS = 12000
# NO. OF NON-ZERO VALUES = 1000
# MATRIX DENSITY = 0.00%
# MAX. NO. OF LANCZOS STEPS = 10000
# MAX. NO. OF EIGENPAIRS = 3
# LEFT END OF THE INTERVAL = -1.00E-30
# RIGHT END OF THE INTERVAL = 1.00E-30
# KAPPA = 1.00E-06
#
# TRANSPOSING THE MATRIX FOR SPEED
# NUMBER OF LANCZOS STEPS = 169
# RITZ VALUES STABILIZED = 4
# SINGULAR VALUES FOUND = 3
#
# ELAPSED CPU TIME = 0.132 sec.
# MULTIPLICATIONS BY A = 176
# MULTIPLICATIONS BY A^T = 173
res$d
[1] 0.999651 0.999274 0.997689
dim(res$u)
[1] 10000 3
dim(res$v)
[1] 12000 3
=============================================
--
Ken Williams
Senior Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com
http://labs.thomsonreuters.com
> The package is at
>http://www.stat.wisc.edu/~bates/spSVD_1.0.tar.gz It manages to pass
>the values from the dgCMatrix object to the smat struct, although this
>was somewhat more difficult than I had anticipated because the smat
>structure uses long int and not int values. So if you look at SVD.cpp
>in that package you will see that I allocate std::vector<long> objects
>then copy the contents of the Rcpp::IntegerVector objects into them.
>
>There was one more "gotcha". There is a C function called "machar"
>defined in the las2.c file and there is one in the R API. The one in
>the R API wins because R is loaded before the dll for the package and
>the calls are not compatible. I changed the name of the one in las2.c
>to svd_machar. I'm currently running a test case
>
>> library(Matrix)
>Loading required package: lattice
>
>Attaching package: 'Matrix'
>
>The following object(s) are masked from 'package:base':
>
> det
>
>> library(spSVD)
>Loading required package: Rcpp
>Warning message:
>In loadNamespace(package, c(which.lib.loc, lib.loc), keep.source =
>keep.source) :
> package 'spSVD' contains no R code
>> data(KNex)
>> str(KNex$mm)
>Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ..@ i : int [1:8755] 0 2 25 27 163 165 1258 1261 1276 1278 ...
> ..@ p : int [1:713] 0 13 17 26 38 43 52 56 61 67 ...
> ..@ Dim : int [1:2] 1850 712
> ..@ Dimnames:List of 2
> .. ..$ : NULL
> .. ..$ : NULL
> ..@ x : num [1:8755] 0.277 0.277 0.277 0.277 0.277 ...
> ..@ factors : list()
>> .Call("spSVD", KNex$mm, 713L)
>SOLVING THE [A^TA] EIGENPROBLEM
>NO. OF ROWS = 1850
>NO. OF COLUMNS = 712
>NO. OF NON-ZERO VALUES = 8755
>MATRIX DENSITY = 0.66%
>MAX. NO. OF LANCZOS STEPS = 712
>MAX. NO. OF EIGENPAIRS = 712
>LEFT END OF THE INTERVAL = -1.00E-30
>RIGHT END OF THE INTERVAL = 1.00E-30
>KAPPA = 1.00E-06
>
>but it is taking a long time. I don't know if it is a programming
>error or the wrong calling sequence or just a slow algorithm but it
>has been chugging away for 7 minutes. In constrast, it takes about 2
>seconds to calculate the singular values of that matrix as a dense
>matrix
>
>> data(KNex)
>> mm1 <- as(KNex$mm, "matrix")
>> str(mm1)
> num [1:1850, 1:712] 0.277 0 0.277 0 0 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : NULL
>
>> system.time(dsvd <- La.svd(mm1, nu=0, nv=0))
> user system elapsed
> 2.280 0.030 2.664
>
More information about the Rcpp-devel
mailing list