[NMF-user] NMF v0.17.5: difference between consensus silhouette values

Gordon Robertson grobertson at bcgsc.ca
Tue Aug 27 06:40:53 CEST 2013


Renaud,

I've been using v0.17.5 on R 3.0.1 since you made it available recently. I'm on a Macbook Pro with OS X 10.7.5. Now, it's being installed on the Linux system at the GSC, so that more people will soon use this version. 

Today I realized that the consensus silhouette width that I write out from the 30-iteration rank survey, for, say, 4 groups, can be a few percent different from the average silhouette width that I calculate from a 200-iteration, 4 group, main run. E.g. the consensus silhouette = 0.891 from the rank survey and 0.86 from the main run, for 4 groups and 260 miRNA-seq samples. 

As we discussed quite some time ago, I calculate a silhouette width, after an NMF run, with this:
...
# rank survey, now calculates silhouette width
estim.r <- nmfEstimateRank( V.matrix, range=2:12, nrun=30, .opt='v', .pbackend=7 )
write.table( estim.r$measures, "rank.survey.txt", sep="\t", quote=FALSE, row.names=F )

# Now do the main run
use.this.k <- 4
res <- nmf( V.matrix, use.this.k, nrun=200, .options='tvP', .pbackend=7 )
...
library(cluster)
...
x <- consensus(res)
hc <- hclust(as.dist(1-x), method='average')
cl <- cutree(hc, k = use.this.k)
cl.hp <- cl[hp$colInd]
sil <- silhouette( cutree(hc, k = use.this.k), as.dist(1-x) )
write.table( sil, "silhouette.UNsorted.txt", sep="\t" )
pdf(file="consensusmap.silhouette.pdf")
plot(sil)
dev.off()
sil.summary <- summary(sil)
write(sil.summary$avg.width, "silhouette.avg.width.txt")
...

I'd not noticed this difference before. Some differences are very likely expected, given that the rank survey run and the main run are independent, and NMF is stochastic rather than deterministic. 

Am I understanding this difference correctly? Or is the silhouette calculated differently in the rank survey?

Thank you,

Gordon
--
Gordon Robertson
Michael Smith Genome Sciences Centre
BC Cancer Agency
Vancouver BC Canada
www.bcgsc.ca

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.0-5  doParallel_1.0.3    iterators_1.0.6    
 [4] foreach_1.4.1       ggplot2_0.9.3.1     NMF_0.17.5         
 [7] bigmemory_4.4.3     BH_1.51.0-1         bigmemory.sri_0.1.2
[10] Biobase_2.20.1      BiocGenerics_0.6.0  digest_0.6.3       
[13] rngtools_1.2        pkgmaker_0.16       registry_0.2       
[16] cluster_1.14.4      edgeR_3.2.4         limma_3.16.6       

loaded via a namespace (and not attached):
 [1] codetools_0.2-8  colorspace_1.2-2 compiler_3.0.1   dichromat_2.0-0 
 [5] gridBase_0.4-6   gtable_0.1.2     labeling_0.2     MASS_7.3-27     
 [9] munsell_0.4.2    plyr_1.8         proto_0.3-10     reshape2_1.2.2  
[13] scales_0.2.3     stringr_0.6.2    xtable_1.7-1    



More information about the nmf-user mailing list