[NMF-user] NMF v0.17.5: difference between consensus silhouette values

Tue Aug 27 08:39:49 CEST 2013

Hi Gordon,

the silhouette values are indeed expected to be different because:
- the two set of runs are independent, i.e. different initial random seed
(as you mention).
- the number of runs is different, so the consensus matrix is computed on
more fits when nrun=200 than when nrun=30.

To satisfy your concern you could try setting the random seed to the same
value (e.g. 123) on both nmf calls and use the same number of runs (see
code below).

Bests,
Renaud

#### REPRODUCING SILHOUETTE WIDTH ####
use.this.k <- 4
estim.r <- nmfEstimateRank( V.matrix, range=4:5, nrun=30, .opt='v',
.pbackend=7, seed = 123)
s <- silhouette(estim.r$fit[[1L]], 'consensus')
res <- nmf( V.matrix, use.this.k, nrun=30, .options='tv', .pbackend=7, seed
= 123)
s2 <- silhouette(res, 'consensus')
identical(s, s2)

library(cluster)
x <- consensus(res)
hc <- hclust(as.dist(1-x), method='average')
cl <- cutree(hc, k = use.this.k)
sil <- silhouette( cutree(hc, k = use.this.k), as.dist(1-x) )

# samples in consensus silhouettes (in object `s`) are ordered to match the
sample order in the consensus heatmap
dr <- as.dendrogram(hc)
o <- order.dendrogram(reorder(dr, rowMeans(consensus(res), na.rm=TRUE)))
identical(setNames(s[, 'sil_width'], NULL), sil[o, 'sil_width'])

On 27 August 2013 07:40, Gordon Robertson <grobertson at bcgsc.ca> wrote:

> Renaud,
>
> I've been using v0.17.5 on R 3.0.1 since you made it available recently.
> I'm on a Macbook Pro with OS X 10.7.5. Now, it's being installed on the
> Linux system at the GSC, so that more people will soon use this version.
>
> Today I realized that the consensus silhouette width that I write out from
> the 30-iteration rank survey, for, say, 4 groups, can be a few percent
> different from the average silhouette width that I calculate from a
> 200-iteration, 4 group, main run. E.g. the consensus silhouette = 0.891
> from the rank survey and 0.86 from the main run, for 4 groups and 260
> miRNA-seq samples.
>
> As we discussed quite some time ago, I calculate a silhouette width, after
> an NMF run, with this:
> ...
> # rank survey, now calculates silhouette width
> estim.r <- nmfEstimateRank( V.matrix, range=2:12, nrun=30, .opt='v',
> .pbackend=7 )
> write.table( estim.r$measures, "rank.survey.txt", sep="\t", quote=FALSE,
> row.names=F )
>
> # Now do the main run
> use.this.k <- 4
> res <- nmf( V.matrix, use.this.k, nrun=200, .options='tvP', .pbackend=7 )
> ...
> library(cluster)
> ...
> x <- consensus(res)
> hc <- hclust(as.dist(1-x), method='average')
> cl <- cutree(hc, k = use.this.k)
> cl.hp <- cl[hp$colInd]
> sil <- silhouette( cutree(hc, k = use.this.k), as.dist(1-x) )
> write.table( sil, "silhouette.UNsorted.txt", sep="\t" )
> pdf(file="consensusmap.silhouette.pdf")
> plot(sil)
> dev.off()
> sil.summary <- summary(sil)
> write(sil.summary$avg.width, "silhouette.avg.width.txt")
> ...
>
> I'd not noticed this difference before. Some differences are very likely
> expected, given that the rank survey run and the main run are independent,
> and NMF is stochastic rather than deterministic.
>
> Am I understanding this difference correctly? Or is the silhouette
> calculated differently in the rank survey?
>
> Thank you,
>
> Gordon
> --
> Gordon Robertson
> Michael Smith Genome Sciences Centre
> BC Cancer Agency
> Vancouver BC Canada
> www.bcgsc.ca
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] parallel  grid      stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
>  [1] RColorBrewer_1.0-5  doParallel_1.0.3    iterators_1.0.6
>  [4] foreach_1.4.1       ggplot2_0.9.3.1     NMF_0.17.5
>  [7] bigmemory_4.4.3     BH_1.51.0-1         bigmemory.sri_0.1.2
> [10] Biobase_2.20.1      BiocGenerics_0.6.0  digest_0.6.3
> [13] rngtools_1.2        pkgmaker_0.16       registry_0.2
> [16] cluster_1.14.4      edgeR_3.2.4         limma_3.16.6
>
> loaded via a namespace (and not attached):
>  [1] codetools_0.2-8  colorspace_1.2-2 compiler_3.0.1   dichromat_2.0-0
>  [5] gridBase_0.4-6   gtable_0.1.2     labeling_0.2     MASS_7.3-27
>  [9] munsell_0.4.2    plyr_1.8         proto_0.3-10     reshape2_1.2.2
> [13] scales_0.2.3     stringr_0.6.2    xtable_1.7-1
>
>
>

-- 
Renaud Gaujoux, PhD
Computational Biology - University of Cape Town, South Africa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/nmf-user/attachments/20130827/04cb7610/attachment.html>