<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Thanks Renaud. I'll try controlling the seed.<div><br><div>Gordon<br><div apple-content-edited="true">

<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br></div></span></div></span></span>

</div>

<br><div><div>On 2013-08-26, at 11:39 PM, Renaud Gaujoux wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div><div><div><div><div>Hi Gordon,<br><br></div>the silhouette values are indeed expected to be different because:<br></div>- the two set of runs are independent, i.e. different initial random seed (as you mention).<br>


</div>- the number of runs is different, so the consensus matrix is computed on more fits when nrun=200 than when nrun=30.<br><br></div>To satisfy your concern you could try setting the random seed to the same value (e.g. 123) on both nmf calls and use the same number of runs (see code below).<br>

<br>Bests,<br></div>Renaud<br><div>


<br></div><div>#### REPRODUCING SILHOUETTE WIDTH ####<br></div><div>use.this.k <- 4<br>estim.r <- nmfEstimateRank( V.matrix, range=4:5, nrun=30, .opt='v', .pbackend=7, seed = 123)<br>s <- silhouette(estim.r$fit[[1L]], 'consensus')<br>

res <- nmf( V.matrix, use.this.k, nrun=30, .options='tv', .pbackend=7, seed = 123)<br>


s2 <- silhouette(res, 'consensus')<br><div>identical(s, s2)<br><br>library(cluster)<br>x <- consensus(res)<br>

hc <- hclust(as.dist(1-x), method='average')<br>

cl <- cutree(hc, k = use.this.k)<br>

sil <- silhouette( cutree(hc, k = use.this.k), as.dist(1-x) )<br></div><div><br></div><div># samples in consensus silhouettes (in object `s`) are ordered to match the sample order in the consensus heatmap<br>dr <- as.dendrogram(hc)<br>

o <- order.dendrogram(reorder(dr, rowMeans(consensus(res), na.rm=TRUE)))<br>identical(setNames(s[, 'sil_width'], NULL), sil[o, 'sil_width'])<br>


<br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 27 August 2013 07:40, Gordon Robertson <span dir="ltr"><<a href="mailto:grobertson@bcgsc.ca" target="_blank">grobertson@bcgsc.ca</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Renaud,<br>

<br>

I've been using v0.17.5 on R 3.0.1 since you made it available recently. I'm on a Macbook Pro with OS X 10.7.5. Now, it's being installed on the Linux system at the GSC, so that more people will soon use this version.<br>


<br>

Today I realized that the consensus silhouette width that I write out from the 30-iteration rank survey, for, say, 4 groups, can be a few percent different from the average silhouette width that I calculate from a 200-iteration, 4 group, main run. E.g. the consensus silhouette = 0.891 from the rank survey and 0.86 from the main run, for 4 groups and 260 miRNA-seq samples.<br>


<br>

As we discussed quite some time ago, I calculate a silhouette width, after an NMF run, with this:<br>

...<br>

# rank survey, now calculates silhouette width<br>

estim.r <- nmfEstimateRank( V.matrix, range=2:12, nrun=30, .opt='v', .pbackend=7 )<br>

write.table( estim.r$measures, "rank.survey.txt", sep="\t", quote=FALSE, row.names=F )<br>

<br>

# Now do the main run<br>

use.this.k <- 4<br>

res <- nmf( V.matrix, use.this.k, nrun=200, .options='tvP', .pbackend=7 )<br>

...<br>

library(cluster)<br>

...<br>

x <- consensus(res)<br>

hc <- hclust(as.dist(1-x), method='average')<br>

cl <- cutree(hc, k = use.this.k)<br>

cl.hp <- cl[hp$colInd]<br>

sil <- silhouette( cutree(hc, k = use.this.k), as.dist(1-x) )<br>

write.table( sil, "silhouette.UNsorted.txt", sep="\t" )<br>

pdf(file="consensusmap.silhouette.pdf")<br>

plot(sil)<br>

dev.off()<br>

sil.summary <- summary(sil)<br>

write(sil.summary$avg.width, "silhouette.avg.width.txt")<br>

...<br>

<br>

I'd not noticed this difference before. Some differences are very likely expected, given that the rank survey run and the main run are independent, and NMF is stochastic rather than deterministic.<br>

<br>

Am I understanding this difference correctly? Or is the silhouette calculated differently in the rank survey?<br>

<br>

Thank you,<br>

<br>

Gordon<br>

--<br>

Gordon Robertson<br>

Michael Smith Genome Sciences Centre<br>

BC Cancer Agency<br>

Vancouver BC Canada<br>

<a href="http://www.bcgsc.ca/" target="_blank">www.bcgsc.ca</a><br>

<br>

> sessionInfo()<br>

R version 3.0.1 (2013-05-16)<br>

Platform: x86_64-apple-darwin10.8.0 (64-bit)<br>

<br>

locale:<br>

[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8<br>

<br>

attached base packages:<br>

[1] parallel  grid      stats     graphics  grDevices utils     datasets<br>

[8] methods   base<br>

<br>

other attached packages:<br>

 [1] RColorBrewer_1.0-5  doParallel_1.0.3    iterators_1.0.6<br>

 [4] foreach_1.4.1       ggplot2_0.9.3.1     NMF_0.17.5<br>

 [7] bigmemory_4.4.3     BH_1.51.0-1         bigmemory.sri_0.1.2<br>

[10] Biobase_2.20.1      BiocGenerics_0.6.0  digest_0.6.3<br>

[13] rngtools_1.2        pkgmaker_0.16       registry_0.2<br>

[16] cluster_1.14.4      edgeR_3.2.4         limma_3.16.6<br>

<br>

loaded via a namespace (and not attached):<br>

 [1] codetools_0.2-8  colorspace_1.2-2 compiler_3.0.1   dichromat_2.0-0<br>

 [5] gridBase_0.4-6   gtable_0.1.2     labeling_0.2     MASS_7.3-27<br>

 [9] munsell_0.4.2    plyr_1.8         proto_0.3-10     reshape2_1.2.2<br>

[13] scales_0.2.3     stringr_0.6.2    xtable_1.7-1<br>

<br>

<br>

</blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><div><font>Renaud Gaujoux, PhD<br></font></div><font>Computational Biology - University of Cape Town, South Africa<br></font></div>

</div>

</blockquote></div><br></div></div></body></html>