[NMF-user] setting the fraction of genes randomly sampled in an iteration?

Gordon Robertson grobertson at bcgsc.ca
Mon Mar 18 15:49:52 CET 2013


Yes, the error messages would have told me what the problem was. I was going so quickly at that point that all I saw was 'threw an error'. It's hard to believe, now, looking at the screen output, but that's what occurred.

Your suggestion:
> dummy <- nmf(V.matrix, 2, maxIter=20)
Error: NMF::nmf - Input matrix x contains at least one null row.

Repeating my original survey call:
> dummy.estim.r <- nmfEstimateRank( V.matrix, range=2:12, nrun=30, .opt='v', .pbackend=7 )
NMF algorithm: 'brunet'
Multiple runs: 30
...
...
Runs: |==================================================| 100%
Timing stopped at: 0.18 0.107 0.168
Error in function (...)  : All the runs produced an error:
-#1 [r=2] -> NMF::nmf - 30/30 fit(s) threw an error.
# Error(s) thrown:
  - run #1: NMF::nmf - Input matrix x contains at least one null row.
-#2 [r=3] -> NMF::nmf - 30/30 fit(s) threw an error.

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] synchronicity_1.1.0 cluster_1.14.3      NMF_0.9
 [4] Biobase_2.18.0      BiocGenerics_0.4.0  RColorBrewer_1.0-5
 [7] colorspace_1.2-1    stringr_0.6.2       gridBase_0.4-6
[10] digest_0.6.3        bigmemory_4.4.0     BH_1.51.0-0
[13] bigmemory.sri_0.1.2 doParallel_1.0.1    iterators_1.0.6
[16] foreach_1.4.0       registry_0.2        rngtools_1.1
[19] pkgmaker_0.10.1

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 xtable_1.7-1


Thanks,

G
--
Gordon Robertson
Michael Smith Genome Sciences Centre
BC Cancer Agency
Vancouver BC Canada
www.bcgsc.ca<http://www.bcgsc.ca>


On 2013-03-18, at 4:56 AM, Renaud Gaujoux wrote:

Good.
But wasn't the error message about null rows already displayed without calling tracebak()?
It should have appeared at the end of the messages after the call to nmf().

Generally, it is good practice to perform a plain single test run of nmf on the data, before launching bigger estimations.
e.g., in your case, the following should have given you a quick taste of the error:

dummy <- nmf(x, 2, maxIter=20)


Renaud


2013/3/18 Gordon Robertson <grobertson at bcgsc.ca<mailto:grobertson at bcgsc.ca>>
Thank you.  traceback() pointed to the problem:
...
    }("[r=2] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s) thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null row.",
        "[r=3] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s) thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null row.",
        "[r=4] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s) thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null row.",
        "[r=5] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s) thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null row.",
        "[r=6] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s) thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null row.")

Sure enough, there were two miRs (rows) with zero values in all samples. I'd probably seen them in setting up the original run, but was going quickly…

When I deleted the rows, I was able to run the nmf command on the matrix (35 data rows, 66 samples):
>res <- nmf(x, 2:6, .opt='v9')
...
…
# libPaths:
   /Library/Frameworks/R.framework/Versions/2.15/Resources/library
Runs:  1  32 4 5 6 7 8* 11* 13* 10* 16* 12* 14* 15 9* 19 18 21* 24* 20 27 22** 17* 23 26 30 29 28 25* ... DONE
# Processing partial results ... OK
System time:
   user  system elapsed
 46.820   1.128   8.759
## Cleaning up ...
# Restoring NMF options ... OK
# Restoring foreach backend ... OK
# Updating RNG settings ... OK
# RNG kind:  Mersenne-Twister / Inversion
# RNG state: 403L, 2L, ..., 270725601L [84247f48b334857a5b8c4b029d25909b]
# Deleting directory './NMF_20f56511e3f' ... OK
+ measures ... OK
>plot(res)

How should NMF handle such cases? The docs likely warn against submitting zero-valued rows, and I typically never do, but this was a test case in which I was running with a selected subset of miRs.

Thanks again!

Gordon


On 2013-03-18, at 4:05 AM, Renaud Gaujoux wrote:

Could you please post some reproducible example (commands, output of errors, traceback and sessionInfo)?
This will simplify tracing the issue.
If confidentiality is an issue, email only to me.

e.g.:

# run NMF
res <- nmf(x, 2:6, .opt='v9')
traceback()
sessionInfo()

Thank you.
Renaud


2013/3/18 Gordon Robertson <grobertson at bcgsc.ca<mailto:grobertson at bcgsc.ca>>
Renaud,

Thanks for clarifying this.

I asked because I tried to run NMF on a miRNA-seq abundance matrix that had 66 samples (columns) and only a small set of miRs (rows), say 20 miRs. I've used NMF routinely for larger miRNA-seq data matrices for some time (using 200-300 miRs), including on a 300-miR matrix for the same samples, but this time the survey returned only errors. I was able to get results from Matt Wilkerson's Consensus Cluster Plus package. I'll look more carefully at what happens to the NMF runs as I progressively remove miRs.

G



On 2013-03-18, at 12:55 AM, Renaud Gaujoux wrote:

Hi,

no, all genes/features are included in each run. What changes is the seed, i.e. starting point, which is different and randomly generated at each run.

Standard consensus clustering analysis would use a different set of _samples_ for each run. This is fine for evaluating the accuracy/stability of classification, but makes it difficult to link features to sample groups, since each run (vote) returns a somehow different set of component-specific feature: what set of features or basis components should be used? average? consensus?
Would be nice to incorporate a function/option to  easily perform such analysis though.

There is still some methodology to be developed around this point. A technical issue also arise in term of memory/speed, if one wants to compute complete feature consensus matrices.
I am happy to hear/discuss on this.
My time is currently very limited, although bringing the package back to CRAN is quite high on my todo list.

Renaud



2013/3/14 Gordon Robertson <grobertson at bcgsc.ca<mailto:grobertson at bcgsc.ca>>
>From what I understand, in each iteration (of, say, 200) in a run, a random subset of genes is used. Is it possible to set the fractional value retained, e.g. 0.90, 0.95?
Thanks,
G
--
Gordon Robertson
Michael Smith Genome Sciences Centre
BC Cancer Agency
Vancouver BC Canada
www.bcgsc.ca<http://www.bcgsc.ca/>



_______________________________________________
nmf-user mailing list
nmf-user at lists.r-forge.r-project.org<mailto:nmf-user at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/nmf-user



--

Renaud Gaujoux
Computational Biology - University of Cape Town
South Africa




--

Renaud Gaujoux
Computational Biology - University of Cape Town
South Africa




--

Renaud Gaujoux
Computational Biology - University of Cape Town
South Africa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/nmf-user/attachments/20130318/d362a549/attachment-0001.html>


More information about the nmf-user mailing list