[NMF-user] setting the fraction of genes randomly sampled in an iteration?

Renaud Gaujoux renaud at cbio.uct.ac.za
Mon Mar 18 12:56:52 CET 2013


Good.
But wasn't the error message about null rows already displayed without
calling tracebak()?
It should have appeared at the end of the messages after the call to nmf().

Generally, it is good practice to perform a plain single test run of nmf on
the data, before launching bigger estimations.
e.g., in your case, the following should have given you a quick taste of
the error:

dummy <- nmf(x, 2, maxIter=20)


Renaud


2013/3/18 Gordon Robertson <grobertson at bcgsc.ca>

> Thank you.  traceback() pointed to the problem:
> ...
>     }("[r=2] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s)
> thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null
> row.",
>         "[r=3] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s)
> thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null
> row.",
>         "[r=4] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s)
> thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null
> row.",
>         "[r=5] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s)
> thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null
> row.",
>         "[r=6] -> NMF::nmf - 30/30 fit(s) threw an error.\n# Error(s)
> thrown:\n  - run #1: NMF::nmf - Input matrix x contains at least one null
> row.")
>
> Sure enough, there were two miRs (rows) with zero values in all samples.
> I'd probably seen them in setting up the original run, but was going
> quickly…
>
> When I deleted the rows, I was able to run the nmf command on the matrix
> (35 data rows, 66 samples):
> >res <- nmf(x, 2:6, .opt='v9')
> ...
>> # libPaths:
>    /Library/Frameworks/R.framework/Versions/2.15/Resources/library
> Runs:  1  32 4 5 6 7 8* 11* 13* 10* 16* 12* 14* 15 9* 19 18 21* 24* 20 27
> 22** 17* 23 26 30 29 28 25* ... DONE
> # Processing partial results ... OK
> System time:
>    user  system elapsed
>  46.820   1.128   8.759
> ## Cleaning up ...
> # Restoring NMF options ... OK
> # Restoring foreach backend ... OK
> # Updating RNG settings ... OK
> # RNG kind:  Mersenne-Twister / Inversion
> # RNG state: 403L, 2L, ..., 270725601L [84247f48b334857a5b8c4b029d25909b]
> # Deleting directory './NMF_20f56511e3f' ... OK
> + measures ... OK
> >plot(res)
>
> How should NMF handle such cases? The docs likely warn against submitting
> zero-valued rows, and I typically never do, but this was a test case in
> which I was running with a selected subset of miRs.
>
> Thanks again!
>
> Gordon
>
>
> On 2013-03-18, at 4:05 AM, Renaud Gaujoux wrote:
>
> Could you please post some reproducible example (commands, output of
> errors, traceback and sessionInfo)?
> This will simplify tracing the issue.
> If confidentiality is an issue, email only to me.
>
> e.g.:
>
> # run NMF
> res <- nmf(x, 2:6, .opt='v9')
> traceback()
> sessionInfo()
>
> Thank you.
> Renaud
>
>
> 2013/3/18 Gordon Robertson <grobertson at bcgsc.ca>
>
>> Renaud,
>>
>> Thanks for clarifying this.
>>
>> I asked because I tried to run NMF on a miRNA-seq abundance matrix that
>> had 66 samples (columns) and only a small set of miRs (rows), say 20 miRs.
>> I've used NMF routinely for larger miRNA-seq data matrices for some time
>> (using 200-300 miRs), including on a 300-miR matrix for the same samples,
>> but this time the survey returned only errors. I was able to get results
>> from Matt Wilkerson's Consensus Cluster Plus package. I'll look more
>> carefully at what happens to the NMF runs as I progressively remove miRs.
>>
>> G
>>
>>
>>
>> On 2013-03-18, at 12:55 AM, Renaud Gaujoux wrote:
>>
>> Hi,
>>
>> no, all genes/features are included in each run. What changes is the
>> seed, i.e. starting point, which is different and randomly generated at
>> each run.
>>
>> Standard consensus clustering analysis would use a different set of
>> _samples_ for each run. This is fine for evaluating the accuracy/stability
>> of classification, but makes it difficult to link features to sample
>> groups, since each run (vote) returns a somehow different set of
>> component-specific feature: what set of features or basis components should
>> be used? average? consensus?
>> Would be nice to incorporate a function/option to  easily perform such
>> analysis though.
>>
>> There is still some methodology to be developed around this point. A
>> technical issue also arise in term of memory/speed, if one wants to compute
>> complete feature consensus matrices.
>> I am happy to hear/discuss on this.
>> My time is currently very limited, although bringing the package back to
>> CRAN is quite high on my todo list.
>>
>> Renaud
>>
>>
>>
>> 2013/3/14 Gordon Robertson <grobertson at bcgsc.ca>
>>
>>> From what I understand, in each iteration (of, say, 200) in a run, a
>>> random subset of genes is used. Is it possible to set the fractional value
>>> retained, e.g. 0.90, 0.95?
>>> Thanks,
>>> G
>>>  --
>>> Gordon Robertson
>>> Michael Smith Genome Sciences Centre
>>> BC Cancer Agency
>>> Vancouver BC Canada
>>> www.bcgsc.ca
>>>
>>>
>>>
>>> _______________________________________________
>>> nmf-user mailing list
>>> nmf-user at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/nmf-user
>>>
>>
>>
>>
>> --
>>
>> Renaud Gaujoux
>> Computational Biology - University of Cape Town
>> South Africa
>>
>>
>>
>
>
> --
>
> Renaud Gaujoux
> Computational Biology - University of Cape Town
> South Africa
>
>
>


-- 

Renaud Gaujoux
Computational Biology - University of Cape Town
South Africa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/nmf-user/attachments/20130318/fe2f3b31/attachment-0001.html>


More information about the nmf-user mailing list