[adegenet-forum] calculating and presenting sPCA variance components
Jombart, Thibaut
t.jombart at imperial.ac.uk
Mon Oct 22 20:19:57 CEST 2012
Hi there,
In general, I'm not a big fan of the % of the total variance as an indication of relevant PCs. With 10 variables, 10% of variance on the first PC is basically what you'd expect at random, so pretty lame; with 10,000 variables, 10% on a single PC is pretty amazing already.
This said, in sPCA it does make sense to compare the variance of a PC to a 'standard'. This standard is given by the first eigenvalue of the PCA, which is, by definition, the linear combination of variables which has the highest possible variance. This is why the summary of spca objects gives information about the sPCA as well as about the equivalent PCA. The variance of the first PC of PCA is also the right hand-side boundary on the rectangle containing eigenvalues in screeplot.spca.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Pip Griffin [pip.griffin at gmail.com]
Sent: 21 October 2012 22:05
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] calculating and presenting sPCA variance components
Hi Thibaut and adegenet forum users,
I'd appreciate some advice about how to present sPCA variance components.
I understand that it's meaningless to calculate "% variance explained"
by a single eigenvalue as a proportion of the total, the way you can
for a PCA, since each eigenvalue comprises both genetic and spatial
components.
And I see, in the summary(spca), the eigenvalue decomposition of the
retained axes into the genetic variance and Moran's I components.
But, I am not clear how this relates to the *proportion* of total (or
maximum possible) variance (of each type), because the decomposition
is presented only for the eigenvalues that were retained.
It would make sense to present the spatial variance explained by Axis
1 as Axis1$moran/Imax (from the 'connection network statistics' also
presented in the summary) - i.e. as a proportion of the maximum
possible.
But then for the "maximum possible" genetic variance: where can I find
this figure? Should it be the sum of all PCA eigenvalue components
(not just those retained)? - findable by doing an independant PCA on
the same dataset?
Or am I taking the wrong approach here?
thanks in advance for your help
Pip
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum
mailing list