[adegenet-forum] Using PCA of SPCA in linear models with environmental data.

Hanan Sela hans at tauex.tau.ac.il
Fri Jul 20 08:56:12 CEST 2012


Hello
One more question about PCNM.  I have 68 wild wheat genotypes collected
from 35 sites. This means that some sample pairs have zero spatial
distance. How  should  I calculate the PCNM?    1. Use the coordinates of
the 68 samples even tough there is redundancy.
2. Use the coordinates of the 35 sites.
I have done both calculations and the results are some what different.
Have a nice weekend
Hanan

On Mon, Jul 16, 2012 at 3:29 PM, Jombart, Thibaut
<t.jombart at imperial.ac.uk>wrote:

>
> Hello,
>
> in fact this is a trivial result, and there is nothing wrong in your data.
> CCA is a Correspondence Analysis on predicted variables; in your case, you
> have exactly 2 predictors (the 2 PCNM), which are already uncorrelated (by
> construction). This the best plane in 2D is exactly that of your 2 PCNMs.
>
> Cheers
>
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Hanan
> Sela [hans at tauex.tau.ac.il]
> Sent: 14 July 2012 14:57
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: Re: [adegenet-forum] Using PCA of SPCA in linear models with
> environmental data.
>
> Hello list
> I have done what Thibaut suggested using the "pcnm" function in "vegan"
> (with no wights).  I have used the first two  pcnm PC's in canonical
> correspondence analysis (CCA) between SNP matrix as dependent matrix and
> the pcnm's PC's as perdictors. I have used the "cca" function in "vegan".
> The results are in the attached PDF file.  The results show that the fist
> two PC's fits exactly the first two cca PC's.  To remind you, the pcnm PC's
> are derived from spatial data and the cca PC's are derived from genetic SNP
> data.  My explanation to this is that I have a bias in the sampling that
> may results artifacts. In my data there are 1-5 genotypes from the same
> site (spatial distance=0)
> average 1.9 genotypes per site.  I suspect that the structure of the
> sampling which is not spatially uniform may contribute to the high
> correlation of the PC's.  When I choose one genotype per site, the
> correlation is lower but still very high. I would like to hear your opinion.
> Hanan
>
> On Thu, Jul 12, 2012 at 3:35 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
>
> Yes, there has been quite a few methods developed since. A starting point
> would be:
>
> Dray, S.; Legendre, P. & Peres-Neto, P. Spatial modelling: a comprehensive
> framework for principal coordinate analysis of neighbour matrices (PCNM)
> Ecological Modelling, 2006, 196, 483-493
>
> Cheers
>
> Thibaut
>
> ________________________________________
> From: Hanan Sela [dooshra at gmail.com<mailto:dooshra at gmail.com>]
> Sent: 12 July 2012 12:44
> To: Jombart, Thibaut
> Cc: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Using PCA of SPCA in linear models with
> environmental data.
>
> Thank you for the answer
> I want to test whether space (lat+lon) has significant effect on the
> genetic structure. Therefore I would like to use spatial variables in the
> right side of the model. Can you suggest a better representation of the
> spatial structures than lat-lon?
> Thank you
> Hanan
>
> On Thu, Jul 12, 2012 at 1:58 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>> wrote:
> Dear Hanan,
>
> this is a tricky question, and I don't think there is a single universal
> answer. Technically speaking, the only requirement is that your residuals
> are independent, so you need to make sure there is no spatial
> autocorrelation left there. Otherwise minimizing the sum of squared
> residuals is no longer the ML solution.
>
> The real problem relates to the interpretation, and the assumption
> implicitly made by the model. There is several reasons why spatial genetic
> patterns can occur. Your model has the form:
> genetic pattern = lat+lon + environment + residuals
>
> Which means that beyond linear trends, genetic patterns are due to the
> environment. It makes sense to treat spatial autocorrelation as a
> confounding factor first removed from the analysis. But lat+lon is often
> not enough to capture all spatial structures. In this respect, using PCs
> from PCA on the left side is probably better than sPCA (no need to seek
> spatial structures to remove them afterwards).
>
> Cheers
>
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>> [
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Hanan
> Sela [dooshra at gmail.com<mailto:dooshra at gmail.com><mailto:dooshra at gmail.com
> <mailto:dooshra at gmail.com>>]
> Sent: 12 July 2012 07:34
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
> Subject: [adegenet-forum] Using PCA of SPCA in linear models with
> environmental data.
>
> Hello all
> I am trying to estimate the major factors affecting the spatial
> distribution of wild wheat genotypes.  I am using a linear model where the
> PCA or the SPCA   first and second axis are the dependent variables and the
> environmental variables are the predictors. Additionally I am using the
> longitude and the latitude as predictors.   Since there is a spatial
> reference on the left side of the formula, I was wondering if using SPCA on
> the right side will not be a problem.
> Thank you
> Hanan
>
>
>
> --
> Hanan Sela Ph.D.
> Curator of the Lieberman Cereal Germplasm Bank
> The Institute for Cereal Crops Improvement
> Tel-Aviv University
> P.O. Box 39040
> Tel Aviv 69978
> Israel
>
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il><mailto:
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il>>
> Phone: 972-3-6405773
> Cell: 972-50-5727458
> Fax: 972-3-6407857
>
>
>
>
> --
> Hanan Sela Ph.D.
> Curator of the Lieberman Cereal Germplasm Bank
> The Institute for Cereal Crops Improvement
> Tel-Aviv University
> P.O. Box 39040
> Tel Aviv 69978
> Israel
>
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il>
> Phone: 972-3-6405773
> Cell: 972-50-5727458
> Fax: 972-3-6407857
>
>


-- 
Hanan Sela Ph.D.
Curator of the Lieberman Cereal Germplasm Bank
The Institute for Cereal Crops Improvement
Tel-Aviv University
P.O. Box 39040
Tel Aviv 69978
Israel

hans at tauex.tau.ac.il
Phone: 972-3-6405773
Cell: 972-50-5727458
Fax: 972-3-6407857
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20120720/1729bcc5/attachment.html>


More information about the adegenet-forum mailing list