Dear Hanan,<div><br></div><div>I will give you the useless opinion of someone working at 9pm on a Friday night. I guess the difference here could be made by the spatial matrix you are using. If it's a connectivity matrix, the binary code will consider the inds from the same locations as neighbours and won't really mind about them having the same coordinates. If you are otherwise using a weighted matrix, I suppose that having the same weights associated to different quantitative variables may be a confusing information, more than redundant. In this second case I would probably calculate the gene frequencies for all the inds at each site as they were pops and perform the spatial analysis on it...that's all that comes to my mind.</div>
<div><br></div><div>Have a nice weekend too</div><div><br></div><div>Valeria </div><div><br><div class="gmail_quote">On 20 July 2012 08:56, Hanan Sela <span dir="ltr"><<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello <br>One more question about PCNM. I have 68 wild wheat genotypes collected from 35 sites. This means that some sample pairs have zero spatial distance. How should I calculate the PCNM? 1. Use the coordinates of the 68 samples even tough there is redundancy.<br>
2. Use the coordinates of the 35 sites. <br>I have done both calculations and the results are some what different. <br>Have a nice weekend<span class="HOEnZb"><font color="#888888"><br>Hanan</font></span><div><div class="h5">
<br><br><div class="gmail_quote">On Mon, Jul 16, 2012 at 3:29 PM, Jombart, Thibaut <span dir="ltr"><<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hello,<br>
<br>
in fact this is a trivial result, and there is nothing wrong in your data. CCA is a Correspondence Analysis on predicted variables; in your case, you have exactly 2 predictors (the 2 PCNM), which are already uncorrelated (by construction). This the best plane in 2D is exactly that of your 2 PCNMs.<br>
<br>
Cheers<br>
<br>
Thibaut<br>
<br>
________________________________________<br>
From: <a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a> [<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>] on behalf of Hanan Sela [<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a>]<br>
Sent: 14 July 2012 14:57<br>
To: <a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a><br>
<div>Subject: Re: [adegenet-forum] Using PCA of SPCA in linear models with environmental data.<br>
<br>
</div><div>Hello list<br>
I have done what Thibaut suggested using the "pcnm" function in "vegan" (with no wights). I have used the first two pcnm PC's in canonical correspondence analysis (CCA) between SNP matrix as dependent matrix and the pcnm's PC's as perdictors. I have used the "cca" function in "vegan". The results are in the attached PDF file. The results show that the fist two PC's fits exactly the first two cca PC's. To remind you, the pcnm PC's are derived from spatial data and the cca PC's are derived from genetic SNP data. My explanation to this is that I have a bias in the sampling that may results artifacts. In my data there are 1-5 genotypes from the same site (spatial distance=0)<br>
average 1.9 genotypes per site. I suspect that the structure of the sampling which is not spatially uniform may contribute to the high correlation of the PC's. When I choose one genotype per site, the correlation is lower but still very high. I would like to hear your opinion.<br>
Hanan<br>
<br>
</div><div>On Thu, Jul 12, 2012 at 3:35 PM, Jombart, Thibaut <<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a><mailto:<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a>>> wrote:<br>
<br>
Yes, there has been quite a few methods developed since. A starting point would be:<br>
<br>
Dray, S.; Legendre, P. & Peres-Neto, P. Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM) Ecological Modelling, 2006, 196, 483-493<br>
<br>
Cheers<br>
<br>
Thibaut<br>
<br>
________________________________________<br>
</div>From: Hanan Sela [<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a><mailto:<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a>>]<br>
<div>Sent: 12 July 2012 12:44<br>
To: Jombart, Thibaut<br>
</div>Cc: <a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a>><br>
<div>Subject: Re: [adegenet-forum] Using PCA of SPCA in linear models with environmental data.<br>
<br>
Thank you for the answer<br>
I want to test whether space (lat+lon) has significant effect on the genetic structure. Therefore I would like to use spatial variables in the right side of the model. Can you suggest a better representation of the spatial structures than lat-lon?<br>
Thank you<br>
Hanan<br>
<br>
</div><div>On Thu, Jul 12, 2012 at 1:58 PM, Jombart, Thibaut <<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a><mailto:<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a>><mailto:<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a><mailto:<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a>>>> wrote:<br>
Dear Hanan,<br>
<br>
this is a tricky question, and I don't think there is a single universal answer. Technically speaking, the only requirement is that your residuals are independent, so you need to make sure there is no spatial autocorrelation left there. Otherwise minimizing the sum of squared residuals is no longer the ML solution.<br>
<br>
The real problem relates to the interpretation, and the assumption implicitly made by the model. There is several reasons why spatial genetic patterns can occur. Your model has the form:<br>
genetic pattern = lat+lon + environment + residuals<br>
<br>
Which means that beyond linear trends, genetic patterns are due to the environment. It makes sense to treat spatial autocorrelation as a confounding factor first removed from the analysis. But lat+lon is often not enough to capture all spatial structures. In this respect, using PCs from PCA on the left side is probably better than sPCA (no need to seek spatial structures to remove them afterwards).<br>
<br>
Cheers<br>
<br>
Thibaut<br>
<br>
________________________________________<br>
</div>From: <a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>>> [<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>>>] on behalf of Hanan Sela [<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a><mailto:<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a>><mailto:<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a><mailto:<a href="mailto:dooshra@gmail.com" target="_blank">dooshra@gmail.com</a>>>]<br>
<div>Sent: 12 July 2012 07:34<br>
</div>To: <a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a>><mailto:<a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a><mailto:<a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a>>><br>
<div>Subject: [adegenet-forum] Using PCA of SPCA in linear models with environmental data.<br>
<br>
Hello all<br>
I am trying to estimate the major factors affecting the spatial distribution of wild wheat genotypes. I am using a linear model where the PCA or the SPCA first and second axis are the dependent variables and the environmental variables are the predictors. Additionally I am using the longitude and the latitude as predictors. Since there is a spatial reference on the left side of the formula, I was wondering if using SPCA on the right side will not be a problem.<br>
Thank you<br>
Hanan<br>
<br>
<br>
<br>
--<br>
Hanan Sela Ph.D.<br>
Curator of the Lieberman Cereal Germplasm Bank<br>
The Institute for Cereal Crops Improvement<br>
Tel-Aviv University<br>
P.O. Box 39040<br>
Tel Aviv 69978<br>
Israel<br>
<br>
</div><a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a><mailto:<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a>><mailto:<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a><mailto:<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a>>><br>
<div><div>Phone: 972-3-6405773<br>
Cell: 972-50-5727458<br>
Fax: 972-3-6407857<br>
<br>
<br>
<br>
<br>
--<br>
Hanan Sela Ph.D.<br>
Curator of the Lieberman Cereal Germplasm Bank<br>
The Institute for Cereal Crops Improvement<br>
Tel-Aviv University<br>
P.O. Box 39040<br>
Tel Aviv 69978<br>
Israel<br>
<br>
<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a><mailto:<a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a>><br>
Phone: 972-3-6405773<br>
Cell: 972-50-5727458<br>
Fax: 972-3-6407857<br>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><div>
<div>Hanan Sela Ph.D.</div>
<div>Curator of the Lieberman Cereal Germplasm Bank</div>
<div>The Institute for Cereal Crops Improvement<br>Tel-Aviv University</div>
<div>P.O. Box 39040</div>
<div>Tel Aviv 69978 </div>
<div>Israel </div> </div>
<div><a href="mailto:hans@tauex.tau.ac.il" target="_blank">hans@tauex.tau.ac.il</a> <br></div>
<div>Phone: 972-3-6405773</div>
<div>Cell: 972-50-5727458<br>Fax: 972-3-6407857</div></div><br>
</div></div></div>
<br>_______________________________________________<br>
adegenet-forum mailing list<br>
<a href="mailto:adegenet-forum@lists.r-forge.r-project.org">adegenet-forum@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum</a><br></blockquote></div><br></div>