<html dir="ltr">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>

</head>

<body ocsi="0" fpstyle="1">

<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;"><br>

<div><font size="2">Hi there, <br>

<br>

no, PCA is not sensitive to the ordering of samples.<br>

<br>

Note: given the size of the dataset, it is probably easier to use the basic PCA procedure (dudi.pca). genlight objects are meant to be used whenever your computer could not otherwise store the data.<br>

<br>

If your missing data are not randomly distributed, then many NAs is a problem: individuals with similar missing data will be seen as artificially similar, and SNPs with similar NAs will be seen as artificially correlated.

<br>

<br>

It is safer to use less data, of better quality. In this case, you may want to remove SNPs with many NAs.

<br>

<br>

Cheers<br>

Thibaut<br>

<br>

<br>

</font></div>

<div style="font-family: Times New Roman; color: #000000; font-size: 16px">

<hr tabindex="-1">

<div style="direction: ltr;" id="divRpF873621"><font size="2" color="#000000" face="Tahoma"><b>From:</b> adegenet-forum-bounces@lists.r-forge.r-project.org [adegenet-forum-bounces@lists.r-forge.r-project.org] on behalf of zuzmus [zuzmus@gmail.com]<br>

<b>Sent:</b> 09 October 2014 10:55<br>

<b>To:</b> adegenet-forum@lists.r-forge.r-project.org<br>

<b>Subject:</b> [adegenet-forum] PCA sensitive to order of samples?<br>

</font><br>

</div>

<div></div>

<div>

<div dir="ltr">

<div dir="auto">

<div>Dear colleagues,</div>

<div><br>

</div>

<div>I would like to perform the PCA in adegenet package and managed to go through the procedure till the end. The problem is that the results don't make sense and I see an obvious bias towards the order of the samples in the input matrix.

<br>

<br>

The matrix has 140 samples from 11 putative species and cca 2800 SNPs coming from the RAD-seq method (only biallelicm SNPs included; coded 0 - more frequent allele, 1 - heterozygote, 2 - rarer allele, NA - missing data).

<br>

<br>

</div>

<div>I used the following code:<br>

<br>

> data <- read.table("/Users/zuzana/Matrix_for_adegenet_cutSNPsTo2484_NoHybrids.txt")<br>

> x <- new("genlight", data)<br>

> pca1 <- glPca(x)<br>

> scatter(pca1, posi="bottomleft")<br>

</div>

<div><br>

The results always show first 5-7 individuals as strongly separated along the PC1 and 2 and the rest forms one cluster. When I repeated the same analysis after removing the first few individual from the matrix, the pattern stayed as it was - the new first individuals

 became separated.<br>

<br>

<img alt="Vložený obrázek 1" src="cid:ii_148f43790a1a1263" width="485" height="530">

<br>

</div>

<div><br>

</div>

<div>I also tried to play with most of the options for glPca command following the manual or help in R, but always got the similar results...<br>

<br>

</div>

<div>

<div>Another issue is that I have quite some missing data (10 - 35 % per SNP, and cca 10 - 50% per individual) in my matrix, but this was the trade off of the experiment design ("sequence as much as possible as cheap as possible..."). But the first individuals

 in the list are quite well sequenced, so they are not the worst in sense of missing data...<br>

</div>

<br>

I wonder if I missed some basics, if I did something wrong or if it is possible that there really is a bias of the order of the samples in the matrix? I would be very happy if somebody could help me to find out how to solve this issue.<br>

<br>

</div>

<div>Thank you very much of any help and suggestion!:-)<br>

<br>

</div>

<div>With regards,<br>

<br>

Zuzana<br>

</div>

<br>

<div>---

<div>Zuzana Musilova, PhD.</div>

<div>Zoological Institute</div>

<div>University of Basel</div>

<div>Vesalgasse 1 | 4051 Basel</div>

<div>Switzerland | Europe</div>

<div><span style="font-size:13pt">)><(((@>....<@)))><(</span></div>

</div>

</div>

</div>

</div>

</div>

</div>

</body>

</html>