From mirainoshojo at gmail.com Wed May 1 00:16:12 2013 From: mirainoshojo at gmail.com (Valeria Montano) Date: Wed, 1 May 2013 00:16:12 +0200 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> Message-ID: Hi Nate, I think Thibaut's answer is already more than appropriate and actually points out the main question among your questions. As I understand, your population is not really easy to deal with since you have this high genetic homogeneity which does not leave much room to imagination (a bit frustrating I believe). Focusing on outliers can be an option, but it really depends on your scientific aim. If I were you, I would try with a statistics estimating individual genetic distances (for instance the mean number of pairwise distances using dist.gene in the ape package), calculate the mean of the distances of every ind from all the others, and than put a threshold to define 'outliers', does it make sense? A wee bit arbitrary maybe...moreover, in this case you would have 'outliers' compared to the general population, and I am not sure it would help... On the other hand, to understand whether outliers are immigrants from distant pops, you could build a network or use any phylogenetic reconstruction and see if outliers appear to be long but derived branches within their geographic neighbours or if they are more basal. This is the only tool that comes to my mind. Anyway good luck with it, flat populations are upsetting. with the occasion, happy Labor day everybody! (or happy transition from Spring to Summer - just in case you follow the Celtic tradition) Valeria On 30 April 2013 12:14, Jombart, Thibaut wrote: > Dear Nate, > > the problem here is that it is not clear what is meant by 'outliers'. If > we're talking about a few migrants from another population, then they > should fall in a small cluster of there own (e.g. using find.clusters). If > the definition is spatial, then 'outliers' may be individuals that are > genetically distinct from their neighbours (without having to be migrants > from another population). Or, 'outliers' can be individuals with > rare/original alleles (without having to be any of the above). Or > 'outliers' can be whatever does not fall within the inertia ellipse, and in > this case you will always have 'outliers' with the default parameters of > s.class. > > All of these definitions of 'outliers' would require different techniques > to pin them down. I would really avoid anything based on the distance from > the centroid. This implies that the cloud of point of the population is > well represented in only 2D and more importantly is spherical, which is > very unlikely. Detection based on inertia ellipses (not intertia - inertia > is the squared length of a vector, which in PCA is the variance of the > corresponding scores) is bound to fail to. There the assumption is that the > cloud of point of the population is bivariate normal, which again is > unlikely. But if it is the case, the default inertia ellipse in s.class > contains 2/3 of the points. It would be far-fetched to call the remaining > third 'outliers'. One can change this parameter, but again, that means > arbitrarily deciding of a fixed number of outliers. > > But again, the problem here as I understand it is not technical (for now) > - what is meant by 'outliers' needs to be clarified first. > > All the best > > Thibaut > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan > Truelove [nathan.truelove at manchester.ac.uk] > Sent: 23 April 2013 13:46 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a > Well Mixed Population > > Dear Thibaut and Adegenet Users, > > I would like to begin by thanking Thibaut and everyone else who created > Adegenet, it has to be the most useful data analysis tool that I have used > for my PhD research. > > I am PhD student working on the population genetics of Caribbean spiny > lobster using 16 microsatellite markers. The species has a huge potential > for migration since it can spend up to a year floating/swimming in ocean > currents before settling in shallow coastal habitat. Adults can also > migrate 10s to 100s of km. It's no big surprise that I am finding very > little differentiation in PCA, PCoA, and DAPC analyses. The trend that > comes out in all these analyses is that ~80% of individuals from all > sampling sites fall within the interia ellipse (s.class) or the contour > polygon (s.chull). Several of the individuals outside the interia ellipse > (or polygons) are located quite far away from the "core" of individuals > within the ellipse. These outlier individuals are not associated with any > particular site, however on the spatial level, there appear to be more > outliers in southern sites than in northern sites. I've been trying a > variety of techniques to try and figure out the ecological > importance of these outlier individuals. For example, a recent paper by > Elphie et al. entitled "Detecting immigrants in a highly genetically > homogeneous spiny lobster population (Palinurus elephas) in the northwest > Mediterranean Sea" explores a similar issue in a different species of > lobster. In this paper the authors use non-metric multidimensional scaling > to separate out the genetic distances of their individuals in multivariate > space. They then classified all individuals within a 50% radius of the > barycentre as the "reference population" and all individuals outside the > 50% radius as an "assignment population". They then used Geneclass2 to run > assignment tests and any individuals that had a p-value < 0.05 are > considered "genetically different". The authors argue that the most likely > explanation for the genetic differences is that the genetically unique > individuals detected in Geneclass are migrants from populations that have > genetically diverged. I imagine there are severa > l other ecological or selective processes that could also lead to > genetically unique individuals, so calling them migrants is up for debate. > > For my data I ran a similar analysis in Adegenet using the functions > s.class and s.chull along with dudi.pca to select the reference and > assignment populations for Genclass2. I compared these results to a similar > analysis using non-metric multidimensional scaling in the Vegan package. > The Adegenet PCA analyses contained about twice as many individuals in the > reference population than the nMDS technique, yet the overall trend of > Geneclass finding more unique individuals in the south than the north was > consistent among all techniques. Also, most of the distant outliers in PCA > analysis in Adegenet were also significantly different in the Geneclass > analysis. > > It would be excellent to get your opinions on this technique and discuss > potential options for improving it: > > 1) Would it be possible to get additional information using Adegenet on > how different the outliers in PCA are from the "core" of individuals inside > the inertia ellipse? It would be nice to run the entire analysis in > Adegenet and not have to use Geneclass2 at all. > > 2) Is there a simple way to identify each individual within an inertia > ellipse. I have been using the function identify to select the individuals > that are located within the ellipse, yet it is rather clunky since you have > to click on every point. > > 3) Any additional advice concerning how to detect genetic outliers in > homogeneous populations using Adegenet would be greatly appreciated. > > Thank you very much for your time. > > Best Wishes, > > Nate > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirainoshojo at gmail.com Wed May 1 00:37:37 2013 From: mirainoshojo at gmail.com (Valeria Montano) Date: Wed, 1 May 2013 00:37:37 +0200 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> Message-ID: ok, just realised you have only microsat, phylogeny wouldn't work much :P On 1 May 2013 00:16, Valeria Montano wrote: > Hi Nate, > > I think Thibaut's answer is already more than appropriate and actually > points out the main question among your questions. As I understand, your > population is not really easy to deal with since you have this high genetic > homogeneity which does not leave much room to imagination (a bit > frustrating I believe). Focusing on outliers can be an option, but it > really depends on your scientific aim. If I were you, I would try with a > statistics estimating individual genetic distances (for instance the mean > number of pairwise distances using dist.gene in the ape package), calculate > the mean of the distances of every ind from all the others, and than put > a threshold to define 'outliers', does it make sense? A wee bit arbitrary > maybe...moreover, in this case you would have 'outliers' compared to the > general population, and I am not sure it would help... > > On the other hand, to understand whether outliers are immigrants from > distant pops, you could build a network or use any phylogenetic > reconstruction and see if outliers appear to be long but derived branches > within their geographic neighbours or if they are more basal. This is the > only tool that comes to my mind. > > Anyway good luck with it, flat populations are upsetting. > > with the occasion, happy Labor day everybody! (or happy transition from > Spring to Summer - just in case you follow the Celtic tradition) > > Valeria > > > On 30 April 2013 12:14, Jombart, Thibaut wrote: > >> Dear Nate, >> >> the problem here is that it is not clear what is meant by 'outliers'. If >> we're talking about a few migrants from another population, then they >> should fall in a small cluster of there own (e.g. using find.clusters). If >> the definition is spatial, then 'outliers' may be individuals that are >> genetically distinct from their neighbours (without having to be migrants >> from another population). Or, 'outliers' can be individuals with >> rare/original alleles (without having to be any of the above). Or >> 'outliers' can be whatever does not fall within the inertia ellipse, and in >> this case you will always have 'outliers' with the default parameters of >> s.class. >> >> All of these definitions of 'outliers' would require different techniques >> to pin them down. I would really avoid anything based on the distance from >> the centroid. This implies that the cloud of point of the population is >> well represented in only 2D and more importantly is spherical, which is >> very unlikely. Detection based on inertia ellipses (not intertia - inertia >> is the squared length of a vector, which in PCA is the variance of the >> corresponding scores) is bound to fail to. There the assumption is that the >> cloud of point of the population is bivariate normal, which again is >> unlikely. But if it is the case, the default inertia ellipse in s.class >> contains 2/3 of the points. It would be far-fetched to call the remaining >> third 'outliers'. One can change this parameter, but again, that means >> arbitrarily deciding of a fixed number of outliers. >> >> But again, the problem here as I understand it is not technical (for now) >> - what is meant by 'outliers' needs to be clarified first. >> >> All the best >> >> Thibaut >> >> ________________________________________ >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan >> Truelove [nathan.truelove at manchester.ac.uk] >> Sent: 23 April 2013 13:46 >> To: adegenet-forum at lists.r-forge.r-project.org >> Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a >> Well Mixed Population >> >> Dear Thibaut and Adegenet Users, >> >> I would like to begin by thanking Thibaut and everyone else who created >> Adegenet, it has to be the most useful data analysis tool that I have used >> for my PhD research. >> >> I am PhD student working on the population genetics of Caribbean spiny >> lobster using 16 microsatellite markers. The species has a huge potential >> for migration since it can spend up to a year floating/swimming in ocean >> currents before settling in shallow coastal habitat. Adults can also >> migrate 10s to 100s of km. It's no big surprise that I am finding very >> little differentiation in PCA, PCoA, and DAPC analyses. The trend that >> comes out in all these analyses is that ~80% of individuals from all >> sampling sites fall within the interia ellipse (s.class) or the contour >> polygon (s.chull). Several of the individuals outside the interia ellipse >> (or polygons) are located quite far away from the "core" of individuals >> within the ellipse. These outlier individuals are not associated with any >> particular site, however on the spatial level, there appear to be more >> outliers in southern sites than in northern sites. I've been trying a >> variety of techniques to try and figure out the ecological >> importance of these outlier individuals. For example, a recent paper by >> Elphie et al. entitled "Detecting immigrants in a highly genetically >> homogeneous spiny lobster population (Palinurus elephas) in the northwest >> Mediterranean Sea" explores a similar issue in a different species of >> lobster. In this paper the authors use non-metric multidimensional scaling >> to separate out the genetic distances of their individuals in multivariate >> space. They then classified all individuals within a 50% radius of the >> barycentre as the "reference population" and all individuals outside the >> 50% radius as an "assignment population". They then used Geneclass2 to run >> assignment tests and any individuals that had a p-value < 0.05 are >> considered "genetically different". The authors argue that the most likely >> explanation for the genetic differences is that the genetically unique >> individuals detected in Geneclass are migrants from populations that have >> genetically diverged. I imagine there are severa >> l other ecological or selective processes that could also lead to >> genetically unique individuals, so calling them migrants is up for debate. >> >> For my data I ran a similar analysis in Adegenet using the functions >> s.class and s.chull along with dudi.pca to select the reference and >> assignment populations for Genclass2. I compared these results to a similar >> analysis using non-metric multidimensional scaling in the Vegan package. >> The Adegenet PCA analyses contained about twice as many individuals in the >> reference population than the nMDS technique, yet the overall trend of >> Geneclass finding more unique individuals in the south than the north was >> consistent among all techniques. Also, most of the distant outliers in PCA >> analysis in Adegenet were also significantly different in the Geneclass >> analysis. >> >> It would be excellent to get your opinions on this technique and discuss >> potential options for improving it: >> >> 1) Would it be possible to get additional information using Adegenet on >> how different the outliers in PCA are from the "core" of individuals inside >> the inertia ellipse? It would be nice to run the entire analysis in >> Adegenet and not have to use Geneclass2 at all. >> >> 2) Is there a simple way to identify each individual within an inertia >> ellipse. I have been using the function identify to select the individuals >> that are located within the ellipse, yet it is rather clunky since you have >> to click on every point. >> >> 3) Any additional advice concerning how to detect genetic outliers in >> homogeneous populations using Adegenet would be greatly appreciated. >> >> Thank you very much for your time. >> >> Best Wishes, >> >> Nate >> >> >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Wed May 1 09:45:02 2013 From: t.jombart at imperial.ac.uk (Thibaut Jombart) Date: Wed, 01 May 2013 08:45:02 +0100 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population Message-ID: <80470rfqqj65t8geoa0sdf8l.1367394302297@email.android.com> True, but reconstructing a tree is still possible. One can use nj on the squared Euclidean distances on allelic profiles (@tab) and still assess outliers that way. May be worth a try. Cheers Thibaut Sony Xperia U on O2 Valeria Montano wrote: >ok, just realised you have only microsat, phylogeny wouldn't work much :P > >On 1 May 2013 00:16, Valeria Montano wrote: > >> Hi Nate, >> >> I think Thibaut's answer is already more than appropriate and actually >> points out the main question among your questions. As I understand, your >> population is not really easy to deal with since you have this high genetic >> homogeneity which does not leave much room to imagination (a bit >> frustrating I believe). Focusing on outliers can be an option, but it >> really depends on your scientific aim. If I were you, I would try with a >> statistics estimating individual genetic distances (for instance the mean >> number of pairwise distances using dist.gene in the ape package), calculate >> the mean of the distances of every ind from all the others, and than put >> a threshold to define 'outliers', does it make sense? A wee bit arbitrary >> maybe...moreover, in this case you would have 'outliers' compared to the >> general population, and I am not sure it would help... >> >> On the other hand, to understand whether outliers are immigrants from >> distant pops, you could build a network or use any phylogenetic >> reconstruction and see if outliers appear to be long but derived branches >> within their geographic neighbours or if they are more basal. This is the >> only tool that comes to my mind. >> >> Anyway good luck with it, flat populations are upsetting. >> >> with the occasion, happy Labor day everybody! (or happy transition from >> Spring to Summer - just in case you follow the Celtic tradition) >> >> Valeria >> >> >> On 30 April 2013 12:14, Jombart, Thibaut wrote: >> >>> Dear Nate, >>> >>> the problem here is that it is not clear what is meant by 'outliers'. If >>> we're talking about a few migrants from another population, then they >>> should fall in a small cluster of there own (e.g. using find.clusters). If >>> the definition is spatial, then 'outliers' may be individuals that are >>> genetically distinct from their neighbours (without having to be migrants >>> from another population). Or, 'outliers' can be individuals with >>> rare/original alleles (without having to be any of the above). Or >>> 'outliers' can be whatever does not fall within the inertia ellipse, and in >>> this case you will always have 'outliers' with the default parameters of >>> s.class. >>> >>> All of these definitions of 'outliers' would require different techniques >>> to pin them down. I would really avoid anything based on the distance from >>> the centroid. This implies that the cloud of point of the population is >>> well represented in only 2D and more importantly is spherical, which is >>> very unlikely. Detection based on inertia ellipses (not intertia - inertia >>> is the squared length of a vector, which in PCA is the variance of the >>> corresponding scores) is bound to fail to. There the assumption is that the >>> cloud of point of the population is bivariate normal, which again is >>> unlikely. But if it is the case, the default inertia ellipse in s.class >>> contains 2/3 of the points. It would be far-fetched to call the remaining >>> third 'outliers'. One can change this parameter, but again, that means >>> arbitrarily deciding of a fixed number of outliers. >>> >>> But again, the problem here as I understand it is not technical (for now) >>> - what is meant by 'outliers' needs to be clarified first. >>> >>> All the best >>> >>> Thibaut >>> >>> ________________________________________ >>> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ >>> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan >>> Truelove [nathan.truelove at manchester.ac.uk] >>> Sent: 23 April 2013 13:46 >>> To: adegenet-forum at lists.r-forge.r-project.org >>> Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a >>> Well Mixed Population >>> >>> Dear Thibaut and Adegenet Users, >>> >>> I would like to begin by thanking Thibaut and everyone else who created >>> Adegenet, it has to be the most useful data analysis tool that I have used >>> for my PhD research. >>> >>> I am PhD student working on the population genetics of Caribbean spiny >>> lobster using 16 microsatellite markers. The species has a huge potential >>> for migration since it can spend up to a year floating/swimming in ocean >>> currents before settling in shallow coastal habitat. Adults can also >>> migrate 10s to 100s of km. It's no big surprise that I am finding very >>> little differentiation in PCA, PCoA, and DAPC analyses. The trend that >>> comes out in all these analyses is that ~80% of individuals from all >>> sampling sites fall within the interia ellipse (s.class) or the contour >>> polygon (s.chull). Several of the individuals outside the interia ellipse >>> (or polygons) are located quite far away from the "core" of individuals >>> within the ellipse. These outlier individuals are not associated with any >>> particular site, however on the spatial level, there appear to be more >>> outliers in southern sites than in northern sites. I've been trying a >>> variety of techniques to try and figure out the ecological >>> importance of these outlier individuals. For example, a recent paper by >>> Elphie et al. entitled "Detecting immigrants in a highly genetically >>> homogeneous spiny lobster population (Palinurus elephas) in the northwest >>> Mediterranean Sea" explores a similar issue in a different species of >>> lobster. In this paper the authors use non-metric multidimensional scaling >>> to separate out the genetic distances of their individuals in multivariate >>> space. They then classified all individuals within a 50% radius of the >>> barycentre as the "reference population" and all individuals outside the >>> 50% radius as an "assignment population". They then used Geneclass2 to run >>> assignment tests and any individuals that had a p-value < 0.05 are >>> considered "genetically different". The authors argue that the most likely >>> explanation for the genetic differences is that the genetically unique >>> individuals detected in Geneclass are migrants from populations that have >>> genetically diverged. I imagine there are severa >>> l other ecological or selective processes that could also lead to >>> genetically unique individuals, so calling them migrants is up for debate. >>> >>> For my data I ran a similar analysis in Adegenet using the functions >>> s.class and s.chull along with dudi.pca to select the reference and >>> assignment populations for Genclass2. I compared these results to a similar >>> analysis using non-metric multidimensional scaling in the Vegan package. >>> The Adegenet PCA analyses contained about twice as many individuals in the >>> reference population than the nMDS technique, yet the overall trend of >>> Geneclass finding more unique individuals in the south than the north was >>> consistent among all techniques. Also, most of the distant outliers in PCA >>> analysis in Adegenet were also significantly different in the Geneclass >>> analysis. >>> >>> It would be excellent to get your opinions on this technique and discuss >>> potential options for improving it: >>> >>> 1) Would it be possible to get additional information using Adegenet on >>> how different the outliers in PCA are from the "core" of individuals inside >>> the inertia ellipse? It would be nice to run the entire analysis in >>> Adegenet and not have to use Geneclass2 at all. >>> >>> 2) Is there a simple way to identify each individual within an inertia >>> ellipse. I have been using the function identify to select the individuals >>> that are located within the ellipse, yet it is rather clunky since you have >>> to click on every point. >>> >>> 3) Any additional advice concerning how to detect genetic outliers in >>> homogeneous populations using Adegenet would be greatly appreciated. >>> >>> Thank you very much for your time. >>> >>> Best Wishes, >>> >>> Nate >>> >>> >>> >>> _______________________________________________ >>> adegenet-forum mailing list >>> adegenet-forum at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.truelove at manchester.ac.uk Thu May 2 18:28:44 2013 From: nathan.truelove at manchester.ac.uk (Nathan Truelove) Date: Thu, 2 May 2013 16:28:44 +0000 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> Message-ID: <22F72AA6-1811-4814-A955-C73212FEE32A@postgrad.manchester.ac.uk> Dear Thibaut, Thank you for your email and your advice. I agree I should have been much more clear about what is meant by 'outliers'. The main research question of my PhD is to test the hypothesis that large-scale ocean currents shape spatial patterns of genetic variation in Caribbean spiny lobster species. We specifically collected genetic samples from regions with contrasting ocean currents, in particular advective regions (fast moving currents: Gulf Stream Current, Yucatan Current, and Caribbean Current) and retentive regions (slow moving circular currents, that retain marine larvae). Oceanographic and biological modeling studies have predicted that retentive regions have high levels of self recruitment (lobster larvae returning to their natal spawning site after spending 6 months in the open ocean), whilst advective regions have low levels of self recruitment. Therefore, for comparing the genetics data from my PhD to previous modeling studies, the definition of 'outlier' is spatial. I would specifically like to test the hypothesis that in retentive regions individuals are more genetically similar to their neighbors, whilst in advective regions they are more genetically different. I recently went through all the steps in your sPCA vignette to look for spatial patterns of global or local structure. None of the tests for spatial structure came out to be significant (mantel.randtest, global.rtest, local.rtest). I continued along with the sPCA vignette and tried using both the Delaunay triangulation and neighborhood by distance connection networks. However, I'm assuming that I shouldn't be very confident of any sPCA results since none of the initial statistical tests indicated the presence of spatial structure. Using the Delaunay triangulation network, the s.value results indicated global structure in one large advective region and local structure in the rest of the locations. When I used the neighborhood by distance network, I allowed the maximum distance between neighbors to be high enough that all sites could be connected to each other. This was probably too connected, whilst the Delaunay probably wasn't connected enough. When I used s.value for this analysis the all sites expect for Bermuda (the most distanct) displayed global structure. It would be great to get your opinion on using sPCA for Caribbean spiny lobster. Does the lack of spatial structure according to the mantel.randtest indicate that sPCA shouldn't be used? If you think sPCA should be pursued, I should be able get access to oceanographic modeling data that could be used to create a potentially more realistic connectivity network than either the Delaunay or neighborhood networks. Also on the topic of 'outliers' perhaps it would be more appropriate to focus on individuals with rare/original alleles since the spatial signal appears to be relatively weak. Thanks again for all your time and advice. It's been really helpful. Best Wishes, Nate On Apr 30, 2013, at 6:14 AM, Jombart, Thibaut wrote: > Dear Nate, > > the problem here is that it is not clear what is meant by 'outliers'. If we're talking about a few migrants from another population, then they should fall in a small cluster of there own (e.g. using find.clusters). If the definition is spatial, then 'outliers' may be individuals that are genetically distinct from their neighbours (without having to be migrants from another population). Or, 'outliers' can be individuals with rare/original alleles (without having to be any of the above). Or 'outliers' can be whatever does not fall within the inertia ellipse, and in this case you will always have 'outliers' with the default parameters of s.class. > > All of these definitions of 'outliers' would require different techniques to pin them down. I would really avoid anything based on the distance from the centroid. This implies that the cloud of point of the population is well represented in only 2D and more importantly is spherical, which is very unlikely. Detection based on inertia ellipses (not intertia - inertia is the squared length of a vector, which in PCA is the variance of the corresponding scores) is bound to fail to. There the assumption is that the cloud of point of the population is bivariate normal, which again is unlikely. But if it is the case, the default inertia ellipse in s.class contains 2/3 of the points. It would be far-fetched to call the remaining third 'outliers'. One can change this parameter, but again, that means arbitrarily deciding of a fixed number of outliers. > > But again, the problem here as I understand it is not technical (for now) - what is meant by 'outliers' needs to be clarified first. > > All the best > > Thibaut > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan Truelove [nathan.truelove at manchester.ac.uk] > Sent: 23 April 2013 13:46 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population > > Dear Thibaut and Adegenet Users, > > I would like to begin by thanking Thibaut and everyone else who created Adegenet, it has to be the most useful data analysis tool that I have used for my PhD research. > > I am PhD student working on the population genetics of Caribbean spiny lobster using 16 microsatellite markers. The species has a huge potential for migration since it can spend up to a year floating/swimming in ocean currents before settling in shallow coastal habitat. Adults can also migrate 10s to 100s of km. It's no big surprise that I am finding very little differentiation in PCA, PCoA, and DAPC analyses. The trend that comes out in all these analyses is that ~80% of individuals from all sampling sites fall within the interia ellipse (s.class) or the contour polygon (s.chull). Several of the individuals outside the interia ellipse (or polygons) are located quite far away from the "core" of individuals within the ellipse. These outlier individuals are not associated with any particular site, however on the spatial level, there appear to be more outliers in southern sites than in northern sites. I've been trying a variety of techniques to try and figure out the ecological importance of these outlier individuals. For example, a recent paper by Elphie et al. entitled "Detecting immigrants in a highly genetically homogeneous spiny lobster population (Palinurus elephas) in the northwest Mediterranean Sea" explores a similar issue in a different species of lobster. In this paper the authors use non-metric multidimensional scaling to separate out the genetic distances of their individuals in multivariate space. They then classified all individuals within a 50% radius of the barycentre as the "reference population" and all individuals outside the 50% radius as an "assignment population". They then used Geneclass2 to run assignment tests and any individuals that had a p-value < 0.05 are considered "genetically different". The authors argue that the most likely explanation for the genetic differences is that the genetically unique individuals detected in Geneclass are migrants from populations that have genetically diverged. I imagine there are several other ecological or selective processes that could also lead to genetically unique individuals, so calling them migrants is up for debate. > > For my data I ran a similar analysis in Adegenet using the functions s.class and s.chull along with dudi.pca to select the reference and assignment populations for Genclass2. I compared these results to a similar analysis using non-metric multidimensional scaling in the Vegan package. The Adegenet PCA analyses contained about twice as many individuals in the reference population than the nMDS technique, yet the overall trend of Geneclass finding more unique individuals in the south than the north was consistent among all techniques. Also, most of the distant outliers in PCA analysis in Adegenet were also significantly different in the Geneclass analysis. > > It would be excellent to get your opinions on this technique and discuss potential options for improving it: > > 1) Would it be possible to get additional information using Adegenet on how different the outliers in PCA are from the "core" of individuals inside the inertia ellipse? It would be nice to run the entire analysis in Adegenet and not have to use Geneclass2 at all. > > 2) Is there a simple way to identify each individual within an inertia ellipse. I have been using the function identify to select the individuals that are located within the ellipse, yet it is rather clunky since you have to click on every point. > > 3) Any additional advice concerning how to detect genetic outliers in homogeneous populations using Adegenet would be greatly appreciated. > > Thank you very much for your time. > > Best Wishes, > > Nate > > > From nathan.truelove at manchester.ac.uk Thu May 2 22:40:48 2013 From: nathan.truelove at manchester.ac.uk (Nathan Truelove) Date: Thu, 2 May 2013 20:40:48 +0000 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> Message-ID: Dear Valeria, Thank you for your advice and encouragement. I definitely need it working with flat populations! The overall aim of my research is to see if ocean currents are shaping spatial patterns of genetic variation in spiny lobster species. For example, we would like to investigate if individuals are more genetically different to their neighbors in advective oceanographic regions and more genetically similar to their neighbors in retentive regions. I was also interested in trying to figure out if any individuals that are genetically different from their neighbors happen to be immigrants from a distant population. After reading Thibaut's response to my first email it looks like I was heading down the wrong path by trying to force my data into 2 dimensions and basing my identification of 'outliers' based upon distance from the centroid. Some new techniques are definitely needed to search for the presence of any genetically unique individuals within my data. I really like your ideas of creating a tree using nj in ape and also using dist.gene in ape to calculate the mean number of pairwise distances of every individual from all others. I took your advice and ran your analyses. The tree in ape had 3 nodes. A long branch in the second node contained 22 individuals and stood out from all other branches. I then calculated the mean of the distances of every individual from all others and it came out to be 41. I sorted out all individuals that where 'arbitrarily' 2X above this threshold and almost all of these individuals also belonged to the branch that stood out in the second node of the tree. It would be great to get your opinion on these results. Perhaps it would be best if I sent you an image of the tree since it's a little tricky to describe it properly. Just let me know what you prefer. Your advice has been really helpful. Best Wishes, Nate On Apr 30, 2013, at 6:16 PM, Valeria Montano wrote: Hi Nate, I think Thibaut's answer is already more than appropriate and actually points out the main question among your questions. As I understand, your population is not really easy to deal with since you have this high genetic homogeneity which does not leave much room to imagination (a bit frustrating I believe). Focusing on outliers can be an option, but it really depends on your scientific aim. If I were you, I would try with a statistics estimating individual genetic distances (for instance the mean number of pairwise distances using dist.gene in the ape package), calculate the mean of the distances of every ind from all the others, and than put a threshold to define 'outliers', does it make sense? A wee bit arbitrary maybe...moreover, in this case you would have 'outliers' compared to the general population, and I am not sure it would help... On the other hand, to understand whether outliers are immigrants from distant pops, you could build a network or use any phylogenetic reconstruction and see if outliers appear to be long but derived branches within their geographic neighbours or if they are more basal. This is the only tool that comes to my mind. Anyway good luck with it, flat populations are upsetting. with the occasion, happy Labor day everybody! (or happy transition from Spring to Summer - just in case you follow the Celtic tradition) Valeria On 30 April 2013 12:14, Jombart, Thibaut > wrote: Dear Nate, the problem here is that it is not clear what is meant by 'outliers'. If we're talking about a few migrants from another population, then they should fall in a small cluster of there own (e.g. using find.clusters). If the definition is spatial, then 'outliers' may be individuals that are genetically distinct from their neighbours (without having to be migrants from another population). Or, 'outliers' can be individuals with rare/original alleles (without having to be any of the above). Or 'outliers' can be whatever does not fall within the inertia ellipse, and in this case you will always have 'outliers' with the default parameters of s.class. All of these definitions of 'outliers' would require different techniques to pin them down. I would really avoid anything based on the distance from the centroid. This implies that the cloud of point of the population is well represented in only 2D and more importantly is spherical, which is very unlikely. Detection based on inertia ellipses (not intertia - inertia is the squared length of a vector, which in PCA is the variance of the corresponding scores) is bound to fail to. There the assumption is that the cloud of point of the population is bivariate normal, which again is unlikely. But if it is the case, the default inertia ellipse in s.class contains 2/3 of the points. It would be far-fetched to call the remaining third 'outliers'. One can change this parameter, but again, that means arbitrarily deciding of a fixed number of outliers. But again, the problem here as I understand it is not technical (for now) - what is meant by 'outliers' needs to be clarified first. All the best Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan Truelove [nathan.truelove at manchester.ac.uk] Sent: 23 April 2013 13:46 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population Dear Thibaut and Adegenet Users, I would like to begin by thanking Thibaut and everyone else who created Adegenet, it has to be the most useful data analysis tool that I have used for my PhD research. I am PhD student working on the population genetics of Caribbean spiny lobster using 16 microsatellite markers. The species has a huge potential for migration since it can spend up to a year floating/swimming in ocean currents before settling in shallow coastal habitat. Adults can also migrate 10s to 100s of km. It's no big surprise that I am finding very little differentiation in PCA, PCoA, and DAPC analyses. The trend that comes out in all these analyses is that ~80% of individuals from all sampling sites fall within the interia ellipse (s.class) or the contour polygon (s.chull). Several of the individuals outside the interia ellipse (or polygons) are located quite far away from the "core" of individuals within the ellipse. These outlier individuals are not associated with any particular site, however on the spatial level, there appear to be more outliers in southern sites than in northern sites. I've been trying a variety of techniques to try and figure out the ecological importance of these outlier individuals. For example, a recent paper by Elphie et al. entitled "Detecting immigrants in a highly genetically homogeneous spiny lobster population (Palinurus elephas) in the northwest Mediterranean Sea" explores a similar issue in a different species of lobster. In this paper the authors use non-metric multidimensional scaling to separate out the genetic distances of their individuals in multivariate space. They then classified all individuals within a 50% radius of the barycentre as the "reference population" and all individuals outside the 50% radius as an "assignment population". They then used Geneclass2 to run assignment tests and any individuals that had a p-value < 0.05 are considered "genetically different". The authors argue that the most likely explanation for the genetic differences is that the genetically unique individuals detected in Geneclass are migrants from populations that have genetically diverged. I imagine there are severa l other ecological or selective processes that could also lead to genetically unique individuals, so calling them migrants is up for debate. For my data I ran a similar analysis in Adegenet using the functions s.class and s.chull along with dudi.pca to select the reference and assignment populations for Genclass2. I compared these results to a similar analysis using non-metric multidimensional scaling in the Vegan package. The Adegenet PCA analyses contained about twice as many individuals in the reference population than the nMDS technique, yet the overall trend of Geneclass finding more unique individuals in the south than the north was consistent among all techniques. Also, most of the distant outliers in PCA analysis in Adegenet were also significantly different in the Geneclass analysis. It would be excellent to get your opinions on this technique and discuss potential options for improving it: 1) Would it be possible to get additional information using Adegenet on how different the outliers in PCA are from the "core" of individuals inside the inertia ellipse? It would be nice to run the entire analysis in Adegenet and not have to use Geneclass2 at all. 2) Is there a simple way to identify each individual within an inertia ellipse. I have been using the function identify to select the individuals that are located within the ellipse, yet it is rather clunky since you have to click on every point. 3) Any additional advice concerning how to detect genetic outliers in homogeneous populations using Adegenet would be greatly appreciated. Thank you very much for your time. Best Wishes, Nate _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirainoshojo at gmail.com Tue May 7 10:18:40 2013 From: mirainoshojo at gmail.com (Valeria Montano) Date: Tue, 7 May 2013 10:18:40 +0200 Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA657057A75AF1@icexch-m1.ic.ac.uk> Message-ID: Hi Nate, I don't know how much I can help you, but if you want you can send me the plot off-list. All the best Valeria On 2 May 2013 22:40, Nathan Truelove wrote: > Dear Valeria, > > Thank you for your advice and encouragement. I definitely need it > working with flat populations! The overall aim of my research is to see if > ocean currents are shaping spatial patterns of genetic variation in spiny > lobster species. For example, we would like to investigate if individuals > are more genetically different to their neighbors in advective > oceanographic regions and more genetically similar to their neighbors in > retentive regions. I was also interested in trying to figure out if any > individuals that are genetically different from their neighbors happen to > be immigrants from a distant population. After reading Thibaut's response > to my first email it looks like I was heading down the wrong path by trying > to force my data into 2 dimensions and basing my identification of > 'outliers' based upon distance from the centroid. Some new techniques are > definitely needed to search for the presence of any genetically unique > individuals within my data. > > I really like your ideas of creating a tree using nj in ape and also > using dist.gene in ape to calculate the mean number of pairwise distances > of every individual from all others. I took your advice and ran your > analyses. The tree in ape had 3 nodes. A long branch in the second node > contained 22 individuals and stood out from all other branches. I then > calculated the mean of the distances of every individual from all others > and it came out to be 41. I sorted out all individuals that where > 'arbitrarily' 2X above this threshold and almost all of these individuals > also belonged to the branch that stood out in the second node of the tree. > > It would be great to get your opinion on these results. Perhaps it would > be best if I sent you an image of the tree since it's a little tricky to > describe it properly. Just let me know what you prefer. Your advice has > been really helpful. > > Best Wishes, > > Nate > > > On Apr 30, 2013, at 6:16 PM, Valeria Montano wrote: > > Hi Nate, > > I think Thibaut's answer is already more than appropriate and actually > points out the main question among your questions. As I understand, your > population is not really easy to deal with since you have this high genetic > homogeneity which does not leave much room to imagination (a bit > frustrating I believe). Focusing on outliers can be an option, but it > really depends on your scientific aim. If I were you, I would try with a > statistics estimating individual genetic distances (for instance the mean > number of pairwise distances using dist.gene in the ape package), calculate > the mean of the distances of every ind from all the others, and than put > a threshold to define 'outliers', does it make sense? A wee bit arbitrary > maybe...moreover, in this case you would have 'outliers' compared to the > general population, and I am not sure it would help... > > On the other hand, to understand whether outliers are immigrants from > distant pops, you could build a network or use any phylogenetic > reconstruction and see if outliers appear to be long but derived branches > within their geographic neighbours or if they are more basal. This is the > only tool that comes to my mind. > > Anyway good luck with it, flat populations are upsetting. > > with the occasion, happy Labor day everybody! (or happy transition from > Spring to Summer - just in case you follow the Celtic tradition) > > Valeria > > On 30 April 2013 12:14, Jombart, Thibaut wrote: > >> Dear Nate, >> >> the problem here is that it is not clear what is meant by 'outliers'. If >> we're talking about a few migrants from another population, then they >> should fall in a small cluster of there own (e.g. using find.clusters). If >> the definition is spatial, then 'outliers' may be individuals that are >> genetically distinct from their neighbours (without having to be migrants >> from another population). Or, 'outliers' can be individuals with >> rare/original alleles (without having to be any of the above). Or >> 'outliers' can be whatever does not fall within the inertia ellipse, and in >> this case you will always have 'outliers' with the default parameters of >> s.class. >> >> All of these definitions of 'outliers' would require different techniques >> to pin them down. I would really avoid anything based on the distance from >> the centroid. This implies that the cloud of point of the population is >> well represented in only 2D and more importantly is spherical, which is >> very unlikely. Detection based on inertia ellipses (not intertia - inertia >> is the squared length of a vector, which in PCA is the variance of the >> corresponding scores) is bound to fail to. There the assumption is that the >> cloud of point of the population is bivariate normal, which again is >> unlikely. But if it is the case, the default inertia ellipse in s.class >> contains 2/3 of the points. It would be far-fetched to call the remaining >> third 'outliers'. One can change this parameter, but again, that means >> arbitrarily deciding of a fixed number of outliers. >> >> But again, the problem here as I understand it is not technical (for now) >> - what is meant by 'outliers' needs to be clarified first. >> >> All the best >> >> Thibaut >> >> ________________________________________ >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan >> Truelove [nathan.truelove at manchester.ac.uk] >> Sent: 23 April 2013 13:46 >> To: adegenet-forum at lists.r-forge.r-project.org >> Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a >> Well Mixed Population >> >> Dear Thibaut and Adegenet Users, >> >> I would like to begin by thanking Thibaut and everyone else who created >> Adegenet, it has to be the most useful data analysis tool that I have used >> for my PhD research. >> >> I am PhD student working on the population genetics of Caribbean spiny >> lobster using 16 microsatellite markers. The species has a huge potential >> for migration since it can spend up to a year floating/swimming in ocean >> currents before settling in shallow coastal habitat. Adults can also >> migrate 10s to 100s of km. It's no big surprise that I am finding very >> little differentiation in PCA, PCoA, and DAPC analyses. The trend that >> comes out in all these analyses is that ~80% of individuals from all >> sampling sites fall within the interia ellipse (s.class) or the contour >> polygon (s.chull). Several of the individuals outside the interia ellipse >> (or polygons) are located quite far away from the "core" of individuals >> within the ellipse. These outlier individuals are not associated with any >> particular site, however on the spatial level, there appear to be more >> outliers in southern sites than in northern sites. I've been trying a >> variety of techniques to try and figure out the ecological >> importance of these outlier individuals. For example, a recent paper by >> Elphie et al. entitled "Detecting immigrants in a highly genetically >> homogeneous spiny lobster population (Palinurus elephas) in the northwest >> Mediterranean Sea" explores a similar issue in a different species of >> lobster. In this paper the authors use non-metric multidimensional scaling >> to separate out the genetic distances of their individuals in multivariate >> space. They then classified all individuals within a 50% radius of the >> barycentre as the "reference population" and all individuals outside the >> 50% radius as an "assignment population". They then used Geneclass2 to run >> assignment tests and any individuals that had a p-value < 0.05 are >> considered "genetically different". The authors argue that the most likely >> explanation for the genetic differences is that the genetically unique >> individuals detected in Geneclass are migrants from populations that have >> genetically diverged. I imagine there are severa >> l other ecological or selective processes that could also lead to >> genetically unique individuals, so calling them migrants is up for debate. >> >> For my data I ran a similar analysis in Adegenet using the functions >> s.class and s.chull along with dudi.pca to select the reference and >> assignment populations for Genclass2. I compared these results to a similar >> analysis using non-metric multidimensional scaling in the Vegan package. >> The Adegenet PCA analyses contained about twice as many individuals in the >> reference population than the nMDS technique, yet the overall trend of >> Geneclass finding more unique individuals in the south than the north was >> consistent among all techniques. Also, most of the distant outliers in PCA >> analysis in Adegenet were also significantly different in the Geneclass >> analysis. >> >> It would be excellent to get your opinions on this technique and discuss >> potential options for improving it: >> >> 1) Would it be possible to get additional information using Adegenet on >> how different the outliers in PCA are from the "core" of individuals inside >> the inertia ellipse? It would be nice to run the entire analysis in >> Adegenet and not have to use Geneclass2 at all. >> >> 2) Is there a simple way to identify each individual within an inertia >> ellipse. I have been using the function identify to select the individuals >> that are located within the ellipse, yet it is rather clunky since you have >> to click on every point. >> >> 3) Any additional advice concerning how to detect genetic outliers in >> homogeneous populations using Adegenet would be greatly appreciated. >> >> Thank you very much for your time. >> >> Best Wishes, >> >> Nate >> >> >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon May 20 17:47:28 2013 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 20 May 2013 15:47:28 +0000 Subject: [adegenet-forum] new release out Message-ID: <2CB2DA8E426F3541AB1907F98ABA657063865749@icexch-m1.ic.ac.uk> Dear all, a new version of adegenet has been released, and is now available on CRAN. For more info, check out the news section on the website: http://adegenet.r-forge.r-project.org/ In particular, the new cross-validation procedure to select the number of PCA components will be useful to DAPC users. Cheers Thibaut From thomassm at tcd.ie Fri May 24 15:31:01 2013 From: thomassm at tcd.ie (Muriel Thomasset) Date: Fri, 24 May 2013 15:31:01 +0200 Subject: [adegenet-forum] Stepwise forward discriminant analysis Message-ID: Dear all, I?m using over 300 SNPs to discriminate 5 different populations. I?m doing DAPC and looking at alleles contribution which is very useful. I would like to reduce the number of markers to use and so find the best combination that would show the best discrimination. I was thinking to do a stepwise forward discriminant analysis. I was wondering if someone has experience to do so on molecular markers? For examples which package or functions to use? Best regards Muriel -- Thomasset Muriel Trinity College Dublin 2, Ireland +353858269170 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Sun May 26 00:03:28 2013 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Sat, 25 May 2013 22:03:28 +0000 Subject: [adegenet-forum] Stepwise forward discriminant analysis In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA657063867534@icexch-m1.ic.ac.uk> Hello, I would recommend cross-validation, now available in the latest release of adegenet. See function xvalDapc. Best Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Muriel Thomasset [thomassm at tcd.ie] Sent: 24 May 2013 14:31 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Stepwise forward discriminant analysis Dear all, I?m using over 300 SNPs to discriminate 5 different populations. I?m doing DAPC and looking at alleles contribution which is very useful. I would like to reduce the number of markers to use and so find the best combination that would show the best discrimination. I was thinking to do a stepwise forward discriminant analysis. I was wondering if someone has experience to do so on molecular markers? For examples which package or functions to use? Best regards Muriel -- Thomasset Muriel Trinity College Dublin 2, Ireland +353858269170 From jrdupuis at ualberta.ca Fri May 31 21:06:56 2013 From: jrdupuis at ualberta.ca (Julian Dupuis) Date: Fri, 31 May 2013 13:06:56 -0600 Subject: [adegenet-forum] xvalDapc error message Message-ID: Hello, I am trying to use the new xvalDapc function to determine the ideal number of PCs to retain in my DAPC analysis, but am having trouble getting it to work. Here's the code I'm inputting: xval <- xvalDapc(JRD1NoNa at tab, pop(JRD1), n.pca.max=150, n.da=NULL, n.pca=NULL, center=TRUE, scale=FALSE, n.rep=10) And this is the error message I receive: Error in ldaX$scaling[, 1:n.da, drop = FALSE] : subscript out of bounds I've searched around for similar problems, but haven't found anything relating specifically to the lda function in MASS. I'm wondering if it might just be a problem with MASS being out of date with the new version of R/adegenet? Any help would be appreciated, and please let me know if I could include anything else to help identify the problem (my R expertise is pretty minimal). Also, if anyone has any insight/opinions on alternate ways to determine the ideal number of PCs to retain in a DAPC (e.g. the optim.a.score function), I would be interesetd to hear them. Thanks in advance, Julian Dupuis -- Julian Rowe Dupuis Ph.D. Candidate Dept of Biological Sciences CW405, Biol. Sci. Centre University of Alberta Edmonton, Alberta, CAN T6G 2E9 Office: Earth Sciences 1-52A -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri May 31 21:15:09 2013 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 31 May 2013 19:15:09 +0000 Subject: [adegenet-forum] xvalDapc error message In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA65706386CA1C@icexch-m1.ic.ac.uk> Hi there, is this working? ### library(adegenet) data(sim2pop) xval <- xvalDapc(sim2pop at tab, pop(sim2pop), n.pca.max=100, n.rep=3) xval boxplot(xval$success~xval$n.pca, xlab="Number of PCA components",ylab="Classification succes", main="DAPC - cross-validation") ### If yes, please send me off list some code and dataset to reproduce the error - might be a bug. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Julian Dupuis [jrdupuis at ualberta.ca] Sent: 31 May 2013 20:06 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] xvalDapc error message Hello, I am trying to use the new xvalDapc function to determine the ideal number of PCs to retain in my DAPC analysis, but am having trouble getting it to work. Here's the code I'm inputting: xval <- xvalDapc(JRD1NoNa at tab, pop(JRD1), n.pca.max=150, n.da=NULL, n.pca=NULL, center=TRUE, scale=FALSE, n.rep=10) And this is the error message I receive: Error in ldaX$scaling[, 1:n.da, drop = FALSE] : subscript out of bounds I've searched around for similar problems, but haven't found anything relating specifically to the lda function in MASS. I'm wondering if it might just be a problem with MASS being out of date with the new version of R/adegenet? Any help would be appreciated, and please let me know if I could include anything else to help identify the problem (my R expertise is pretty minimal). Also, if anyone has any insight/opinions on alternate ways to determine the ideal number of PCs to retain in a DAPC (e.g. the optim.a.score function), I would be interesetd to hear them. Thanks in advance, Julian Dupuis -- Julian Rowe Dupuis Ph.D. Candidate Dept of Biological Sciences CW405, Biol. Sci. Centre University of Alberta Edmonton, Alberta, CAN T6G 2E9 Office: Earth Sciences 1-52A