From clarklc at appstate.edu Thu Nov 2 20:10:03 2017 From: clarklc at appstate.edu (Logan Clark) Date: Thu, 2 Nov 2017 15:10:03 -0400 Subject: [adegenet-forum] Problems with genind object Message-ID: Hi, I am working on a project involving microsatellite data in a relatively standard diploid species. I have 141 samples across 17 loci and my alleles formatted as 168/184 for a heterozygote and 168/168 for a homozygote. I have been able to transfer my data from a .csv file to a dataframe then to a genind object using df2genind. I have been able to run other statistics on the genind object I have created such as HWE, but I am trying to use the chao_bootstrap function in mmod with this object and I keep getting the error message "Error in tapply(y, pop(x), mean, na.rm = TRUE) : arguments must have same length" I'm not really sure what this means or how to fix it. I am trying to run the individuals as one population, but I am not sure if that is have any effect. Should I create another vector of 1's to code for the population and concatenate the two? Does anyone have any advice on how to solve this problem? Thank you in advance for any help with this issue. -- Logan Clark Graduate Student Biology Department Appalachian State University P: 252-370-0034 E: clarklc at appstate.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mstagliamonte at ufl.edu Fri Nov 3 01:27:33 2017 From: mstagliamonte at ufl.edu (Tagliamonte,Massimiliano S) Date: Fri, 3 Nov 2017 00:27:33 +0000 Subject: [adegenet-forum] Problems with genind object In-Reply-To: References: Message-ID: <1509668852383.90303@ufl.edu> Hi Logan, Out of curiosity, do your population or sample names include non alphanumeric characters? Such as dots, as an example? If they do, try and remove those characters, see if it helps. Make sure the new names are still unique. Good luck with your analyses, Max Massimiliano S. Tagliamonte Graduate Student University of Florida College of Veterinary Medicine Department of Infectious Diseases and Immunology ________________________________ From: adegenet-forum-bounces at r-forge.wu-wien.ac.at on behalf of Logan Clark Sent: Thursday, November 2, 2017 3:10 PM To: adegenet-forum at r-forge.wu-wien.ac.at Subject: [adegenet-forum] Problems with genind object Hi, I am working on a project involving microsatellite data in a relatively standard diploid species. I have 141 samples across 17 loci and my alleles formatted as 168/184 for a heterozygote and 168/168 for a homozygote. I have been able to transfer my data from a .csv file to a dataframe then to a genind object using df2genind. I have been able to run other statistics on the genind object I have created such as HWE, but I am trying to use the chao_bootstrap function in mmod with this object and I keep getting the error message "Error in tapply(y, pop(x), mean, na.rm = TRUE) : arguments must have same length" I'm not really sure what this means or how to fix it. I am trying to run the individuals as one population, but I am not sure if that is have any effect. Should I create another vector of 1's to code for the population and concatenate the two? Does anyone have any advice on how to solve this problem? Thank you in advance for any help with this issue. -- Logan Clark Graduate Student Biology Department Appalachian State University P: 252-370-0034 E: clarklc at appstate.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Fri Nov 3 09:12:50 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Fri, 3 Nov 2017 09:12:50 +0100 (CET) Subject: [adegenet-forum] Problems with genind object In-Reply-To: References: Message-ID: <456187293.353689.1509696770256.JavaMail.zimbra@biolitika.si> Can you provide a small example we can work on? Cheers, Roman ---- In god we trust, all others bring data. > Zahtevaj IJZ na https://kurc.biolitika.si From: "Logan Clark" To: adegenet-forum at lists.r-forge.r-project.org Sent: Thursday, November 2, 2017 8:10:03 PM Subject: [adegenet-forum] Problems with genind object Hi, I am working on a project involving microsatellite data in a relatively standard diploid species. I have 141 samples across 17 loci and my alleles formatted as 168/184 for a heterozygote and 168/168 for a homozygote. I have been able to transfer my data from a .csv file to a dataframe then to a genind object using df2genind. I have been able to run other statistics on the genind object I have created such as HWE, but I am trying to use the chao_bootstrap function in mmod with this object and I keep getting the error message "Error in tapply(y, pop(x), mean, na.rm = TRUE) : arguments must have same length" I'm not really sure what this means or how to fix it. I am trying to run the individuals as one population, but I am not sure if that is have any effect. Should I create another vector of 1's to code for the population and concatenate the two? Does anyone have any advice on how to solve this problem? Thank you in advance for any help with this issue. -- Logan Clark Graduate Student Biology Department Appalachian State University P: 252-370-0034 E: clarklc at appstate.edu _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Fri Nov 3 13:37:55 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 3 Nov 2017 12:37:55 +0000 Subject: [adegenet-forum] Problems with genind object In-Reply-To: <456187293.353689.1509696770256.JavaMail.zimbra@biolitika.si> References: <456187293.353689.1509696770256.JavaMail.zimbra@biolitika.si> Message-ID: Yes, we need more information. We don't know what 'y' is here, or what command line you called in the first place. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 3 November 2017 at 08:12, Roman Lu?trik wrote: > Can you provide a small example we can work on? > > Cheers, > Roman > > ---- > In god we trust, all others bring data. >> Zahtevaj IJZ na https://kurc.biolitika.si > > ________________________________ > From: "Logan Clark" > To: adegenet-forum at lists.r-forge.r-project.org > Sent: Thursday, November 2, 2017 8:10:03 PM > Subject: [adegenet-forum] Problems with genind object > > Hi, I am working on a project involving microsatellite data in a relatively > standard diploid species. I have 141 samples across 17 loci and my alleles > formatted as 168/184 for a heterozygote and 168/168 for a homozygote. I have > been able to transfer my data from a .csv file to a dataframe then to a > genind object using df2genind. I have been able to run other statistics on > the genind object I have created such as HWE, but I am trying to use the > chao_bootstrap function in mmod with this object and I keep getting the > error message > "Error in tapply(y, pop(x), mean, na.rm = TRUE) : > arguments must have same length" > > I'm not really sure what this means or how to fix it. I am trying to run the > individuals as one population, but I am not sure if that is have any effect. > Should I create another vector of 1's to code for the population and > concatenate the two? Does anyone have any advice on how to solve this > problem? Thank you in advance for any help with this issue. > > -- > Logan Clark > Graduate Student > Biology Department > Appalachian State University > P: 252-370-0034 > E: clarklc at appstate.edu > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From thibautjombart at gmail.com Fri Nov 3 16:19:57 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 3 Nov 2017 15:19:57 +0000 Subject: [adegenet-forum] Problems with genind object In-Reply-To: References: <456187293.353689.1509696770256.JavaMail.zimbra@biolitika.si> Message-ID: I suspect your issue is that pop(x) is NULL. This is because you do not set it up when creating the genind object. You can add it by reading this data into R, and then using pop(Hex) <- your_pop_factor Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 3 November 2017 at 13:47, Logan Clark wrote: > My code so far has been > Hexas<- read.csv(Hexastylis1) > Hexa<- data.frame(Hexas) > Hex<- df2genind(Hexas, sep="/") > HexBoot<- chao_bootstrap(Hex, n=1000) > Error in tapply(y, pop(x), mean, na.rm = TRUE) : > arguments must have same length > > I also included a subset of the data. Thank you for your help, I really > appreciate it. I am still new to R and learning the ropes. > > On Fri, Nov 3, 2017 at 8:37 AM, Thibaut Jombart > wrote: >> >> Yes, we need more information. We don't know what 'y' is here, or what >> command line you called in the first place. >> >> Best >> Thibaut >> >> -- >> Dr Thibaut Jombart >> Lecturer, Department of Infectious Disease Epidemiology, Imperial College >> London >> Head of RECON: repidemicsconsortium.org >> WHO Consultant - outbreak analysis >> sites.google.com/site/thibautjombart/ >> Twitter: @TeebzR >> +44(0)20 7594 3658 >> >> >> On 3 November 2017 at 08:12, Roman Lu?trik >> wrote: >> > Can you provide a small example we can work on? >> > >> > Cheers, >> > Roman >> > >> > ---- >> > In god we trust, all others bring data. >> >> Zahtevaj IJZ na https://kurc.biolitika.si >> > >> > ________________________________ >> > From: "Logan Clark" >> > To: adegenet-forum at lists.r-forge.r-project.org >> > Sent: Thursday, November 2, 2017 8:10:03 PM >> > Subject: [adegenet-forum] Problems with genind object >> > >> > Hi, I am working on a project involving microsatellite data in a >> > relatively >> > standard diploid species. I have 141 samples across 17 loci and my >> > alleles >> > formatted as 168/184 for a heterozygote and 168/168 for a homozygote. I >> > have >> > been able to transfer my data from a .csv file to a dataframe then to a >> > genind object using df2genind. I have been able to run other statistics >> > on >> > the genind object I have created such as HWE, but I am trying to use the >> > chao_bootstrap function in mmod with this object and I keep getting the >> > error message >> > "Error in tapply(y, pop(x), mean, na.rm = TRUE) : >> > arguments must have same length" >> > >> > I'm not really sure what this means or how to fix it. I am trying to run >> > the >> > individuals as one population, but I am not sure if that is have any >> > effect. >> > Should I create another vector of 1's to code for the population and >> > concatenate the two? Does anyone have any advice on how to solve this >> > problem? Thank you in advance for any help with this issue. >> > >> > -- >> > Logan Clark >> > Graduate Student >> > Biology Department >> > Appalachian State University >> > P: 252-370-0034 >> > E: clarklc at appstate.edu >> > >> > >> > _______________________________________________ >> > adegenet-forum mailing list >> > adegenet-forum at lists.r-forge.r-project.org >> > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> > >> > _______________________________________________ >> > adegenet-forum mailing list >> > adegenet-forum at lists.r-forge.r-project.org >> > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > > > -- > Logan Clark > Graduate Student > Biology Department > Appalachian State University > P: 252-370-0034 > E: clarklc at appstate.edu > From JudyC at si.edu Fri Nov 3 22:04:35 2017 From: JudyC at si.edu (Judy (Duffie), Caroline) Date: Fri, 3 Nov 2017 21:04:35 +0000 Subject: [adegenet-forum] newbie help: formatting xy data stored in genind object for use in a sPCA Message-ID: <66466C5A-F609-46E1-9DCD-3DE3387BA55C@si.edu> Dear Adegenet forum, I created a genind object using the ?read.structure? command. As you see in the code below - I specified the columns [,3:8] as ?other?. These original data contain four traits, and xy coordinates [,7:8]. After conversion to a genind object, the resulting structure of @other is a list with one object, $X, which is a 158 x 6 matrix. The columns represent the four traits and the x,y data stored in separate columns. How can I reformat $X to contain five objects: the four traits (each as a separate object), and an xy matrix as a single matrix with dimensions 158 x 2 for use in a downstream sPCA? Many thanks in advance for help. Caroline ------------------------------------------------------ > mydata <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", + n.ind=158, + n.loc=6451, + onerowperin=TRUE, + col.lab=1, + col.pop=2, + col.others=3:8, + row.marknames=0, + ask=FALSE, + ) Converting data from a STRUCTURE .stru file to a genind object... > mydata at other $X [,1] [,2] [,3] [,4] [,5] [,6] 1 "female" "black" "1" "3.72" "18.02911667" "-76.3886" 2 "female" "black" "1" "3.55" "18.02911667" "-76.3886" 3 "male" "black" "1" "3.97" "18.02953333" "-76.38783333" 4 "male" "black" "1" "3.55" "18.0337" "-76.38993333" 5 "female" "black" "1" "3.735" "18.0337" "-76.38993333" 6 "male" "black" "1" "3.95" "18.0295" "-76.38831667" 7 "male" "black" "1" "3.99" "18.02953333" "-76.38783333" Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mneel at umd.edu Sat Nov 4 22:35:31 2017 From: mneel at umd.edu (Maile C Neel) Date: Sat, 4 Nov 2017 17:35:31 -0400 Subject: [adegenet-forum] Handling Missing Spatial Data in genind2genpop Message-ID: Apologies if this seemingly simple question has been answered before. I have found nothing from extensive searches of manuals and adegenet archives. In my data set of 1,110+ individuals from 25 sites, a few individuals at 2 sites lack GPS coordinates. For pairwise comparisons of individuals that require coordinates, I eliminate samples with missing location data. However, when I am collapsing the data set to populations using genind2genpop I would like to keep the genetic data from all samples and ignore the NA values in the genind at other$utms slot. If even one sample in a population is missing utm coordinates, the population values are NA I want the population centroid returned to the genpop at other$utms slot to be the mean of the samples that DO have utm coordinates. The distances among populations are so large that any error introduced by a small number of missing locations will not affect my results. I know how to specify a basic function for other.action (e.g., other.action=mean), but I can't see how to specify something like na.remove, na.omit, or complete.cases that would calculate the mean ignoring missing data.. Another option would be to replace missing @other$utms values with mean values for the population based on complete cases from within the genind2genpop call, but that also does not appear to be possible. Is there an easy way to keep all my genetic observations and get a population centroids ignoring missing coordinates? ____________ Maile Neel Professor; Director of the Norton-Brown Herbarium University of Maryland Department of Plant Science and Landscape Architecture & Department of Entomology -------------- next part -------------- An HTML attachment was scrubbed... URL: From zkamvar at gmail.com Mon Nov 6 16:45:31 2017 From: zkamvar at gmail.com (Zhian Kamvar) Date: Mon, 6 Nov 2017 09:45:31 -0600 Subject: [adegenet-forum] Handling Missing Spatial Data in genind2genpop In-Reply-To: References: Message-ID: Hello Dr. Neel, You can use an anonymous function. I.E: other.action = function(x){ mean(x, na.rm = TRUE) } Hope that helps! Zhian ----- Zhian N. Kamvar, Ph. D. Postdoctoral Researcher (Everhart Lab) Department of Plant Pathology University of Nebraska-Lincoln ORCID: 0000-0003-1458-7108 > > Date: Sat, 4 Nov 2017 17:35:31 -0400 > From: Maile C Neel > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Handling Missing Spatial Data in > genind2genpop > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Apologies if this seemingly simple question has been answered before. I > have found nothing from extensive searches of manuals and adegenet archives. > > In my data set of 1,110+ individuals from 25 sites, a few individuals at 2 > sites lack GPS coordinates. For pairwise comparisons of individuals that > require coordinates, I eliminate samples with missing location data. > > However, when I am collapsing the data set to populations using > genind2genpop I would like to keep the genetic data from all samples and > ignore the NA values in the genind at other$utms slot. If even one sample in > a population is missing utm coordinates, the population values are NA > > I want the population centroid returned to the genpop at other$utms slot to be > the mean of the samples that DO have utm coordinates. The distances among > populations are so large that any error introduced by a small number of > missing locations will not affect my results. > > I know how to specify a basic function for other.action (e.g., > other.action=mean), but I can't see how to specify something like > na.remove, na.omit, or complete.cases that would calculate the mean > ignoring missing data.. Another option would be to replace missing > @other$utms values with mean values for the population based on complete > cases from within the genind2genpop call, but that also does not appear to > be possible. > > Is there an easy way to keep all my genetic observations and get a > population centroids ignoring missing coordinates? > > ____________ > Maile Neel > Professor; Director of the Norton-Brown Herbarium > University of Maryland > Department of Plant Science and Landscape Architecture & > Department of Entomology > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > End of adegenet-forum Digest, Vol 111, Issue 4 > ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From zkamvar at gmail.com Mon Nov 6 17:11:03 2017 From: zkamvar at gmail.com (Zhian Kamvar) Date: Mon, 6 Nov 2017 10:11:03 -0600 Subject: [adegenet-forum] newbie help: formatting xy data stored in genind object for use in a sPCA In-Reply-To: References: Message-ID: Hi Caroline, To format your spatial data: By default, spca will look in the "other" slot to find a data frame or matrix called "xy" that has numeric coordinates. You have a matrix called "X" in your other slot, which contains character data. To convert the last two columns to xy coordinates, there are three steps: 1) subset X to the last two columns, 2) save these as a matrix in the "other" slot called "xy", 3) convert xy to numeric: other(mydata)$xy <- other(mydata)$X[, 5:6] mode(other(mydata)$xy) <- "numeric" colnames(other(mydata)$xy) <- c("x", "y") To format your trait data: If you will be using any of the categorical traits to separate your data, you can add them as strata: strata(mydata) <- as.data.frame(other(mydata)$X[, 1:3]) # you can rename them with nameStrata() Otherwise, you can add the traits one by one to your "other" slot like so: other(mydata)$sex <- other(mydata)$X[, 1, drop = TRUE] ... I hope that helps, Zhian ----- Zhian N. Kamvar, Ph. D. Postdoctoral Researcher (Everhart Lab) Department of Plant Pathology University of Nebraska-Lincoln ORCID: 0000-0003-1458-7108 > Date: Fri, 3 Nov 2017 21:04:35 +0000 > From: "Judy (Duffie), Caroline" > To: "adegenet-forum at lists.r-forge.r-project.org" > > Subject: [adegenet-forum] newbie help: formatting xy data stored in > genind object for use in a sPCA > Message-ID: <66466C5A-F609-46E1-9DCD-3DE3387BA55C at si.edu> > Content-Type: text/plain; charset="utf-8" > > Dear Adegenet forum, > > I created a genind object using the ?read.structure? command. As you see in the code below - I specified the columns [,3:8] as ?other?. These original data contain four traits, and xy coordinates [,7:8]. After conversion to a genind object, the resulting structure of @other is a list with one object, $X, which is a 158 x 6 matrix. The columns represent the four traits and the x,y data stored in separate columns. > > How can I reformat $X to contain five objects: the four traits (each as a separate object), and an xy matrix as a single matrix with dimensions 158 x 2 for use in a downstream sPCA? > > Many thanks in advance for help. > Caroline > > ------------------------------------------------------ > >> mydata <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", > + n.ind=158, > + n.loc=6451, > + onerowperin=TRUE, > + col.lab=1, > + col.pop=2, > + col.others=3:8, > + row.marknames=0, > + ask=FALSE, > + ) > > Converting data from a STRUCTURE .stru file to a genind object... > >> mydata at other > $X > [,1] [,2] [,3] [,4] [,5] [,6] > 1 "female" "black" "1" "3.72" "18.02911667" "-76.3886" > 2 "female" "black" "1" "3.55" "18.02911667" "-76.3886" > 3 "male" "black" "1" "3.97" "18.02953333" "-76.38783333" > 4 "male" "black" "1" "3.55" "18.0337" "-76.38993333" > 5 "female" "black" "1" "3.735" "18.0337" "-76.38993333" > 6 "male" "black" "1" "3.95" "18.0295" "-76.38831667" > 7 "male" "black" "1" "3.99" "18.02953333" "-76.38783333" > > Caroline D. Judy > PhD Candidate (LSU) > Peter Buck Predoctoral Fellow (NMNH) > email: judyc at si.edu > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > End of adegenet-forum Digest, Vol 111, Issue 3 > ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From caroline.duffie at gmail.com Mon Nov 6 21:25:38 2017 From: caroline.duffie at gmail.com (Caroline Judy) Date: Mon, 6 Nov 2017 15:25:38 -0500 Subject: [adegenet-forum] help with sPCA basics: plotting a raster map with xy points overlaid Message-ID: Dear Adegenet forum, As a first step to the sPCA, I am trying to plot the xy points on the topo map using the following code similar to the example "rupica" in the SPCA tutorial that appears on page 27. plot(data$other$topo.jam) points(xy,col="red", pch=20) Both the xy points and the map of my study region are part of the Genind object I created, see details below. --------------- /// GENIND OBJECT ///////// // 158 individuals; 6,451 loci; 12,902 alleles; size: 10.7 Mb // Basic content @tab: 158 x 12902 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12902 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.structure(file = "~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", n.ind = 158, n.loc = 6451, onerowperind = TRUE, col.lab = 1, col.pop = 2, col.others = 3:8, row.marknames = 0, ask = FALSE) // Optional content @pop: population of each individual (group size range: 6-37) @strata: a data frame with 3 columns ( sex, phenotype, HI ) @other: a list containing: X xy topo topo.east topo.jam -------------------------------------------------------------------------------------- $topo.jam class : RasterLayer band : 1 (of 3 bands) dimensions : 1416, 2558, 3622128 (nrow, ncol, ncell) resolution : 0.0008333333, 0.0008333333 (x, y) extent : -78.27649, -76.14482, 17.48716, 18.66716 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 data source : /Users/carolineduffie/Documents/Trochilus/second_chapter/Analysis/PCA/sPCA/jamaica_elevation.tif names : jamaica_elevation values : 0, 255 (min, max) -------------- next part -------------- An HTML attachment was scrubbed... URL: From caroline.duffie at gmail.com Mon Nov 6 21:37:15 2017 From: caroline.duffie at gmail.com (Caroline Judy) Date: Mon, 6 Nov 2017 15:37:15 -0500 Subject: [adegenet-forum] help with sPCA basics: plotting the raster map with xy points Message-ID: Dear Adegenet forum, Sorry for the incomplete email I accidentally sent just a few minutes ago. Here's the full email: As a first step to the sPCA, I am trying to plot my xy points on the map of my study region similar to the example "rupica" in the SPCA tutorial that appears on page 27, using the following code: plot(data$other$topo.jam) points(xy,col="red", pch=20) I don't get an error message, but the points won't draw on the map. A pdf of the map is attached (sans plots). Is this a format issue with the raster file? I've tried running the code with the xy data and map stored outside of the genind object, to no avail. Thanks in advance for any assistance --------------- /// GENIND OBJECT ///////// // 158 individuals; 6,451 loci; 12,902 alleles; size: 10.7 Mb // Basic content @tab: 158 x 12902 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12902 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.structure(file = "~/Documents/Trochilus/second_ chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", n.ind = 158, n.loc = 6451, onerowperind = TRUE, col.lab = 1, col.pop = 2, col.others = 3:8, row.marknames = 0, ask = FALSE) // Optional content @pop: population of each individual (group size range: 6-37) @strata: a data frame with 3 columns ( sex, phenotype, HI ) @other: a list containing: X xy topo topo.east topo.jam ------------------------------------------------------------ -------------------------- $topo.jam class : RasterLayer band : 1 (of 3 bands) dimensions : 1416, 2558, 3622128 (nrow, ncol, ncell) resolution : 0.0008333333, 0.0008333333 (x, y) extent : -78.27649, -76.14482, 17.48716, 18.66716 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 data source : /Users/carolineduffie/Documents/Trochilus/second_ chapter/Analysis/PCA/sPCA/jamaica_elevation.tif names : jamaica_elevation values : 0, 255 (min, max) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map.pdf Type: application/pdf Size: 322686 bytes Desc: not available URL: From roman.lustrik at biolitika.si Tue Nov 7 09:21:42 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Tue, 7 Nov 2017 09:21:42 +0100 (CET) Subject: [adegenet-forum] help with sPCA basics: plotting the raster map with xy points In-Reply-To: References: Message-ID: <929137668.368346.1510042902893.JavaMail.zimbra@biolitika.si> You do not see any added points because the two sets of data are on a different scale. The loadings are in its own system and are not compatible with your map. See page 36 - points are plotted using function s.value using coordinate from other(rupica)$xy and using loadings of PCA to determine the size of the squares. Can you provide a small reproducible example? Cheers, Roman ---- In god we trust, all others bring data. > Zahtevaj IJZ na https://kurc.biolitika.si From: "Caroline Judy" To: adegenet-forum at lists.r-forge.r-project.org Sent: Monday, November 6, 2017 9:37:15 PM Subject: [adegenet-forum] help with sPCA basics: plotting the raster map with xy points Dear Adegenet forum, Sorry for the incomplete email I accidentally sent just a few minutes ago. Here's the full email: As a first step to the sPCA, I am trying to plot my xy points on the map of my study region similar to the example "rupica" in the SPCA tutorial that appears on page 27, using the following code: plot( data $ other $ topo.jam ) points( xy , col = "red" , pch = 20 ) I don't get an error message, but the points won't draw on the map. A pdf of the map is attached (sans plots). Is this a format issue with the raster file? I've tried running the code with the xy data and map stored outside of the genind object, to no avail. Thanks in advance for any assistance --------------- /// GENIND OBJECT ///////// // 158 individuals; 6,451 loci; 12,902 alleles; size: 10.7 Mb // Basic content @tab: 158 x 12902 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12902 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.structure(file = "~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", n.ind = 158, n.loc = 6451, onerowperind = TRUE, col.lab = 1, col.pop = 2, col.others = 3:8, row.marknames = 0, ask = FALSE) // Optional content @pop: population of each individual (group size range: 6-37) @strata: a data frame with 3 columns ( sex, phenotype, HI ) @other: a list containing: X xy topo topo.east topo.jam -------------------------------------------------------------------------------------- $topo.jam class : RasterLayer band : 1 (of 3 bands) dimensions : 1416, 2558, 3622128 (nrow, ncol, ncell) resolution : 0.0008333333, 0.0008333333 (x, y) extent : -78.27649, -76.14482, 17.48716, 18.66716 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 data source : /Users/carolineduffie/Documents/Trochilus/second_chapter/Analysis/PCA/sPCA/jamaica_elevation.tif names : jamaica_elevation values : 0, 255 (min, max) _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Fri Nov 10 13:15:23 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 10 Nov 2017 12:15:23 +0000 Subject: [adegenet-forum] help with sPCA basics: plotting a raster map with xy points overlaid In-Reply-To: References: Message-ID: Hello Err.. I can't see a question in your message. For the record you don't have to store your GIS layer in the @other slot, although it is possible. >From your code it doesn't seem 'xy' is defined, but other(data)$xy is. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 6 November 2017 at 20:25, Caroline Judy wrote: > Dear Adegenet forum, > > As a first step to the sPCA, I am trying to plot the xy points on the topo > map using the following code similar to the example "rupica" in the SPCA > tutorial that appears on page 27. > > plot(data$other$topo.jam) > > points(xy,col="red", pch=20) > > Both the xy points and the map of my study region are part of the Genind > object I created, see details below. > > > > --------------- > > /// GENIND OBJECT ///////// > > // 158 individuals; 6,451 loci; 12,902 alleles; size: 10.7 Mb > > // Basic content > @tab: 158 x 12902 matrix of allele counts > @loc.n.all: number of alleles per locus (range: 2-2) > @loc.fac: locus factor for the 12902 columns of @tab > @all.names: list of allele names for each locus > @ploidy: ploidy of each individual (range: 2-2) > @type: codom > @call: read.structure(file = > "~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", > n.ind = 158, n.loc = 6451, onerowperind = TRUE, col.lab = 1, > col.pop = 2, col.others = 3:8, row.marknames = 0, ask = FALSE) > > // Optional content > @pop: population of each individual (group size range: 6-37) > @strata: a data frame with 3 columns ( sex, phenotype, HI ) > @other: a list containing: X xy topo topo.east topo.jam > -------------------------------------------------------------------------------------- > $topo.jam > class : RasterLayer > band : 1 (of 3 bands) > dimensions : 1416, 2558, 3622128 (nrow, ncol, ncell) > resolution : 0.0008333333, 0.0008333333 (x, y) > extent : -78.27649, -76.14482, 17.48716, 18.66716 (xmin, xmax, ymin, > ymax) > coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 > +towgs84=0,0,0 > data source : > /Users/carolineduffie/Documents/Trochilus/second_chapter/Analysis/PCA/sPCA/jamaica_elevation.tif > names : jamaica_elevation > values : 0, 255 (min, max) > > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From mmorales at williams.edu Fri Nov 10 17:20:05 2017 From: mmorales at williams.edu (Manuel A. Morales) Date: Fri, 10 Nov 2017 11:20:05 -0500 Subject: [adegenet-forum] Fstat vs pairwise.fst for 2 populations Message-ID: <1510330805.2758.12.camel@williams.edu> I may have a fundamental misunderstanding of what's happening but it seems to me that the functions Fst() and pairwise.fst() should give the same value for the case of two populations, which they do not. A reproducible example: data(nancycats) obj1 <- seppop(nancycats)$P01 obj2 <- seppop(nancycats)$P02 obj3 <- repool(obj1, obj2) fstat(obj3) pairwise.fst(obj3) And output: > fstat(obj3) ????????????pop???????Ind Total 0.1307741 0.2804306 pop???0.0000000 0.1721722 > pairwise.fst(obj3) ?????????1 2 0.080185 Any help would be very much appreciated. Best, Manuel From JudyC at si.edu Tue Nov 7 18:13:11 2017 From: JudyC at si.edu (Judy (Duffie), Caroline) Date: Tue, 7 Nov 2017 17:13:11 +0000 Subject: [adegenet-forum] genind object too big for sPCA? Message-ID: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> Hi all, I?m having trouble running an sPCA on a genind object (10.6Mb) that contains about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE)? R crashes - i.e. I get the ?whirling ball of death? and the program becomes unresponsive. I?ve seen some older messages on the forum that similarly report problems with larger genind objects, but responses indicate that there shouldn?t be a memory issue (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html). I?m running on a MBP with 16 GB of memory. Any tips or tricks for running an object of this size? Interestingly I?ve been able to run a PCA and DAPC without issue. #Convert structure file to a genind object. > data <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", + n.ind=158, + n.loc=6451, + onerowperin=TRUE, + col.lab=1, + col.pop=2, + col.others=3:8, + row.marknames=0, + ask=FALSE, + ) Converting data from a STRUCTURE .stru file to a genind object... > #add xy data as a separate element in the list $other > other(data)$xy <- other(data)$X[, 5:6] > mode(other(data)$xy) <- "numeric" > colnames(other(data)$xy) <- c("x", "y") > #define strata > strata(data) <- as.data.frame(other(data)$X[, 1:3]) > nameStrata(data) <-c("sex","phenotype", "HI") > > # add jitter > data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL) > mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE) Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From JudyC at si.edu Wed Nov 8 15:50:06 2017 From: JudyC at si.edu (Judy (Duffie), Caroline) Date: Wed, 8 Nov 2017 14:50:06 +0000 Subject: [adegenet-forum] genind object too big for sPCA? In-Reply-To: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> References: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> Message-ID: Update - I tried running the same script on a computer with 64 GB of memory. Same issues. On Nov 7, 2017, at 12:13 PM, Judy (Duffie), Caroline > wrote: Hi all, I?m having trouble running an sPCA on a genind object (10.6Mb) that contains about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE)? R crashes - i.e. I get the ?whirling ball of death? and the program becomes unresponsive. I?ve seen some older messages on the forum that similarly report problems with larger genind objects, but responses indicate that there shouldn?t be a memory issue (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html). I?m running on a MBP with 16 GB of memory. Any tips or tricks for running an object of this size? Interestingly I?ve been able to run a PCA and DAPC without issue. #Convert structure file to a genind object. > data <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", + n.ind=158, + n.loc=6451, + onerowperin=TRUE, + col.lab=1, + col.pop=2, + col.others=3:8, + row.marknames=0, + ask=FALSE, + ) Converting data from a STRUCTURE .stru file to a genind object... > #add xy data as a separate element in the list $other > other(data)$xy <- other(data)$X[, 5:6] > mode(other(data)$xy) <- "numeric" > colnames(other(data)$xy) <- c("x", "y") > #define strata > strata(data) <- as.data.frame(other(data)$X[, 1:3]) > nameStrata(data) <-c("sex","phenotype", "HI") > > # add jitter > data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL) > mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE) Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From JudyC at si.edu Wed Nov 8 19:06:05 2017 From: JudyC at si.edu (Judy (Duffie), Caroline) Date: Wed, 8 Nov 2017 18:06:05 +0000 Subject: [adegenet-forum] genind object too big for sPCA? In-Reply-To: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> References: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> Message-ID: Update - I tried running the same script on a computer with 64 GB of memory. Same issues. On Nov 7, 2017, at 12:13 PM, Judy (Duffie), Caroline > wrote: Hi all, I?m having trouble running an sPCA on a genind object (10.6Mb) that contains about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE)? R crashes - i.e. I get the ?whirling ball of death? and the program becomes unresponsive. I?ve seen some older messages on the forum that similarly report problems with larger genind objects, but responses indicate that there shouldn?t be a memory issue (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html). I?m running on a MBP with 16 GB of memory. Any tips or tricks for running an object of this size? Interestingly I?ve been able to run a PCA and DAPC without issue. #Convert structure file to a genind object. > data <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", + n.ind=158, + n.loc=6451, + onerowperin=TRUE, + col.lab=1, + col.pop=2, + col.others=3:8, + row.marknames=0, + ask=FALSE, + ) Converting data from a STRUCTURE .stru file to a genind object... > #add xy data as a separate element in the list $other > other(data)$xy <- other(data)$X[, 5:6] > mode(other(data)$xy) <- "numeric" > colnames(other(data)$xy) <- c("x", "y") > #define strata > strata(data) <- as.data.frame(other(data)$X[, 1:3]) > nameStrata(data) <-c("sex","phenotype", "HI") > > # add jitter > data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL) > mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE) Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From hvh22 at cam.ac.uk Thu Nov 16 18:21:09 2017 From: hvh22 at cam.ac.uk (Harriet Hunt) Date: Thu, 16 Nov 2017 17:21:09 +0000 Subject: [adegenet-forum] Reading in data from Stacks output files Message-ID: Hi Thibault et al, I am trying to read in a SNP data set outputted from Julian Catchen's Stacks program for downstream multivariate analyses (PCAs, genetic distance measures, etc.) I have tried converting both the Structure file format and vcf format but they don't seem to be giving the same genind results - there are 2136 alleles (1068 loci, diploid) in the genind converted from the structure file but 3592 alleles in the genind converted from the vcf file. Some of this is done using the package SNPstats rather than adegenet but maybe someone can answer the question anyway? I would like to know if there is an error in my code which means I get these conflicting results. Or is it just the way data is coded in vcf? My code is: matrix98str <- read.structure("98percent.str", n.ind=371, n.loc=1068, onerowperind = FALSE, col.lab=1, col.pop=2, row.marknames=1, NA.char=0) vcf <- readVcf("98percent.vcf") library("snpStats") matrix98vcf <- genotypeToSnpMatrix(vcf) matrix98vcfSNPs <- df2genind(matrix98vcf$genotypes, ploidy=2, sep="/", ind.names=rownames(matrix98vcf$genotypes), loc.names=colnames(matrix98vcf$genotypes), NA.char=NA) and then I am comparing the 2 genind objects matrix98str and matrix98vcfSNPs. Thanks for any help! Harriet -- Dr Harriet Hunt Research Associate McDonald Institute for Archaeological Research University of Cambridge Downing Street Cambridge CB2 3ER UK Tel: +44 (0)1223 339330 e-mail: hvh22 at cam.ac.uk From roman.lustrik at biolitika.si Fri Nov 17 10:31:24 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Fri, 17 Nov 2017 10:31:24 +0100 (CET) Subject: [adegenet-forum] Reading in data from Stacks output files In-Reply-To: References: Message-ID: <332087209.409661.1510911084759.JavaMail.zimbra@biolitika.si> Can you also share the files? It's hard to guess by code alone and I (we?) don't like hard. Cheers, Roman ---- In god we trust, all others bring data. > Zahtevaj IJZ na https://kurc.biolitika.si ----- Original Message ----- From: "Harriet Hunt" To: adegenet-forum at lists.r-forge.r-project.org Sent: Thursday, November 16, 2017 6:21:09 PM Subject: [adegenet-forum] Reading in data from Stacks output files Hi Thibault et al, I am trying to read in a SNP data set outputted from Julian Catchen's Stacks program for downstream multivariate analyses (PCAs, genetic distance measures, etc.) I have tried converting both the Structure file format and vcf format but they don't seem to be giving the same genind results - there are 2136 alleles (1068 loci, diploid) in the genind converted from the structure file but 3592 alleles in the genind converted from the vcf file. Some of this is done using the package SNPstats rather than adegenet but maybe someone can answer the question anyway? I would like to know if there is an error in my code which means I get these conflicting results. Or is it just the way data is coded in vcf? My code is: matrix98str <- read.structure("98percent.str", n.ind=371, n.loc=1068, onerowperind = FALSE, col.lab=1, col.pop=2, row.marknames=1, NA.char=0) vcf <- readVcf("98percent.vcf") library("snpStats") matrix98vcf <- genotypeToSnpMatrix(vcf) matrix98vcfSNPs <- df2genind(matrix98vcf$genotypes, ploidy=2, sep="/", ind.names=rownames(matrix98vcf$genotypes), loc.names=colnames(matrix98vcf$genotypes), NA.char=NA) and then I am comparing the 2 genind objects matrix98str and matrix98vcfSNPs. Thanks for any help! Harriet -- Dr Harriet Hunt Research Associate McDonald Institute for Archaeological Research University of Cambridge Downing Street Cambridge CB2 3ER UK Tel: +44 (0)1223 339330 e-mail: hvh22 at cam.ac.uk _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From roman.lustrik at biolitika.si Fri Nov 17 10:33:45 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Fri, 17 Nov 2017 10:33:45 +0100 (CET) Subject: [adegenet-forum] genind object too big for sPCA? In-Reply-To: References: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> Message-ID: <977500053.409673.1510911225675.JavaMail.zimbra@biolitika.si> Can you check the memory usage? Is it consuming all the RAM? If not, they it's probably not a memory issue and the culprit is somewhere else. Have you tried running the analysis on a subset of data? Cheers, Roman ---- In god we trust, all others bring data. > Zahtevaj IJZ na https://kurc.biolitika.si From: "Judy (Duffie), Caroline" To: adegenet-forum at lists.r-forge.r-project.org Sent: Wednesday, November 8, 2017 7:06:05 PM Subject: Re: [adegenet-forum] genind object too big for sPCA? Update - I tried running the same script on a computer with 64 GB of memory. Same issues. On Nov 7, 2017, at 12:13 PM, Judy (Duffie), Caroline < JudyC at si.edu > wrote: Hi all, I?m having trouble running an sPCA on a genind object (10.6Mb) that contains about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE)? R crashes - i.e. I get the ?whirling ball of death? and the program becomes unresponsive. I?ve seen some older messages on the forum that similarly report problems with larger genind objects, but responses indicate that there shouldn?t be a memory issue ( http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html ). I?m running on a MBP with 16 GB of memory. Any tips or tricks for running an object of this size? Interestingly I?ve been able to run a PCA and DAPC without issue. #Convert structure file to a genind object. > data <- read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", + n.ind=158, + n.loc=6451, + onerowperin=TRUE, + col.lab=1, + col.pop=2, + col.others=3:8, + row.marknames=0, + ask=FALSE, + ) Converting data from a STRUCTURE .stru file to a genind object... > #add xy data as a separate element in the list $other > other(data)$xy <- other(data)$X[, 5:6] > mode(other(data)$xy) <- "numeric" > colnames(other(data)$xy) <- c("x", "y") > #define strata > strata(data) <- as.data.frame(other(data)$X[, 1:3]) > nameStrata(data) <-c("sex","phenotype", "HI") > > # add jitter > data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL) > mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE) Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu Caroline D. Judy PhD Candidate (LSU) Peter Buck Predoctoral Fellow (NMNH) email: judyc at si.edu _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Fri Nov 17 12:48:44 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 17 Nov 2017 11:48:44 +0000 Subject: [adegenet-forum] Fstat vs pairwise.fst for 2 populations In-Reply-To: <1510330805.2758.12.camel@williams.edu> References: <1510330805.2758.12.camel@williams.edu> Message-ID: Hi Manuel I think this has been discussed already on the hierfstat issues. Best check and report this there. It may be a bug, or different estimators being used (inc. different group weightings), but I don't have time to check this now. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 10 November 2017 at 16:20, Manuel A. Morales wrote: > I may have a fundamental misunderstanding of what's happening but it > seems to me that the functions Fst() and pairwise.fst() should give the > same value for the case of two populations, which they do not. > > A reproducible example: > data(nancycats) > obj1 <- seppop(nancycats)$P01 > obj2 <- seppop(nancycats)$P02 > obj3 <- repool(obj1, obj2) > fstat(obj3) > pairwise.fst(obj3) > > And output: >> fstat(obj3) > pop Ind > Total 0.1307741 0.2804306 > pop 0.0000000 0.1721722 >> pairwise.fst(obj3) > 1 > 2 0.080185 > > Any help would be very much appreciated. > > Best, > Manuel > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From thibautjombart at gmail.com Fri Nov 17 14:56:46 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 17 Nov 2017 13:56:46 +0000 Subject: [adegenet-forum] genind object too big for sPCA? In-Reply-To: <977500053.409673.1510911225675.JavaMail.zimbra@biolitika.si> References: <02BC6CA7-D4B6-4BC9-901D-AEF27CAB7A3C@si.edu> <977500053.409673.1510911225675.JavaMail.zimbra@biolitika.si> Message-ID: Hello, this probably comes from the fact that you have to run the eigenanalysis on a large matrix - I would guess around 13,000 x 13,000 in this case. Unlike regular PCA, sPCA cannot diagonalise in the smallest dimension (# indiv / # alleles). On quick solution for this would be reduce the number of alleles, e.g. by keeping alleles which are contributors (e.g. squared loadings > .01 or 0.5) in the first xxx axes of a PCA. Otherwise the same trick used in DAPC can be used in sPCA: run the analysis through a PCA first, then run the sPCA on the principal components. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 17 November 2017 at 09:33, Roman Lu?trik wrote: > Can you check the memory usage? Is it consuming all the RAM? If not, they > it's probably not a memory issue and the culprit is somewhere else. Have you > tried running the analysis on a subset of data? > > Cheers, > Roman > > > ---- > In god we trust, all others bring data. >> Zahtevaj IJZ na https://kurc.biolitika.si > > ________________________________ > From: "Judy (Duffie), Caroline" > To: adegenet-forum at lists.r-forge.r-project.org > Sent: Wednesday, November 8, 2017 7:06:05 PM > Subject: Re: [adegenet-forum] genind object too big for sPCA? > > Update - I tried running the same script on a computer with 64 GB of memory. > Same issues. > > On Nov 7, 2017, at 12:13 PM, Judy (Duffie), Caroline wrote: > > Hi all, > > I?m having trouble running an sPCA on a genind object (10.6Mb) that contains > about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <- > spca(data, ask=FALSE, type=1, scannf=FALSE)? R crashes - i.e. I get the > ?whirling ball of death? and the program becomes unresponsive. > > I?ve seen some older messages on the forum that similarly report problems > with larger genind objects, but responses indicate that there shouldn?t be a > memory issue > (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html). > I?m running on a MBP with 16 GB of memory. > > Any tips or tricks for running an object of this size? Interestingly I?ve > been able to run a PCA and DAPC without issue. > > #Convert structure file to a genind object. >> data <- >> read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru", > + n.ind=158, > + n.loc=6451, > + onerowperin=TRUE, > + col.lab=1, > + col.pop=2, > + col.others=3:8, > + row.marknames=0, > + ask=FALSE, > + ) > > Converting data from a STRUCTURE .stru file to a genind object... > >> #add xy data as a separate element in the list $other >> other(data)$xy <- other(data)$X[, 5:6] >> mode(other(data)$xy) <- "numeric" >> colnames(other(data)$xy) <- c("x", "y") >> #define strata >> strata(data) <- as.data.frame(other(data)$X[, 1:3]) >> nameStrata(data) <-c("sex","phenotype", "HI") >> >> # add jitter >> data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL) > >> mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE) > > > Caroline D. Judy > PhD Candidate (LSU) > Peter Buck Predoctoral Fellow (NMNH) > email: judyc at si.edu > > > > > Caroline D. Judy > PhD Candidate (LSU) > Peter Buck Predoctoral Fellow (NMNH) > email: judyc at si.edu > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From rob.syme at gmail.com Mon Nov 20 10:41:11 2017 From: rob.syme at gmail.com (Rob Syme) Date: Mon, 20 Nov 2017 09:41:11 +0000 Subject: [adegenet-forum] Question about seploc error: "Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent" Message-ID: In trying to run seploc on a genind object (output from gl.read.silicodart from dartR v0.93), I get the error: >is.genind(gl) [1] TRUE > seploc(gl) Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent 10. `colnames<-`(`*tmp*`, value = seq(ncol(tab))) 9. .local(.Object, ...) 8. initialize(value, ...) 7. initialize(value, ...) 6. new("genind", ...) 5. FUN(X[[i]], ...) 4. lapply(kX, genind, pop = x at pop, prevcall = prevcall, ploidy = x at ploidy, type = x at type) 3. .local(x, ...) 2. seploc(gl) 1. seploc(gl) Is there any way we can track down what peculiarity of our input data is causing the error? Thanks Rob Syme Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-pierre.peros at inra.fr Mon Nov 20 13:46:23 2017 From: jean-pierre.peros at inra.fr (Jean-Pierre Peros) Date: Mon, 20 Nov 2017 12:46:23 +0000 Subject: [adegenet-forum] sPCA and cpDNA data Message-ID: <21CE7A57-D2A9-4958-888B-E0B375306474@inra.fr> Hello, I am woundering if it appropriate to analyse with sPCA chloroplast microsatellite data because of the complete linkage between these markers. I have three data sets (three different species) with 15 loci and between 30 to 58 alleles per set of 80 to 200 individuals. These objects provided sPCA results that confirmed nuclear data (SSR and SNP). Thanks for your advice and best regards, Jean-Pierre *Veuillez noter ma nouvelle adresse mail : jean-pierre.peros at inra.fr* Jean-Pierre P?ros Directeur de recherche, INRA, Centre Occitanie-Montpellier UMR AGAP, Responsable Equipe DAAV: ? Diversit?, Adaptation et Am?lioration de la Vigne ? 2, place Viala 34060 Montpellier cedex 1 T?l: 33-(0)4-99-61-20-26 Fax: 33-(0)4-99-61-20-64 http://www.montpellier.inra.fr From sheenatalma at gmail.com Tue Nov 21 10:49:36 2017 From: sheenatalma at gmail.com (sheena talma) Date: Tue, 21 Nov 2017 11:49:36 +0200 Subject: [adegenet-forum] Removing Non-biallelic samples Message-ID: <8F2E6BCC-85AA-4DFC-ACF0-1454F1B9A2C0@gmail.com> Hi, I am new to R and programming. Does any one know whether there is a way to remove non- biallelic SNPs using R? Thanks Sheena From zkamvar at gmail.com Tue Nov 21 16:29:23 2017 From: zkamvar at gmail.com (Zhian Kamvar) Date: Tue, 21 Nov 2017 09:29:23 -0600 Subject: [adegenet-forum] Removing Non-biallelic samples In-Reply-To: References: Message-ID: Hi Sheena, The solution to your question is the following (change "dat" to whatever you named your genind): loci_to_keep <- nAll(dat) < 3 trimmed_dat <- dat[loc = loci_to_keep] The reason why this works: nAll() gives you the number of alleles per locus in your genind object. The vector loci_to_keep is a logical vector specifying which loci have fewer than three alleles. When used with the loc subsettor for the genind object, it will remove all the loci where the corresponding entry in loci_to_keep is FALSE. Hope that helps, Zhian ----- Zhian N. Kamvar, Ph. D. Postdoctoral Researcher (Everhart Lab) Department of Plant Pathology University of Nebraska-Lincoln ORCID: 0000-0003-1458-7108 > > Date: Tue, 21 Nov 2017 11:49:36 +0200 > From: sheena talma > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Removing Non-biallelic samples > Message-ID: <8F2E6BCC-85AA-4DFC-ACF0-1454F1B9A2C0 at gmail.com> > Content-Type: text/plain; charset=us-ascii > > Hi, > > I am new to R and programming. > > Does any one know whether there is a way to remove non- biallelic SNPs using R? > > Thanks > > Sheena > > ------------------------------ > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From leplat.florian at gmail.com Tue Nov 21 19:45:25 2017 From: leplat.florian at gmail.com (Florian Leplat) Date: Tue, 21 Nov 2017 19:45:25 +0100 Subject: [adegenet-forum] Grouping crossvalidation DAPC Message-ID: Dear Adegenet user, I was previously using adegenet package for genetic applications. Since I started to use the DAPC function. The methods is very informative. However there are some details that I would like to understand better. The main objective of my use is to group a certain number of plant genotype regarding to their genotypic background. Each genotype are homozygous lines. I have no prerequisite information for any genotype. Therefore the main idea is to attribute a ?group number? for each of my plants. I does work fine with a good comprehensiveness regarding to the information that I have for each plants (origin, parents...) However, one of the limitations that I face is the repeatability of the grouping. If running several time, a certain number of plant will be attributed to a different group. Is there a cross-validation procedure at this step in order to look at the percentage of plant always grouped together for each run ? Furthermore, even if my grouping are quite consistent ?group wise? from one run to the other, the grouping number will change. Is there a way to solve that (for instance give a group number to the founder genotype of our population) ? Then I have a second issue. After the first step which define my groups, I would like to plot hybrids genotypes which should (could) be an admixture between 2 groups. Therefore I cannot use them to build my groups as they add some noise to the model. I wanted to use the procedure described in the tutorial using supplementary individuals. I realized that it only works if the supplementary individuals have already grouping information, however I don?t have this information. Indeed I only want to see (mostly visually) where the hybrids are positioned in relation to the other groups that I previously defined. Is there a way to use the script to do that ? Thanks in advance for your help. Best regards. *Flo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From patriciacruzgguedes at gmail.com Fri Nov 24 12:36:22 2017 From: patriciacruzgguedes at gmail.com (Patricia Guedes) Date: Fri, 24 Nov 2017 11:36:22 +0000 Subject: [adegenet-forum] Post a question Message-ID: Hi, I'd like to post a question in the adegenet forum. It is not exactly related to coding but I don't know where to ask. If you know of a better place to ask this question please redirect me. I'm performing a DAPC in R, using the adegenet package. I only have 84 individuals, and I'm using a gening object with about 5000 SNPs to try and see if there's genetic diference between sample sites (4). I'm using the grp function to find how many clusters I should use. However, when I run de grp function, if I choose to keep 84 PCs the best BIC indicates 2 clusters. If I choose to keep 28 PCs, the BIC indicates 4 clusters. Why does this happen? If I'm using more PC's and adding more information shouldn't that give me more clusters? Thanks! Best wishes, Patr?cia Guedes -------------- next part -------------- An HTML attachment was scrubbed... URL: