From jphill01 at uoguelph.ca Fri Aug 17 15:16:47 2018 From: jphill01 at uoguelph.ca (Jarrett Phillips) Date: Fri, 17 Aug 2018 13:16:47 +0000 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() Message-ID: My inquiry concerns seqTrack(), which employs haploGen() to simulate a haplotype genealogy through time. haploGen() generates DNA sequences under a Kimura-2-Parameter substitution model with equal base frequencies and differing transition/transversion rates. My question is: how can we be certain that the sequences really do conform to a K2P model? Since these are simulated data, it would be difficult to choose the most parsimonious model based on the BIC (using MEGA software for instance), as one is required to select the appropriate codon table (e.g., standard vs. mitochondrial), to ensure proper placement of STOP codons. Trying this approach for each of standard and mitochondrial codon tables seems to give conflicting results as to the best substitution model. Could someone shed some light on this? I would appreciate any insight one might be able to offer. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Mon Aug 20 15:27:30 2018 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 20 Aug 2018 14:27:30 +0100 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() In-Reply-To: References: Message-ID: Hi Jarrett if this is something you want to use for reconstructing transmission trees, I would recommend using a more advanced method such as outbreaker2 ( http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very crude first pass at that problem. As for the substitution model, in outbreak reconstruction is usually doesn't matter. But if you are doing phylogenetic reconstruction then I would use the usual ML / likelihood ratio tests to compare different models in phangorn. AIC / BIC should hopefully give similar results. Best Thibaut -- Dr Thibaut Jombart Senior Lecturer in Genetic Analysis, Imperial College London Associate Professor in Outbreak Analytics, London School of Hygiene and Tropical Medicine Head of RECON: repidemicsconsortium.org https://thibautjombart.netlify.com Twitter: @TeebzR +44(0)20 7594 3658 On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips wrote: > My inquiry concerns seqTrack(), which employs haploGen() to simulate a > haplotype genealogy through time. > > haploGen() generates DNA sequences under a Kimura-2-Parameter substitution > model with equal base frequencies and differing transition/transversion > rates. > > *My question is*: how can we be certain that the sequences really > do conform to a K2P model? Since these are simulated data, it would be > difficult to choose the most parsimonious model based on the BIC (using > MEGA software for instance), as one is required to select the appropriate > codon table (e.g., standard vs. mitochondrial), to ensure proper placement > of STOP codons. > > Trying this approach for each of standard and mitochondrial codon tables > seems to give conflicting results as to the best substitution model. > > Could someone shed some light on this? I would appreciate any insight > one might be able to offer. > > Thanks. > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From jphill01 at uoguelph.ca Mon Aug 20 20:53:22 2018 From: jphill01 at uoguelph.ca (Jarrett Phillips) Date: Mon, 20 Aug 2018 18:53:22 +0000 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() In-Reply-To: References: , Message-ID: Thanks for the reply Thibaut. I am actually just looking to generate some DNA sequences according to a given model of nucleotide substitution. I opened a GitHub issue for the adegenet repository a few weeks ago, which was handled by Emmanuel Paradis, expressing my interest in being able to simulate DNA sequences without having to specify a phylogenetic tree as input. As I understand, the majority of R packages require a tree in order to simulate DNA sequences along branches. haploGen/seqTrack appears to be the only such R function that generates DNA sequences automatically, given a pre-specified mutation rate of transitions. I do have a followup question, as I am tweaking your implemented code to handle other evolutionary models: How exactly is the number of DNA sequences to be contained in each haplotype specified? Is this done with the repro() function in haploGen? That is, is the number of DNA sequences for each haplotype generated according to a Poisson(lambda = 1.5) distribution? Thanks. - Jarrett ________________________________ From: Thibaut Jombart Sent: Monday, August 20, 2018 9:27:30 AM To: Jarrett Phillips Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() Hi Jarrett if this is something you want to use for reconstructing transmission trees, I would recommend using a more advanced method such as outbreaker2 (http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very crude first pass at that problem. As for the substitution model, in outbreak reconstruction is usually doesn't matter. But if you are doing phylogenetic reconstruction then I would use the usual ML / likelihood ratio tests to compare different models in phangorn. AIC / BIC should hopefully give similar results. Best Thibaut -- Dr Thibaut Jombart Senior Lecturer in Genetic Analysis, Imperial College London Associate Professor in Outbreak Analytics, London School of Hygiene and Tropical Medicine Head of RECON: repidemicsconsortium.org https://thibautjombart.netlify.com Twitter: @TeebzR +44(0)20 7594 3658 On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips > wrote: My inquiry concerns seqTrack(), which employs haploGen() to simulate a haplotype genealogy through time. haploGen() generates DNA sequences under a Kimura-2-Parameter substitution model with equal base frequencies and differing transition/transversion rates. My question is: how can we be certain that the sequences really do conform to a K2P model? Since these are simulated data, it would be difficult to choose the most parsimonious model based on the BIC (using MEGA software for instance), as one is required to select the appropriate codon table (e.g., standard vs. mitochondrial), to ensure proper placement of STOP codons. Trying this approach for each of standard and mitochondrial codon tables seems to give conflicting results as to the best substitution model. Could someone shed some light on this? I would appreciate any insight one might be able to offer. Thanks. _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Wed Aug 22 13:27:24 2018 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Wed, 22 Aug 2018 12:27:24 +0100 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() In-Reply-To: References: Message-ID: I am not sure I understand the question. Each haplotype is, by definition, one sequence. I assume we are talking about the number of descendants for a given haplotype in the simulation. If so, you are right, the number of descendants is determined by the function 'repro', which defaults to: function(){rpois(1,1.5)} Then for each descendant: - the time of evolution since the ancestral sequence is determined - transitions and transversions are determined - resulting sequences are output Does it make sense? Best Thibaut On Mon, 20 Aug 2018 at 19:53, Jarrett Phillips wrote: > Thanks for the reply Thibaut. > > > I am actually just looking to generate some DNA sequences according to a > given model of nucleotide substitution. > > > I opened a GitHub issue for the adegenet repository a few weeks ago, which > was handled by Emmanuel Paradis, expressing my interest in being able to > simulate DNA sequences without having to specify a phylogenetic tree as > input. As I understand, the majority of R packages require a tree in order > to simulate DNA sequences along branches. > > > haploGen/seqTrack appears to be the only such R function that generates > DNA sequences automatically, given a pre-specified mutation rate of > transitions. > > > I do have a followup question, as I am tweaking your implemented code to > handle other evolutionary models: > > > How exactly is the number of DNA sequences to be contained in each > haplotype specified? Is this done with the repro() function in haploGen? > That is, is the number of DNA sequences for each haplotype generated > according to a > > Poisson(lambda = 1.5) distribution? > > > Thanks. > > > - Jarrett > > > ------------------------------ > *From:* Thibaut Jombart > *Sent:* Monday, August 20, 2018 9:27:30 AM > *To:* Jarrett Phillips > *Cc:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* Re: [adegenet-forum] Nucleotide substitution model used by > haploGen()/SeqTrack() > > Hi Jarrett > > if this is something you want to use for reconstructing transmission > trees, I would recommend using a more advanced method such as outbreaker2 ( > http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very > crude first pass at that problem. > > As for the substitution model, in outbreak reconstruction is usually > doesn't matter. But if you are doing phylogenetic reconstruction then I > would use the usual ML / likelihood ratio tests to compare different models > in phangorn. AIC / BIC should hopefully give similar results. > > Best > Thibaut > > -- > Dr Thibaut Jombart > Senior Lecturer in Genetic Analysis, Imperial College London > Associate Professor in Outbreak Analytics, London School of Hygiene and > Tropical Medicine > Head of RECON: repidemicsconsortium.org > https://thibautjombart.netlify.com > Twitter: @TeebzR > +44(0)20 7594 3658 > > > On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips > wrote: > > My inquiry concerns seqTrack(), which employs haploGen() to simulate a > haplotype genealogy through time. > > haploGen() generates DNA sequences under a Kimura-2-Parameter substitution > model with equal base frequencies and differing transition/transversion > rates. > > *My question is*: how can we be certain that the sequences really > do conform to a K2P model? Since these are simulated data, it would be > difficult to choose the most parsimonious model based on the BIC (using > MEGA software for instance), as one is required to select the appropriate > codon table (e.g., standard vs. mitochondrial), to ensure proper placement > of STOP codons. > > Trying this approach for each of standard and mitochondrial codon tables > seems to give conflicting results as to the best substitution model. > > Could someone shed some light on this? I would appreciate any insight > one might be able to offer. > > Thanks. > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From jphill01 at uoguelph.ca Thu Aug 23 17:50:26 2018 From: jphill01 at uoguelph.ca (Jarrett Phillips) Date: Thu, 23 Aug 2018 15:50:26 +0000 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() In-Reply-To: References: , Message-ID: Thanks Thibaut, this make sense. Best, - Jarrett ________________________________ From: Thibaut Jombart Sent: Wednesday, August 22, 2018 7:27:24 AM To: Jarrett Phillips Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() I am not sure I understand the question. Each haplotype is, by definition, one sequence. I assume we are talking about the number of descendants for a given haplotype in the simulation. If so, you are right, the number of descendants is determined by the function 'repro', which defaults to: function(){rpois(1,1.5)} Then for each descendant: - the time of evolution since the ancestral sequence is determined - transitions and transversions are determined - resulting sequences are output Does it make sense? Best Thibaut On Mon, 20 Aug 2018 at 19:53, Jarrett Phillips > wrote: Thanks for the reply Thibaut. I am actually just looking to generate some DNA sequences according to a given model of nucleotide substitution. I opened a GitHub issue for the adegenet repository a few weeks ago, which was handled by Emmanuel Paradis, expressing my interest in being able to simulate DNA sequences without having to specify a phylogenetic tree as input. As I understand, the majority of R packages require a tree in order to simulate DNA sequences along branches. haploGen/seqTrack appears to be the only such R function that generates DNA sequences automatically, given a pre-specified mutation rate of transitions. I do have a followup question, as I am tweaking your implemented code to handle other evolutionary models: How exactly is the number of DNA sequences to be contained in each haplotype specified? Is this done with the repro() function in haploGen? That is, is the number of DNA sequences for each haplotype generated according to a Poisson(lambda = 1.5) distribution? Thanks. - Jarrett ________________________________ From: Thibaut Jombart > Sent: Monday, August 20, 2018 9:27:30 AM To: Jarrett Phillips Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() Hi Jarrett if this is something you want to use for reconstructing transmission trees, I would recommend using a more advanced method such as outbreaker2 (http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very crude first pass at that problem. As for the substitution model, in outbreak reconstruction is usually doesn't matter. But if you are doing phylogenetic reconstruction then I would use the usual ML / likelihood ratio tests to compare different models in phangorn. AIC / BIC should hopefully give similar results. Best Thibaut -- Dr Thibaut Jombart Senior Lecturer in Genetic Analysis, Imperial College London Associate Professor in Outbreak Analytics, London School of Hygiene and Tropical Medicine Head of RECON: repidemicsconsortium.org https://thibautjombart.netlify.com Twitter: @TeebzR +44(0)20 7594 3658 On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips > wrote: My inquiry concerns seqTrack(), which employs haploGen() to simulate a haplotype genealogy through time. haploGen() generates DNA sequences under a Kimura-2-Parameter substitution model with equal base frequencies and differing transition/transversion rates. My question is: how can we be certain that the sequences really do conform to a K2P model? Since these are simulated data, it would be difficult to choose the most parsimonious model based on the BIC (using MEGA software for instance), as one is required to select the appropriate codon table (e.g., standard vs. mitochondrial), to ensure proper placement of STOP codons. Trying this approach for each of standard and mitochondrial codon tables seems to give conflicting results as to the best substitution model. Could someone shed some light on this? I would appreciate any insight one might be able to offer. Thanks. _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: