[adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()

Jarrett Phillips jphill01 at uoguelph.ca
Thu Aug 23 17:50:26 CEST 2018


Thanks Thibaut, this make sense.


Best,


- Jarrett

________________________________
From: Thibaut Jombart <thibautjombart at gmail.com>
Sent: Wednesday, August 22, 2018 7:27:24 AM
To: Jarrett Phillips
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()

I am not sure I understand the question. Each haplotype is, by definition, one sequence. I assume we are talking about the number of descendants for a given haplotype in the simulation. If so, you are right, the number of descendants is determined by the function 'repro', which defaults to:
function(){rpois(1,1.5)}

Then for each descendant:
- the time of evolution since the ancestral sequence is determined
- transitions and transversions are determined
- resulting sequences are output

Does it make sense?
Best
Thibaut



On Mon, 20 Aug 2018 at 19:53, Jarrett Phillips <jphill01 at uoguelph.ca<mailto:jphill01 at uoguelph.ca>> wrote:

Thanks for the reply Thibaut.


I am actually just looking to generate some DNA sequences according to a given model of nucleotide substitution.


I opened a GitHub issue for the adegenet repository a few weeks ago, which was handled by Emmanuel Paradis, expressing my interest in being able to simulate DNA sequences without having to specify a phylogenetic tree as input. As I understand, the majority of R packages require a tree in order to simulate DNA sequences along branches.


haploGen/seqTrack appears to be the only such R function that generates DNA sequences automatically, given a pre-specified mutation rate of transitions.


I do have a followup question, as I am tweaking your implemented code to handle other evolutionary models:


How exactly is the number of DNA sequences to be contained in each haplotype specified? Is this done with the repro() function in haploGen? That is, is the number of DNA sequences for each haplotype generated according to a

Poisson(lambda = 1.5) distribution?


Thanks.


- Jarrett


________________________________
From: Thibaut Jombart <thibautjombart at gmail.com<mailto:thibautjombart at gmail.com>>
Sent: Monday, August 20, 2018 9:27:30 AM
To: Jarrett Phillips
Cc: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()

Hi Jarrett

if this is something you want to use for reconstructing transmission trees, I would recommend using a more advanced method such as outbreaker2 (http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very crude first pass at that problem.

As for the substitution model, in outbreak reconstruction is usually doesn't matter. But if you are doing phylogenetic reconstruction then I would use the usual ML / likelihood ratio tests to compare different models in phangorn. AIC / BIC should hopefully give similar results.

Best
Thibaut

--
Dr Thibaut Jombart
Senior Lecturer in Genetic Analysis, Imperial College London
Associate Professor in Outbreak Analytics, London School of Hygiene and Tropical Medicine
Head of RECON: repidemicsconsortium.org<http://repidemicsconsortium.org>
https://thibautjombart.netlify.com
Twitter: @TeebzR
+44(0)20 7594 3658


On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips <jphill01 at uoguelph.ca<mailto:jphill01 at uoguelph.ca>> wrote:
My inquiry concerns seqTrack(), which employs haploGen() to simulate a haplotype genealogy through time.

haploGen() generates DNA sequences under a Kimura-2-Parameter substitution model with equal base frequencies and differing transition/transversion rates.

My question is: how can we be certain that the sequences really do conform to a K2P model? Since these are simulated data, it would be difficult to choose the most parsimonious model based on the BIC (using MEGA software for instance), as one is required to select the appropriate codon table (e.g., standard vs. mitochondrial), to ensure proper placement of STOP codons.

Trying this approach for each of standard and mitochondrial codon tables seems to give conflicting results as to the best substitution model.


Could someone shed some light on this? I would appreciate any insight one might be able to offer.

Thanks.
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180823/f511f139/attachment.html>


More information about the adegenet-forum mailing list