[adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()

Mon Aug 20 20:53:22 CEST 2018

Thanks for the reply Thibaut.

I am actually just looking to generate some DNA sequences according to a given model of nucleotide substitution.

I opened a GitHub issue for the adegenet repository a few weeks ago, which was handled by Emmanuel Paradis, expressing my interest in being able to simulate DNA sequences without having to specify a phylogenetic tree as input. As I understand, the majority of R packages require a tree in order to simulate DNA sequences along branches.

haploGen/seqTrack appears to be the only such R function that generates DNA sequences automatically, given a pre-specified mutation rate of transitions.

I do have a followup question, as I am tweaking your implemented code to handle other evolutionary models:

How exactly is the number of DNA sequences to be contained in each haplotype specified? Is this done with the repro() function in haploGen? That is, is the number of DNA sequences for each haplotype generated according to a

Poisson(lambda = 1.5) distribution?

Thanks.

- Jarrett

________________________________
From: Thibaut Jombart <thibautjombart at gmail.com>
Sent: Monday, August 20, 2018 9:27:30 AM
To: Jarrett Phillips
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()

Hi Jarrett

if this is something you want to use for reconstructing transmission trees, I would recommend using a more advanced method such as outbreaker2 (http://www.repidemicsconsortium.org/outbreaker2/) as SeqTrack was a very crude first pass at that problem.

As for the substitution model, in outbreak reconstruction is usually doesn't matter. But if you are doing phylogenetic reconstruction then I would use the usual ML / likelihood ratio tests to compare different models in phangorn. AIC / BIC should hopefully give similar results.

Best
Thibaut

--
Dr Thibaut Jombart
Senior Lecturer in Genetic Analysis, Imperial College London
Associate Professor in Outbreak Analytics, London School of Hygiene and Tropical Medicine
Head of RECON: repidemicsconsortium.org<http://repidemicsconsortium.org>
https://thibautjombart.netlify.com
Twitter: @TeebzR
+44(0)20 7594 3658

On Fri, 17 Aug 2018 at 14:16, Jarrett Phillips <jphill01 at uoguelph.ca<mailto:jphill01 at uoguelph.ca>> wrote:
My inquiry concerns seqTrack(), which employs haploGen() to simulate a haplotype genealogy through time.

haploGen() generates DNA sequences under a Kimura-2-Parameter substitution model with equal base frequencies and differing transition/transversion rates.

My question is: how can we be certain that the sequences really do conform to a K2P model? Since these are simulated data, it would be difficult to choose the most parsimonious model based on the BIC (using MEGA software for instance), as one is required to select the appropriate codon table (e.g., standard vs. mitochondrial), to ensure proper placement of STOP codons.

Trying this approach for each of standard and mitochondrial codon tables seems to give conflicting results as to the best substitution model.

Could someone shed some light on this? I would appreciate any insight one might be able to offer.

Thanks.
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180820/b2cfa06d/attachment.html>