[adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack()
Zhian Kamvar
kamvarz at science.oregonstate.edu
Thu Sep 27 12:14:04 CEST 2018
Hi Jarrett,
The code below doesn't work as subst.model, transi.rate, and transiv.rate
have not been defined. Could these be the problems you are running into?
Also, you may wish to include a link to the previous discussion so people
can understand what's going on.
Best,
Zhian
On Thu, Sep 27, 2018 at 11:00 AM <
adegenet-forum-request at lists.r-forge.r-project.org> wrote:
> Send adegenet-forum mailing list submissions to
> adegenet-forum at lists.r-forge.r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> or, via email, send a message with subject or body 'help' to
> adegenet-forum-request at lists.r-forge.r-project.org
>
> You can reach the person managing the list at
> adegenet-forum-owner at lists.r-forge.r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of adegenet-forum digest..."
>
>
> Today's Topics:
>
> 1. Nucleotide substitution model used by haploGen()/SeqTrack()
> (Jarrett Phillips)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 Sep 2018 15:46:07 +0000
> From: Jarrett Phillips <jphill01 at uoguelph.ca>
> To: "adegenet-forum at lists.r-forge.r-project.org"
> <adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] Nucleotide substitution model used by
> haploGen()/SeqTrack()
> Message-ID:
> <
> YQXPR01MB069458BFD63BE601F9505BCAD5150 at YQXPR01MB0694.CANPRD01.PROD.OUTLOOK.COM
> >
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Thibaut,
>
>
> This question stems from ones I asked in August.
>
>
> I've been experimenting with the R code from haploGen() and include it
> below (with my own slight modifications for my specific problem). Hopefully
> you can lend some advice as to the issue I am facing (see below).
>
>
> Note here I refer to a haplotype as a unique DNA sequence.
>
>
>
> library(pegas)
>
>
> set.seed(17)
>
>
> sim.seqs <- TRUE
>
> length.seqs <- 500
>
> num.seqs <- 100 # number of DNA sequences
>
>
>
> if (sim.seqs == TRUE) {
>
>
>
> nucl <- as.DNAbin(c('a','c','g','t'))
>
>
> res <- sample(nucl, size = length.seqs, replace = TRUE, prob =
> rep(0.25, 4))
>
>
> if (subst.model == "K80") {
>
>
>
> transi.set <- list('a' = as.DNAbin('g'),
>
> 'c' = as.DNAbin('t'),
>
> 'g' = as.DNAbin('a'),
>
> 't' = as.DNAbin('c'))
>
> transv.set <- list('a' = as.DNAbin(c('c', 't')),
>
> 'c' = as.DNAbin(c('a', 'g')),
>
> 'g' = as.DNAbin(c('c', 't')),
>
> 't' = as.DNAbin(c('a', 'g')))
>
>
>
> transi <- function(res) {
>
> unlist(transi.set[as.character(res)])
>
> }
>
>
>
> transv <- function(res) {
>
> sapply(transv.set[as.character(res)], sample, 1)
>
> }
>
>
>
> duplicate.seq <- function(res) {
>
> num.transi <- rbinom(n = 1, size = length.seqs, prob =
> transi.rate) # total number of transitions
>
> if (num.transi > 0) {
>
> idx <- sample(length.seqs, size = num.transi, replace = FALSE)
>
> res[idx] <- transi(res[idx])
>
> }
>
>
>
> num.transv <- rbinom(n = 1, size = length.seqs, prob =
> transv.rate) # total number of transversions
>
> if (num.transv > 0) {
>
> idx <- sample(length.seqs, size = num.transv, replace = FALSE)
>
> res[idx] <- transv(res[idx])
>
> }
>
> res
>
> }
>
> }
>
>
>
> res <- matrix(replicate(num.seqs, duplicate.seq(res)), byrow = TRUE,
> nrow = num.seqs)
>
>
>
> class(res) <- "DNAbin"
>
>
>
> # write.dna(res, file = "seqs.fas", format = "fasta")
>
>
>
> h <- sort(haplotype(res), decreasing = TRUE, what = "frequencies")
>
> rownames(h) <- 1:nrow(h)
>
>
>
> }
>
>
> ## Output
>
>
> h
>
>
> Haplotypes extracted from: res
>
>
> Number of haplotypes: 5
>
> Sequence length: 500
>
>
> Haplotype labels and frequencies:
>
>
> 1 2 3 4 5
>
> 96 1 1 1 1
>
>
>
> If the code is run multiple times (without the seed), the same pattern
> emerges: h is always skewed toward the most dominant haplotype.
>
>
> Is there a way that you've implemented in haploGen() (and that I have
> potentially glossed over) that gives different trends each time the code is
> run?
>
>
> For example, I want to be able to run the code (without seed) and get
> something like
>
>
> h
>
>
> Haplotypes extracted from: res
>
>
> Number of haplotypes: 5
>
> Sequence length: 500
>
>
> Haplotype labels and frequencies:
>
>
> 1 2 3 4 5
>
> 35 15 25 20 5
>
>
>
> haploGen() does exactly this. Each time haploGen() is run, a different
> distribution for h is outputted. I just want to emulate what you have
> done, but am stuck at this roadblock and I m uncertain of what is causing
> this issue.
>
>
> Note: I am just wanting to generalize your approach to generate DNA
> sequences (not necessarily for outbreak modelling).
>
>
> Can you please shed some light?
>
>
>
> Thank you in advance.
>
>
>
> Sincerely,
>
>
> Jarrett Phillips
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180926/408416d5/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> ------------------------------
>
> End of adegenet-forum Digest, Vol 121, Issue 3
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180927/b479f31d/attachment.html>
More information about the adegenet-forum
mailing list