<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, Helvetica, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<p style="margin-top:0; margin-bottom:0">Dear Thibaut,</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">This question stems from ones I asked in August.</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">I've been experimenting with the R code from haploGen() and include it below (with my own slight modifications for my specific problem). <span style="font-size: 12pt;">Hopefully you can lend some advice as to the issue
I am facing (see below).</span></p>
<p></p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">Note here I refer to a haplotype as a unique DNA sequence.</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">library(pegas)</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">set.seed(17)</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">sim.seqs <- TRUE</p>
<p style="margin-top:0; margin-bottom:0">length.seqs <- 500 </p>
<p style="margin-top:0; margin-bottom:0">num.seqs <- 100 # number of DNA sequences</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p>if (sim.seqs == TRUE) {</p>
<p> </p>
<p> nucl <- as.DNAbin(c('a','c','g','t'))</p>
<p><span style="font-size:12pt"><br>
</span></p>
<p><span style="font-size:12pt"> res <- sample(nucl, size = length.seqs, replace = TRUE, prob = rep(0.25, 4))</span><br>
</p>
<p><span style="font-size:12pt"><br>
</span></p>
<p> if (subst.model == "K80") {</p>
<p> </p>
<p> transi.set <- list('a' = as.DNAbin('g'), </p>
<p> 'c' = as.DNAbin('t'),</p>
<p> 'g' = as.DNAbin('a'), </p>
<p> 't' = as.DNAbin('c'))</p>
<p> transv.set <- list('a' = as.DNAbin(c('c', 't')),</p>
<p> 'c' = as.DNAbin(c('a', 'g')),</p>
<p> 'g' = as.DNAbin(c('c', 't')), </p>
<p> 't' = as.DNAbin(c('a', 'g')))</p>
<p> </p>
<p> transi <- function(res) {</p>
<p> unlist(transi.set[as.character(res)])</p>
<p> }</p>
<p> </p>
<p> transv <- function(res) {</p>
<p> sapply(transv.set[as.character(res)], sample, 1)</p>
<p> }</p>
<p> </p>
<p> duplicate.seq <- function(res) {</p>
<p> num.transi <- rbinom(n = 1, size = length.seqs, prob = transi.rate) # total number of transitions</p>
<p> if (num.transi > 0) {</p>
<p> idx <- sample(length.seqs, size = num.transi, replace = FALSE)</p>
<p> res[idx] <- transi(res[idx])</p>
<p> }</p>
<p> </p>
<p> num.transv <- rbinom(n = 1, size = length.seqs, prob = transv.rate) # total number of transversions</p>
<p> if (num.transv > 0) {</p>
<p> idx <- sample(length.seqs, size = num.transv, replace = FALSE)</p>
<p> res[idx] <- transv(res[idx])</p>
<p> }</p>
<p> res</p>
<p> }</p>
<p> }</p>
<p> </p>
<p> res <- matrix(replicate(num.seqs, duplicate.seq(res)), byrow = TRUE, nrow = num.seqs)</p>
<p> </p>
<p> class(res) <- "DNAbin"</p>
<p> </p>
<p> # write.dna(res, file = "seqs.fas", format = "fasta")</p>
<p> </p>
<p> h <- sort(haplotype(res), decreasing = TRUE, what = "frequencies")</p>
<p> rownames(h) <- 1:nrow(h)</p>
<p></p>
<p> </p>
<p> }</p>
<p><br>
</p>
<p>## Output </p>
<p><br>
</p>
<p>h</p>
<p><br>
</p>
<p>Haplotypes extracted from: res </p>
<p><br>
</p>
<p> Number of haplotypes: 5 </p>
<p> Sequence length: 500 </p>
<p><br>
</p>
<p>Haplotype labels and frequencies:</p>
<p><br>
</p>
<p> 1 2 3 4 5 </p>
<p></p>
<p>96 1 1 1 1 </p>
<p></p>
<p><br>
</p>
<p><br>
</p>
<p>If the code is run multiple times (without the seed), the same pattern emerges: h is always skewed toward the most dominant haplotype.</p>
<p><br>
</p>
<p>Is there a way that you've implemented in haploGen() (and that I have potentially glossed over) that gives different trends each time the code is run?</p>
<p><br>
</p>
<p>For example, I want to be able to run the code (without seed) and get something like</p>
<p><br>
</p>
<p>h</p>
<p><br>
</p>
<p>Haplotypes extracted from: res </p>
<p><br>
</p>
<p> Number of haplotypes: 5 </p>
<p> Sequence length: 500 </p>
<p><br>
</p>
<p>Haplotype labels and frequencies:</p>
<p><br>
</p>
<p> 1 2 3 4 5 </p>
<p></p>
<p>35 15 25 20 5</p>
<p><br>
</p>
<p><br>
</p>
<p>haploGen() does exactly this. Each time haploGen() is run, a different distribution for h is outputted. I just want to emulate what you have done, but am stuck at this roadblock and I m uncertain of what is causing this issue.</p>
<p><br>
</p>
<p>Note: I am just wanting to generalize your approach to generate DNA sequences (not necessarily for outbreak modelling). </p>
<p><br>
</p>
<p>Can you please shed some light?</p>
<p><br>
</p>
<p><br>
</p>
<p>Thank you in advance.</p>
<p><br>
</p>
<p><br>
</p>
<p>Sincerely,</p>
<p><br>
</p>
<p>Jarrett Phillips </p>
<p></p>
</div>
</body>
</html>