<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

</head>

<body dir="ltr">

<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, Helvetica, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">

<p style="margin-top:0; margin-bottom:0">Dear Thibaut,</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">This question stems from  ones I asked in August.</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">I've been experimenting with the R code from haploGen() and include it below (with my own slight modifications for my specific problem). <span style="font-size: 12pt;">Hopefully you can lend some advice as to the issue

 I am facing (see below).</span></p>

<p></p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">Note here I refer to a haplotype as a unique DNA sequence.</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">library(pegas)</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">set.seed(17)</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0">sim.seqs <- TRUE</p>

<p style="margin-top:0; margin-bottom:0">length.seqs <- 500 </p>

<p style="margin-top:0; margin-bottom:0">num.seqs <- 100 # number of DNA sequences</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p style="margin-top:0; margin-bottom:0"><br>

</p>

<p>if (sim.seqs == TRUE) {</p>

<p>      </p>

<p>      nucl <- as.DNAbin(c('a','c','g','t'))</p>

<p><span style="font-size:12pt"><br>

</span></p>

<p><span style="font-size:12pt">     res <- sample(nucl, size = length.seqs, replace = TRUE, prob = rep(0.25, 4))</span><br>

</p>

<p><span style="font-size:12pt"><br>

</span></p>

<p>     if (subst.model == "K80") {</p>

<p>        </p>

<p>        transi.set <- list('a' = as.DNAbin('g'), </p>

<p>                           'c' = as.DNAbin('t'),</p>

<p>                           'g' = as.DNAbin('a'), </p>

<p>                           't' = as.DNAbin('c'))</p>

<p>        transv.set <- list('a' = as.DNAbin(c('c', 't')),</p>

<p>                           'c' = as.DNAbin(c('a', 'g')),</p>

<p>                           'g' = as.DNAbin(c('c', 't')), </p>

<p>                           't' = as.DNAbin(c('a', 'g')))</p>

<p>        </p>

<p>        transi <- function(res) {</p>

<p>          unlist(transi.set[as.character(res)])</p>

<p>        }</p>

<p>        </p>

<p>        transv <- function(res) {</p>

<p>          sapply(transv.set[as.character(res)], sample, 1)</p>

<p>        }</p>

<p>        </p>

<p>        duplicate.seq <- function(res) {</p>

<p>          num.transi <- rbinom(n = 1, size = length.seqs, prob = transi.rate) # total number of transitions</p>

<p>          if (num.transi > 0) {</p>

<p>            idx <- sample(length.seqs, size = num.transi, replace = FALSE)</p>

<p>            res[idx] <- transi(res[idx])</p>

<p>          }</p>

<p>          </p>

<p>          num.transv <- rbinom(n = 1, size = length.seqs, prob = transv.rate) # total number of transversions</p>

<p>          if (num.transv > 0) {</p>

<p>            idx <- sample(length.seqs, size = num.transv, replace = FALSE)</p>

<p>            res[idx] <- transv(res[idx])</p>

<p>          }</p>

<p>          res</p>

<p>          }</p>

<p>        }</p>

<p>    </p>

<p>      res <- matrix(replicate(num.seqs, duplicate.seq(res)), byrow = TRUE, nrow = num.seqs)</p>

<p>      </p>

<p>      class(res) <- "DNAbin"</p>

<p>    </p>

<p>      # write.dna(res, file = "seqs.fas", format = "fasta")</p>

<p>    </p>

<p>      h <- sort(haplotype(res), decreasing = TRUE, what = "frequencies")</p>

<p>      rownames(h) <- 1:nrow(h)</p>

<p></p>

<p>    </p>

<p>    }</p>

<p><br>

</p>

<p>## Output </p>

<p><br>

</p>

<p>h</p>

<p><br>

</p>

<p>Haplotypes extracted from: res </p>

<p><br>

</p>

<p>    Number of haplotypes: 5 </p>

<p>         Sequence length: 500 </p>

<p><br>

</p>

<p>Haplotype labels and frequencies:</p>

<p><br>

</p>

<p>  1  2  3  4  5 </p>

<p></p>

<p>96  1  1  1  1 </p>

<p></p>

<p><br>

</p>

<p><br>

</p>

<p>If the code is run multiple times (without the seed), the same pattern emerges: h is always skewed toward the most dominant haplotype.</p>

<p><br>

</p>

<p>Is there a way that you've implemented in haploGen() (and that I have potentially glossed over) that gives different trends each time the code is run?</p>

<p><br>

</p>

<p>For example, I want to be able to run the code (without seed) and get something like</p>

<p><br>

</p>

<p>h</p>

<p><br>

</p>

<p>Haplotypes extracted from: res </p>

<p><br>

</p>

<p>    Number of haplotypes: 5 </p>

<p>         Sequence length: 500 </p>

<p><br>

</p>

<p>Haplotype labels and frequencies:</p>

<p><br>

</p>

<p>  1    2    3    4   5 </p>

<p></p>

<p>35  15  25  20  5</p>

<p><br>

</p>

<p><br>

</p>

<p>haploGen() does exactly this. Each time haploGen() is run, a different distribution for h  is outputted. I just want to emulate what you have done, but am stuck at this roadblock and I m uncertain of what is causing this issue.</p>

<p><br>

</p>

<p>Note: I am just wanting to generalize your approach to generate DNA sequences (not necessarily for outbreak modelling). </p>

<p><br>

</p>

<p>Can you please shed some light?</p>

<p><br>

</p>

<p><br>

</p>

<p>Thank you in advance.</p>

<p><br>

</p>

<p><br>

</p>

<p>Sincerely,</p>

<p><br>

</p>

<p>Jarrett Phillips </p>

<p></p>

</div>

</body>

</html>