From nvincigue at gmail.com Sat Sep 1 15:28:22 2018 From: nvincigue at gmail.com (nvincigue at gmail.com) Date: Sat, 1 Sep 2018 06:28:22 -0700 Subject: [adegenet-forum] Randomly subsetting genind object Message-ID: <9DAB5C98-0C7C-4222-A481-35D76F0885E0@gmail.com> Is there a way to randomly sample a subset of my SNP data, say 13 SNPs out of ~3k, in adegenet when I read in a STRUCTURE file? boot_1 <- read.structure( ?snps.str", onerowperind = TRUE, n.ind = 16, n.loc = 2468, ask = FALSE, quiet = TRUE ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From zkamvar at gmail.com Sat Sep 8 12:08:42 2018 From: zkamvar at gmail.com (Zhian Kamvar) Date: Sat, 8 Sep 2018 11:08:42 +0100 Subject: [adegenet-forum] Randomly subsetting genind object Message-ID: <881A7FF6-65F3-4BB7-9F4C-04C6F6EFB0FD@gmail.com> If you are looking to take a random sample of loci, you can just sample them after you read in the data by using the ?loc? argument: myData[loc=sample(nLoc(myData), 13)] Sent from my iPhone > Is there a way to randomly sample a subset of my SNP data, say 13 SNPs out of ~3k, in adegenet when I read in a STRUCTURE file? > > boot_1 <- > read.structure( > ?snps.str", > onerowperind = TRUE, > n.ind = 16, > n.loc = 2468, > ask = FALSE, > quiet = TRUE > ) > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > ------------------------------ > > End of adegenet-forum Digest, Vol 121, Issue 1 > ********************************************** From jphill01 at uoguelph.ca Wed Sep 26 17:46:07 2018 From: jphill01 at uoguelph.ca (Jarrett Phillips) Date: Wed, 26 Sep 2018 15:46:07 +0000 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() Message-ID: Dear Thibaut, This question stems from ones I asked in August. I've been experimenting with the R code from haploGen() and include it below (with my own slight modifications for my specific problem). Hopefully you can lend some advice as to the issue I am facing (see below). Note here I refer to a haplotype as a unique DNA sequence. library(pegas) set.seed(17) sim.seqs <- TRUE length.seqs <- 500 num.seqs <- 100 # number of DNA sequences if (sim.seqs == TRUE) { nucl <- as.DNAbin(c('a','c','g','t')) res <- sample(nucl, size = length.seqs, replace = TRUE, prob = rep(0.25, 4)) if (subst.model == "K80") { transi.set <- list('a' = as.DNAbin('g'), 'c' = as.DNAbin('t'), 'g' = as.DNAbin('a'), 't' = as.DNAbin('c')) transv.set <- list('a' = as.DNAbin(c('c', 't')), 'c' = as.DNAbin(c('a', 'g')), 'g' = as.DNAbin(c('c', 't')), 't' = as.DNAbin(c('a', 'g'))) transi <- function(res) { unlist(transi.set[as.character(res)]) } transv <- function(res) { sapply(transv.set[as.character(res)], sample, 1) } duplicate.seq <- function(res) { num.transi <- rbinom(n = 1, size = length.seqs, prob = transi.rate) # total number of transitions if (num.transi > 0) { idx <- sample(length.seqs, size = num.transi, replace = FALSE) res[idx] <- transi(res[idx]) } num.transv <- rbinom(n = 1, size = length.seqs, prob = transv.rate) # total number of transversions if (num.transv > 0) { idx <- sample(length.seqs, size = num.transv, replace = FALSE) res[idx] <- transv(res[idx]) } res } } res <- matrix(replicate(num.seqs, duplicate.seq(res)), byrow = TRUE, nrow = num.seqs) class(res) <- "DNAbin" # write.dna(res, file = "seqs.fas", format = "fasta") h <- sort(haplotype(res), decreasing = TRUE, what = "frequencies") rownames(h) <- 1:nrow(h) } ## Output h Haplotypes extracted from: res Number of haplotypes: 5 Sequence length: 500 Haplotype labels and frequencies: 1 2 3 4 5 96 1 1 1 1 If the code is run multiple times (without the seed), the same pattern emerges: h is always skewed toward the most dominant haplotype. Is there a way that you've implemented in haploGen() (and that I have potentially glossed over) that gives different trends each time the code is run? For example, I want to be able to run the code (without seed) and get something like h Haplotypes extracted from: res Number of haplotypes: 5 Sequence length: 500 Haplotype labels and frequencies: 1 2 3 4 5 35 15 25 20 5 haploGen() does exactly this. Each time haploGen() is run, a different distribution for h is outputted. I just want to emulate what you have done, but am stuck at this roadblock and I m uncertain of what is causing this issue. Note: I am just wanting to generalize your approach to generate DNA sequences (not necessarily for outbreak modelling). Can you please shed some light? Thank you in advance. Sincerely, Jarrett Phillips -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamvarz at science.oregonstate.edu Thu Sep 27 12:14:04 2018 From: kamvarz at science.oregonstate.edu (Zhian Kamvar) Date: Thu, 27 Sep 2018 11:14:04 +0100 Subject: [adegenet-forum] Nucleotide substitution model used by haploGen()/SeqTrack() In-Reply-To: References: Message-ID: Hi Jarrett, The code below doesn't work as subst.model, transi.rate, and transiv.rate have not been defined. Could these be the problems you are running into? Also, you may wish to include a link to the previous discussion so people can understand what's going on. Best, Zhian On Thu, Sep 27, 2018 at 11:00 AM < adegenet-forum-request at lists.r-forge.r-project.org> wrote: > Send adegenet-forum mailing list submissions to > adegenet-forum at lists.r-forge.r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > or, via email, send a message with subject or body 'help' to > adegenet-forum-request at lists.r-forge.r-project.org > > You can reach the person managing the list at > adegenet-forum-owner at lists.r-forge.r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of adegenet-forum digest..." > > > Today's Topics: > > 1. Nucleotide substitution model used by haploGen()/SeqTrack() > (Jarrett Phillips) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 26 Sep 2018 15:46:07 +0000 > From: Jarrett Phillips > To: "adegenet-forum at lists.r-forge.r-project.org" > > Subject: [adegenet-forum] Nucleotide substitution model used by > haploGen()/SeqTrack() > Message-ID: > < > YQXPR01MB069458BFD63BE601F9505BCAD5150 at YQXPR01MB0694.CANPRD01.PROD.OUTLOOK.COM > > > > Content-Type: text/plain; charset="iso-8859-1" > > Dear Thibaut, > > > This question stems from ones I asked in August. > > > I've been experimenting with the R code from haploGen() and include it > below (with my own slight modifications for my specific problem). Hopefully > you can lend some advice as to the issue I am facing (see below). > > > Note here I refer to a haplotype as a unique DNA sequence. > > > > library(pegas) > > > set.seed(17) > > > sim.seqs <- TRUE > > length.seqs <- 500 > > num.seqs <- 100 # number of DNA sequences > > > > if (sim.seqs == TRUE) { > > > > nucl <- as.DNAbin(c('a','c','g','t')) > > > res <- sample(nucl, size = length.seqs, replace = TRUE, prob = > rep(0.25, 4)) > > > if (subst.model == "K80") { > > > > transi.set <- list('a' = as.DNAbin('g'), > > 'c' = as.DNAbin('t'), > > 'g' = as.DNAbin('a'), > > 't' = as.DNAbin('c')) > > transv.set <- list('a' = as.DNAbin(c('c', 't')), > > 'c' = as.DNAbin(c('a', 'g')), > > 'g' = as.DNAbin(c('c', 't')), > > 't' = as.DNAbin(c('a', 'g'))) > > > > transi <- function(res) { > > unlist(transi.set[as.character(res)]) > > } > > > > transv <- function(res) { > > sapply(transv.set[as.character(res)], sample, 1) > > } > > > > duplicate.seq <- function(res) { > > num.transi <- rbinom(n = 1, size = length.seqs, prob = > transi.rate) # total number of transitions > > if (num.transi > 0) { > > idx <- sample(length.seqs, size = num.transi, replace = FALSE) > > res[idx] <- transi(res[idx]) > > } > > > > num.transv <- rbinom(n = 1, size = length.seqs, prob = > transv.rate) # total number of transversions > > if (num.transv > 0) { > > idx <- sample(length.seqs, size = num.transv, replace = FALSE) > > res[idx] <- transv(res[idx]) > > } > > res > > } > > } > > > > res <- matrix(replicate(num.seqs, duplicate.seq(res)), byrow = TRUE, > nrow = num.seqs) > > > > class(res) <- "DNAbin" > > > > # write.dna(res, file = "seqs.fas", format = "fasta") > > > > h <- sort(haplotype(res), decreasing = TRUE, what = "frequencies") > > rownames(h) <- 1:nrow(h) > > > > } > > > ## Output > > > h > > > Haplotypes extracted from: res > > > Number of haplotypes: 5 > > Sequence length: 500 > > > Haplotype labels and frequencies: > > > 1 2 3 4 5 > > 96 1 1 1 1 > > > > If the code is run multiple times (without the seed), the same pattern > emerges: h is always skewed toward the most dominant haplotype. > > > Is there a way that you've implemented in haploGen() (and that I have > potentially glossed over) that gives different trends each time the code is > run? > > > For example, I want to be able to run the code (without seed) and get > something like > > > h > > > Haplotypes extracted from: res > > > Number of haplotypes: 5 > > Sequence length: 500 > > > Haplotype labels and frequencies: > > > 1 2 3 4 5 > > 35 15 25 20 5 > > > > haploGen() does exactly this. Each time haploGen() is run, a different > distribution for h is outputted. I just want to emulate what you have > done, but am stuck at this roadblock and I m uncertain of what is causing > this issue. > > > Note: I am just wanting to generalize your approach to generate DNA > sequences (not necessarily for outbreak modelling). > > > Can you please shed some light? > > > > Thank you in advance. > > > > Sincerely, > > > Jarrett Phillips > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180926/408416d5/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > ------------------------------ > > End of adegenet-forum Digest, Vol 121, Issue 3 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: