[Seqinr-forum] extract A/C/G/T positions in a FASTA file

Jean Lobry jean.lobry at univ-lyon1.fr
Mon Aug 23 10:35:21 CEST 2021


Dear Jie,

I'm unsure of what you are trying to do. Here is some
code you may use as a starter:

library(seqinr)
# read a DNA alignement from a fasta file
myfile <- system.file("sequences/Anouk.fasta", package = "seqinr")
myali <- read.alignment(myfile, format = "fasta")

# Get the indices of "a" in the alignement
which(as.matrix(myali) == "a", arr.ind = TRUE)

# Get the indices of "a" in the consensus sequence
mycon <- consensus(myali)
which(mycon == "a")

HTH,

JLO

Le 09/08/2021 à 21:33, jiehuang001 at gmail.com a écrit :
> Hi, guys:
> 
> Previously I have been using library(Biostrings).
> 
> For example, I have used the following 2 lines to read in a SARS-COV-2 
> FASTA file and find the positions for all “A” allele.
> 
> fa <- readDNAStringSet(“MY-FASTA.fa”, format="fasta")
> 
> I could then use vmatchPattern("A", fa, max.mismatch=0)
> 
> However, the output from the above vmatchPattern() command is a bit messy.
> 
> I wish that SeqinR package could do this more straight-forward.
> 
> If so, can someone please let me know how to write my above Biostrings 
> command for SeqinR?
> 
> Thank you very much & best regards,
> 
> Jie
> 
> 
> _______________________________________________
> Seqinr-forum mailing list
> Seqinr-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
> 



More information about the Seqinr-forum mailing list