[Seqinr-forum] read.alignment truncated FASTA header

Haruo Suzuki haruo at sfc.keio.ac.jp
Tue Feb 27 12:51:42 CET 2018


Dear Dr. Jean Lobry,

I confirmed that it worked properly as follows:

-----------------------
> whole.header.test <- 
+  read.alignment(file = system.file("sequences/LTPs128_SSU_aligned_First_Two.fasta", 
+  package = "seqinr"), format = "fasta", whole.header = TRUE)
> whole.header.test$nam
[1] "D50541\t1\t1411\t1411bp\trna\tAbiotrophia defectiva\tAerococcaceae"      
[2] "KP233895\t1\t1520\t1520bp\trna\tAbyssivirga alkaniphila\tLachnospiraceae"
-----------------------

Thank you,

Haruo Suzuki

On Feb 24, 2018, at 23:57, Jean Lobry <jean.lobry at univ-lyon1.fr> wrote:

> Dear All,
> 
> I have commited a fix:
> 
> http://seqinr.r-forge.r-project.org/src/appendix/releasenotes.pdf
> 
> available under the dev version of seqinr:
> 
> install.packages("seqinr", repos="http://R-Forge.R-project.org")
> 
> Best,
> 
> JLO
> 
> Le 23/02/2018 à 12:40, Jean Lobry a écrit :
>> Dear Simon,
>> I was able to reproduce the bahaviour bescribed by
>> Haro Suzuki thereafter.
>> I've found the culprit in read.fasta() which is
>> called by read.alignment(). The name is indeed
>> truncated after the first space.
>> I'll commit a fix that doesn't break previous
>> code asap.
>> Best,
>> JLO
>> Le 22/02/2018 à 09:39, Haruo Suzuki a écrit :
>>> Dear Simon,
>>> 
>>> I hope all is well with you.
>>> 
>>> LTP datasets based on SILVA release 128 was downloaded from [Archive](https://www.arb-silva.de/no_cache/download/archive/living_tree/LTP_release_128/) using:
>>> -----------------------
>>>     wget https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_128/LTPs128_SSU/LTPs128_SSU_aligned.fasta.tar.gz 
>>>     tar xvzf LTPs128_SSU_aligned.fasta.tar.gz
>>> -----------------------
>>> 
>>> Here are FASTA header lines:
>>> -----------------------
>>> $grep "^>" LTPs128_SSU_aligned.fasta | head -n 2
>>>> D50541    1    1411    1411bp    rna    Abiotrophia defectiva    Aerococcaceae
>>>> KP233895    1    1520    1520bp    rna    Abyssivirga alkaniphila    Lachnospiraceae
>>> -----------------------
>>> 
>>> The `read.alignment` function of SeqinR (Version: 3.4-5) did not get whole FASTA header lines (truncated descriptions probably because there are space " " between genus and species in organism names; e.g. "Abiotrophia defectiva" and "Abyssivirga alkaniphila") as follows:
>>> -----------------------
>>>> aln <- read.alignment("LTPs128_SSU_aligned.fasta", format = "fasta")
>>>> head(aln$nam, 2)
>>> [1] "D50541\t1\t1411\t1411bp\trna\tAbiotrophia"
>>> [2] "KP233895\t1\t1520\t1520bp\trna\tAbyssivirga"
>>> -----------------------
>>> 
>>> # References
>>> -----------------------
>>> https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_128/readme_LTP_SSUs128_LSUs123.pdf 
>>> LTPs128_SSU_aligned.fasta​: multifasta alignments of type strains. The headers of the sequences accordingly stand for the following information: accession number, start and stop position, length, type of sequence, fullname_ltp, hi_tax_ltp. Also compressed for download as ​LTPs128_SSU_aligned.fasta.tar.gz
>>> -----------------------
>>> 
>>> Yours sincerely,
>>> 
>>> Haruo Suzuki
>>> 
>>> 
>>> On Nov 27, 2017, at 18:14, Simon Penel <simon.penel at univ-lyon1.fr> wrote:.
>>> 
>> _______________________________________________
>> Seqinr-forum mailing list
>> Seqinr-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20180227/c9f434a8/attachment.html>


More information about the Seqinr-forum mailing list