[Seqinr-forum] read.alignment truncated FASTA header

Jean Lobry jean.lobry at univ-lyon1.fr
Fri Feb 23 12:40:27 CET 2018


Dear Simon,

I was able to reproduce the bahaviour bescribed by
Haro Suzuki thereafter.

I've found the culprit in read.fasta() which is
called by read.alignment(). The name is indeed
truncated after the first space.

I'll commit a fix that doesn't break previous
code asap.

Best,

JLO

Le 22/02/2018 à 09:39, Haruo Suzuki a écrit :
> Dear Simon,
> 
> I hope all is well with you.
> 
> LTP datasets based on SILVA release 128 was downloaded from [Archive](https://www.arb-silva.de/no_cache/download/archive/living_tree/LTP_release_128/) using:
> -----------------------
> 	wget https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_128/LTPs128_SSU/LTPs128_SSU_aligned.fasta.tar.gz
> 	tar xvzf LTPs128_SSU_aligned.fasta.tar.gz
> -----------------------
> 
> Here are FASTA header lines:
> -----------------------
> $grep "^>" LTPs128_SSU_aligned.fasta | head -n 2
>> D50541	1	1411	1411bp	rna	Abiotrophia defectiva	Aerococcaceae
>> KP233895	1	1520	1520bp	rna	Abyssivirga alkaniphila	Lachnospiraceae
> -----------------------
> 
> The `read.alignment` function of SeqinR (Version: 3.4-5) did not get whole FASTA header lines (truncated descriptions probably because there are space " " between genus and species in organism names; e.g. "Abiotrophia defectiva" and "Abyssivirga alkaniphila") as follows:
> -----------------------
>> aln <- read.alignment("LTPs128_SSU_aligned.fasta", format = "fasta")
>> head(aln$nam, 2)
> [1] "D50541\t1\t1411\t1411bp\trna\tAbiotrophia"
> [2] "KP233895\t1\t1520\t1520bp\trna\tAbyssivirga"
> -----------------------
> 
> # References
> -----------------------
> https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_128/readme_LTP_SSUs128_LSUs123.pdf
> LTPs128_SSU_aligned.fasta​: multifasta alignments of type strains. The headers of the sequences accordingly stand for the following information: accession number, start and stop position, length, type of sequence, fullname_ltp, hi_tax_ltp. Also compressed for download as ​LTPs128_SSU_aligned.fasta.tar.gz
> -----------------------
> 
> Yours sincerely,
> 
> Haruo Suzuki
> 
> 
> On Nov 27, 2017, at 18:14, Simon Penel <simon.penel at univ-lyon1.fr> wrote:.
> 



More information about the Seqinr-forum mailing list