[Seqinr-forum] read.fasta fails to read a gzipped compressed file from a ftp server

Jean Lobry jean.lobry at univ-lyon1.fr
Sun Apr 22 17:03:06 CEST 2018


Dear Haruo Suzuki,

I think you already have what you want with:

uco(unlist(ec999), index="rscu")

because all CDS are complete here so that there is no
frameshift problem with the concatenation of
all genes.

If you want RSCU values on a gene by gene basis,
just apply the function to the list:

resRSCU <- sapply(ec999, uco, index="rscu")

Best,

JLO

Le 22/04/2018 à 16:34, Haruo Suzuki a écrit :
> Dear Dr. Lobry,
> 
> Thank you for your reply.
> I would be most grateful if you could provide an example code to do the job.
> 
> uco(ec999[[1]], index="rscu") # it worked for a single gene (vector)
> uco(ec999, index="rscu") # it did not work for all genes (list) and generated NA..
> uco(unlist(ec999), index="rscu") # for a concatenation of all genes
> 
> I would like to compute RSCU values for Global codon usage like Table 1 in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2671203/
> 
> Yours sincerely,
> Haruo Suzuki
> 
> On Apr 22, 2018, at 22:28, Jean Lobry <jean.lobry at univ-lyon1.fr> wrote:
> 
>> Dear Haruo Suzuki,
>>
>> I think that the option index = "rscu" of the uco() function
>> should do the job.
>>
>> Best,
>>
>> JLO
>>
>> Le 22/04/2018 à 14:44, Haruo Suzuki a écrit :
>>> Dear Dr. Lobry:
>>> Thank you for your speedy response. The problem was solved by using gzcon().
>>> I was wondering if there is any easy way to compute Relative Synonymous Codon Usage (RSCU) for a group of genes.
>>> Based on Examples for `dotchart.uco` at https://cran.r-project.org/web/packages/seqinr/seqinr.pdf
>>> absolute codon frequencies and relative codon frequencies for a collection of all genes can be computed as follows:
>>>      # Load dataset:
>>>      data(ec999)
>>>      # Compute codon usage for all coding sequences:
>>>      ec999.uco <- sapply(ec999, uco, index="eff")
>>>      # Compute absolute codon frequencies
>>>      af <- rowSums(ec999.uco)
>>>      # Compute relative codon frequencies
>>>      rf <- af / sum(af)
>>>      # How to compute Relative Synonymous Codon Usage (RSCU) values for the collection of all genes (average gene)?
>>> Yours sincerely,
>>> Haruo Suzuki
>>> On Apr 22, 2018, at 3:23, Jean Lobry <jean.lobry at univ-lyon1.fr> wrote:
>>>> Found it!
>>>>
>>>> just use gzcon() to encapsulate the thing, this is explained at:
>>>>
>>>> http://seqinr.r-forge.r-project.org/src/mainmatter/getseqflat.pdf
>>>>
>>>> Here is it:
>>>>
>>>> --------------------- BEGIN
>>>> $ R --quiet
>>>>> library(seqinr)
>>>>> filename <- "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"
>>>>> ld <- read.fasta(file = gzcon(url(filename)))
>>>>> head(names(ld))
>>>> [1] "lcl|NC_002695.1_rrna_1" "lcl|NC_002695.1_trna_2" "lcl|NC_002695.1_trna_3"
>>>> [4] "lcl|NC_002695.1_rrna_4" "lcl|NC_002695.1_rrna_5" "lcl|NC_002695.1_trna_6"
>>>>
>>>> ----------------------- END
>>>>
>>>> Best,
>>>>
>>>> JLO
>>>>
>>>> Le 21/04/2018 à 20:02, Jean Lobry a écrit :
>>>>> Dear all,
>>>>> I was able to reproduce the well documented faulty behaviour as follows:
>>>>> -------------------------- BEGIN ------------------------------------
>>>>> $ R --quiet
>>>>>> library("seqinr")
>>>>>> filename <- "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"  > ld <- read.fasta(file = filename)
>>>>> Error in substr(lines, 1L, 1L) :
>>>>>    chaîne de charactères multioctets incorrecte à '<aa><89><b4>b<68><8d><a4>c0<ed><B$<97>    O<eb>3:<d2>ڀR 46<a4>L<9b><91>d<ba><b2>9<a6><c4>k!<d2>z<a6>G<8d>4<bb> פa0>2H[<ac>X<d2>VP<b0>a<d2>tj<84><e6><99>&<95><8e><86>Kt<ac>OHZ:@<d7>.
>>>>> |Sэ<e8><81>#<e9><ae><9a><9f>ab><bb>b<ac>ab>W<d7>W<8c><95><93><8d><d5><d0>/<8c><d5>]6<?<fb><cd><f5>ա<8d>\e5><86>a<b2>Qw<b0>Q<b4><90>` <8e>#-M<a8><9d<f5><db><9a>
>>>>>                                                   z
>>>>> d$|<8a>4r<dc>¥<d7><f3><eb><e7><ae>a<d7>p<8c>+\??<c2>5<e8><8e>p<f9><d6><e1>!<aa><94>"`E2<a1>iR<ab><88><e6><89>' De plus : There were 27 warnings (use warnings() to see them)
>>>>>> sessionInfo()
>>>>> R version 3.4.1 (2017-06-30)
>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>> Running under: macOS Sierra 10.12.6
>>>>> Matrix products: default
>>>>> BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
>>>>> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
>>>>> locale:
>>>>> [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>> other attached packages:
>>>>> [1] seqinr_3.4-5
>>>>> loaded via a namespace (and not attached):
>>>>> [1] compiler_3.4.1 ade4_1.7-8
>>>>>> q()
>>>>> Save workspace image? [y/n/c]: n
>>>>> -------------------------- END ------------------------------------
>>>>> Rings a bell to me, I'll post a solution asap.
>>>>> Best,
>>>>> JLO
>>>>> -------- Message transféré --------
>>>>> Sujet :     Re: Error in library(help=seqinr)
>>>>> Date :     Sun, 22 Apr 2018 02:21:23 +0900
>>>>> De :     Haruo Suzuki <haruo at sfc.keio.ac.jp>
>>>>> Pour :     Simon Penel <simon.penel at univ-lyon1.fr>
>>>>> Copie à :     jean.lobry at univ-lyon1.fr
>>>>> Dear Simon,
>>>>> The `read.fasta` function of seqinr package failed to load the gzipped FASTA file directly from the ftp site, simply by specifying a full URL, as shown in attachment.
>>>>> Yours sincerely,
>>>>> Haruo Suzuki
>>>>> _______________________________________________
>>>>> Seqinr-forum mailing list
>>>>> Seqinr-forum at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
>>>>
>>> .
>>
> 
> .
> 



More information about the Seqinr-forum mailing list