[Seqinr-forum] Query regarding running "Online Synonymous Codon Usage Analyses with the ade4 and seqinR packages"

sourav roy choudhury src.bio at gmail.com
Mon Nov 9 20:33:43 CET 2009


Dear Dr. Jean Lobry and Colleagues,

The code worked fine, the analysis is done and plots are generated as
it was expected.
Thank you very much Dr. Lobry for your cordial help. It was so nice of you.

with regards,
Sincerely yours
Sourav


SOURAV ROY CHOUDHURY.
Dept of Neuroscience
University of Calcutta
West Bengal, INDIA







On Sat, Nov 7, 2009 at 2:56 PM, Jean lobry <lobry at biomserv.univ-lyon1.fr> wrote:
>> Dear Dr. Jean Lobry and Colleagues,
>>
>> Yes, the progress bar went to the end and it is now showing a total of
>> 2207194 codons. So now we positively have the whole data.
>>
>>
>> Thank you very much, have a nice weekend ahead.
>>
>> Sincerely
>> Sourav
>>
>>
>> SOURAV ROY CHOUDHURY.
>> Dept of Neuroscience
>> University of Calcutta
>> West Bengal, INDIA
>>
>
> Dear Sourav,
>
> So now you can run the analyses, try this:
>
> ######
> load("tabcod.RData") # load dataset
> library(seqinr)
> library(ade4) # for multivariate analyses
>
> #
> # Select CDS with >50 codons:
> #
>
> tabcod2 <- tabcod[rowSums(tabcod) > 50, ]
>
> #
> # Remove Stop, Trp and Met codons:
> #
>
> tablecode() # just to see the genetic code
> removed.codons <- match(c("taa", "tag", "tga", "tgg", "atg"),
> colnames(tabcod))
> tabcod2 <- tabcod2[ , -removed.codons]
> nrow(tabcod2) # 4882 CDS
> ncol(tabcod2) # 59 codons analyzed
> sum(tabcod2)  # 2,128,850 total codon count
>
> #
> # Make a qualitative variable for amino-acid:
> #
>
> facaa <- factor(sapply(colnames(tabcod2), function(x)
> aaa(translate(s2c(x)))))
> facaa
> length(levels(facaa)) # 18 since 2 aminoacids were deleted
>
> #
> # Make a qualitative variable for genomes:
> #
>
> facgen <- factor(sapply(rownames(tabcod2),
>  function(x) unlist(strsplit(x, split = "\\."))[1]))
> length(levels(facgen)) # 49 genomes
> levels(facgen)
> # The first 4 characters "NC_0" are not informative, we delete them:
> levels(facgen) <- substr(levels(facgen), 5, 20)
>
> #
> # Run global codon usage analysis:
> #
>
> tot <- dudi.coa(tabcod2, scann = F)
>
> #
> # Run synonymous codon uage analysis:
> #
>
> witaa <- t(within(t(tot), facaa, scann = F))
>
> #
> # Run non-synonymous codon usage analysis (aa usage)
> #
>
> betaa <- t(between(t(tot), facaa, scann = F))
>
> #
> # Show eigenvalue screeplots:
> #
>
> par(mfrow = c(1,3))
> ylim <- c(0, tot$eig[1])
> neig <- 15
> barplot(witaa$eig[1: neig], ylim = ylim, main = "Synonymous")
> barplot(betaa$eig[1: neig], ylim = ylim, main = "Non synonymous")
> barplot(tot$eig[1: neig], ylim = ylim, main = "Total")
>
> #
> # There is one dominating factor, bowth at the synonymous and
> # non-synonymous level. Most of the variability is at the
> # synonymous level, as expected. Let's see the first
> # factorial map for synonymous codon usage:
>
> par(mfrow = c(1, 1))
> s.class(witaa$li, facgen, cpoint = 0, clab = 0.5, col = rainbow(49),
>  cstar = 0, axesell = F)
> s.label(witaa$co, add.plot = TRUE, clab = 0.75)
>
> #
> # The most important factor of variability is obvioulsy the GC
> # content since all AT-ending codons are on one side and all
> # GC-ending codons on the other side. The genome hypothesis holds
> # here since ellipses are well discriminated.
> #
> ######
>
> Best,
>
> Jean
> --
> Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
> Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
> 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
> allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
> http://pbil.univ-lyon1.fr/members/lobry/
>
>
>


More information about the Seqinr-forum mailing list