[Seqinr-forum] Query regarding running "Online Synonymous Codon Usage Analyses with the ade4 and seqinR packages"
Jean lobry
lobry at biomserv.univ-lyon1.fr
Sat Nov 7 10:26:54 CET 2009
>Dear Dr. Jean Lobry and Colleagues,
>
>Yes, the progress bar went to the end and it is now showing a total of
>2207194 codons. So now we positively have the whole data.
>
>
>Thank you very much, have a nice weekend ahead.
>
>Sincerely
>Sourav
>
>
>SOURAV ROY CHOUDHURY.
>Dept of Neuroscience
>University of Calcutta
>West Bengal, INDIA
>
Dear Sourav,
So now you can run the analyses, try this:
######
load("tabcod.RData") # load dataset
library(seqinr)
library(ade4) # for multivariate analyses
#
# Select CDS with >50 codons:
#
tabcod2 <- tabcod[rowSums(tabcod) > 50, ]
#
# Remove Stop, Trp and Met codons:
#
tablecode() # just to see the genetic code
removed.codons <- match(c("taa", "tag", "tga", "tgg", "atg"), colnames(tabcod))
tabcod2 <- tabcod2[ , -removed.codons]
nrow(tabcod2) # 4882 CDS
ncol(tabcod2) # 59 codons analyzed
sum(tabcod2) # 2,128,850 total codon count
#
# Make a qualitative variable for amino-acid:
#
facaa <- factor(sapply(colnames(tabcod2), function(x) aaa(translate(s2c(x)))))
facaa
length(levels(facaa)) # 18 since 2 aminoacids were deleted
#
# Make a qualitative variable for genomes:
#
facgen <- factor(sapply(rownames(tabcod2),
function(x) unlist(strsplit(x, split = "\\."))[1]))
length(levels(facgen)) # 49 genomes
levels(facgen)
# The first 4 characters "NC_0" are not informative, we delete them:
levels(facgen) <- substr(levels(facgen), 5, 20)
#
# Run global codon usage analysis:
#
tot <- dudi.coa(tabcod2, scann = F)
#
# Run synonymous codon uage analysis:
#
witaa <- t(within(t(tot), facaa, scann = F))
#
# Run non-synonymous codon usage analysis (aa usage)
#
betaa <- t(between(t(tot), facaa, scann = F))
#
# Show eigenvalue screeplots:
#
par(mfrow = c(1,3))
ylim <- c(0, tot$eig[1])
neig <- 15
barplot(witaa$eig[1: neig], ylim = ylim, main = "Synonymous")
barplot(betaa$eig[1: neig], ylim = ylim, main = "Non synonymous")
barplot(tot$eig[1: neig], ylim = ylim, main = "Total")
#
# There is one dominating factor, bowth at the synonymous and
# non-synonymous level. Most of the variability is at the
# synonymous level, as expected. Let's see the first
# factorial map for synonymous codon usage:
par(mfrow = c(1, 1))
s.class(witaa$li, facgen, cpoint = 0, clab = 0.5, col = rainbow(49),
cstar = 0, axesell = F)
s.label(witaa$co, add.plot = TRUE, clab = 0.75)
#
# The most important factor of variability is obvioulsy the GC
# content since all AT-ending codons are on one side and all
# GC-ending codons on the other side. The genome hypothesis holds
# here since ellipses are well discriminated.
#
######
Best,
Jean
--
Jean R. Lobry (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo : +33 472 43 27 56 fax : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/
More information about the Seqinr-forum
mailing list