[Seqinr-forum] rho() and zscore() functions in SeqinR
Coghlan, Avril
A.Coghlan at ucc.ie
Tue Mar 2 12:44:46 CET 2010
Dear all,
I have written some code to extend the rho() function in SeqinR so that
it can calculate rho values for wordsizes of greater than 2.
I have included my suggested code below for an extended rho() function.
It has an extra argument, which is the word size that the user wants to
calculate rho for (default = 2).
Would you consider extending the current SeqinR rho() function in this
way?
By the way, if you do extend the current SeqinR rho() function in this
way so that it can calculate rho values for wordsizes of greater than 2,
then as far as I can see, it is very simple to extend the zscore()
function so that it also can calculate zscore values for wordsizes of
greater than 2 (by using simulated sequences to calculate the expected
mean and variance for rho).
It would just involve adding an extra argument "wordsize" to zscore(),
and then calling my rho() function (below) with that wordsize value.
I think it would also be nice to give a P-value for a particular zscore.
I think this could just be calculated as:
1 - pnorm(zscore, mean=1,sd=1)
Is this true?
Would you consider adding this too?
Kind Regards,
Avril
Avril Coghlan
University College Cork, Ireland
rho <- function (sequence, alphabet = s2c("acgt"), wordsize=2)
{
wordcount <- count(sequence, wordsize, freq=TRUE, alphabet =
alphabet)
uni <- count(sequence, 1, freq = TRUE, alphabet = alphabet)
# Calculate the total number of possible words of size "wordsize"
total_words <- wordsize^4
# Calculate the expected wordcount
expected_wordcount <- rep(uni,(4^(wordsize-1))) # The expected
frequencies of the first letters in the total_words possible words
for (i in 1:(wordsize-1)) # Find the expected frequencies of the next
(wordsize-1) letters in the total_words possible words
{
# If this is the second letter of a 4-letter word, should have:
i=1, rep(rep(uni, each = 4),16)
# If this is the third letter of a 4-letter word, should have:
i=2, rep(rep(uni, each = 16, 4)
# If this is the fourth letter of a 4-letter word, should have:
i=3, rep(rep(uni, each = 64, 1)
expected_freq_for_letter <- rep(rep(uni, each = 4^i),
4^(wordsize-i-1))
expected_wordcount <- expected_wordcount *
expected_freq_for_letter
}
rho <- wordcount/expected_wordcount
return(rho)
}
More information about the Seqinr-forum
mailing list