[Seqinr-forum] rho() and zscore() functions in SeqinR

Coghlan, Avril A.Coghlan at ucc.ie
Tue Mar 2 12:44:46 CET 2010


Dear all,

I have written some code to extend the rho() function in SeqinR so that
it can calculate rho values for wordsizes of greater than 2.
I have included my  suggested code below for an extended rho() function.
It has an extra argument, which is the word size that the user wants to
calculate rho for (default = 2).

Would you consider extending the current SeqinR rho() function in this
way?

By the way, if you do extend the current SeqinR rho() function in this
way so that it can calculate rho values for wordsizes of greater than 2,
then as far as I can see, it is very simple to extend the zscore()
function so that it also can calculate zscore values for wordsizes of
greater than 2 (by using simulated sequences to calculate the expected
mean and variance for rho).
It would just involve adding an extra argument "wordsize" to zscore(),
and then calling my rho() function (below) with that wordsize value.

I think it would also be nice to give a P-value for a particular zscore.

I think this could just be calculated as: 
    1 - pnorm(zscore, mean=1,sd=1)
Is this true?
Would you consider adding this too?

Kind Regards,
Avril

Avril Coghlan
University College Cork, Ireland

rho <- function (sequence, alphabet = s2c("acgt"), wordsize=2)
{
   wordcount <- count(sequence, wordsize, freq=TRUE, alphabet =
alphabet)
   uni <- count(sequence, 1, freq = TRUE, alphabet = alphabet)
 
   # Calculate the total number of possible words of size "wordsize"
   total_words <- wordsize^4

   # Calculate the expected wordcount
   expected_wordcount <- rep(uni,(4^(wordsize-1))) # The expected
frequencies of the first letters in the total_words possible words

   for (i in 1:(wordsize-1)) # Find the expected frequencies of the next
(wordsize-1) letters in the total_words possible words
   {
      # If this is the second letter of a 4-letter word, should have:
i=1, rep(rep(uni, each = 4),16)
      # If this is the third letter of a 4-letter word, should have:
i=2, rep(rep(uni, each = 16, 4)
      # If this is the fourth letter of a 4-letter word, should have:
i=3, rep(rep(uni, each = 64, 1)
      expected_freq_for_letter <- rep(rep(uni, each = 4^i),
4^(wordsize-i-1))
      expected_wordcount <- expected_wordcount *
expected_freq_for_letter
   }

   rho <- wordcount/expected_wordcount

   return(rho)
}



More information about the Seqinr-forum mailing list