[Seqinr-forum] rho() and zscore() functions in SeqinR

Jean lobry lobry at biomserv.univ-lyon1.fr
Sun Mar 7 12:25:15 CET 2010


At 11:44 +0000 2/03/10, Coghlan, Avril wrote:
>Dear all,
>
>I have written some code to extend the rho() function in SeqinR so that
>it can calculate rho values for wordsizes of greater than 2.
>I have included my  suggested code below for an extended rho() function.
>It has an extra argument, which is the word size that the user wants to
>calculate rho for (default = 2).
>
>Would you consider extending the current SeqinR rho() function in this
>way?


Hi Avril,

I think it's a very good idea. I have just commited the thereafter
modification for the rho() function, main difference with yours
is that it can handle alphabets of any size. Will be available
on R-forge soon (revision >= 1741).

###
rho <- function (sequence, wordsize = 2, alphabet = s2c("acgt"))
{
   wordcount <- count(sequence, wordsize, freq = FALSE, alphabet = alphabet)
   uni <- count(sequence, 1, freq = TRUE, alphabet = alphabet)

   expected_wordfreq <- function (wordsize, uni)
   {
     if (wordsize == 1)
         return(uni)
     else kronecker(uni, expected_wordfreq(wordsize - 1, uni))
   }

   expected_wordcount <- sum(wordcount)*expected_wordfreq(wordsize, uni)

   return(wordcount/expected_wordcount)
}
###

>By the way, if you do extend the current SeqinR rho() function in this
>way so that it can calculate rho values for wordsizes of greater than 2,
>then as far as I can see, it is very simple to extend the zscore()
>function so that it also can calculate zscore values for wordsizes of
>greater than 2 (by using simulated sequences to calculate the expected
>mean and variance for rho).
>It would just involve adding an extra argument "wordsize" to zscore(),
>and then calling my rho() function (below) with that wordsize value.

Good idea too, but I would like to see first if the new rho() version
does not break previous code.

>I think it would also be nice to give a P-value for a particular zscore.
>
>I think this could just be calculated as:
>     1 - pnorm(zscore, mean=1,sd=1)
>Is this true?
>Would you consider adding this too?

I would feel much more comfortable if the responsability for using
the normal approximation was left on the user side :-)

Best,

Z.

-- 
Pr. Jean R. Lobry (Dr. phil. hab.)
Head of Laboratoire de Police Scientifique de Lyon
Institut National de Police Scientifique
31 av. F. Roosevelt, F-69134 ECULLY CEDEX
Allo: +33 (0)4 72 86 89 60
e-mail: jean.lobry at interieur.gouv.fr




More information about the Seqinr-forum mailing list