[Seqinr-forum] zscore() function in SeqinR - "base" model

Coghlan, Avril A.Coghlan at ucc.ie
Tue Mar 2 13:21:49 CET 2010


Dear all,

I have been looking at the zscore() function in SeqinR, and noticed that
the "modele="base" option generates a random sequence using the
permutation() function.  

As far as I can see, zscore() function calls permutation() with default
options, which means that permutation() will generate a random sequence
by randomly sampling the letters in the input sequence without
replacement. 

However, I am wondering whether it would be better if zscore() called
permutation() with the "replace=TRUE" option, so that permutation() will
generate a random sequence by randomly sampling the letters in the input
sequence *with replacement*. 

I think it would be better to shuffle the letters with replacement,
because then that would be generating a random sequence by using a
multinomial probability distribution, where the probabilities of A, C,
G, and T are set equal to their frequencies in the input sequence. This
will result in sequences that do not have exactly the same frequencies
of A, C, G, and T as the input sequence (in contrast, generating a
random sequence by sampling the letters without replacement will
generate a random sequence with exactly the same frequencies of A, C, G,
and T in the input sequence).

My thinking is that if you generate the random sequences that do not
have exactly the same frequencies of A, C, G, and T as the input
sequence, you will get a more accurate estimate of the expected mean and
variance of rho (in particular, of the expected variance of rho, which
will probably be estimated as a larger number), and this will in turn
give a more accurate estimate of zscore (probably a more conservative
estimate of rho, as the estimate for the expected variance of rho will
probably be larger). 

Do you agree with me? 
If so, would you consider changing the lines in the zscore function:
   rhopermut <- sapply(seq(simulations), function(x) {
            rho(permutation(sequence = sequence, modele = modele, 
                ...))
        })
to be:
   rhopermut <- sapply(seq(simulations), function(x) {
            rho(permutation(sequence = sequence, modele = modele,
replace = TRUE, 
                ...))
        })
instead ?

Kind Regards,
Avril 

Avril Coghlan
University College Cork, Ireland










More information about the Seqinr-forum mailing list