[Seqinr-forum] zscore() function in SeqinR - "base" model

Sun Mar 7 12:53:57 CET 2010

>Dear all,
>
>I have been looking at the zscore() function in SeqinR, and noticed that
>the "modele="base" option generates a random sequence using the
>permutation() function.
>
>As far as I can see, zscore() function calls permutation() with default
>options, which means that permutation() will generate a random sequence
>by randomly sampling the letters in the input sequence without
>replacement.
>
>However, I am wondering whether it would be better if zscore() called
>permutation() with the "replace=TRUE" option, so that permutation() will
>generate a random sequence by randomly sampling the letters in the input
>sequence *with replacement*.
>
>I think it would be better to shuffle the letters with replacement,
>because then that would be generating a random sequence by using a
>multinomial probability distribution, where the probabilities of A, C,
>G, and T are set equal to their frequencies in the input sequence. This
>will result in sequences that do not have exactly the same frequencies
>of A, C, G, and T as the input sequence (in contrast, generating a
>random sequence by sampling the letters without replacement will
>generate a random sequence with exactly the same frequencies of A, C, G,
>and T in the input sequence).
>
>My thinking is that if you generate the random sequences that do not
>have exactly the same frequencies of A, C, G, and T as the input
>sequence, you will get a more accurate estimate of the expected mean and
>variance of rho (in particular, of the expected variance of rho, which
>will probably be estimated as a larger number), and this will in turn
>give a more accurate estimate of zscore (probably a more conservative
>estimate of rho, as the estimate for the expected variance of rho will
>probably be larger).
>
>Do you agree with me?
>If so, would you consider changing the lines in the zscore function:
>    rhopermut <- sapply(seq(simulations), function(x) {
>             rho(permutation(sequence = sequence, modele = modele,
>                 ...))
>         })
>to be:
>    rhopermut <- sapply(seq(simulations), function(x) {
>             rho(permutation(sequence = sequence, modele = modele,
>replace = TRUE,
>                 ...))
>         })
>instead ?
>
>Kind Regards,
>Avril
>
>Avril Coghlan
>University College Cork, Ireland
>

Hi Avril,

thanks to the dot-dot-dot argument you can already do that if you
want: the 'replace' argument will be forwarded to the permutation()
function. For instance:

###
>  set.seed(1)
>  sequence <- sample(x = s2c("acgt"), size = 6000, replace = TRUE)
>  set.seed(1)
>  zscore(sequence, modele = "base", simulation = 10)

         aa         ac         ag         at         ca         cc 
cg         ct
  0.4964146  0.2304339 -0.2904738 -0.5860529 -1.2189081 -0.2670835 
0.6720315  1.4506118
         ga         gc         gg         gt         ta         tc 
tg         tt
-0.5799489  0.5563278 -0.7861945  0.4491624  1.2411472 -0.4528621 
0.2757699 -1.6667786
>  set.seed(1)
>  zscore(sequence, modele = "base", simulation = 10, replace = TRUE)

         aa         ac         ag         at         ca         cc 
cg         ct
  0.9257696  0.2787500 -0.4081940 -0.7942528 -2.0559923 -0.1371734 
0.7658307  0.7498670
         ga         gc         gg         gt         ta         tc 
tg         tt
-0.3817298  0.2818443 -0.7577013  1.1394403  1.0662898 -0.5083124 
0.3406378 -0.7612934
###

Not sure to follow you on saying it's better to do so. For me it looks like
removing the margin sums constraints in a chi-square independency 
test performed
by simulation.

Best,

Z.

-- 
Pr. Jean R. Lobry (Dr. phil. hab.)
Head of Laboratoire de Police Scientifique de Lyon
Institut National de Police Scientifique
31 av. F. Roosevelt, F-69134 ECULLY CEDEX
Allo: +33 (0)4 72 86 89 60
e-mail: jean.lobry at interieur.gouv.fr