[Seqinr-forum] zscore() function in SeqinR - "base" model
Jean lobry
lobry at biomserv.univ-lyon1.fr
Sun Mar 7 12:53:57 CET 2010
>Dear all,
>
>I have been looking at the zscore() function in SeqinR, and noticed that
>the "modele="base" option generates a random sequence using the
>permutation() function.
>
>As far as I can see, zscore() function calls permutation() with default
>options, which means that permutation() will generate a random sequence
>by randomly sampling the letters in the input sequence without
>replacement.
>
>However, I am wondering whether it would be better if zscore() called
>permutation() with the "replace=TRUE" option, so that permutation() will
>generate a random sequence by randomly sampling the letters in the input
>sequence *with replacement*.
>
>I think it would be better to shuffle the letters with replacement,
>because then that would be generating a random sequence by using a
>multinomial probability distribution, where the probabilities of A, C,
>G, and T are set equal to their frequencies in the input sequence. This
>will result in sequences that do not have exactly the same frequencies
>of A, C, G, and T as the input sequence (in contrast, generating a
>random sequence by sampling the letters without replacement will
>generate a random sequence with exactly the same frequencies of A, C, G,
>and T in the input sequence).
>
>My thinking is that if you generate the random sequences that do not
>have exactly the same frequencies of A, C, G, and T as the input
>sequence, you will get a more accurate estimate of the expected mean and
>variance of rho (in particular, of the expected variance of rho, which
>will probably be estimated as a larger number), and this will in turn
>give a more accurate estimate of zscore (probably a more conservative
>estimate of rho, as the estimate for the expected variance of rho will
>probably be larger).
>
>Do you agree with me?
>If so, would you consider changing the lines in the zscore function:
> rhopermut <- sapply(seq(simulations), function(x) {
> rho(permutation(sequence = sequence, modele = modele,
> ...))
> })
>to be:
> rhopermut <- sapply(seq(simulations), function(x) {
> rho(permutation(sequence = sequence, modele = modele,
>replace = TRUE,
> ...))
> })
>instead ?
>
>Kind Regards,
>Avril
>
>Avril Coghlan
>University College Cork, Ireland
>
Hi Avril,
thanks to the dot-dot-dot argument you can already do that if you
want: the 'replace' argument will be forwarded to the permutation()
function. For instance:
###
> set.seed(1)
> sequence <- sample(x = s2c("acgt"), size = 6000, replace = TRUE)
> set.seed(1)
> zscore(sequence, modele = "base", simulation = 10)
aa ac ag at ca cc
cg ct
0.4964146 0.2304339 -0.2904738 -0.5860529 -1.2189081 -0.2670835
0.6720315 1.4506118
ga gc gg gt ta tc
tg tt
-0.5799489 0.5563278 -0.7861945 0.4491624 1.2411472 -0.4528621
0.2757699 -1.6667786
> set.seed(1)
> zscore(sequence, modele = "base", simulation = 10, replace = TRUE)
aa ac ag at ca cc
cg ct
0.9257696 0.2787500 -0.4081940 -0.7942528 -2.0559923 -0.1371734
0.7658307 0.7498670
ga gc gg gt ta tc
tg tt
-0.3817298 0.2818443 -0.7577013 1.1394403 1.0662898 -0.5083124
0.3406378 -0.7612934
###
Not sure to follow you on saying it's better to do so. For me it looks like
removing the margin sums constraints in a chi-square independency
test performed
by simulation.
Best,
Z.
--
Pr. Jean R. Lobry (Dr. phil. hab.)
Head of Laboratoire de Police Scientifique de Lyon
Institut National de Police Scientifique
31 av. F. Roosevelt, F-69134 ECULLY CEDEX
Allo: +33 (0)4 72 86 89 60
e-mail: jean.lobry at interieur.gouv.fr
More information about the Seqinr-forum
mailing list