[GenABEL-dev] Roadmap ProbABEL

Wed Jan 8 13:23:02 CET 2014

Hi Yurii,

On 07-01-14 09:09, Yury Aulchenko wrote:
>
> On Jan 6, 2014, at 23:21, Maarten Kooyman <kooyman at gmail.com> wrote:
>
>> Dear List,
>>
>> Lennart and I did discus a roadmap for ProbABEL. We made this roadmap after comparing ProbABEL with other GWAS applications (more on this later). We are welcome for comments, ideas or patches. You will find here a small summary of our discussion.
>>
>>
>> -ProbABEL 0.4.3: bugfix release
>> - Fix regression in converting numbers from filevector/DatABEL files. Lennart already started on this :http://lists.r-forge.r-project.org/pipermail/genabel-commits/2014-January/000922.html
>>
>> -ProbABEL 0.5.0: P-values and Faster
>> - Add P-values to output. (Most likely with help of BOOST library)
>> - Log likelihood is disabled by default since wald test is better.
>
> In what sense it is better? In principle, it can be argued that the likelihood ratio test is the "better" because of its statistical properties (after all, Wald is an approximation to the LRT); score is "better" because it is faster...
>
> I can imagine that Wald is "better" because of technical reasons (which is totally fine) - but please comment.
>

After I had re-implemented the LRT-based chi^2 in ProbABEL I had a
discussion with a colleague who knows more about statistics and she said
that Wald was more accurate than LRT and reminded me of an example we
had in a statistical genetics course.

Yesterday I looked up the example (see below) and it turns out she was
wrong. As you can see the LRT-based p-value is closer to the exact one.

So you are right. We can scratch this from the roadmap.

The example:

Say we want to do hypothesis testing on a binomial problem:
tossing a coin 20 times, we observe 4 heads. Question: is the coin fair?

H_0: p = 0.5
H_1: p != 0.5

The exact (binomial) solution can be calculated in that case, as well as
the normal approximation, the Wald test, the chi^2 test and the LRT. I
quickly coded this in R (see attachment) and got
the following results:

     pval.binom pval.norm pval.chi2 pval.wald   pval.lrt
     0.01181793 0.0139063 0.0139063 0.002107897 0.005492213

So, although the p-values of Wald and LRT are quite far off, the
LRT-based p-value is closer to the binomial one.

Thanks for pointing this out!

Lennart.

> best,
> Yurii
>
>> - masking data in matrix before regression in a more effective way. Now this operation is done twice.
>> - Vectorize code for palinear with EIGEN library
>> - Optimize compiler flags: executables are unnecessary large in size and matrix calculations can go faster.
>> - Other fixes/opportunities.
>>
>> - ProbABEL 0.5.1: palogist optimisations and bugfixes.
>> - Vectorize code for palogist
>> - bug fixes
>>
>>
>> With kind regards,
>>
>> Maarten
>>
>>
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>

--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
-------------- next part --------------
## Hypothesis testing and various approximations
##
## Time-stamp: <2014-01-08 10:53:52 (L.C. Karssen)>
##
## Problem:
## we want to do hypothesis testing on a binomial problem:
## toss a coin 20 times, observe 4 heads. Question is the coin fair?

p <- 0.5                                # prob for heads if coin is fair
q <- 1 - p
n <- 20                                 # nr of tries
X <- 4                                  # observed nr of heads

bot <- 0:X
top <- (n - X):n

## Binomial (exaxt):
pval.binom <- sum(dbinom(bot, n, p), dbinom(top, n, p))

## Normal approximation (score test):
mean <- n * p
stddev <- sqrt(n * p * q)
z.norm <- (X + 0.5 - mean) / stddev     # 0.5 is continuity correction
pval.norm <- 2 * pnorm(z.norm)

## Wald test:
p.obs <- X / n
q.obs <- 1 - p.obs
stddev <- sqrt(n * p.obs * q.obs)
z.wald <- (X + 0.5 - mean)/stddev
pval.wald <- 2 * pnorm(z.wald)

## Chi^2 test
## class 1: 4 heads observed, 10 expected
## class 2: 16 tails observed, 10 expected
chi2 <- (X - mean + 0.5)^2 / mean + ((n - X) - mean - 0.5)^2 / mean
pval.chi2 <- pchisq(chi2, df=1, lower.tail=FALSE)

## Likelihood-ratio test
## H_0: p = 0.5
## H_1: p.obs = X/n = 4/20
p.0 <- dbinom(X, n, p)
p.1 <- dbinom(X, n, p.obs)
lrt <- -2 * log(p.0 / p.1)
pval.lrt <- pchisq(lrt, df=1, lower.tail=FALSE)

results <- data.frame(pval.binom, pval.norm, pval.chi2, pval.wald, pval.lrt)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140108/c2c2b836/attachment.sig>