[GenABEL-dev] Roadmap ProbABEL

Yurii Aulchenko yurii.aulchenko at gmail.com
Tue Jan 21 11:47:07 CET 2014


Re: Wald 2df

I have figured it out a couple if times, so if we ever have an hour to
spend I should be able to reproduce that in R...

----------------------
Yurii Aulchenko
(sent from mobile device)

> On Jan 21, 2014, at 3:11 PM, "L.C. Karssen" <lennart at karssen.org> wrote:
>
> Dear Maria,
>
> You are right. As developers we should be more explicit in what we mean
> with 'better'. At the moment we would like to have the most accurate
> test in ProbABEL, which too my knowledge is the LRT (except when mmscore
> is used, then the Wald test is used). Some users have asked for the Wald
> test in all cases. We may add that as an option.
>
> By the way, for the 2DF model in combination with mmscore we still have
> not implemented a chi^2 statistic. This would require the implementation
> of the 2df Wald test as outlined in the ProbABEL paper, but when I tried
> to implement it last year I didn't get the expected results. I guess I
> should invest some more time in that, maybe I just missed something
> simple (help is very welcome!).
>
>
> Best,
>
> Lennart.
>
>> On 17-01-14 12:40, Maria G wrote:
>> Dear all,
>>
>> thanks for open discussion!
>>
>> For me personally is was alway an important question, what do developers mean by choosing a specific test for a default mode. The same with therm "better" - I also have noticed, that in many scripts applicable for huge data-sets, "better" would mean "faster" rather then "more accurate". Which is fine, but good to know in advance   ;)
>>
>> best regards
>>
>> Maria
>>
>> 08.01.2014, 16:23, "L.C. Karssen" <lennart at karssen.org>:
>>> Hi Yurii,
>>>
>>>> On 07-01-14 09:09, Yury Aulchenko wrote:
>>>>
>>>>> On Jan 6, 2014, at 23:21, Maarten Kooyman <kooyman at gmail.com> wrote:
>>>>> Dear List,
>>>>>
>>>>> Lennart and I did discus a roadmap for ProbABEL. We made this roadmap after comparing ProbABEL with other GWAS applications (more on this later). We are welcome for comments, ideas or patches. You will find here a small summary of our discussion.
>>>>>
>>>>> -ProbABEL 0.4.3: bugfix release
>>>>> - Fix regression in converting numbers from filevector/DatABEL files. Lennart already started on this :http://lists.r-forge.r-project.org/pipermail/genabel-commits/2014-January/000922.html
>>>>>
>>>>> -ProbABEL 0.5.0: P-values and Faster
>>>>> - Add P-values to output. (Most likely with help of BOOST library)
>>>>> - Log likelihood is disabled by default since wald test is better.
>>>> In what sense it is better? In principle, it can be argued that the likelihood ratio test is the "better" because of its statistical properties (after all, Wald is an approximation to the LRT); score is "better" because it is faster...
>>>>
>>>> I can imagine that Wald is "better" because of technical reasons (which is totally fine) - but please comment.
>>>
>>> After I had re-implemented the LRT-based chi^2 in ProbABEL I had a
>>> discussion with a colleague who knows more about statistics and she said
>>> that Wald was more accurate than LRT and reminded me of an example we
>>> had in a statistical genetics course.
>>>
>>> Yesterday I looked up the example (see below) and it turns out she was
>>> wrong. As you can see the LRT-based p-value is closer to the exact one.
>>>
>>> So you are right. We can scratch this from the roadmap.
>>>
>>> The example:
>>>
>>> Say we want to do hypothesis testing on a binomial problem:
>>> tossing a coin 20 times, we observe 4 heads. Question: is the coin fair?
>>>
>>> H_0: p = 0.5
>>> H_1: p != 0.5
>>>
>>> The exact (binomial) solution can be calculated in that case, as well as
>>> the normal approximation, the Wald test, the chi^2 test and the LRT. I
>>> quickly coded this in R (see attachment) and got
>>> the following results:
>>>
>>>     pval.binom pval.norm pval.chi2 pval.wald   pval.lrt
>>>     0.01181793 0.0139063 0.0139063 0.002107897 0.005492213
>>>
>>> So, although the p-values of Wald and LRT are quite far off, the
>>> LRT-based p-value is closer to the binomial one.
>>>
>>> Thanks for pointing this out!
>>>
>>> Lennart.
>>>
>>>> best,
>>>> Yurii
>>>>> - masking data in matrix before regression in a more effective way. Now this operation is done twice.
>>>>> - Vectorize code for palinear with EIGEN library
>>>>> - Optimize compiler flags: executables are unnecessary large in size and matrix calculations can go faster.
>>>>> - Other fixes/opportunities.
>>>>>
>>>>> - ProbABEL 0.5.1: palogist optimisations and bugfixes.
>>>>> - Vectorize code for palogist
>>>>> - bug fixes
>>>>>
>>>>> With kind regards,
>>>>>
>>>>> Maarten
>>>>>
>>>>> _______________________________________________
>>>>> genabel-devel mailing list
>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>> _______________________________________________
>>>> genabel-devel mailing list
>>>> genabel-devel at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>
>>> --
>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>> L.C. Karssen
>>> Utrecht
>>> The Netherlands
>>>
>>> lennart at karssen.org
>>> http://blog.karssen.org
>>> GPG key ID: A88F554A
>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>>>
>>> ,
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel


More information about the genabel-devel mailing list