[GenABEL-dev] faster polygenic

Mon Mar 7 22:10:45 CET 2011

Xia et al.,

here are results

> I just 'sudo-randomly' tried some different argument value in the hglm function call, and happened to have solved the bias issue (at least it seems to be so). Please try putting the following arguments in your hglm wrapper:
>
> hglm(y = ph, X = desmat, Z = L, conv = 1e-6, maxit = 40)
>
> I re-did 100 simulations and the bias is not sig from 0 with p-value = 0.77!
>

To be even more strict, I have set

hglm( ... , conv = 1e-8, maxit = 100)

and ran the tests. Results attached as PDF (+ the script)

First, some great news:

See figure, columns a) and b): there is almost 1-to-1 identity in h2
estimates generated by polygenic_hglm and FMM. I take this as a sign
that both of them do a great job, while 'old' polygenic underestimates
h2 slightly.

Do not quite understand whathappens with old -- and not quite willing
to dig into it yet, as lambdas are closest to 1 with old -- so I think
we stick to the old form of old-polygenic till we can explain the
whole story. Will wait till polygenic-FMM is out to do another round
of comparisons.

So, I thin Xia's suggestion

> I guess the problem here is just numerical. Since IBS matrix is generally sparse (close to identity), the estimation of variance components will be sensitive numerically. Just make the convergence criteria more strict, and it works.
>

is right. And the 0.4-strangeness must be something to deal with the
starting values, what do you think?

> Note that this might slow down hglm a bit (not very much I guess), but I think it should be more than 20x faster than the old polygenic. Not familiar with fmm, but: 1. our hglm can deal with any GLM family of the trait distribution, even non-normal genetic effects. 2. we produce standard errors for everything - every part is checkable using h-likelihood :)
>

It did become slower with the strict criteria -- now speed-up is x5.3
(range: x1.7 - x80.8). Still, this is quite incredible :) Next step
will be William's polygenic_fmm -- his speed-up is
more-than-incredible :)

Ok, what about finalizing this so we have 'polygenic_hglm'
ready-to-use in the next release?

Here is a plan:

I will take care of updating documentation to 'polygenic' so it
references to 'polygenic_hglm' as an alternative; also main package
documentation. You will see that on genabel-commits.

---- Things we absolutely need before it can be released:

* Test if resulting object works at all with grammar procedure and
produces roughly similar results with old-polygenic object
* Pull the se's of the estimates in the output -- this allows for
testing significance of covariates
* Update documentation:
** Xia, please make sure you mention everything you want to mention --
give proper credit, references, etc.
** Better explanation of the method. We can of cause always refer
people to the main 'hglm' man page, but it will be nice to outline --
e.g. that REML is used by default
** Examples: how do you test for a SNP effect? See for example of
situation with old polygenic:

http://forum.genabel.org/viewtopic.php?f=6&t=134&sid=c5dcf7fc39fd3f7d07b1308bafed5286

----- Things, which will be great to have, but not critical to the release:

** Similar to old-polygenic fixh2 argument -- in which case we can
also output -2xLogLik (but need to make sure people understand that eg
in REML setting this can only be used for testing random effect / h2)
[also look up what are other options available in old-polygenic but
not in polygenic_hglm]
** optimization of the code -- many things which are computed in
wrapper should have been computed by 'hglm', so we can gain some speed

---- Nice, but not-so-urgent things:

* unify and formalize 'polygenic' as S4-class

best wishes, and hear from you,
Yurii

PS To be followed-up later: why the hell does old-polygenic+mmscore
still generates Lambdas closest to 1 (see figure, columns c), d))?
Something to deal with uncertainty in gkin, as suggested by William
(very interesting question!), conservativity of the 'mmscore' as
suggested by Yurii, etc. At the end, we want to know which one is the
most powerful procedure. This is something we can test directly.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_old_hglm_FMM.pdf
Type: application/pdf
Size: 149460 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110307/04a18fcd/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_polygenic_hglm_fmm_2011.03.07.R
Type: application/octet-stream
Size: 3723 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110307/04a18fcd/attachment-0001.obj>