[GenABEL-dev] faster polygenic

Tue Mar 22 17:33:11 CET 2011

Hi Yurii,

Attached is an updated version of 'polygenic_hglm' function with documentation. I've fixed:

- Brief description of the estimation procedure;

- Reference list;

- An example for testing significance of the polygenic variance component;

- Some minor typos.

Perhaps the default 'conv' can be relaxed a bit. 1e-6 is good enough to me, and also 1e-8 might be two strict for some binary trait - getting a precision of 8 digits for binary data is too difficult, sometimes it might not even converge.

We are working on updating the 'hglm' package on CRAN, where I will make extracting internal variance-covariance matrix possible.

Cheers

Xia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: polygenic_hglm.R
Type: application/octet-stream
Size: 4993 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110322/1179c1b3/attachment.obj>
-------------- next part --------------

On Mar 7, 2011, at 10:10 PM, Yurii Aulchenko wrote:

> Xia et al.,
> 
> here are results
> 
>> I just 'sudo-randomly' tried some different argument value in the hglm function call, and happened to have solved the bias issue (at least it seems to be so). Please try putting the following arguments in your hglm wrapper:
>> 
>> hglm(y = ph, X = desmat, Z = L, conv = 1e-6, maxit = 40)
>> 
>> I re-did 100 simulations and the bias is not sig from 0 with p-value = 0.77!
>> 
> 
> To be even more strict, I have set
> 
> hglm( ... , conv = 1e-8, maxit = 100)
> 
> and ran the tests. Results attached as PDF (+ the script)
> 
> First, some great news:
> 
> See figure, columns a) and b): there is almost 1-to-1 identity in h2
> estimates generated by polygenic_hglm and FMM. I take this as a sign
> that both of them do a great job, while 'old' polygenic underestimates
> h2 slightly.
> 
> Do not quite understand whathappens with old -- and not quite willing
> to dig into it yet, as lambdas are closest to 1 with old -- so I think
> we stick to the old form of old-polygenic till we can explain the
> whole story. Will wait till polygenic-FMM is out to do another round
> of comparisons.
> 
> So, I thin Xia's suggestion
> 
>> I guess the problem here is just numerical. Since IBS matrix is generally sparse (close to identity), the estimation of variance components will be sensitive numerically. Just make the convergence criteria more strict, and it works.
>> 
> 
> is right. And the 0.4-strangeness must be something to deal with the
> starting values, what do you think?
> 
>> Note that this might slow down hglm a bit (not very much I guess), but I think it should be more than 20x faster than the old polygenic. Not familiar with fmm, but: 1. our hglm can deal with any GLM family of the trait distribution, even non-normal genetic effects. 2. we produce standard errors for everything - every part is checkable using h-likelihood :)
>> 
> 
> It did become slower with the strict criteria -- now speed-up is x5.3
> (range: x1.7 - x80.8). Still, this is quite incredible :) Next step
> will be William's polygenic_fmm -- his speed-up is
> more-than-incredible :)
> 
> Ok, what about finalizing this so we have 'polygenic_hglm'
> ready-to-use in the next release?
> 
> Here is a plan:
> 
> I will take care of updating documentation to 'polygenic' so it
> references to 'polygenic_hglm' as an alternative; also main package
> documentation. You will see that on genabel-commits.
> 
> ---- Things we absolutely need before it can be released:
> 
> * Test if resulting object works at all with grammar procedure and
> produces roughly similar results with old-polygenic object
> * Pull the se's of the estimates in the output -- this allows for
> testing significance of covariates
> * Update documentation:
> ** Xia, please make sure you mention everything you want to mention --
> give proper credit, references, etc.
> ** Better explanation of the method. We can of cause always refer
> people to the main 'hglm' man page, but it will be nice to outline --
> e.g. that REML is used by default
> ** Examples: how do you test for a SNP effect? See for example of
> situation with old polygenic:
> 
> http://forum.genabel.org/viewtopic.php?f=6&t=134&sid=c5dcf7fc39fd3f7d07b1308bafed5286
> 
> 
> ----- Things, which will be great to have, but not critical to the release:
> 
> ** Similar to old-polygenic fixh2 argument -- in which case we can
> also output -2xLogLik (but need to make sure people understand that eg
> in REML setting this can only be used for testing random effect / h2)
> [also look up what are other options available in old-polygenic but
> not in polygenic_hglm]
> ** optimization of the code -- many things which are computed in
> wrapper should have been computed by 'hglm', so we can gain some speed
> 
> ---- Nice, but not-so-urgent things:
> 
> * unify and formalize 'polygenic' as S4-class
> 
> best wishes, and hear from you,
> Yurii
> 
> PS To be followed-up later: why the hell does old-polygenic+mmscore
> still generates Lambdas closest to 1 (see figure, columns c), d))?
> Something to deal with uncertainty in gkin, as suggested by William
> (very interesting question!), conservativity of the 'mmscore' as
> suggested by Yurii, etc. At the end, we want to know which one is the
> most powerful procedure. This is something we can test directly.
> <test_old_hglm_FMM.pdf><test_polygenic_hglm_fmm_2011.03.07.R>