[GenABEL-dev] faster polygenic

Fri Feb 25 22:37:09 CET 2011

Thanks, Xia!

Did what you suggested and just submitted the code to the repository

https://r-forge.r-project.org/scm/viewvc.php/pkg/GenABEL/R/polygenic_hglm.R?view=markup&root=genabel

Naturally, you are mentioned as the first author of this procedure :)

Something to take care later on: more details in documentation
(especially what people should paste in their papers when writing down
'methods'), cross-links to 'hglm', references, so you get citations
when polygenic_hglm is used.

On lines 82-92 of 'polygenic_hglm.R' I construct 'polygenic' object
from hglm results.

It seems this procedure works just great time-wise -- on a sample of
200 IDs I got the mean speed-up of x30 (range x10-x70)! Also, the
results look acceptable in general.

However, when I dig a bit more to it, there seems to be a point of
improvement. In GWAS, among the main parameter of concern is Lambda
(genomic control inflation factor, ratio between the observed median
of the chi2 1df test stat and the median expected, 0.455). To estimate
lambdas, I simulated polygenically controlled trait, and then ran
GWAS: estimated polygenic object by either polygenic-old or
polygenic-hglm, and then use them for GWAS with mmscore. If correction
is done well, I expect that Lambda is close to one.

I ran ~100 simulations (script attached) to see the respective
behavior of polygenic-old and polygenic-hgml. Results are presented in
the attached figure.

What I have observed initially is that SD of lambda-hglm (0.023) is
wider than the variance of lambda-old (0.019). Also, the mean of
lambda-hglm was 0.986, which was smaller than the mean of lambda-old
(0.992), suggesting that hglm-based procedure is more conservative,
though the difference is not huge.

The most important graph is on panel f) (last). What appears
worrisome: while majority of points lie on the line with intercept of
0 and beta=1, there is clearly a group following different pattern
(marked by blue color). In this group of simulations lambda-hglm >
lambda-old, and as a rule lambda-hglm<1, meaning the hglm-based test
statistics gets too conservative. Hence, compared to the old
procedure, reduced power is to be expected.

I have tried to figure out what is going on, and results are interesting.

Panel a) compares heritability estimates. They are not quite the same.
It seems that when old estimate of h2 is < (some point between
0.3-0.4), estimate from hglm is always slightly higher; when h2-old >
0.3-0.4, it goes other way around

Panel b) compares the estimates of total variance. It is less visible,
but also there are distinct groups there, as you will see from d).

Panels c) show density plot for the DIFFERENCE between heritability
estimates established using old and hglm procedures. It is very
visible that a funny thing happens -- the difference tends to be
either positive, or negative (bi-modality). The same thing happens
with the difference in total Var estimate (panel d))

In panel e), I explored the relation between the differences in
h2-estimates and total Variance estimates from the two procedures. It
seems there is quite a pattern there (I would expect a cloud if
everything was ok).

In panel f), the dots where polygenic-hglm appears to lead to
conservativity (blue) are easily identified if I select iterations
when the estimates of h2 from hglm were greater then estimates from
the old procedure (and this happens at h2-old < (some point between
0.3-0.4)!).

I tend to think that the problem is in polygenic_hglm, rather than in
polygenic-old, as SD of lambda-hglm (0.023) is wider than the variance
of lambda-old (0.019), and the mean of lambda-hglm is lower (=more
conservative) (0.986) than the mean of lambda-old (0.992), though the
differences are not huge. Difference with 1 is hugely significant
(p<1e-8) for lambda-hglm, and less significant (p=0.001) for
lambda-old.

Any ideas? Quite likely I have messed up in the part where I translate
hglm results to 'polygenic' slots, could you have a look? If not, and
if you have no solution -- please discuss with Lars. We need to solve
this issue before we release and recommend polygenic_hglm for users.

I attach my test-script with this e-mail for your convenience. Some
minor comments on your suggestions below in re:...

best wishes,
Yurii

On Fri, Feb 25, 2011 at 12:31 AM, Xia Shen <xia.shen at lcb.uu.se> wrote:
> 1) do I understand correctly that fixed effects estimates are in
> hglm(...)$fixef
>
> Exactly.
>
> 2) do I understand correctly that 'breeding values' are in hglm(...)$ranef
>
> This gives you random effects estimates, for BV for each individual, using
> your code, should be L%*%hglm(...)$ranef.
>
> 3) do I understand correctly that environmental residuals are in
> hglm(...)$resid
>
> Yes, this gives you standardized residuals. For raw residuals, simply
> hglm(...)$fv - y.

rather, used

y - hglm(...)$fv

>
> 4) how can I get hold on the estimates of polygenic and environmental
> (residual) variances, and find out what was heritability estimate
>
> Polygenic (random effect) variance is given by variance component
> hglm(...)$varRanef, and environmental (residual) variance is given by
> variance component hglm(...)$varFix. Then you can calculate the intra-class
> correlation, which should be the heritability.
>
> 5) is there an easy way to get hold on inverse of variance-covariance
> matrix. I mean given parameters estimates, it is no problem to compute that,
> but just in case it is already hanging around would be stupid to do so
>
> In fact, we do have this matrix in the procedure, however it's not included
> in the output in the current version, therefore you might need to calculate
> it, and I'll include it as output in the next version of hglm.

this may be handy -- now I re-compute that directly in the wrapper,
which is a bit computationally expensive

> If your
> purpose is to calculate the REML likelihood, then hglm(..., method =
> 'REML')$ProfLogLik is it.

good point. We can use that for testing variance component, I believe.
In that, introducing fixh2 parameter, in the same manner as in
'polygenic' makes sense

> Hope these clarify your wondering :)
> Cheers
> Xia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_polygenic_hglm.R
Type: application/octet-stream
Size: 2776 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110225/2ecfe6bf/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_hglm.pdf
Type: application/pdf
Size: 64267 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110225/2ecfe6bf/attachment-0001.pdf>