[GenABEL-dev] Interesting idea by Gulnara Svischeva

Wed May 18 20:56:25 CEST 2011

Dear All,

Just to explain the story behind e-mail of Lars.

Use of mixed models (MMs) in genomic analysis is very promising; e.g
for GWAS (things like grammar, mmscore, FASTA, EMMAX, etc.). Classical
application of MMs is to correct for relationship structure in
family-based samples or samples coming from genetically isolated
populations, but there are and will be more and more uses of MMs.

One of the practical problem with applying mixed models to genomic
data (leaving grammar-like methods aside) is that the computational
time grows quadratically as the number of people in the sample
increase. This is apparent with 'polygenic', but also apparent with
'mmscore', 'FastMixedModel' or 'GWFGLS'. For example, running
'mmscore' GWAS using data on 1000 individuals is quite feasible and
usually finishes in one day. With say 3000 people, this start becoming
a problem, and one may expect to spend a week for single GWAS.

Dr Gulnara Svischeva came up with something which we feel may be
really important: a way to compute the likelihood, which seems to be
at the moment close-to-linear in GWAS setting. Basically, some quite
expensive operations are done over the relationship matrix, but then
these results could be used ever since to do tests GWAS, and the tests
are very quick. I would consider availability of such non-quadratic
method for MM in GWAS as a major breakthrough. I leave it to other
people to judge if actually Gulya's idea can be used more broadly.

Now Gulya, Lars and me set off to quickly extend and implements these
ideas and write a paper.

Following the idea of 'open methodology', we are going to develop
these methods, do implementation, and write the paper 'in public'
using GenABEL project infrastructure (primarily SVN repo and this
mailing list).

So, to me the plan to go is

1) Gulya, with help of Lars, will formulate the REML and the score
test based on her 'efficient' formulation of likelihood, aiming to
keep time-sample size relation to linear. They will also develop/test
R code implementing these ideas.

2) Unless someone else volunteers, I will translate the time-critical
parts of the code to C++ and develop procedures allowing genome-wide
testing. To be GWAS-implemented (in order of priority): score test,
REML, ML (the latter unlikely, as I presume there may be too many
difficulties in getting good numerical optimization set up and
tested).

3) We run resulting procedures on progressively larger data set (500
people, 1000, 2000, ...)  along with other compatible methods and
write down the time. Question to all: I think of mmscore, GWFGLS,
EMMAX, and FastMixedModel as the methods to compare with. Any other
ideas? Other question: what data should we use? I am inclined to use
of real data, and can get hold on GWAS of 1000-3000 people from a
human genetically isolated population. Any other suggestions more then
welcome!

Hopefully we can have implementation and test results ready in 1-2
months and submit in 3-4 months.

with best wishes,
Yurii

2011/5/18 Lars Rönnegård <lrn at du.se>:
> Hi,
>
> I attach an implementation of an idea proposed by Gulnara. It uses weighted
> least squares to compute the REML heritability estimate.
>
> The function is found in “REMLrotateFun.R” and some example code is found in
> “ExampleCode_REMLrotate.R” that uses the data in “RemlEx.RData”.
>
>
>
> If someone would be interested in trying it (and possibly also suggest
> improvements), I would appreciate it.
>
> It works for linear mixed models having a polygenic random effect and iid
> residuals. It seems to be very fast. The example I attach contains 680
> observations.
>
> Is this a new algorithm or has it been done already?
>
>
>
> Best wishes,
>
> Lars Rönnegård
>
> Dalarna University
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
>