[GenABEL-dev] The GenABEL project fundamentals: post #1
L.C. Karssen
l.karssen at erasmusmc.nl
Sun May 15 20:46:42 CEST 2011
In light of Yurii's recent posts on the core philosophy of the GenABEL
project and the set of howto's we've been working on in the past few
months, I thought the following is relevant.
I recently came across a blog post [1] that discusses the use of
processes (basically a standardised set of rules/guidelines that
everyone in a project should adhere to) in open source development.
In short the post boils down to: in open source development having
processes in place is important, but make sure that they are reasonable
and not too complicated otherwise people will ignore them and/or
potential contributers will be pushed away (especially those that only
send a quick fix every now and then I'd say).
Bringing this up now doesn't mean that I think Yurii's guidelines are
too strict. Not at all! I just wanted to point out that having general
guidelines/best practices in place is good and necessary, but we should
always take care (especially since we're still relatively young and have
a small contributer base) not to make contributing too hard.
Enjoy the rest of the weekend,
Lennart.
[1]
http://theironlion.net/blog/2011/05/10/what-do-your-processes-say-about-your-free-softwar/
On Fri, 2011-05-13 at 00:40 +0200, Yurii Aulchenko wrote:
> As promised, here is the first post on the GenABEL project
> fundamentals. This first post describes my general view on
> (statistical genomics) methodology development.
>
> As suggested by Lennart, after discussion at this mailing list, this
> post is likely to become a part of project's documentation to be
> published on www.genabel.org; if little discussion, this will be
> reflected in a footnote.
>
> Yurii
>
>
> ---------------------------
>
> The GenABEL project: methodology to address real world problems
>
> The GenABEL project is dedicated to development of statistical
> genomics methodology of large impact on the real world. From this
> perspective, methodology development includes statistical methodology
> itself, its implementation in an usable software, and application of
> this software to real data in order to generate new knowledge.
>
> Thus, we see methodology development as a three-stage process
> including mathematical formulation of the method, formulation and
> implementation of an algorithm in a software, and, finally,
> application of the methodology to real data. Actually, most of the
> time, the data will call for a new method. In that, application comes
> before the mathematical formulation. Presence of all three stages, and
> feedback between these is a key aspect of our approach to statistical
> genomics methodology development.
>
> Why all three stages are critical?
>
> For example, you may develop something, which looks like a nice and
> promising piece of math, but when you try to implement it, you figure
> out that you did not completely understand the problem, or that you
> were operating under some implicit assumptions, which are not likely
> to be correct. You may also figure out that computational complexity
> is too high, and you need to change the method in order for it to be
> practically applicable.
>
> Next, it is important to apply your methodology to the simulated data
> (for which you know the answer): nonsense results will provide
> feedback on implementation (is that a bug?) or even on methodology
> (ah, formula 15-3 was wrong!).
>
> It is even more important to apply your methodology to REAL data as
> early as possible: it will provide feedback on implementation (is it
> feasible to run my analysis in reality?), and methodology -- the
> situation when a method works on simulated data, but fails miserably
> on real data is not that uncommon! Also, trying to use real data will
> tell you about data formats people are using, and will eventually make
> your implementation really usable. There are example of great methods
> implemented in a software requiring such specific data format, that it
> becomes almost impossible to use these.
>
> To conclude: methodology development should be viewed as integral
> process including development of methodology itself, development of
> software, and application of software to real data.
>
> While such integral approach is a tall order for an individual
> researcher or even a (smaller) group of these, it is feasible if an
> open source approach -- commons-based production by openly exchanging
> ideas and collaboration -- is applied throughout. I will elaborate on
> this point in my further posts.
>
> Disclaimer. This is my personal position, which is open for
> discussion. In this post, I am not speaking on behalf of the GenABEL
> project community, but rather seeding the discussion which will
> eventually set the project's standards.
>
> I would like to thank Dr. Lennart Karssen for valuable and continuous
> discussion through which many of my views on the GenABEL project have
> evolved.
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
--
-----------------------------------------------
L.C. Karssen
Erasmus MC
Department of Epidemiology
Room 2224
Postbus 2040
3000 CA Rotterdam
The Netherlands
phone: +31-10-7044217
fax: +31-10-7044657
e-mail: l.karssen at erasmusmc.nl
GPG key ID: 0E1D39E3
-----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20110515/e439b7d7/attachment.pgp>
More information about the genabel-devel
mailing list