[GenABEL-dev] The GenABEL project fundamentals: post #1

Yurii Aulchenko yurii.aulchenko at gmail.com
Tue May 17 20:48:00 CEST 2011


Thank you, Lennart, for raising this issue.

I totally agree that guidelines/best practices should be there to
help, while 'processes'=rules should be minimized, otherwise it may
indeed to come to a point that a willing contributor is discouraged by
just looking at the pile of 'rules'.

Above being said, my series on fundamentals should be viewed as
'optional reading' -- these having time to read this, please do; not
reading is totally fine. I think if one sticks to 'open source spirit'
throughout, one does not need to read my posts, and can not make a
mistake. Probably this is something to say explicitely when we publish
these statements.

I also must admit that if I have too much free time I tend to start
developing processes / SOP's (= standard operation procedures :) ), so
please keep me under control guys! You should not take me (or anyone
else) too seriously; after all the GenABEL project is about "Getting
Yurii out of the way"; see

http://yurii-aulchenko.blogspot.com/2010/10/get-him-out-of-way.html

bw,
Yurii



On Sun, May 15, 2011 at 8:46 PM, L.C. Karssen <l.karssen at erasmusmc.nl> wrote:
> In light of Yurii's recent posts on the core philosophy of the GenABEL
> project and the set of howto's we've been working on in the past few
> months, I thought the following is relevant.
>
> I recently came across a blog post [1] that discusses the use of
> processes (basically a standardised set of rules/guidelines that
> everyone in a project should adhere to) in open source development.
> In short the post boils down to: in open source development having
> processes in place is important, but make sure that they are reasonable
> and not too complicated otherwise people will ignore them and/or
> potential contributers will be pushed away (especially those that only
> send a quick fix every now and then I'd say).
>
> Bringing this up now doesn't mean that I think Yurii's guidelines are
> too strict. Not at all! I just wanted to point out that having general
> guidelines/best practices in place is good and necessary, but we should
> always take care (especially since we're still relatively young and have
> a small contributer base) not to make contributing too hard.
>
>
> Enjoy the rest of the weekend,
>
> Lennart.
>
> [1]
> http://theironlion.net/blog/2011/05/10/what-do-your-processes-say-about-your-free-softwar/
>
> On Fri, 2011-05-13 at 00:40 +0200, Yurii Aulchenko wrote:
>> As promised, here is the first post on the GenABEL project
>> fundamentals. This first post describes my general view on
>> (statistical genomics) methodology development.
>>
>> As suggested by Lennart, after discussion at this mailing list, this
>> post is likely to become a part of project's documentation to be
>> published on www.genabel.org; if little discussion, this will be
>> reflected in a footnote.
>>
>> Yurii
>>
>>
>> ---------------------------
>>
>> The GenABEL project: methodology to address real world problems
>>
>> The GenABEL project is dedicated to development of statistical
>> genomics methodology of large impact on the real world. From this
>> perspective, methodology development includes statistical methodology
>> itself, its implementation in an usable software, and application of
>> this software to real data in order to generate new knowledge.
>>
>> Thus, we see methodology development as a three-stage process
>> including mathematical formulation of the method, formulation and
>> implementation of an algorithm in a software, and, finally,
>> application of the methodology to real data. Actually, most of the
>> time, the data will call for a new method. In that, application comes
>> before the mathematical formulation. Presence of all three stages, and
>> feedback between these is a key aspect of our approach to statistical
>> genomics methodology development.
>>
>> Why all three stages are critical?
>>
>> For example, you may develop something, which looks like a nice and
>> promising piece of math, but when you try to implement it, you figure
>> out that you did not completely understand the problem, or that you
>> were operating under some implicit assumptions, which are not likely
>> to be correct. You may also figure out that computational complexity
>> is too high, and you need to change the method in order for it to be
>> practically applicable.
>>
>> Next, it is important to apply your methodology to the simulated data
>> (for which you know the answer): nonsense results will provide
>> feedback on implementation (is that a bug?) or even on methodology
>> (ah, formula 15-3 was wrong!).
>>
>> It is even more important to apply your methodology to REAL data as
>> early as possible: it will provide feedback on implementation (is it
>> feasible to run my analysis in reality?), and methodology -- the
>> situation when a method works on simulated data, but fails miserably
>> on real data is not that uncommon! Also, trying to use real data will
>> tell you about data formats people are using, and will eventually make
>> your implementation really usable. There are example of great methods
>> implemented in a software requiring such specific data format, that it
>> becomes almost impossible to use these.
>>
>> To conclude: methodology development should be viewed as integral
>> process including development of methodology itself, development of
>> software, and application of software to real data.
>>
>> While such integral approach is a tall order for an individual
>> researcher or even a (smaller) group of these, it is feasible if an
>> open source approach -- commons-based production by openly exchanging
>> ideas and collaboration -- is applied throughout. I will elaborate on
>> this point in my further posts.
>>
>> Disclaimer. This is my personal position, which is open for
>> discussion. In this post, I am not speaking on behalf of the GenABEL
>> project community, but rather seeding the discussion which will
>> eventually set the project's standards.
>>
>> I would like to thank Dr. Lennart Karssen for valuable and continuous
>> discussion through which many of my views on the GenABEL project have
>> evolved.
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> --
> -----------------------------------------------
> L.C. Karssen
> Erasmus MC
> Department of Epidemiology
> Room 2224
>
> Postbus 2040
> 3000 CA Rotterdam
> The Netherlands
>
> phone: +31-10-7044217
> fax: +31-10-7044657
> e-mail: l.karssen at erasmusmc.nl
> GPG key ID: 0E1D39E3
> -----------------------------------------------
>


More information about the genabel-devel mailing list