[GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL

L.C. Karssen lennart at karssen.org
Fri Apr 18 16:35:29 CEST 2014


Dear list,

In the past few months Maarten has made several speed improvements to
ProbABEL. Many of these speedups make use of the EIGEN library that was
first introduced into ProbABEL in v0.3.0.
After merging Maarten's branch with trunk (and after I independently
added more extensive checks in Jenkins) we found out compilation after
configuring ProbABEL using
    ./configure --without-eigen
fails. Fixing this is not trivial, so we are hereby proposing to remove
the --without-eigen option. This doesn't necessarily mean that all
mematrix code needs to be removed immediately, but by insisting on using
EIGEN we can at least start removing the old code.

Impact analysis for users and developers:
1) positive: consistent (and faster) analysis speed experience for all
users: everybody will use EIGEN

2) positive: reduction of maintenance/development time because we no
longer need to maintain the non-EIGEN parts of the code.

3) possibly negative: we need to make a choice on whether we will
distribute EIGEN with the ProbABEL code, or whether we 'force' the user
to download the code themselves.


Point 3) is similar to the debate about libfilevector: do we go for a
simple user experience where all requirements are combined in the
distributed source code, or do we make use of the modularity of the code
and its dependencies and let people download and install the
dependencies themselves (or use packages provided by the OS).
In the upcoming release we also plan to include calculation of p-values
using the Boost libraries [0]. The same issue will arise there again.

Therefore, I would like to start/continue the discussion here on how to
proceed with external dependencies. I'm really looking forward to your
opinions. Below I've outlined several options I could think of on how to
go forward. Let me know what you think of them or if you have any other
ideas.


Thanks a lot,

Lennart.



Note 1: For ProbABEL we provide pre-compiled MS Windows binaries, so
that platform is not part of this discussion.
Note 2: EIGEN consists of header files only, no compilation is needed to
use EIGEN (either at compile time or at run time).

I see the following options:
a) include a copy of the EIGEN source code in the ProbABEL code base (in
SVN)
b) include a copy of the EIGEN source code in the official released
ProbABEL tar.gz.
c) don't include the EIGEN source code, but provide very clear
instructions on how to obtain EIGEN.
d) include a script that downloads and extracts the latest EIGEN and
mention that script in the installation instructions.
e) Automatic download and extraction of the EIGEN source code during the
./configure (or make) process of ProbABEL.

More details about these options:
a):
 - Licence-wise this seems possible as EIGEN is released under the MPL2.
But Q14 of http://www.mozilla.org/MPL/2.0/FAQ.html doesn't immediately
make clear to me what the requirements/repercussions are. More thorough
reading of the licence is probably required.
 - ProbABEL contains both GPL and LGPL licensed files (a complete
overview had to be made for the Debian package and can be found at [1]),
so I'm not overly happy to add yet another type of licence.
 - simple for the user; everything is in and compiles cleanly.
 - developers don't need to keep up with updates of EIGEN, so no
incompatibility; we can keep the current EIGEN code in there forever
(like was done with parts of the code from the R survival package)
 - However, with a copy of the EIGEN code in SVN we don't benefit from
bug fixes and improvements in EIGEN.

b):
 - The same licence issues as in a) apply
 - simple for the user
 - developers will need to keep up with new EIGEN releases, but we
benefit from their improvements and bug fixes (unless we always
distribute with the EIGEN version 3.2.1 (the current version).

c):
 - This is what we currently do. This allows
users/administrators/packages to use EIGEN either by downloading and
extracting it themselves or use OS-provided packages. Maybe we can
improve the documentation to make it even easier.
 - This requires more 'investment' from the user: they need to carefully
read the installation instructions AND download and extract EIGEN AND
add the path with extracted code to the ./configure
--with-eigen-include-path=/your/path/to/eigen option.

d):
 - This would be easy to do, but would require the user to have wget or
curl installed (are these available for all architectures?). Does that
make things better? The good thing is we can fix the extraction
directory so the ./configure --with-eigen-include option can be preset.
 - No hassle with licences
 - users/developers/packagers who want to use an OS-provided EIGEN
package can do so

e):
 - simple for the user
 - no hassle with licences
 - same dependency on wget or curl as d)
 - I'm not sure how to do that in configure.ac, but I think it can be done.
 - unless we add an --dont-download-eigen option to configure.ac
users/developers/packagers who want to use OS-provided EIGEN packages
won't be happy.




[0] http://www.boost.org/
[1] http://sources.debian.net/src/probabel/0.4.3-1/debian/copyright

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140418/01d584fd/attachment.sig>


More information about the genabel-devel mailing list