From lennart at karssen.org  Tue Apr  1 09:15:34 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Tue, 01 Apr 2014 09:15:34 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1664	-
	branches/ProbABEL-0.50/src
In-Reply-To: <244CF001646FF74FB34F372310A332C57AD8B4@MBX2.rwth-ad.de>
References: <20140328191241.F38E6185FBC@r-forge.r-project.org>
 <A2211A25-7C31-4261-808C-494665D91603@gmail.com> <5335CDCF.8090503@gmail.com>,
 <5335F579.9070608@karssen.org>,
 <244CF001646FF74FB34F372310A332C57AD899@MBX2.rwth-ad.de>
 <244CF001646FF74FB34F372310A332C57AD8B4@MBX2.rwth-ad.de>
Message-ID: <533A6796.4030004@karssen.org>

Hi Alvaro,

Thanks for joining in. Much appreciated!


On 31-03-14 23:07, Frank, Alvaro Jesus wrote:
> Perhaps what I mentioned earlier cannot be done:
> 
> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Ah, Goldberg comes up again. I've read the paper several years ago, but
it sounds like it's time to reread it.

> 
> Since a Double 0.677 cant represented any other way than 67699998617172241 for example.

Indeed, I think that's what we've hit here.
It would be great if this cloud be converted 'cheaply' into something
that more closely resembles 0.67700000000 (cheaper than snprintf()).
Alternatively, we need to identify where in the calculations we have the
biggest loss of precision.
As I wrote in my answer to Yurii, our input is not likely to have more
than 4 significant digits. I'm definitely willing to print only 4
sign.digits in the output, but at the moment I see differences in the
third digit(of the chi^2 values) when looking at the example data.


Best,

Lennart.


> ________________________________________
> From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of Frank, Alvaro Jesus [alvaro.frank at rwth-aachen.de]
> Sent: Monday, March 31, 2014 10:48 PM
> To: L.C. Karssen; genabel-devel at lists.r-forge.r-project.org
> Subject: Re: [GenABEL-dev] [Genabel-commits] r1664      -       branches/ProbABEL-0.50/src
> 
> Dear all,
> 
> How about instead of going from float>text>double not just use a binary mask after casting with errors?
>  0.67699998617172241 > 0.67700000000000000 with a mask on every number?
> This image would tell you wish bits need to be set to zero:
> http://cnx.org/content/m32770/latest/graphics1.png
> 
> masking is super fast if its c/c++.
> 
> This may not be portable tho. But the way floating point numbers are stored should be generic (IEEE) anyway.
> 
> -Alvaro
> 
> 
> ________________________________________
> From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org]
> Sent: Friday, March 28, 2014 11:19 PM
> To: genabel-devel at lists.r-forge.r-project.org
> Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 -    branches/ProbABEL-0.50/src
> 
> Dear all,
> 
> (I guess the previous version of this mail went to the commit email
> list, so here it is again for the devel list).
> 
> 
> Indeed: an impressive speed-up! Well done Maarten.
> 
> On 28-03-14 20:30, Maarten Kooyman wrote:
>> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted
>> for sex and age (I did not run it in triplet but gives an idea)
>>
>> version 0.42 0.50_branch
>> FV         58     52
>> mldose  48    12
>> all times ate in seconds.
>>
>> As you can see the filevector format in the part that slows down the
>> program. When profiling the reading from FV takes up 86% of all the time
>> the program takes.
>>
> 
> 
> The current problem with reading from filevector is that the fv dat ais
> stored in floats (this is logical as it means half the disk space usage
> compared to storing doubles, moreover, the imputed data is never more
> precise than a float anyway).
> However, internally ProbABEL uses doubles for calculations. This means
> conversion from float to double must occur at some point.
> 
> Simply casting to double gives impression. For example casting a float
> 0.677 to double gives: 0.67699998617172241
> Therefore, with version 0.4.0 I changed this and used a string as
> intermediate form, followed by strtod(). First I used stringstreams, but
> these turn out to be much too slow for our use case. Now snprintf() is
> used. For the above example the double value is: 0.67700000000000005,
> much closer to what we would like to see. Using this two-step conversion
> means the output when using fv is equal to the output using txt data
> (and equal to using R), within float precision.
> 
> Using Maarten's 'strtod' will speed up this part as well, but the
> snprintf() call is still expensive.
> 
> Apart from this two-step conversion we may also be inefficient because
> the dosage/probability values are converted one array element at the
> time. Maybe we can gain something there, like Maarten did for the txt
> format and simply sending a whole 'line'/array to the conversion may help.
> 
> 
> 
> 
> Given that most people nowadays store their imputation results in chunks
> of chromosomes anyway (i.e. small(er) files), and the fact that I think
> implementing the ability to read gziped files is not difficult, it may
> be time to give mldose.gz files another chance for ProbABEL users. It
> will save them the conversion from mldose.gz to DatABEL.
> Of course we can still support DatABEL files, but (depending on how fast
> reading from gzipped files is), our recommendation could change with the
> upcoming ProbABEL v0.5.0.
> 
> Any thoughts on this?
> 
> 
> Best,
> 
> Lennart.
> 
> 
> 
> 
> 
>> On 28-03-14 20:15, Yury Aulchenko wrote:
>>> 10 fold is good speed up. An order of magnitude :)
>>>
>>> Wonder how it compares now to the reading from plain text files?
>>>
>>> Y
>>>
>>> ----------------
>>> Sent from mobile device, please excuse possible typos
>>>
>>>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote:
>>>>
>>>> Author: maartenk
>>>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014)
>>>> New Revision: 1664
>>>>
>>>> Modified:
>>>>    branches/ProbABEL-0.50/src/gendata.cpp
>>>>    branches/ProbABEL-0.50/src/gendata.h
>>>> Log:
>>>> new implementation of reading in numbers of mldose file: this version
>>>> is about a 10(!) fold faster than in ProABEL 0.42
>>>>
>>>> Modified: branches/ProbABEL-0.50/src/gendata.cpp
>>>> ===================================================================
>>>> --- branches/ProbABEL-0.50/src/gendata.cpp    2014-03-27 21:16:16 UTC
>>>> (rev 1663)
>>>> +++ branches/ProbABEL-0.50/src/gendata.cpp    2014-03-28 19:12:41 UTC
>>>> (rev 1664)
>>>> @@ -40,58 +40,69 @@
>>>> #endif
>>>> #include "utilities.h"
>>>>
>>>> -double mldose_strtod(const char *str_pointer) {
>>>> -    // This function is inspired on some answers found at
>>>> stackoverflow :
>>>> -    // eg question 5678932
>>>> -    int sign = 0;
>>>> -    double result = 0;
>>>> -    //check if not a null pointer or NaN (right now checks only
>>>> first character)
>>>> -//TODO: make catching of NaN more rigid
>>>> -    if (!*str_pointer | *str_pointer == 'N'){
>>>> -        return std::numeric_limits<double>::quiet_NaN();
>>>> +
>>>> +void gendata::mldose_line_to_matrix(int k,const char
>>>> *all_numbers,int amount_of_numbers){
>>>> +    int j = 0;
>>>> +    //check if not a null pointer
>>>> +    if (!*all_numbers){
>>>> +        perror("Error while reading genetic data (expected pointer
>>>> to char but found a null pointer)");
>>>> +                       exit(EXIT_FAILURE);
>>>>      }
>>>> -    //skip whitespace
>>>> -    while (*str_pointer == ' ')
>>>> +    while (j<amount_of_numbers)
>>>>      {
>>>> -        str_pointer++;
>>>> -    }
>>>> -    //set sign to -1 if negative: multiply by sign just before return
>>>> -    if (*str_pointer == '-')
>>>> -    {
>>>> -        str_pointer++;
>>>> -        sign = -1;
>>>> -    }
>>>> -    //read digits before dot
>>>> -    while (*str_pointer <= '9' && *str_pointer >= '0'){
>>>> -        result = result * 10 + (*str_pointer++ - '0');
>>>> -    }
>>>> -    //read digit after dot
>>>> -    if (*str_pointer == '.')
>>>> -    {
>>>> -        double decimal_counter = 1.0;
>>>> -        str_pointer++;
>>>> -        while (*str_pointer <= '9' && *str_pointer >= '0')
>>>> +        double result = 0;
>>>> +        //skip whitespace
>>>> +        while (*all_numbers == ' ')
>>>>          {
>>>> -            decimal_counter *= 0.1;
>>>> -            result += (*str_pointer++ - '0') * decimal_counter;
>>>> +            all_numbers++;
>>>>          }
>>>> +        //check NaN (right now checks only first character)
>>>> +        //TODO: make catching of NaN more rigid
>>>> +        if (*all_numbers == 'N')
>>>> +        {
>>>> +            result = std::numeric_limits<double>::quiet_NaN();
>>>> +            //skip other characters of NaN
>>>> +            while ((*all_numbers == 'a') | (*all_numbers == 'N'))
>>>> +            {
>>>> +                all_numbers++;
>>>> +            }
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            int sign = 0;
>>>> +            //set sign to -1 if negative: multiply by sign just
>>>> before return
>>>> +            if (*all_numbers == '-')
>>>> +            {
>>>> +                all_numbers++;
>>>> +                sign = -1;
>>>> +            }
>>>> +            //read digits before dot
>>>> +            while (*all_numbers <= '9' && *all_numbers >= '0')
>>>> +            {
>>>> +                result = result * 10 + (*all_numbers++ - '0');
>>>> +            }
>>>> +            //read digit after dot
>>>> +            if (*all_numbers == '.')
>>>> +            {
>>>> +                double decimal_counter = 1.0;
>>>> +                all_numbers++;
>>>> +                while (*all_numbers <= '9' && *all_numbers >= '0')
>>>> +                {
>>>> +                    decimal_counter *= 0.1;
>>>> +                    result += (*all_numbers++ - '0') * decimal_counter;
>>>> +                }
>>>> +            }
>>>> +            //correct for negative number
>>>> +            if (sign == -1)
>>>> +            {
>>>> +                result = sign * result;
>>>> +            }
>>>> +        }
>>>> +        G.put(result, k, j);
>>>> +        j++;
>>>>      }
>>>> -    //str_pointer should be null since all characters are read.
>>>> -    if (*str_pointer){
>>>> -        perror("Error while reading genetic data (mldose_strtod)");
>>>> -                          exit(EXIT_FAILURE);
>>>> -    }
>>>> -    //correct for negative number
>>>> -    if (sign == -1){
>>>> -        return sign * result;
>>>> -    }else{
>>>> -        return result;
>>>> -    }
>>>> -
>>>> }
>>>>
>>>> -
>>>> -
>>>> void gendata::get_var(int var, double * data)
>>>> {
>>>>      // Read the genetic data for SNP 'var' and store in the array
>>>> 'data'
>>>> @@ -246,7 +257,7 @@
>>>>                  size_t strpos = tmpstr.find("->");
>>>>                  if (strpos != string::npos)
>>>>                  {
>>>> -                    tmpid = tmpstr.substr(strpos+2, string::npos);
>>>> +                    tmpid = tmpstr.substr(strpos + 2, string::npos);
>>>>                  }
>>>>                  else
>>>>                  {
>>>> @@ -255,8 +266,8 @@
>>>>                  if (tmpid != idnames[k])
>>>>                  {
>>>>                      cerr << "phenotype file and dose or probability
>>>> file "
>>>> -                         << "did not match at line " << i + 2 << "
>>>> (" << tmpid
>>>> -                         << " != " << idnames[k] << ")" << endl;
>>>> +                            << "did not match at line " << i + 2 <<
>>>> " ("
>>>> +                            << tmpid << " != " << idnames[k] << ")"
>>>> << endl;
>>>>                      infile.close();
>>>>                      exit(1);
>>>>                  }
>>>> @@ -267,47 +278,58 @@
>>>>                  infile >> tmpstr;
>>>>              }
>>>>
>>>> -            for (unsigned int j = 0; j < (nsnps * ngpreds); j++)
>>>> +            int oldstyle = 0;
>>>> +            if (oldstyle == 1)
>>>>              {
>>>> -                if (infile.good())
>>>> +                for (unsigned int j = 0; j < (nsnps * ngpreds); j++)
>>>>                  {
>>>> -                    infile >> inStr;
>>>> -                    // tmpstr contains the dosage/probability in
>>>> -                    // string form. Convert it to double (if tmpstr is
>>>> -                    // NaN it will be set to nan).
>>>> -                    double dosage;
>>>> -                    char *endptr;
>>>> -                    errno = 0;      // To distinguish success/failure
>>>> -                                    // after strtod()
>>>> +                    if (infile.good())
>>>> +                    {
>>>> +                        infile >> inStr;
>>>> +                        // tmpstr contains the dosage/probability in
>>>> +                        // string form. Convert it to double (if
>>>> tmpstr is
>>>> +                        // NaN it will be set to nan).
>>>> +                        double dosage;
>>>> +                        char *endptr;
>>>> +                        errno = 0;      // To distinguish
>>>> success/failure
>>>> +                                        // after strtod()
>>>>
>>>> -                    dosage = mldose_strtod(inStr);
>>>> -                    //dosage = strtod(tmpstr.c_str(), &endptr);
>>>> -//                    if ((errno == ERANGE &&
>>>> -//                         (dosage == HUGE_VALF || dosage ==
>>>> HUGE_VALL))
>>>> -//                        || (errno != 0 && dosage == 0)) {
>>>> -//                        perror("Error while reading genetic data
>>>> (strtod)");
>>>> -//                        exit(EXIT_FAILURE);
>>>> -//                    }
>>>> +                        dosage = strtod(inStr, &endptr);
>>>> +                        if ((errno == ERANGE
>>>> +                                && (dosage == HUGE_VALF || dosage ==
>>>> HUGE_VALL))
>>>> +                                || (errno != 0 && dosage == 0))
>>>> +                        {
>>>> +                            perror("Error while reading genetic data
>>>> (strtod)");
>>>> +                            exit(EXIT_FAILURE);
>>>> +                        }
>>>>
>>>> -                    if (endptr == tmpstr.c_str()) {
>>>> -                        cerr << "No digits were found while reading
>>>> genetic data"
>>>> -                             << " (individual " << i + 1
>>>> -                             << ", position " << j + 1 << ")"
>>>> -                             << endl;
>>>> -                        exit(EXIT_FAILURE);
>>>> +                        if (endptr == tmpstr.c_str())
>>>> +                        {
>>>> +                            cerr
>>>> +                                    << "No digits were found while
>>>> reading genetic data"
>>>> +                                    << " (individual " << i + 1 <<
>>>> ", position "
>>>> +                                    << j + 1 << ")" << endl;
>>>> +                            exit(EXIT_FAILURE);
>>>> +                        }
>>>> +                        /* If we got here, strtod() successfully
>>>> parsed a number */
>>>> +                        G.put(dosage, k, j);
>>>>                      }
>>>> -
>>>> -                    /* If we got here, strtod() successfully parsed
>>>> a number */
>>>> -                    G.put(dosage, k, j);
>>>> +                    else
>>>> +                    {
>>>> +                        std::cerr << "cannot read dose-file: " << fname
>>>> +                                << "check skipd and ngpreds
>>>> parameters\n";
>>>> +                        infile.close();
>>>> +                        exit(1);
>>>> +                    }
>>>>                  }
>>>> -                else
>>>> -                {
>>>> -                    std::cerr << "cannot read dose-file: " << fname
>>>> -                              << "check skipd and ngpreds
>>>> parameters\n";
>>>> -                    infile.close();
>>>> -                    exit(1);
>>>> -                }
>>>>              }
>>>> +            else
>>>> +            {
>>>> +                std::string all_numbers;
>>>> +                all_numbers.reserve(nsnps * ngpreds * 7);
>>>> +                std::getline(infile, all_numbers);
>>>> +                mldose_line_to_matrix(k, all_numbers.c_str(), nsnps
>>>> * ngpreds);
>>>> +            }
>>>>              k++;
>>>>          }
>>>>          else
>>>>
>>>> Modified: branches/ProbABEL-0.50/src/gendata.h
>>>> ===================================================================
>>>> --- branches/ProbABEL-0.50/src/gendata.h    2014-03-27 21:16:16 UTC
>>>> (rev 1663)
>>>> +++ branches/ProbABEL-0.50/src/gendata.h    2014-03-28 19:12:41 UTC
>>>> (rev 1664)
>>>> @@ -44,7 +44,7 @@
>>>>      unsigned int nids;
>>>>      unsigned int ngpreds;
>>>>      gendata();
>>>> -    double convert(   char* source,  char** endPtr );
>>>> +    void mldose_line_to_matrix(int k,const char *all_numbers,int
>>>> amount_of_numbers);
>>>>
>>>>      void re_gendata(char * fname, unsigned int insnps, unsigned int
>>>> ingpreds,
>>>>              unsigned int npeople, unsigned int nmeasured,
>>>>
>>>> _______________________________________________
>>>> Genabel-commits mailing list
>>>> Genabel-commits at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>>>
>>> _______________________________________________
>>> Genabel-commits mailing list
>>> Genabel-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>>
>>
>> _______________________________________________
>> Genabel-commits mailing list
>> Genabel-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>
> 
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
> 
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140401/b1baf293/attachment-0001.sig>

From lennart at karssen.org  Wed Apr  2 17:44:44 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 02 Apr 2014 17:44:44 +0200
Subject: [GenABEL-dev] Preparing for ProbABEL 0.4.3 release
In-Reply-To: <52F757FB.8070707@karssen.org>
References: <52F757FB.8070707@karssen.org>
Message-ID: <533C306C.30302@karssen.org>

Dear list,

For those of you who haven't noticed through other channels: I have
tagged and released ProbABEL v0.4.3 yesterday. The release announcement
is available at http://www.genabel.org. The source code is available
from the ProbABEL page [2] and packages for Debian and Ubuntu have been
sent to the respective build services.

Now we can spend all our time on getting the next great release of
ProbABEL out. Maarten has done a lot of work on it, a few things still
need to be done, but we are definitely headed for a great release.


Thanks to all for your work on ProbABEL.

Best,

Lennart.

On 09-02-14 11:27, L.C. Karssen wrote:
> Dear list,
> 
> I am currently preparing the 0.4.3 release of ProbABEL. If you have any
> updates/fixes/etc. that you would like to have in this release, please
> let me know. I aim to do the release somewhere in the coming week.
> 
> 
> Best regards,
> 
> Lennart Karssen.
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140402/3a3dee98/attachment.sig>

From lennart at karssen.org  Wed Apr  2 17:45:59 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 02 Apr 2014 17:45:59 +0200
Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference
In-Reply-To: <52EAA7D1.5080106@karssen.org>
References: <52EAA7D1.5080106@karssen.org>
Message-ID: <533C30B7.3060102@karssen.org>

Dear list,

Please find attached the poster I presented at the European Mathematical
Genetics Society in Cologne over the past two days.


Best,

Lennart.

On 30-01-14 20:28, L.C. Karssen wrote:
> Dear list,
> 
> I'm planning to go to the EMGM (European Mathematical Genetics Meeting)
> in Cologne in April. I'd like to present a poster there and wrote the
> abstract below.
> Please let me know any comments or suggestions as soon possible as the
> deadline for abstract submission is tomorrow (Fri 31 Jan).
> 
> Thank you very much,
> 
> Lennart.
> 
> 
> --------------8<----------------8<------------------8<-------------
>   Over the last year the GenABEL project has seen a considerable
>   number of improvements. These improvements do not only consist of
>   updates to the existing packages of the GenABEL suite, but are also
>   manifest in the way the development process is being handled and
>   the way the packages are made available to the users.
> 
>   On our poster we will demonstrate the newly implemented features in
>   the various packages of the GenABEL suite. We also welcome a new
>   member to the GenABEL family: OmicABEL, a package for rapid
>   mixed-model based genome-wide association analysis of multiple
>   traits (think metabolomics, glycomics, etc.).
> 
>   Recently we started using the open source Jenkins Continuous
>   Integration server to help us release software of higher
>   quality. Jenkins is a framework that automatically runs several
>   tests (e.g. static code analysis, checks for memory leaks) for each
>   of our packages. It builds and tests each project after a new commit
>   in our version control system. This allows us to detect problems in
>   the code at an early stage, before they bug the user.
> 
>   After the GenABEL package, ProbABEL is the second package that is
>   available as a Debian package. This means that users of upcoming
>   Debian releases will be able to install ProbABEL with a simple click
>   of a button or a single command. Moreover, since many other Linux
>   distributions like Ubuntu and Linux Mint are derived from Debian,
>   users of these distributions automatically benefit as well.
> 
>   In the coming year more packages are expected to be added the
>   GenABEL suite as well as continued efforts to improve the existing
>   ones. Moreover, we plan to increase both the ease of installation as
>   well as the visibility of the GenABEL suite by adding more packages
>   into both the Debian and Red Hat Enterprise Linux repositories (as
>   well as derivatives like CentOS and Scientific Linux).
> --------------8<----------------8<------------------8<-------------
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: posterEMGM2014_GenABEL.pdf
Type: application/pdf
Size: 236741 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140402/89063b46/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140402/89063b46/attachment-0001.sig>

From yurii.aulchenko at gmail.com  Wed Apr  2 20:22:21 2014
From: yurii.aulchenko at gmail.com (Yury Aulchenko)
Date: Wed, 2 Apr 2014 20:22:21 +0200
Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference
In-Reply-To: <533C30B7.3060102@karssen.org>
References: <52EAA7D1.5080106@karssen.org> <533C30B7.3060102@karssen.org>
Message-ID: <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com>

European Math Genet MEETING :)

----------------
Sent from mobile device, please excuse possible typos

> On 02 Apr 2014, at 17:45, "L.C. Karssen" <lennart at karssen.org> wrote:
> 
> Dear list,
> 
> Please find attached the poster I presented at the European Mathematical
> Genetics Society in Cologne over the past two days.
> 
> 
> Best,
> 
> Lennart.
> 
>> On 30-01-14 20:28, L.C. Karssen wrote:
>> Dear list,
>> 
>> I'm planning to go to the EMGM (European Mathematical Genetics Meeting)
>> in Cologne in April. I'd like to present a poster there and wrote the
>> abstract below.
>> Please let me know any comments or suggestions as soon possible as the
>> deadline for abstract submission is tomorrow (Fri 31 Jan).
>> 
>> Thank you very much,
>> 
>> Lennart.
>> 
>> 
>> --------------8<----------------8<------------------8<-------------
>>  Over the last year the GenABEL project has seen a considerable
>>  number of improvements. These improvements do not only consist of
>>  updates to the existing packages of the GenABEL suite, but are also
>>  manifest in the way the development process is being handled and
>>  the way the packages are made available to the users.
>> 
>>  On our poster we will demonstrate the newly implemented features in
>>  the various packages of the GenABEL suite. We also welcome a new
>>  member to the GenABEL family: OmicABEL, a package for rapid
>>  mixed-model based genome-wide association analysis of multiple
>>  traits (think metabolomics, glycomics, etc.).
>> 
>>  Recently we started using the open source Jenkins Continuous
>>  Integration server to help us release software of higher
>>  quality. Jenkins is a framework that automatically runs several
>>  tests (e.g. static code analysis, checks for memory leaks) for each
>>  of our packages. It builds and tests each project after a new commit
>>  in our version control system. This allows us to detect problems in
>>  the code at an early stage, before they bug the user.
>> 
>>  After the GenABEL package, ProbABEL is the second package that is
>>  available as a Debian package. This means that users of upcoming
>>  Debian releases will be able to install ProbABEL with a simple click
>>  of a button or a single command. Moreover, since many other Linux
>>  distributions like Ubuntu and Linux Mint are derived from Debian,
>>  users of these distributions automatically benefit as well.
>> 
>>  In the coming year more packages are expected to be added the
>>  GenABEL suite as well as continued efforts to improve the existing
>>  ones. Moreover, we plan to increase both the ease of installation as
>>  well as the visibility of the GenABEL suite by adding more packages
>>  into both the Debian and Red Hat Enterprise Linux repositories (as
>>  well as derivatives like CentOS and Scientific Linux).
>> --------------8<----------------8<------------------8<-------------
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>> 
> 
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
> 
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> <posterEMGM2014_GenABEL.pdf>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

From lennart at karssen.org  Fri Apr  4 15:46:08 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Fri, 04 Apr 2014 15:46:08 +0200
Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference
In-Reply-To: <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com>
References: <52EAA7D1.5080106@karssen.org> <533C30B7.3060102@karssen.org>
 <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com>
Message-ID: <533EB7A0.3050006@karssen.org>

Of course :-).


Lennart.


On 02-04-14 20:22, Yury Aulchenko wrote:
> European Math Genet MEETING :)
> 
> ----------------
> Sent from mobile device, please excuse possible typos
> 
>> On 02 Apr 2014, at 17:45, "L.C. Karssen" <lennart at karssen.org> wrote:
>>
>> Dear list,
>>
>> Please find attached the poster I presented at the European Mathematical
>> Genetics Society in Cologne over the past two days.
>>
>>
>> Best,
>>
>> Lennart.
>>
>>> On 30-01-14 20:28, L.C. Karssen wrote:
>>> Dear list,
>>>
>>> I'm planning to go to the EMGM (European Mathematical Genetics Meeting)
>>> in Cologne in April. I'd like to present a poster there and wrote the
>>> abstract below.
>>> Please let me know any comments or suggestions as soon possible as the
>>> deadline for abstract submission is tomorrow (Fri 31 Jan).
>>>
>>> Thank you very much,
>>>
>>> Lennart.
>>>
>>>
>>> --------------8<----------------8<------------------8<-------------
>>>  Over the last year the GenABEL project has seen a considerable
>>>  number of improvements. These improvements do not only consist of
>>>  updates to the existing packages of the GenABEL suite, but are also
>>>  manifest in the way the development process is being handled and
>>>  the way the packages are made available to the users.
>>>
>>>  On our poster we will demonstrate the newly implemented features in
>>>  the various packages of the GenABEL suite. We also welcome a new
>>>  member to the GenABEL family: OmicABEL, a package for rapid
>>>  mixed-model based genome-wide association analysis of multiple
>>>  traits (think metabolomics, glycomics, etc.).
>>>
>>>  Recently we started using the open source Jenkins Continuous
>>>  Integration server to help us release software of higher
>>>  quality. Jenkins is a framework that automatically runs several
>>>  tests (e.g. static code analysis, checks for memory leaks) for each
>>>  of our packages. It builds and tests each project after a new commit
>>>  in our version control system. This allows us to detect problems in
>>>  the code at an early stage, before they bug the user.
>>>
>>>  After the GenABEL package, ProbABEL is the second package that is
>>>  available as a Debian package. This means that users of upcoming
>>>  Debian releases will be able to install ProbABEL with a simple click
>>>  of a button or a single command. Moreover, since many other Linux
>>>  distributions like Ubuntu and Linux Mint are derived from Debian,
>>>  users of these distributions automatically benefit as well.
>>>
>>>  In the coming year more packages are expected to be added the
>>>  GenABEL suite as well as continued efforts to improve the existing
>>>  ones. Moreover, we plan to increase both the ease of installation as
>>>  well as the visibility of the GenABEL suite by adding more packages
>>>  into both the Debian and Red Hat Enterprise Linux repositories (as
>>>  well as derivatives like CentOS and Scientific Linux).
>>> --------------8<----------------8<------------------8<-------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>
>>
>> -- 
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>> L.C. Karssen
>> Utrecht
>> The Netherlands
>>
>> lennart at karssen.org
>> http://blog.karssen.org
>> GPG key ID: A88F554A
>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>> <posterEMGM2014_GenABEL.pdf>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140404/89a8eae3/attachment.sig>

From lennart at karssen.org  Mon Apr  7 08:59:30 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 07 Apr 2014 08:59:30 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1672 -
	branches/ProbABEL-0.50/checks/R-tests
In-Reply-To: <20140402205412.1A1CF186EE1@r-forge.r-project.org>
References: <20140402205412.1A1CF186EE1@r-forge.r-project.org>
Message-ID: <53424CD2.6060901@karssen.org>

Thanks Maarten! Nice fix. One step closer to a complete set of tests of
the ProbABEL functionality.


Lennart.

On 02-04-14 22:54, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-02 22:54:11 +0200 (Wed, 02 Apr 2014)
> New Revision: 1672
> 
> Modified:
>    branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R
> Log:
> One check was malfunctioning. This was caused by combination of being hard set to a value and change to EIGEN for cholesky decomposition. I made this test succeed since mathematically it seems to work alright  
> 
> Modified: branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R
> ===================================================================
> --- branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R	2014-04-02 20:25:43 UTC (rev 1671)
> +++ branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R	2014-04-02 20:54:11 UTC (rev 1672)
> @@ -36,7 +36,11 @@
>  ## (SNP 6 in the info file). ProbABEL lists them all as 0.0, R lists
>  ## them as:
>  prob.dom.PA[6, 2:4] <- c(NaN, NaN, 0.0)
> +#for 2df model the last SNP is interchangeable: EIGEN calculates the beta for the other SNP than R. This causes the beta to have the wrong sign. This part of change the position of the snp beta(and swaps sign) and SE if beta and other SE are 0
> +if (sum(abs(prob.2df.PA[6, 2:3]))==0){
> +prob.2df.PA[6, 2:3] <-c(prob.2df.PA[6, 4]*-1,prob.2df.PA[6, 5])
>  prob.2df.PA[6, 4:5] <- c(NA, NA)
> +}
>  
>  ####
>  ## run analysis in R
> @@ -123,7 +127,8 @@
>  }
>  colnames(prob.2df.R) <- cols2df
>  rownames(prob.2df.R) <- NULL
> -stopifnot( all.equal(prob.2df.PA[1:5,], prob.2df.R[1:5,], tol=tol) )
> +
> +stopifnot( all.equal(prob.2df.PA, prob.2df.R, tol=tol) )
>  cat("2df\n")
>  
>  cat("\t\t\t\t\t\tOK\n")
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140407/c4794cd4/attachment.sig>

From lennart at karssen.org  Mon Apr  7 09:06:31 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 07 Apr 2014 09:06:31 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1671 - in
 branches/ProbABEL-0.50: examples src
In-Reply-To: <20140402202543.B9693186EE2@r-forge.r-project.org>
References: <20140402202543.B9693186EE2@r-forge.r-project.org>
Message-ID: <53424E77.5020400@karssen.org>

Hi Maarten,

On 02-04-14 22:25, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-02 22:25:43 +0200 (Wed, 02 Apr 2014)
> New Revision: 1671
> 
> Modified:
>    branches/ProbABEL-0.50/examples/mmscore.R
>    branches/ProbABEL-0.50/src/probabel
>    branches/ProbABEL-0.50/src/reg1.cpp
> Log:
> reg1.cpp : simplified code of mmscore(palinear) and fixed failing test_mms.sh check
> probabel: introduce check that verifies phenotype file does exists before running pa* (prevents errors while running the script)  
> mmscore.R: fixed small typo
> 
> Modified: branches/ProbABEL-0.50/examples/mmscore.R
> ===================================================================
> --- branches/ProbABEL-0.50/examples/mmscore.R	2014-04-02 15:57:31 UTC (rev 1670)
> +++ branches/ProbABEL-0.50/examples/mmscore.R	2014-04-02 20:25:43 UTC (rev 1671)
> @@ -88,5 +88,5 @@
>  ## 2) residuals of the phenotype, which will be the new phenotype that
>  ## ProbABEL will analyse.
>  
> -## Mow, go to ProbABEL and start analysis
> +## Now, go to ProbABEL and start analysis

Thanks :-).

>  
> 
> Modified: branches/ProbABEL-0.50/src/probabel
> ===================================================================
> --- branches/ProbABEL-0.50/src/probabel	2014-04-02 15:57:31 UTC (rev 1670)
> +++ branches/ProbABEL-0.50/src/probabel	2014-04-02 20:25:43 UTC (rev 1671)
> @@ -169,6 +169,11 @@
>  
>  
>  my $phename = $ARGV[5];
> +if (! -e $phename.".PHE"){
> +die "Phenotype file $phename.PHE does not exists. The phenotype file should be specified without the .PHE extension.\n";
> +}
> +
> +
>  # By default the output file prefix is the same as the name of the
>  # phenotype file (minus the .PHE extension and any paths)
>  use File::Basename;
> 
> Modified: branches/ProbABEL-0.50/src/reg1.cpp
> ===================================================================
> --- branches/ProbABEL-0.50/src/reg1.cpp	2014-04-02 15:57:31 UTC (rev 1670)
> +++ branches/ProbABEL-0.50/src/reg1.cpp	2014-04-02 20:25:43 UTC (rev 1671)
> @@ -313,35 +313,21 @@
>  void linear_reg::mmscore_regression(const mematrix<double>& X,
>          const masked_matrix& W_masked, LDLT<MatrixXd>& Ch) {
>  
> -
>      VectorXd Y = reg_data.Y.data.col(0);
> -    if (X.data.cols() == 3)
> -    {
> -        Matrix<double, Dynamic, 3> tXW = W_masked.masked_data->data * X.data;
> -        Matrix2d xWx = tXW.transpose() * X.data;
> -        Ch = LDLT<MatrixXd>(xWx);
> -        Vector3d beta_3f = Ch.solve(tXW.transpose() * Y);
> -        sigma2 = (Y - tXW * beta_3f).squaredNorm();
> -        beta.data = beta_3f;
> -    }
> -    else if (X.data.cols() == 2)
> -    {
> -        Matrix<double,  Dynamic,2> tXW =  W_masked.masked_data->data*X.data;
> -        Matrix2d xWx = tXW.transpose() * X.data;
> -        Ch = LDLT<MatrixXd> (xWx);
> -        Vector2d beta_2f = Ch.solve(tXW.transpose() * Y);
> -        sigma2 = (Y - tXW * beta_2f).squaredNorm();
> -        beta.data = beta_2f;
> -    }
> -    else
> -    {
> -        // next line is  5997000 flops
> -        MatrixXd tXW = X.data.transpose() * W_masked.masked_data->data;
> -        Ch = LDLT<MatrixXd>(tXW * X.data); // 17991 flops
> -        beta.data = Ch.solve(tXW * Y); //5997 flops
> -        //next line is: 1000+5000+3000= 9000 flops
> -        sigma2 = (Y - tXW.transpose() * beta.data).squaredNorm();
> -    }

Glad to see the if/else go! This is much cleaner (and apparently not
slower).


> +    /*
> +     in ProbABEL <0.50 this calculation was performed like t(X)*W
> +     This changed to W*X since this is better vectorized since the left hand
> +     side has more rows: this introduces an additional transpose, but can be
> +     neglected compared to the speedup this brings(about a factor 2 for the
> +     palinear with 1 predictor)
> +     */
> +    MatrixXd tXW = W_masked.masked_data->data * X.data;

I think the variable naming should be more apropriate here: tXW sounds
like X^t * W, but you store W * X in that variable.


> +    MatrixXd xWx = tXW.transpose() * X.data;

Similarly here, I'm not sure how to interpret xWx. Since you calculate
(W*X)^t * X a name like WXtX seems more reasonable.

> +    Ch = LDLT<MatrixXd>(xWx);
> +    VectorXd beta_vec = Ch.solve(tXW.transpose() * Y);
> +    sigma2 = (Y - tXW * beta_vec).squaredNorm();
> +    beta.data = beta_vec;
> +
>  }


Thanks for the good work!

Lennart.

>  
>  void linear_reg::logLikelihood(const mematrix<double>& X) {
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140407/8699ebf0/attachment.sig>

From lennart at karssen.org  Mon Apr  7 16:37:38 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 07 Apr 2014 16:37:38 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector
In-Reply-To: <20140407143639.30257186D32@r-forge.r-project.org>
References: <20140407143639.30257186D32@r-forge.r-project.org>
Message-ID: <5342B832.20207@karssen.org>

Pfff..... Finally got the filevector tag the way I wanted...


Lennart.

On 07-04-14 16:36, noreply at r-forge.r-project.org wrote:
> Author: lckarssen
> Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014)
> New Revision: 1679
> 
> Added:
>    tags/filevector/v.1.0.0/
> Log:
> Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674.
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140407/65d5bb74/attachment.sig>

From yurii.aulchenko at gmail.com  Mon Apr  7 19:38:50 2014
From: yurii.aulchenko at gmail.com (Yury Aulchenko)
Date: Mon, 7 Apr 2014 19:38:50 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector
In-Reply-To: <5342B832.20207@karssen.org>
References: <20140407143639.30257186D32@r-forge.r-project.org>
 <5342B832.20207@karssen.org>
Message-ID: <F266BD1C-46F1-477A-BA34-6276549532EC@gmail.com>

Yep, I was following the story with curiosity :)

----------------
Sent from mobile device, please excuse possible typos

> On 07 Apr 2014, at 16:37, "L.C. Karssen" <lennart at karssen.org> wrote:
> 
> Pfff..... Finally got the filevector tag the way I wanted...
> 
> 
> 
> Lennart.
> 
>> On 07-04-14 16:36, noreply at r-forge.r-project.org wrote:
>> Author: lckarssen
>> Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014)
>> New Revision: 1679
>> 
>> Added:
>>   tags/filevector/v.1.0.0/
>> Log:
>> Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674.
>> 
>> _______________________________________________
>> Genabel-commits mailing list
>> Genabel-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
> 
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

From kooyman at gmail.com  Mon Apr  7 19:51:54 2014
From: kooyman at gmail.com (Maarten Kooyman)
Date: Mon, 07 Apr 2014 19:51:54 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1671 - in
 branches/ProbABEL-0.50: examples src
In-Reply-To: <53424E77.5020400@karssen.org>
References: <20140402202543.B9693186EE2@r-forge.r-project.org>
 <53424E77.5020400@karssen.org>
Message-ID: <5342E5BA.3000509@gmail.com>

Hi Lennart,

On 07-04-14 09:06, L.C. Karssen wrote:
> Hi Maarten,
>
> On 02-04-14 22:25, noreply at r-forge.r-project.org wrote:
>> Author: maartenk
>> Date: 2014-04-02 22:25:43 +0200 (Wed, 02 Apr 2014)
>>
> Thanks :-).
>>   
>>
>> Modified: branches/ProbABEL-0.50/src/probabel
>> ===================================================================
>> --- branches/ProbABEL-0.50/src/probabel	2014-04-02 15:57:31 UTC (rev 1670)
>> +++ branches/ProbABEL-0.50/src/probabel	2014-04-02 20:25:43 UTC (rev 1671)
>> @@ -169,6 +169,11 @@
>>   
>>   
>>   my $phename = $ARGV[5];
>> +if (! -e $phename.".PHE"){
>> +die "Phenotype file $phename.PHE does not exists. The phenotype file should be specified without the .PHE extension.\n";
>> +}
>> +
>> +
>>   # By default the output file prefix is the same as the name of the
>>   # phenotype file (minus the .PHE extension and any paths)
>>   use File::Basename;
>>
>> Modified: branches/ProbABEL-0.50/src/reg1.cpp
>> ===================================================================
>> --- branches/ProbABEL-0.50/src/reg1.cpp	2014-04-02 15:57:31 UTC (rev 1670)
>> +++ branches/ProbABEL-0.50/src/reg1.cpp	2014-04-02 20:25:43 UTC (rev 1671)
>> @@ -313,35 +313,21 @@
>>   void linear_reg::mmscore_regression(const mematrix<double>& X,
>>           const masked_matrix& W_masked, LDLT<MatrixXd>& Ch) {
>>   
>> -
>>       VectorXd Y = reg_data.Y.data.col(0);
>> -    if (X.data.cols() == 3)
>> -    {
>> -        Matrix<double, Dynamic, 3> tXW = W_masked.masked_data->data * X.data;
>> -        Matrix2d xWx = tXW.transpose() * X.data;
>> -        Ch = LDLT<MatrixXd>(xWx);
>> -        Vector3d beta_3f = Ch.solve(tXW.transpose() * Y);
>> -        sigma2 = (Y - tXW * beta_3f).squaredNorm();
>> -        beta.data = beta_3f;
>> -    }
>> -    else if (X.data.cols() == 2)
>> -    {
>> -        Matrix<double,  Dynamic,2> tXW =  W_masked.masked_data->data*X.data;
>> -        Matrix2d xWx = tXW.transpose() * X.data;
>> -        Ch = LDLT<MatrixXd> (xWx);
>> -        Vector2d beta_2f = Ch.solve(tXW.transpose() * Y);
>> -        sigma2 = (Y - tXW * beta_2f).squaredNorm();
>> -        beta.data = beta_2f;
>> -    }
>> -    else
>> -    {
>> -        // next line is  5997000 flops
>> -        MatrixXd tXW = X.data.transpose() * W_masked.masked_data->data;
>> -        Ch = LDLT<MatrixXd>(tXW * X.data); // 17991 flops
>> -        beta.data = Ch.solve(tXW * Y); //5997 flops
>> -        //next line is: 1000+5000+3000= 9000 flops
>> -        sigma2 = (Y - tXW.transpose() * beta.data).squaredNorm();
>> -    }
> Glad to see the if/else go! This is much cleaner (and apparently not
> slower).
>
>
>> +    /*
>> +     in ProbABEL <0.50 this calculation was performed like t(X)*W
>> +     This changed to W*X since this is better vectorized since the left hand
>> +     side has more rows: this introduces an additional transpose, but can be
>> +     neglected compared to the speedup this brings(about a factor 2 for the
>> +     palinear with 1 predictor)
>> +     */
>> +    MatrixXd tXW = W_masked.masked_data->data * X.data;
> I think the variable naming should be more apropriate here: tXW sounds
> like X^t * W, but you store W * X in that variable.
Yepp, your right it should be called tWX. We skip the transpose of W 
since it is a symmetric matrix: however in terms of mathematics it makes 
sense to call what we achieve. You can read in the code what we do.  
This might need some explanation in form of comments.
>
>> +    MatrixXd xWx = tXW.transpose() * X.data;
> Similarly here, I'm not sure how to interpret xWx. Since you calculate
> (W*X)^t * X a name like WXtX seems more reasonable.

So this will be something like ttWXX ??? Any other good solution?
>> +    Ch = LDLT<MatrixXd>(xWx);
>> +    VectorXd beta_vec = Ch.solve(tXW.transpose() * Y);
>> +    sigma2 = (Y - tXW * beta_vec).squaredNorm();
>> +    beta.data = beta_vec;
>> +
>>   }
> Thanks for the good work!
>
> Lennart.
>
>>   
>>   void linear_reg::logLikelihood(const mematrix<double>& X) {
>>


From fabregat at aices.rwth-aachen.de  Mon Apr  7 20:08:52 2014
From: fabregat at aices.rwth-aachen.de (Diego Fabregat)
Date: Mon, 7 Apr 2014 20:08:52 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1671 - in
 branches/ProbABEL-0.50: examples src
In-Reply-To: <5342E5BA.3000509@gmail.com>
References: <20140402202543.B9693186EE2@r-forge.r-project.org>
 <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com>
Message-ID: <5342E9B4.3020007@aices.rwth-aachen.de>

Hi guys,

If I may...
>>> +    /*
>>> +     in ProbABEL <0.50 this calculation was performed like t(X)*W
>>> +     This changed to W*X since this is better vectorized since the 
>>> left hand
>>> +     side has more rows: this introduces an additional transpose, 
>>> but can be
>>> +     neglected compared to the speedup this brings(about a factor 2 
>>> for the
>>> +     palinear with 1 predictor)
>>> +     */
>>> +    MatrixXd tXW = W_masked.masked_data->data * X.data;
>> I think the variable naming should be more apropriate here: tXW sounds
>> like X^t * W, but you store W * X in that variable.
> Yepp, your right it should be called tWX. We skip the transpose of W 
> since it is a symmetric matrix: however in terms of mathematics it 
> makes sense to call what we achieve. You can read in the code what we 
> do.  This might need some explanation in form of comments.
>>
>>> +    MatrixXd xWx = tXW.transpose() * X.data;
>> Similarly here, I'm not sure how to interpret xWx. Since you calculate
>> (W*X)^t * X a name like WXtX seems more reasonable.
>
> So this will be something like ttWXX ??? Any other good solution?
I don't know the context of the discussion, but what do you think about 
documenting the algorithm somewhere in the code (like at the top of the 
source file), giving simple names to the variables, and then just using 
those names instead of getting to a point where you have to juggle with 
cryptic variable names. For instance, in case you want to solve a 
least-squares problem  inv(X^T X) X^T y:

/*
  *  Algorithm for LSQ  [ b := inv(X^T X) X^T y ]
  *
  *  S := X^T X
  *  v := X^T y
  *  b := inv(S) y   (notice that this should be solved as a linear 
system, not explicitly inverting S)
  */

And then you can simply use S, v, and b in the code.

Best,
Diego

From lennart at karssen.org  Mon Apr  7 21:48:37 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 07 Apr 2014 21:48:37 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector
In-Reply-To: <F266BD1C-46F1-477A-BA34-6276549532EC@gmail.com>
References: <20140407143639.30257186D32@r-forge.r-project.org>
 <5342B832.20207@karssen.org> <F266BD1C-46F1-477A-BA34-6276549532EC@gmail.com>
Message-ID: <53430115.6020409@karssen.org>

:-) One more reason to consider switching to git. There I could have
gotten it right before pushing it to the public repo and bothering all
of you.


Lennart.

On 07-04-14 19:38, Yury Aulchenko wrote:
> Yep, I was following the story with curiosity :)
> 
> ----------------
> Sent from mobile device, please excuse possible typos
> 
>> On 07 Apr 2014, at 16:37, "L.C. Karssen" <lennart at karssen.org> wrote:
>>
>> Pfff..... Finally got the filevector tag the way I wanted...
>>
>>
>>
>> Lennart.
>>
>>> On 07-04-14 16:36, noreply at r-forge.r-project.org wrote:
>>> Author: lckarssen
>>> Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014)
>>> New Revision: 1679
>>>
>>> Added:
>>>   tags/filevector/v.1.0.0/
>>> Log:
>>> Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674.
>>>
>>> _______________________________________________
>>> Genabel-commits mailing list
>>> Genabel-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>
>> -- 
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>> L.C. Karssen
>> Utrecht
>> The Netherlands
>>
>> lennart at karssen.org
>> http://blog.karssen.org
>> GPG key ID: A88F554A
>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140407/462a8895/attachment.sig>

From lennart at karssen.org  Mon Apr  7 21:52:43 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 07 Apr 2014 21:52:43 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1671 - in
 branches/ProbABEL-0.50: examples src
In-Reply-To: <5342E9B4.3020007@aices.rwth-aachen.de>
References: <20140402202543.B9693186EE2@r-forge.r-project.org>
 <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com>
 <5342E9B4.3020007@aices.rwth-aachen.de>
Message-ID: <5343020B.8060401@karssen.org>

Hi Diego,

On 07-04-14 20:08, Diego Fabregat wrote:
> Hi guys,
> 
> If I may...

Of course! More than welcome :-).

I like your suggestion a lot. It's seems the cleanest one, with the
added advantage that it documents the algorithm explicitly in the code.


Thanks,

Lennart.

>>>> +    /*
>>>> +     in ProbABEL <0.50 this calculation was performed like t(X)*W
>>>> +     This changed to W*X since this is better vectorized since the
>>>> left hand
>>>> +     side has more rows: this introduces an additional transpose,
>>>> but can be
>>>> +     neglected compared to the speedup this brings(about a factor 2
>>>> for the
>>>> +     palinear with 1 predictor)
>>>> +     */
>>>> +    MatrixXd tXW = W_masked.masked_data->data * X.data;
>>> I think the variable naming should be more apropriate here: tXW sounds
>>> like X^t * W, but you store W * X in that variable.
>> Yepp, your right it should be called tWX. We skip the transpose of W
>> since it is a symmetric matrix: however in terms of mathematics it
>> makes sense to call what we achieve. You can read in the code what we
>> do.  This might need some explanation in form of comments.
>>>
>>>> +    MatrixXd xWx = tXW.transpose() * X.data;
>>> Similarly here, I'm not sure how to interpret xWx. Since you calculate
>>> (W*X)^t * X a name like WXtX seems more reasonable.
>>
>> So this will be something like ttWXX ??? Any other good solution?
> I don't know the context of the discussion, but what do you think about
> documenting the algorithm somewhere in the code (like at the top of the
> source file), giving simple names to the variables, and then just using
> those names instead of getting to a point where you have to juggle with
> cryptic variable names. For instance, in case you want to solve a
> least-squares problem  inv(X^T X) X^T y:
> 
> /*
>  *  Algorithm for LSQ  [ b := inv(X^T X) X^T y ]
>  *
>  *  S := X^T X
>  *  v := X^T y
>  *  b := inv(S) y   (notice that this should be solved as a linear
> system, not explicitly inverting S)
>  */
> 
> And then you can simply use S, v, and b in the code.
> 
> Best,
> Diego
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140407/e4375ded/attachment.sig>

From yurii.aulchenko at gmail.com  Tue Apr  8 01:14:43 2014
From: yurii.aulchenko at gmail.com (Yurii Aulchenko)
Date: Tue, 8 Apr 2014 01:14:43 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1671 - in
 branches/ProbABEL-0.50: examples src
In-Reply-To: <5343020B.8060401@karssen.org>
References: <20140402202543.B9693186EE2@r-forge.r-project.org>
 <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com>
 <5342E9B4.3020007@aices.rwth-aachen.de>
 <5343020B.8060401@karssen.org>
Message-ID: <CAHX9t6JE9PMY82rpVjTbeyufiYr0y6Mc_O1T9KCALTLBGCLeHQ@mail.gmail.com>

agree!


On Mon, Apr 7, 2014 at 9:52 PM, L.C. Karssen <lennart at karssen.org> wrote:

> Hi Diego,
>
> On 07-04-14 20:08, Diego Fabregat wrote:
> > Hi guys,
> >
> > If I may...
>
> Of course! More than welcome :-).
>
> I like your suggestion a lot. It's seems the cleanest one, with the
> added advantage that it documents the algorithm explicitly in the code.
>
>
> Thanks,
>
> Lennart.
>
> >>>> +    /*
> >>>> +     in ProbABEL <0.50 this calculation was performed like t(X)*W
> >>>> +     This changed to W*X since this is better vectorized since the
> >>>> left hand
> >>>> +     side has more rows: this introduces an additional transpose,
> >>>> but can be
> >>>> +     neglected compared to the speedup this brings(about a factor 2
> >>>> for the
> >>>> +     palinear with 1 predictor)
> >>>> +     */
> >>>> +    MatrixXd tXW = W_masked.masked_data->data * X.data;
> >>> I think the variable naming should be more apropriate here: tXW sounds
> >>> like X^t * W, but you store W * X in that variable.
> >> Yepp, your right it should be called tWX. We skip the transpose of W
> >> since it is a symmetric matrix: however in terms of mathematics it
> >> makes sense to call what we achieve. You can read in the code what we
> >> do.  This might need some explanation in form of comments.
> >>>
> >>>> +    MatrixXd xWx = tXW.transpose() * X.data;
> >>> Similarly here, I'm not sure how to interpret xWx. Since you calculate
> >>> (W*X)^t * X a name like WXtX seems more reasonable.
> >>
> >> So this will be something like ttWXX ??? Any other good solution?
> > I don't know the context of the discussion, but what do you think about
> > documenting the algorithm somewhere in the code (like at the top of the
> > source file), giving simple names to the variables, and then just using
> > those names instead of getting to a point where you have to juggle with
> > cryptic variable names. For instance, in case you want to solve a
> > least-squares problem  inv(X^T X) X^T y:
> >
> > /*
> >  *  Algorithm for LSQ  [ b := inv(X^T X) X^T y ]
> >  *
> >  *  S := X^T X
> >  *  v := X^T y
> >  *  b := inv(S) y   (notice that this should be solved as a linear
> > system, not explicitly inverting S)
> >  */
> >
> > And then you can simply use S, v, and b in the code.
> >
> > Best,
> > Diego
> > _______________________________________________
> > genabel-devel mailing list
> > genabel-devel at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>


-- 
-----------------------------------------------------
Yurii S. Aulchenko

[ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [
Twitter<http://twitter.com/YuriiAulchenko>] [
Blog <http://yurii-aulchenko.blogspot.nl/> ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140408/5dd97ea8/attachment-0001.html>

From yurii.aulchenko at gmail.com  Wed Apr  9 22:12:03 2014
From: yurii.aulchenko at gmail.com (Yury Aulchenko)
Date: Wed, 9 Apr 2014 22:12:03 +0200
Subject: [GenABEL-dev] Summer school in statistical omics
Message-ID: <1CB1368F-D306-4AE6-AC54-BAC9E7BD7551@gmail.com>

Dear All,

Sorry for off-topic, but I was wondering if some of you (especially these involved in teaching and training) may be interested in information about summer school we are organizing. May be good for your students! The deadline is however close. See below and also the link http://school.statisticalomics.org

Yurii 

Summer School in Statistical Omics 2014, to be held in Split, Croatia, from Aug 1 to 15, 2014. The deadline for applications is April 15, 2014. 

The School of Statistical Omics aims to train new generation of omics scientists. The School consists of project-based training-through-research and series of lectures designed to introduce students with biological/biomedical/biochemical background to statistical analyses of multiple omics datasets. With the development of high-throughput technologies in the recent years, the field of biology has become data-rich field and, consequently, there is an increasing need for biologists trained in data analysis. The aim of this highly intensive School is training of a new generation of the Statistical Omics scientists. In two weeks of work on a cutting-edge scientific projects participants will be introduced to highly relevant real-world problems related to the fields of glycomics and genomics and gain experience of programming in R programming language.
Selected participants will spend two weeks working on a project in a small groups of up to 5 students. Morning sessions are planned for the lectures that are intended to give overview of the current state-of-the-art methods used in the field and corresponding theoretical background for practical sessions that will be held in the afternoons. First two days are dedicated to introduction to the field of Statistical Omics and selection of projects. All projects will be presented by project leaders and students will choose the project they wish to work on. The groups will be formed balancing between personal choices and equal distribution of participants on all projects. Each group will consist of up to 5 students, a project leader and a tutor. During the School, several lectures will be held from the leading scientists in the field discussing the latest advances and discoveries. The School will end with a small conference where students will present their work through posters and presentations.

Given the nature and magnitude of the projects, students are invited to stay in contact with project leaders and to continue to work on projects after the School, potentially leading to scientific publications or qualification works (e.g. MSci).

----------------
Sent from mobile device, please excuse possible typos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140409/c34439d6/attachment.html>

From yurii.aulchenko at gmail.com  Wed Apr  9 22:20:40 2014
From: yurii.aulchenko at gmail.com (Yurii Aulchenko)
Date: Wed, 9 Apr 2014 22:20:40 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1664 -
	branches/ProbABEL-0.50/src
In-Reply-To: <5339E158.2040106@gmail.com>
References: <20140328191241.F38E6185FBC@r-forge.r-project.org>
 <A2211A25-7C31-4261-808C-494665D91603@gmail.com>
 <5335CDCF.8090503@gmail.com> <5335F579.9070608@karssen.org>
 <DA597561-7CC2-45C9-8093-D60E59B3A134@gmail.com>
 <5339E158.2040106@gmail.com>
Message-ID: <CAHX9t6LLYWiMpL5esuqTG8aujBEDXz8HZRCKkFX_0+Go47h8jg@mail.gmail.com>

Absolutely agree. More than supportive! Would be absolutely cool to be able
to have all these different packages and functions we have working with
different type of data via centralized API. Tremendous help in development
of new methods, something which would really make GenA project attractive
for other developers.

Yurii

On Monday, March 31, 2014, Maarten Kooyman <kooyman at gmail.com> wrote:

> Dear All,
>
> It might be usefull to make next generation Databel with a interface for
> IMPUTE2/SHAPEIT and mach/minimac. Having one library/package to read the
> data would help all projects in usability. I'm not the one waiting to
> convert my 1kg imputations into other format. Nobody (in user perspective)
> feels like saving the same hundreds of GB of data in multiple formats. (And
> that is a practical reason for choosing a program to work with, and might
> not be the same as the best program)
>
> To centralize these function would also benefit method developers. They do
> not have to bother with writing another parser. Creating a reliable, fast
> and multi-format parser is boilerplate code and this kind of code you do
> not want to bother with if you have a new powerful methodology in mind.
> That is why lots of scientific software is picky on input format. There are
> offcourse some problems caused by the nature of the data format eg [1].
>
>
> Kind regards,
>
> Maarten
>
>
>
>
> [1] One problem is that there is an number of different predictors in
> those formats. It varies between 1 and 3, where in case of IMPUTE2/SHAPEIT
> the probabilities do not sum to one.  mach/minimac might be converted to 3
> predictors since it should[1] add to one.
>
> On 31-03-14 20:46, Yury Aulchenko wrote:
>
> I personally find the fact that text outperforms binary disappointing
> (and, if you forget about technical details - well, strange). On the other
> hand this is probably good for user as it eradicates the need to do
> conversion. Especially if we could work with compressed files. Especially
> if we build interface to work with other type of text outputs (e.g. IMPUTE2
> would be a candidate)...
>
> Yurii
>
> ----------------
> Sent from mobile device, please excuse possible typos
>
>  On 28 Mar 2014, at 23:19, "L.C. Karssen" <lennart at karssen.org> wrote:
>
> Dear all,
>
> (I guess the previous version of this mail went to the commit email
> list, so here it is again for the devel list).
>
>
> Indeed: an impressive speed-up! Well done Maarten.
>
>  On 28-03-14 20:30, Maarten Kooyman wrote:
> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted
> for sex and age (I did not run it in triplet but gives an idea)
>
> version 0.42 0.50_branch
> FV         58     52
> mldose  48    12
> all times ate in seconds.
>
> As you can see the filevector format in the part that slows down the
> program. When profiling the reading from FV takes up 86% of all the time
> the program takes.
>
>
> The current problem with reading from filevector is that the fv dat ais
> stored in floats (this is logical as it means half the disk space usage
> compared to storing doubles, moreover, the imputed data is never more
> precise than a float anyway).
> However, internally ProbABEL uses doubles for calculations. This means
> conversion from float to double must occur at some point.
>
> Simply casting to double gives impression. For example casting a float
> 0.677 to double gives: 0.67699998617172241
> Therefore, with version 0.4.0 I changed this and used a string as
> intermediate form, followed by strtod(). First I used stringstreams, but
> these turn out to be much too slow for our use case. Now snprintf() is
> used. For the above example the double value is: 0.67700000000000005,
> much closer to what we would like to see. Using this two-step conversion
> means the output when using fv is equal to the output using txt data
> (and equal to using R), within float precision.
>
> Using Maarten's 'strtod' will speed up this part as well, but the
> snprintf() call is still expensive.
>
> Apart from this two-step conversion we may also be inefficient because
> the dosage/probability values are converted one array element at the
> time. Maybe we can gain something there, like Maarten did for the txt
> format and simply sending a whole 'line'/array to the conversion may help.
>
>
>
>
> Given that most people nowadays store their imputation results in chunks
> of chromosomes anyway (i.e. small(er) files), and the fact that I think
> implementing the ability to read gziped files is not difficult, it may
> be time to give mldose.gz files another chance for ProbABEL users. It
> will save them the conversion from mldose.gz to DatABEL.
> Of course we can still support DatABEL files, but (depending on how fast
> reading from gzipped files is), our recommendation could change with the
> upcoming ProbABEL v0.5.0.
>
> Any thoughts on this?
>
>
> Best,
>
> Lennart.
>
>
>
>
>
>  On 28-03-14 20:15, Yury Aulchenko wrote:
> 10 fold is good speed up. An order of magnitude :)
>
> Wonder how it compares now to the reading from plain text files?
>
> Y
>
> ----------------
> Sent from mobile device, please excuse possible typos
>
>  On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote:
>
> Author: maartenk
> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014)
> New Revision: 1664
>
> Modified:
>    branches/ProbABEL-0.50/src/gendata.cpp
>    branches/ProbABEL-0.50/src/gendata.h
> Log:
> new implementation of reading in numbers of mldose file: this version
> is about a 10(!) fold faster than in ProABEL 0.42
>
> Modified: branches/ProbABEL-0.50/src/gendata.cpp
> ===================================================================
> --- branches/ProbABEL-0.50/src/gendata.cpp    2014-03-27 21:16:16 UTC
> (rev 1663)
> +++ branches/ProbABEL-0.50/src/gendata.cpp    2014-03-28 19:12:41 UTC<
>
>

-- 
-----------------------------------------------------
Yurii S. Aulchenko

[ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [
Twitter<http://twitter.com/YuriiAulchenko>] [
Blog <http://yurii-aulchenko.blogspot.nl/> ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140409/e308e88d/attachment.html>

From j.verkouteren at erasmusmc.nl  Mon Apr 14 16:19:47 2014
From: j.verkouteren at erasmusmc.nl (J.A.C. Verkouteren)
Date: Mon, 14 Apr 2014 14:19:47 +0000
Subject: [GenABEL-dev] Not receiving activiation e-mail new account
	GenABEL.org
Message-ID: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl>

Dear developers,

I just tried to create a new account for your forum (username: JACV) but for some reason I do not receive the activation e-mail. Could your message be seen as spam by Erasmus MC Outlook?


Kind regards,


Joris A.C. Verkouteren MD


PhD student


Dermatology


[Erasmus MC]


P.O. Box 2040, 3000 CA Rotterdam, The Netherlands, internal postal address Gk-318


Visiting address: Burg. s' Jacobplein 51, 3015 CA Rotterdam, The Netherlands, room Gk-026 (Building Rochussenstraat)


E j.verkouteren at erasmusmc.nl<mailto:j.verkouteren at erasmusmc.nl> | T +31 10 703 89 51


www.erasmusmc.nl<http://www.erasmusmc.nl> | www.erasmusmc.nl/dermatologie<http://www.erasmusmc.nl/dermatologie>


Presence: Monday-Friday


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140414/a592876d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 5102 bytes
Desc: image001.gif
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140414/a592876d/attachment.gif>

From lennart at karssen.org  Mon Apr 14 21:36:44 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 14 Apr 2014 21:36:44 +0200
Subject: [GenABEL-dev] Not receiving activiation e-mail new account
	GenABEL.org
In-Reply-To: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl>
References: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl>
Message-ID: <534C38CC.2080802@karssen.org>

Dear Joris,

Thanks for registering a forum account. I can't really say what happened
to the activation e-mail, but I've just activated your account.
Feel free to contact us again if it doesn't work out.

Welcome on our forum!

Best,

Lennart.


On 14-04-14 16:19, J.A.C. Verkouteren wrote:
> Dear developers,
> 
> I just tried to create a new account for your forum (username: JACV) but
> for some reason I do not receive the activation e-mail. Could your
> message be seen as spam by Erasmus MC Outlook?
> 
> 	
> 
> Kind regards,
> 	
> 
> *Joris A.C. Verkouteren MD*
> 
>  
> /PhD student/ 
> Dermatology
> Erasmus MC
> 

> P.O. Box 2040, 3000 CA Rotterdam, The Netherlands, internal postal address Gk-318
> Visiting address: Burg. s' Jacobplein 51, 3015 CA Rotterdam, The Netherlands, room Gk-026 (Building Rochussenstraat) 
> E j.verkouteren at erasmusmc.nl <mailto:j.verkouteren at erasmusmc.nl> |
> T +31 10 703 89 51
> www.erasmusmc.nl <http://www.erasmusmc.nl> |
> www.erasmusmc.nl/dermatologie <http://www.erasmusmc.nl/dermatologie>
> 
> Presence: Monday-Friday
> 
>  
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140414/09fd7617/attachment.sig>

From lennart at karssen.org  Mon Apr 14 22:55:01 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 14 Apr 2014 22:55:01 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1689 - pkg/ProbABEL/src
In-Reply-To: <20140414201904.2F0F1186B65@r-forge.r-project.org>
References: <20140414201904.2F0F1186B65@r-forge.r-project.org>
Message-ID: <534C4B25.60908@karssen.org>

Hi Maarten,

Thanks for doing the cleaning :-)!


Lennart.

On 14-04-14 22:19, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-14 22:19:03 +0200 (Mon, 14 Apr 2014)
> New Revision: 1689
> 
> Modified:
>    pkg/ProbABEL/src/gendata.cpp
> Log:
> removed some old  non functional code
> 
> Modified: pkg/ProbABEL/src/gendata.cpp
> ===================================================================
> --- pkg/ProbABEL/src/gendata.cpp	2014-04-11 09:26:28 UTC (rev 1688)
> +++ pkg/ProbABEL/src/gendata.cpp	2014-04-14 20:19:03 UTC (rev 1689)
> @@ -253,7 +253,6 @@
>      }
>  
>      std::string tmpid, tmpstr;
> -    char inStr[8];
>  
>      int k = 0;
>      for (unsigned int i = 0; i < npeople; i++)
> @@ -290,58 +289,11 @@
>                  infile >> tmpstr;
>              }
>  
> -            int oldstyle = 0;
> -            if (oldstyle == 1)
> -            {
> -                for (unsigned int j = 0; j < (nsnps * ngpreds); j++)
> -                {
> -                    if (infile.good())
> -                    {
> -                        infile >> inStr;
> -                        // tmpstr contains the dosage/probability in
> -                        // string form. Convert it to double (if tmpstr is
> -                        // NaN it will be set to nan).
> -                        double dosage;
> -                        char *endptr;
> -                        errno = 0;      // To distinguish success/failure
> -                                        // after strtod()
> +            std::string all_numbers;
> +            all_numbers.reserve(nsnps * ngpreds * 7);
> +            std::getline(infile, all_numbers);
> +            mldose_line_to_matrix(k, all_numbers.c_str(), nsnps * ngpreds);
>  
> -                        dosage = strtod(inStr, &endptr);
> -                        if ((errno == ERANGE
> -                                && (dosage == HUGE_VALF || dosage == HUGE_VALL))
> -                                || (errno != 0 && dosage == 0))
> -                        {
> -                            perror("Error while reading genetic data (strtod)");
> -                            exit(EXIT_FAILURE);
> -                        }
> -
> -                        if (endptr == tmpstr.c_str())
> -                        {
> -                            cerr
> -                                    << "No digits were found while reading genetic data"
> -                                    << " (individual " << i + 1 << ", position "
> -                                    << j + 1 << ")" << endl;
> -                            exit(EXIT_FAILURE);
> -                        }
> -                        /* If we got here, strtod() successfully parsed a number */
> -                        G.put(dosage, k, j);
> -                    }
> -                    else
> -                    {
> -                        std::cerr << "cannot read dose-file: " << fname
> -                                << "check skipd and ngpreds parameters\n";
> -                        infile.close();
> -                        exit(1);
> -                    }
> -                }
> -            }
> -            else
> -            {
> -                std::string all_numbers;
> -                all_numbers.reserve(nsnps * ngpreds * 7);
> -                std::getline(infile, all_numbers);
> -                mldose_line_to_matrix(k, all_numbers.c_str(), nsnps * ngpreds);
> -            }
>              k++;
>          }
>          else
> @@ -361,7 +313,6 @@
>  
>  }
>  
> -
>  // HERE NEED A NEW CONSTRUCTOR BASED ON DATABELBASECPP OBJECT
>  gendata::~gendata()
>  {
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140414/f14ff67b/attachment-0001.sig>

From yurii.aulchenko at gmail.com  Wed Apr 16 13:02:19 2014
From: yurii.aulchenko at gmail.com (Yurii Aulchenko)
Date: Wed, 16 Apr 2014 13:02:19 +0200
Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on
	big-endian architectures
In-Reply-To: <20140416102331.E8B38187585@r-forge.r-project.org>
References: <20140416102331.E8B38187585@r-forge.r-project.org>
Message-ID: <-4562335105651951279@unknownmsgid>

Is that something for bug tracker or forum or a mix?

----------------------
Yurii Aulchenko
(sent from mobile device)

> On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" <genabel-bugs at r-forge.r-project.org> wrote:
>
> Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic
> You can respond by visiting:
> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505
>
> Status: Open
> Priority: 2
> Submitted By: Lennart Karssen (lckarssen)
> Assigned to: Nobody (None)
> Summary: Filevector doesn't work on big-endian architectures
> Resolution: Accepted As Bug
> Operating System: All
> Severity: normal
> Hardware: Other
> Version: other
> Component: FileVector
> URL: https://buildd.debian.org/status/package.php?p=probabel
>
>
> Initial Comment:
> The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input.
>
> Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html
> Wollny writes:
> "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the
> function blockWriteOrRead uses fstream.read|write to do raw data IO and
> then in other parts of the code the data is just cast to the desired
> type without doing any checks of endianess let alone the needed
> conversions."
>
>
> Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds.
>
> ----------------------------------------------------------------------
>
> Comment By: Jurica Stanojkovic (juricast)
> Date: 2014-04-16 12:23
>
> Message:
> Hello,
>
> I have tried building package probabel on mips big endian.
> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones.
>
> I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created.
> The package was built with new files without an error.
>
> I used following command to create files:
> library(GenABEL)
> library(DatABEL)
> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose")
> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE)
> mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose")
> mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>
> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions:
>
> What is the best course of action for supporting probabel on big endian?
> Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)?
> Or can *.fvd, *.fvi files be replaced with big endian files for big endian build?
>
> Is it necessary to be able to use *.fvd *.fvi files created on a different endian system?
>
> I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem.
>
> Regards,
> Jurica
>
> ----------------------------------------------------------------------
>
> Comment By: Lennart Karssen (lckarssen)
> Date: 2014-01-27 21:20
>
> Message:
> A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics.
>
>
> ----------------------------------------------------------------------
>
> You can respond by visiting:
> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505

From lennart at karssen.org  Wed Apr 16 14:29:27 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 16 Apr 2014 14:29:27 +0200
Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on
 big-endian architectures
In-Reply-To: <-4562335105651951279@unknownmsgid>
References: <20140416102331.E8B38187585@r-forge.r-project.org>
 <-4562335105651951279@unknownmsgid>
Message-ID: <534E77A7.5040005@karssen.org>

I would say this is for the dev list + bug


Lennart.

On 16-04-14 13:02, Yurii Aulchenko wrote:
> Is that something for bug tracker or forum or a mix?
> 
> ----------------------
> Yurii Aulchenko
> (sent from mobile device)
> 
>> On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" <genabel-bugs at r-forge.r-project.org> wrote:
>>
>> Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic
>> You can respond by visiting:
>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505
>>
>> Status: Open
>> Priority: 2
>> Submitted By: Lennart Karssen (lckarssen)
>> Assigned to: Nobody (None)
>> Summary: Filevector doesn't work on big-endian architectures
>> Resolution: Accepted As Bug
>> Operating System: All
>> Severity: normal
>> Hardware: Other
>> Version: other
>> Component: FileVector
>> URL: https://buildd.debian.org/status/package.php?p=probabel
>>
>>
>> Initial Comment:
>> The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input.
>>
>> Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html
>> Wollny writes:
>> "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the
>> function blockWriteOrRead uses fstream.read|write to do raw data IO and
>> then in other parts of the code the data is just cast to the desired
>> type without doing any checks of endianess let alone the needed
>> conversions."
>>
>>
>> Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds.
>>
>> ----------------------------------------------------------------------
>>
>> Comment By: Jurica Stanojkovic (juricast)
>> Date: 2014-04-16 12:23
>>
>> Message:
>> Hello,
>>
>> I have tried building package probabel on mips big endian.
>> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones.
>>
>> I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created.
>> The package was built with new files without an error.
>>
>> I used following command to create files:
>> library(GenABEL)
>> library(DatABEL)
>> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose")
>> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE)
>> mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose")
>> mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>>
>> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions:
>>
>> What is the best course of action for supporting probabel on big endian?
>> Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)?
>> Or can *.fvd, *.fvi files be replaced with big endian files for big endian build?
>>
>> Is it necessary to be able to use *.fvd *.fvi files created on a different endian system?
>>
>> I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem.
>>
>> Regards,
>> Jurica
>>
>> ----------------------------------------------------------------------
>>
>> Comment By: Lennart Karssen (lckarssen)
>> Date: 2014-01-27 21:20
>>
>> Message:
>> A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics.
>>
>>
>> ----------------------------------------------------------------------
>>
>> You can respond by visiting:
>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140416/9e48ddc8/attachment.sig>

From lennart at karssen.org  Wed Apr 16 15:11:49 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 16 Apr 2014 15:11:49 +0200
Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on
 big-endian architectures
In-Reply-To: <534E77A7.5040005@karssen.org>
References: <20140416102331.E8B38187585@r-forge.r-project.org>
 <-4562335105651951279@unknownmsgid> <534E77A7.5040005@karssen.org>
Message-ID: <534E8195.2040906@karssen.org>

Hmm that was maybe a bit too short a reaction :-).

I'd suggest the following:
- Move the discussion to the dev list (mention that in the bug tracker
as well)
- Once a course of action has been decided we can do status updates in
the bug tracker.


Lennart.

On 16-04-14 14:29, L.C. Karssen wrote:
> I would say this is for the dev list + bug
> 
> 
> Lennart.
> 
> On 16-04-14 13:02, Yurii Aulchenko wrote:
>> Is that something for bug tracker or forum or a mix?
>>
>> ----------------------
>> Yurii Aulchenko
>> (sent from mobile device)
>>
>>> On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" <genabel-bugs at r-forge.r-project.org> wrote:
>>>
>>> Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic
>>> You can respond by visiting:
>>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505
>>>
>>> Status: Open
>>> Priority: 2
>>> Submitted By: Lennart Karssen (lckarssen)
>>> Assigned to: Nobody (None)
>>> Summary: Filevector doesn't work on big-endian architectures
>>> Resolution: Accepted As Bug
>>> Operating System: All
>>> Severity: normal
>>> Hardware: Other
>>> Version: other
>>> Component: FileVector
>>> URL: https://buildd.debian.org/status/package.php?p=probabel
>>>
>>>
>>> Initial Comment:
>>> The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input.
>>>
>>> Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html
>>> Wollny writes:
>>> "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the
>>> function blockWriteOrRead uses fstream.read|write to do raw data IO and
>>> then in other parts of the code the data is just cast to the desired
>>> type without doing any checks of endianess let alone the needed
>>> conversions."
>>>
>>>
>>> Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds.
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Comment By: Jurica Stanojkovic (juricast)
>>> Date: 2014-04-16 12:23
>>>
>>> Message:
>>> Hello,
>>>
>>> I have tried building package probabel on mips big endian.
>>> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones.
>>>
>>> I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created.
>>> The package was built with new files without an error.
>>>
>>> I used following command to create files:
>>> library(GenABEL)
>>> library(DatABEL)
>>> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose")
>>> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE)
>>> mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose")
>>> mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>>>
>>> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions:
>>>
>>> What is the best course of action for supporting probabel on big endian?
>>> Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)?
>>> Or can *.fvd, *.fvi files be replaced with big endian files for big endian build?
>>>
>>> Is it necessary to be able to use *.fvd *.fvi files created on a different endian system?
>>>
>>> I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem.
>>>
>>> Regards,
>>> Jurica
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Comment By: Lennart Karssen (lckarssen)
>>> Date: 2014-01-27 21:20
>>>
>>> Message:
>>> A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics.
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> You can respond by visiting:
>>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140416/ac2b1243/attachment.sig>

From lennart at karssen.org  Fri Apr 18 16:35:29 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Fri, 18 Apr 2014 16:35:29 +0200
Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL
Message-ID: <53513831.3040506@karssen.org>

Dear list,

In the past few months Maarten has made several speed improvements to
ProbABEL. Many of these speedups make use of the EIGEN library that was
first introduced into ProbABEL in v0.3.0.
After merging Maarten's branch with trunk (and after I independently
added more extensive checks in Jenkins) we found out compilation after
configuring ProbABEL using
    ./configure --without-eigen
fails. Fixing this is not trivial, so we are hereby proposing to remove
the --without-eigen option. This doesn't necessarily mean that all
mematrix code needs to be removed immediately, but by insisting on using
EIGEN we can at least start removing the old code.

Impact analysis for users and developers:
1) positive: consistent (and faster) analysis speed experience for all
users: everybody will use EIGEN

2) positive: reduction of maintenance/development time because we no
longer need to maintain the non-EIGEN parts of the code.

3) possibly negative: we need to make a choice on whether we will
distribute EIGEN with the ProbABEL code, or whether we 'force' the user
to download the code themselves.


Point 3) is similar to the debate about libfilevector: do we go for a
simple user experience where all requirements are combined in the
distributed source code, or do we make use of the modularity of the code
and its dependencies and let people download and install the
dependencies themselves (or use packages provided by the OS).
In the upcoming release we also plan to include calculation of p-values
using the Boost libraries [0]. The same issue will arise there again.

Therefore, I would like to start/continue the discussion here on how to
proceed with external dependencies. I'm really looking forward to your
opinions. Below I've outlined several options I could think of on how to
go forward. Let me know what you think of them or if you have any other
ideas.


Thanks a lot,

Lennart.


Note 1: For ProbABEL we provide pre-compiled MS Windows binaries, so
that platform is not part of this discussion.
Note 2: EIGEN consists of header files only, no compilation is needed to
use EIGEN (either at compile time or at run time).

I see the following options:
a) include a copy of the EIGEN source code in the ProbABEL code base (in
SVN)
b) include a copy of the EIGEN source code in the official released
ProbABEL tar.gz.
c) don't include the EIGEN source code, but provide very clear
instructions on how to obtain EIGEN.
d) include a script that downloads and extracts the latest EIGEN and
mention that script in the installation instructions.
e) Automatic download and extraction of the EIGEN source code during the
./configure (or make) process of ProbABEL.

More details about these options:
a):
 - Licence-wise this seems possible as EIGEN is released under the MPL2.
But Q14 of http://www.mozilla.org/MPL/2.0/FAQ.html doesn't immediately
make clear to me what the requirements/repercussions are. More thorough
reading of the licence is probably required.
 - ProbABEL contains both GPL and LGPL licensed files (a complete
overview had to be made for the Debian package and can be found at [1]),
so I'm not overly happy to add yet another type of licence.
 - simple for the user; everything is in and compiles cleanly.
 - developers don't need to keep up with updates of EIGEN, so no
incompatibility; we can keep the current EIGEN code in there forever
(like was done with parts of the code from the R survival package)
 - However, with a copy of the EIGEN code in SVN we don't benefit from
bug fixes and improvements in EIGEN.

b):
 - The same licence issues as in a) apply
 - simple for the user
 - developers will need to keep up with new EIGEN releases, but we
benefit from their improvements and bug fixes (unless we always
distribute with the EIGEN version 3.2.1 (the current version).

c):
 - This is what we currently do. This allows
users/administrators/packages to use EIGEN either by downloading and
extracting it themselves or use OS-provided packages. Maybe we can
improve the documentation to make it even easier.
 - This requires more 'investment' from the user: they need to carefully
read the installation instructions AND download and extract EIGEN AND
add the path with extracted code to the ./configure
--with-eigen-include-path=/your/path/to/eigen option.

d):
 - This would be easy to do, but would require the user to have wget or
curl installed (are these available for all architectures?). Does that
make things better? The good thing is we can fix the extraction
directory so the ./configure --with-eigen-include option can be preset.
 - No hassle with licences
 - users/developers/packagers who want to use an OS-provided EIGEN
package can do so

e):
 - simple for the user
 - no hassle with licences
 - same dependency on wget or curl as d)
 - I'm not sure how to do that in configure.ac, but I think it can be done.
 - unless we add an --dont-download-eigen option to configure.ac
users/developers/packagers who want to use OS-provided EIGEN packages
won't be happy.


[0] http://www.boost.org/
[1] http://sources.debian.net/src/probabel/0.4.3-1/debian/copyright

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140418/01d584fd/attachment.sig>

From yurii.aulchenko at gmail.com  Sat Apr 19 11:44:53 2014
From: yurii.aulchenko at gmail.com (Yurii Aulchenko)
Date: Sat, 19 Apr 2014 11:44:53 +0200
Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from
	ProbABEL
In-Reply-To: <53513831.3040506@karssen.org>
References: <53513831.3040506@karssen.org>
Message-ID: <1564471774009216793@unknownmsgid>

I am for going eigen way (only). In general, the more use of standard
libraries we do, the better; here only concern is how difficult it is
for user I do proper installation. In case of eigen this does not seem
to be a big problem. (But mind the experience we had with GSL and
MixABEL - this appeared to be non-installable by many users)

----------------------
Yurii Aulchenko
(sent from mobile device)

> On Apr 18, 2014, at 4:35 PM, "L.C. Karssen" <lennart at karssen.org> wrote:
>
> Dear list,
>
> In the past few months Maarten has made several speed improvements to
> ProbABEL. Many of these speedups make use of the EIGEN library that was
> first introduced into ProbABEL in v0.3.0.
> After merging Maarten's branch with trunk (and after I independently
> added more extensive checks in Jenkins) we found out compilation after
> configuring ProbABEL using
>    ./configure --without-eigen
> fails. Fixing this is not trivial, so we are hereby proposing to remove
> the --without-eigen option. This doesn't necessarily mean that all
> mematrix code needs to be removed immediately, but by insisting on using
> EIGEN we can at least start removing the old code.
>
> Impact analysis for users and developers:
> 1) positive: consistent (and faster) analysis speed experience for all
> users: everybody will use EIGEN
>
> 2) positive: reduction of maintenance/development time because we no
> longer need to maintain the non-EIGEN parts of the code.
>
> 3) possibly negative: we need to make a choice on whether we will
> distribute EIGEN with the ProbABEL code, or whether we 'force' the user
> to download the code themselves.
>
>
> Point 3) is similar to the debate about libfilevector: do we go for a
> simple user experience where all requirements are combined in the
> distributed source code, or do we make use of the modularity of the code
> and its dependencies and let people download and install the
> dependencies themselves (or use packages provided by the OS).
> In the upcoming release we also plan to include calculation of p-values
> using the Boost libraries [0]. The same issue will arise there again.
>
> Therefore, I would like to start/continue the discussion here on how to
> proceed with external dependencies. I'm really looking forward to your
> opinions. Below I've outlined several options I could think of on how to
> go forward. Let me know what you think of them or if you have any other
> ideas.
>
>
> Thanks a lot,
>
> Lennart.
>
>
>
> Note 1: For ProbABEL we provide pre-compiled MS Windows binaries, so
> that platform is not part of this discussion.
> Note 2: EIGEN consists of header files only, no compilation is needed to
> use EIGEN (either at compile time or at run time).
>
> I see the following options:
> a) include a copy of the EIGEN source code in the ProbABEL code base (in
> SVN)
> b) include a copy of the EIGEN source code in the official released
> ProbABEL tar.gz.
> c) don't include the EIGEN source code, but provide very clear
> instructions on how to obtain EIGEN.
> d) include a script that downloads and extracts the latest EIGEN and
> mention that script in the installation instructions.
> e) Automatic download and extraction of the EIGEN source code during the
> ./configure (or make) process of ProbABEL.
>
> More details about these options:
> a):
> - Licence-wise this seems possible as EIGEN is released under the MPL2.
> But Q14 of http://www.mozilla.org/MPL/2.0/FAQ.html doesn't immediately
> make clear to me what the requirements/repercussions are. More thorough
> reading of the licence is probably required.
> - ProbABEL contains both GPL and LGPL licensed files (a complete
> overview had to be made for the Debian package and can be found at [1]),
> so I'm not overly happy to add yet another type of licence.
> - simple for the user; everything is in and compiles cleanly.
> - developers don't need to keep up with updates of EIGEN, so no
> incompatibility; we can keep the current EIGEN code in there forever
> (like was done with parts of the code from the R survival package)
> - However, with a copy of the EIGEN code in SVN we don't benefit from
> bug fixes and improvements in EIGEN.
>
> b):
> - The same licence issues as in a) apply
> - simple for the user
> - developers will need to keep up with new EIGEN releases, but we
> benefit from their improvements and bug fixes (unless we always
> distribute with the EIGEN version 3.2.1 (the current version).
>
> c):
> - This is what we currently do. This allows
> users/administrators/packages to use EIGEN either by downloading and
> extracting it themselves or use OS-provided packages. Maybe we can
> improve the documentation to make it even easier.
> - This requires more 'investment' from the user: they need to carefully
> read the installation instructions AND download and extract EIGEN AND
> add the path with extracted code to the ./configure
> --with-eigen-include-path=/your/path/to/eigen option.
>
> d):
> - This would be easy to do, but would require the user to have wget or
> curl installed (are these available for all architectures?). Does that
> make things better? The good thing is we can fix the extraction
> directory so the ./configure --with-eigen-include option can be preset.
> - No hassle with licences
> - users/developers/packagers who want to use an OS-provided EIGEN
> package can do so
>
> e):
> - simple for the user
> - no hassle with licences
> - same dependency on wget or curl as d)
> - I'm not sure how to do that in configure.ac, but I think it can be done.
> - unless we add an --dont-download-eigen option to configure.ac
> users/developers/packagers who want to use OS-provided EIGEN packages
> won't be happy.
>
>
>
>
> [0] http://www.boost.org/
> [1] http://sources.debian.net/src/probabel/0.4.3-1/debian/copyright
>
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

From kooyman at gmail.com  Mon Apr 21 20:18:02 2014
From: kooyman at gmail.com (Maarten Kooyman)
Date: Mon, 21 Apr 2014 20:18:02 +0200
Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from
	ProbABEL
In-Reply-To: <53513831.3040506@karssen.org>
References: <53513831.3040506@karssen.org>
Message-ID: <535560DA.5060008@gmail.com>


On 18-04-14 16:35, L.C. Karssen wrote:
> a) include a copy of the EIGEN source code in the ProbABEL code base (in
> SVN)
I strongly oppose to this option: we do not want to maintain this code 
and what should it do in our SVN?
> b) include a copy of the EIGEN source code in the official released
> ProbABEL tar.gz.
This seems to me as the most foolproof way to distribute ProbABEL as 
code: you control also the versions of dependencies which can be handy 
compared to run into old versions of libraries. This  results sometime 
in faulty binaries or non compiling set ups . Licence wise it looks 
all-right to me (however, I am not a OSS lawyer). The EIGEN source files 
as provided on the website are about a megabyte: this should not be a 
problem for distribution. If you look at the boost library licence wise 
it seem also fine,however the download provided as on there site is 60 
megabyte: quite a download! We have to trim down this size one way or an 
other.

> c) don't include the EIGEN source code, but provide very clear
> instructions on how to obtain EIGEN.
Reading manuals is often not done. Also this makes it harder for 
inexperience computer user and rises the bar for usage.

> d) include a script that downloads and extracts the latest EIGEN and
> mention that script in the installation instructions.

> e) Automatic download and extraction of the EIGEN source code during the
> ./configure (or make) process of ProbABEL.
Sounds nice but right now I have problems to download EIGEN from there 
server.  Maybe we should host the software ourself. This still causes 
size problems for downloading boost. Option E is as a workflow easier 
then options D. However, this downloading can be buggy since you not 
sure wget/curl is installed on the users system. (This needs also direct 
internet connection to the WWW and this not always the cause on some 
servers)


Why do we not provide a statically  executable? We have Jenkins in place 
to perform the builds.


Kind regards,

Maarten


From lennart at karssen.org  Wed Apr 23 08:33:29 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 23 Apr 2014 08:33:29 +0200
Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from
	ProbABEL
In-Reply-To: <535560DA.5060008@gmail.com>
References: <53513831.3040506@karssen.org> <535560DA.5060008@gmail.com>
Message-ID: <53575EB9.5080800@karssen.org>

Great, it seems that we have a go-ahead for switching to an EIGEN-only
ProbABEL. I'll start with the removal of the relevant options in
configure.ac.

On 21-04-14 20:18, Maarten Kooyman wrote:
> 
> 
> On 18-04-14 16:35, L.C. Karssen wrote:
>> a) include a copy of the EIGEN source code in the ProbABEL code base (in
>> SVN)
> I strongly oppose to this option: we do not want to maintain this code
> and what should it do in our SVN?

I completely agree.

>> b) include a copy of the EIGEN source code in the official released
>> ProbABEL tar.gz.
> This seems to me as the most foolproof way to distribute ProbABEL as
> code: you control also the versions of dependencies which can be handy
> compared to run into old versions of libraries. This  results sometime
> in faulty binaries or non compiling set ups . Licence wise it looks
> all-right to me (however, I am not a OSS lawyer).

I still want to check that in more detail. I'll post my conclusions in
this thread.

> The EIGEN source files
> as provided on the website are about a megabyte: this should not be a
> problem for distribution. 

Indeed. Size is not an issue for EIGEN. For Boost you already noticed
the problem below.

> If you look at the boost library licence wise
> it seem also fine,

I agree.

> however the download provided as on there site is 60
> megabyte: quite a download! We have to trim down this size one way or an
> other.

Yes, that's my point. That's one more reason why I am not too happy with
'distribution with ProbABEL options'. On the other hand, we may decide
to have a different policy for EIGEN than for Boost.


> 
>> c) don't include the EIGEN source code, but provide very clear
>> instructions on how to obtain EIGEN.
> Reading manuals is often not done. Also this makes it harder for
> inexperience computer user and rises the bar for usage.

True, people don't read. On the other hand, how many 'inexperienced'
users do we have? Probably quite some, but I'm quite sure they don't
know about the ./configure; make; make install steps either. Moreover,
in order to install it themselves (without root privileges), they need
to know about the ./configure --prefix option. Which is also in the
documentation.
So, all in all, I'm not so sure 'inexperienced' users will be able to
successfully compile install ProbABEL without at least some reading.
How about adding (a copy of) the necessary steps to the ProbABEL website
as well? That way users will find them when looking for the source.

> 
>> d) include a script that downloads and extracts the latest EIGEN and
>> mention that script in the installation instructions.
> 
>> e) Automatic download and extraction of the EIGEN source code during the
>> ./configure (or make) process of ProbABEL.
> Sounds nice but right now I have problems to download EIGEN from there
> server.  Maybe we should host the software ourself. 

Hmm, doesn't that (somewhat) contradict what you wrote under a) about
not wanting to host the code ourselves?

> This still causes
> size problems for downloading boost. Option E is as a workflow easier
> then options D. However, this downloading can be buggy since you not
> sure wget/curl is installed on the users system. (This needs also direct
> internet connection to the WWW and this not always the cause on some
> servers)

Yup.

> 
> 
> Why do we not provide a statically  executable? We have Jenkins in place
> to perform the builds.
> 

That's a good suggestion (or actually two). We can certainly do that. I
should also try to get download statistics for source packages (and
later the statically linked binaries) from the web server. That will
help us to get a better idea of which is used.


Thanks for your input! Hoping to see input from others as well on this
matter.


Best,

Lennart.

> 
> Kind regards,
> 
> Maarten
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140423/43f8095a/attachment.sig>

From lennart at karssen.org  Wed Apr 23 18:03:53 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 23 Apr 2014 18:03:53 +0200
Subject: [GenABEL-dev] ProbABEL v0.4.3 for Ubuntu 14.04 LTS uploaded to PPA
Message-ID: <5357E469.9020704@karssen.org>

Dear list,

This is to inform you that I have just uploaded the ProbABEL v0.4.3
packages for Ubuntu 14.04 LTS (which was released on April 17th) to the
GenABEL PPA [1]. The packages have been built successfully and can be
installed now.

Those who upgraded an older Ubuntu installation to 14.04 will need to
re-enable the PPA to receive updates.
Please post any questions related to installation from the PPA on our
forum [2].


Best regards,

Lennart.


[1] https://launchpad.net/~l.c.karssen/+archive/genabel-ppa
[2] http://forum.genabel.org/
-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140423/3871d2ca/attachment.sig>

From lennart at karssen.org  Thu Apr 24 11:45:28 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Thu, 24 Apr 2014 11:45:28 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1695 - pkg/ProbABEL/src
In-Reply-To: <20140423185426.CBEEB18736F@r-forge.r-project.org>
References: <20140423185426.CBEEB18736F@r-forge.r-project.org>
Message-ID: <5358DD38.2070302@karssen.org>

Wow, that's a lot of code removal!

Glad to see this clean-up. Thanks Maarten! Now we can slowly start
thinking of using Eigen directly instead of going through the
eigen_mematrix "wrapper". However, I don't think this has a high
priority now. Moreover, it is quite a large task, so definitely should
be done in a branch and extensively tested.


Lennart.

On 23-04-14 20:54, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-23 20:54:26 +0200 (Wed, 23 Apr 2014)
> New Revision: 1695
> 
> Removed:
>    pkg/ProbABEL/src/mematri1.h
>    pkg/ProbABEL/src/mematrix.h
> Modified:
>    pkg/ProbABEL/src/cholesky.cpp
>    pkg/ProbABEL/src/cholesky.h
>    pkg/ProbABEL/src/command_line_settings.cpp
>    pkg/ProbABEL/src/coxph_data.cpp
>    pkg/ProbABEL/src/coxph_data.h
>    pkg/ProbABEL/src/data.cpp
>    pkg/ProbABEL/src/gendata.cpp
>    pkg/ProbABEL/src/gendata.h
>    pkg/ProbABEL/src/main.cpp
>    pkg/ProbABEL/src/maskedmatrix.cpp
>    pkg/ProbABEL/src/maskedmatrix.h
>    pkg/ProbABEL/src/phedata.h
>    pkg/ProbABEL/src/reg1.cpp
>    pkg/ProbABEL/src/regdata.h
>    pkg/ProbABEL/src/testchol.cpp
>    pkg/ProbABEL/src/usage.cpp
> Log:
> Removed mematri1.h mematrix.h specific code. This remove about 797 lines of code.
> 
> Modified: pkg/ProbABEL/src/cholesky.cpp
> ===================================================================
> --- pkg/ProbABEL/src/cholesky.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/cholesky.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -9,13 +9,8 @@
>  #include <cstdio>
>  #include <cstdlib>
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  
>  
>  /*  SCCS @(#)cholesky2.c    5.2 10/27/98
> 
> Modified: pkg/ProbABEL/src/cholesky.h
> ===================================================================
> --- pkg/ProbABEL/src/cholesky.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/cholesky.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -8,12 +8,8 @@
>  #ifndef CHOLESKY_H_
>  #define CHOLESKY_H_
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#endif
>  
>  int cholesky2_mm(mematrix<double> &matrix, double toler);
>  void chinv2_mm(mematrix<double> &matrix);
> 
> Modified: pkg/ProbABEL/src/command_line_settings.cpp
> ===================================================================
> --- pkg/ProbABEL/src/command_line_settings.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/command_line_settings.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -31,9 +31,7 @@
>  #include <iostream>
>  #include "usage.h"
>  #include "command_line_settings.h"
> -#if EIGEN
>  #include "eigen_mematrix.h"
> -#endif
>  
>  // config.h and fvlib/FileVector.h are included for the upper case variables
>  #if HAVE_CONFIG_H
> 
> Modified: pkg/ProbABEL/src/coxph_data.cpp
> ===================================================================
> --- pkg/ProbABEL/src/coxph_data.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/coxph_data.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -405,22 +405,14 @@
>  
>      // When using Eigen coxfit2 needs to be called in a slightly
>      // different way (i.e. the .data()-part needs to be added).
> -#if EIGEN
>      coxfit2(&maxiter, &cdata.nids, &X.nrow, cdata.stime.data.data(),
>              cdata.sstat.data.data(), X.data.data(), newoffset.data.data(),
>              cdata.weights.data.data(), cdata.strata.data.data(),
>              means.data.data(), beta.data.data(), u.data.data(),
>              imat.data.data(), loglik_int, &flag, work, &eps, &tol_chol,
>              &sctest);
> -#else
> -    coxfit2(&maxiter, &cdata.nids, &X.nrow, cdata.stime.data,
> -            cdata.sstat.data, X.data, newoffset.data,
> -            cdata.weights.data, cdata.strata.data,
> -            means.data, beta.data, u.data,
> -            imat.data, loglik_int, &flag, work, &eps, &tol_chol,
> -            &sctest);
> -#endif
>  
> +
>      niter = maxiter;
>  
>      // Check the results of the Cox fit; mirrored from the same checks
> @@ -449,7 +441,6 @@
>               << " setting beta and se to 'nan'\n";
>          setToZero = true;
>      } else {
> -#if EIGEN
>          VectorXd ueigen = u.data;
>          MatrixXd imateigen = imat.data;
>          VectorXd infs = ueigen.transpose() * imateigen;
> @@ -463,12 +454,7 @@
>  
>              setToZero = true;
>          }
> -#else
> -        cerr << "Warning for " << snpinfo.name[cursnp]
> -             << ": can't check for infinite betas."
> -             << " Please compile ProbABEL with Eigen support to fix this."
> -             << endl;
> -#endif
> +
>      }
>  
>      for (int i = 0; i < X.nrow; i++)
> 
> Modified: pkg/ProbABEL/src/coxph_data.h
> ===================================================================
> --- pkg/ProbABEL/src/coxph_data.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/coxph_data.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -29,13 +29,8 @@
>  #ifndef COXPH_DATA_H_
>  #define COXPH_DATA_H_
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  
>  #include "data.h"
>  #include "reg1.h"
> 
> Modified: pkg/ProbABEL/src/data.cpp
> ===================================================================
> --- pkg/ProbABEL/src/data.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/data.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -38,13 +38,8 @@
>  #include "gendata.h"
>  #include "data.h"
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  #include "utilities.h"
>  
>  
> 
> Modified: pkg/ProbABEL/src/gendata.cpp
> ===================================================================
> --- pkg/ProbABEL/src/gendata.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/gendata.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -31,13 +31,8 @@
>  #include <limits>
>  #include "gendata.h"
>  #include "fvlib/FileVector.h"
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  #include "utilities.h"
>  
>  
> 
> Modified: pkg/ProbABEL/src/gendata.h
> ===================================================================
> --- pkg/ProbABEL/src/gendata.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/gendata.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -31,13 +31,10 @@
>  #include <string>
>  #include "fvlib/FileVector.h"
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#endif
>  
> +
>  class gendata {
>   public:
>      unsigned int nsnps;
> 
> Modified: pkg/ProbABEL/src/main.cpp
> ===================================================================
> --- pkg/ProbABEL/src/main.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/main.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -67,15 +67,8 @@
>  
>  #include <ctime> //needed for timing loading non file vector format
>  
> -
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
> -
>  #include "maskedmatrix.h"
>  #include "data.h"
>  #include "reg1.h"
> 
> Modified: pkg/ProbABEL/src/maskedmatrix.cpp
> ===================================================================
> --- pkg/ProbABEL/src/maskedmatrix.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/maskedmatrix.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -30,13 +30,8 @@
>  
>  #include <algorithm>
>  #include "maskedmatrix.h"
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  
>  masked_matrix::masked_matrix()
>  {
> 
> Modified: pkg/ProbABEL/src/maskedmatrix.h
> ===================================================================
> --- pkg/ProbABEL/src/maskedmatrix.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/maskedmatrix.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -29,13 +29,8 @@
>  #ifndef MASKEDMATRIX_H_
>  #define MASKEDMATRIX_H_
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  
>  class masked_matrix {
>   public:
> 
> Deleted: pkg/ProbABEL/src/mematri1.h
> ===================================================================
> --- pkg/ProbABEL/src/mematri1.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/mematri1.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -1,636 +0,0 @@
> -/*
> - *
> - * Copyright (C) 2009--2014 Various members of the GenABEL team. See
> - * the SVN commit logs for more details.
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> - * as published by the Free Software Foundation; either version 2
> - * of the License, or (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
> - * MA  02110-1301, USA.
> - *
> - */
> -
> -
> -#ifndef MEMATRI1_H
> -#define MEMATRI1_H
> -
> -#include <cstdlib>
> -#include <string>
> -#include <cstdarg>
> -#include <cstdio>
> -
> -//
> -// constructors
> -//
> -
> -template<class DT>
> -mematrix<DT>::mematrix(int nr, int nc)
> -{
> -    if (nr <= 0)
> -    {
> -        fprintf(stderr, "mematrix(): nr <= 0\n");
> -        exit(1);
> -    }
> -    if (nc <= 0)
> -    {
> -        fprintf(stderr, "mematrix(): nc <= 0\n");
> -        exit(1);
> -    }
> -    nrow = nr;
> -    ncol = nc;
> -    nelements = nr * nc;
> -    data = new (nothrow) DT[ncol * nrow];
> -    if (!data)
> -    {
> -        fprintf(stderr, "mematrix(nr,nc): cannot allocate memory (%d,%d)\n",
> -                nrow, ncol);
> -        exit(1);
> -    }
> -}
> -
> -template<class DT>
> -mematrix<DT>::mematrix(const mematrix<DT> & M)
> -{
> -    ncol = M.ncol;
> -    nrow = M.nrow;
> -    nelements = M.nelements;
> -    data = new (nothrow) DT[M.ncol * M.nrow];
> -    if (!data)
> -    {
> -        fprintf(stderr,
> -                "mematrix const(mematrix): cannot allocate memory (%d,%d)\n",
> -                M.nrow, M.ncol);
> -        exit(1);
> -    }
> -    //	std::cerr << "mematrix const(mematrix): can allocate memory ("
> -    //            << M.nrow << "," << M.ncol << ")\n";
> -    for (int i = 0; i < M.ncol * M.nrow; i++)
> -        data[i] = M.data[i];
> -}
> -
> -//
> -// operators
> -//
> -template<class DT>
> -mematrix<DT> &mematrix<DT>::operator=(const mematrix<DT> &M)
> -{
> -    if (this != &M)
> -    {
> -        if (data != NULL)
> -            delete[] data;
> -        data = new (nothrow) DT[M.ncol * M.nrow];
> -        if (!data)
> -        {
> -            fprintf(stderr, "mematrix=: cannot allocate memory (%d,%d)\n",
> -                    M.nrow, M.ncol);
> -            delete[] data;
> -            exit(1);
> -        }
> -        ncol = M.ncol;
> -        nrow = M.nrow;
> -        nelements = M.nelements;
> -        for (int i = 0; i < M.ncol * M.nrow; i++)
> -        {
> -            data[i] = M.data[i];
> -        }
> -    }
> -    return *this;
> -}
> -
> -template<class DT>
> -DT &mematrix<DT>::operator[](int i)
> -{
> -    if (i < 0 || i >= (ncol * nrow))
> -    {
> -        fprintf(stderr, "mematrix[]: %d out of bounds (0,%d)\n", i,
> -                nrow * ncol - 1);
> -        exit(1);
> -    }
> -    return data[i];
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator+(DT toadd)
> -{
> -    mematrix<DT> temp(nrow, ncol);
> -    for (int i = 0; i < nelements; i++)
> -        temp.data[i] = data[i] + toadd;
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator+(mematrix<DT> &M)
> -{
> -    if (ncol != M.ncol || nrow != M.nrow)
> -    {
> -        fprintf(stderr,
> -                "mematrix+: matrices not equal in size (%d,%d) and (%d,%d)",
> -                nrow, ncol, M.nrow, M.ncol);
> -        exit(1);
> -    }
> -    mematrix<DT> temp(nrow, ncol);
> -    for (int i = 0; i < nelements; i++)
> -        temp.data[i] = data[i] + M.data[i];
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator-(DT toadd)
> -{
> -    mematrix<DT> temp(nrow, ncol);
> -    for (int i = 0; i < nelements; i++)
> -        temp.data[i] = data[i] - toadd;
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator-(mematrix<DT> &M)
> -{
> -    if (ncol != M.ncol || nrow != M.nrow)
> -    {
> -        fprintf(stderr,
> -                "mematrix-: matrices not equal in size (%d,%d) and (%d,%d)",
> -                nrow, ncol, M.nrow, M.ncol);
> -        exit(1);
> -    }
> -    mematrix<DT> temp(nrow, ncol);
> -    for (int i = 0; i < nelements; i++)
> -        temp.data[i] = data[i] - M.data[i];
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator*(DT toadd)
> -{
> -    // A che naschet std::string vmesto DT? Maksim.
> -    mematrix<DT> temp(nrow, ncol);
> -    for (int i = 0; i < nelements; i++)
> -        temp.data[i] = data[i] * toadd;
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator*(mematrix<DT> &M)
> -{
> -    if (ncol != M.nrow)
> -    {
> -        fprintf(stderr, "mematrix*: ncol != nrow (%d,%d) and (%d,%d)", nrow,
> -                ncol, M.nrow, M.ncol);
> -        exit(1);
> -    }
> -    mematrix<DT> temp(nrow, M.ncol);
> -    for (int j = 0; j < temp.nrow; j++)
> -    {
> -        for (int i = 0; i < temp.ncol; i++)
> -        {
> -            DT sum = 0;
> -            for (int j1 = 0; j1 < ncol; j1++)
> -                sum += data[j * ncol + j1] * M.data[j1 * M.ncol + i];
> -            temp[j * temp.ncol + i] = sum;
> -        }
> -    }
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> mematrix<DT>::operator*(mematrix<DT> *M)
> -{
> -    if (ncol != M->nrow)
> -    {
> -        fprintf(stderr, "mematrix*: ncol != nrow (%d,%d) and (%d,%d)", nrow,
> -                ncol, M->nrow, M->ncol);
> -        exit(1);
> -    }
> -    mematrix<DT> temp(nrow, M->ncol);
> -    for (int j = 0; j < temp.nrow; j++)
> -    {
> -        for (int i = 0; i < temp.ncol; i++)
> -        {
> -            DT sum = 0;
> -            for (int j1 = 0; j1 < ncol; j1++)
> -                sum += data[j * ncol + j1] * M->data[j1 * M->ncol + i];
> -            temp[j * temp.ncol + i] = sum;
> -        }
> -    }
> -    return temp;
> -}
> -
> -//
> -// operations
> -//
> -template<class DT>
> -void mematrix<DT>::reinit(int nr, int nc)
> -{
> -    if (nelements > 0)
> -        delete[] data;
> -    if (nr <= 0)
> -    {
> -        fprintf(stderr, "mematrix(): number of rows smaller then 1\n");
> -        exit(1);
> -    }
> -    if (nc <= 0)
> -    {
> -        fprintf(stderr, "mematrix(): number of columns smaller then 1\n");
> -        exit(1);
> -    }
> -    nrow = nr;
> -    ncol = nc;
> -    nelements = nr * nc;
> -    data = new (nothrow) DT[ncol * nrow];
> -    if (!data)
> -    {
> -        fprintf(stderr, "mematrix(nr,nc): cannot allocate memory (%d,%d)\n",
> -                nrow, ncol);
> -        exit(1);
> -    }
> -}
> -
> -template<class DT>
> -DT mematrix<DT>::get(int nr, int nc)
> -{
> -    if (nc < 0 || nc > ncol -1)
> -    {
> -        std::cerr << "mematrix::get: column out of range: " << nc + 1
> -                  << " not between (1," << ncol << ")\n" << std::flush;
> -        exit(1);
> -    }
> -    if (nr < 0 || nr > nrow -1)
> -    {
> -        std::cerr << "mematrix::get: row out of range: " << nr + 1
> -                  << " not between (1," << nrow << ")\n" << std::flush;
> -        exit(1);
> -    }
> -    DT temp = data[nr * ncol + nc];
> -    return temp;
> -}
> -
> -template<class DT>
> -void mematrix<DT>::put(DT value, int nr, int nc)
> -{
> -    if (nc < 0 || nc > ncol -1)
> -    {
> -        std::cerr << "mematrix::put: column out of range: " << nc + 1
> -                  << " not between (1," << ncol << ")\n" << std::flush;
> -        exit(1);
> -    }
> -    if (nr < 0 || nr > nrow -1)
> -    {
> -        std::cerr << "mematrix::put: row out of range: " << nr + 1
> -                  << " not between (1," << nrow << ")\n" << std::flush;
> -        exit(1);
> -    }
> -    data[nr * ncol + nc] = value;
> -}
> -
> -template<class DT>
> -DT mematrix<DT>::column_mean(int nc)
> -{
> -    if (nc >= ncol || nc < 0)
> -    {
> -        fprintf(stderr, "colmM bad column\n");
> -        exit(1);
> -    }
> -    DT out = 0.0;
> -    for (int i = 0; i < nrow; i++)
> -        out += DT(data[i * ncol + nc]);
> -    out /= DT(nrow);
> -    return out;
> -}
> -
> -template<class DT>
> -void mematrix<DT>::print(void)
> -{
> -    cout << "nrow=" << nrow << "; ncol=" << ncol << "; nelements=" << nelements
> -            << "\n";
> -    for (int i = 0; i < nrow; i++)
> -    {
> -        cout << "nr=" << i << ":\t";
> -        for (int j = 0; j < ncol; j++)
> -        {
> -            printf("%e\t", data[i * ncol + j]);
> -        }
> -        cout << "\n";
> -    }
> -}
> -
> -template<class DT>
> -void mematrix<DT>::delete_column(const int delcol)
> -{
> -    if (delcol > ncol || delcol < 0)
> -    {
> -        fprintf(stderr, "mematrix::delete_column: column out of range\n");
> -        exit(1);
> -    }
> -    mematrix<DT> temp = *this;
> -    if (nelements > 0)
> -        delete[] data;
> -    ncol--;
> -    nelements = ncol * nrow;
> -    data = new (nothrow) DT[ncol * nrow];
> -    if (!data)
> -    {
> -        fprintf(stderr,
> -                "mematrix::delete_column: cannot allocate memory (%d,%d)\n",
> -                nrow, ncol);
> -        delete[] data;
> -        exit(1);
> -    }
> -    int newcol = 0;
> -    for (int nr = 0; nr < temp.nrow; nr++)
> -    {
> -        newcol = 0;
> -        for (int nc = 0; nc < temp.ncol; nc++)
> -            if (nc != delcol)
> -                data[nr * ncol + (newcol++)] = temp[nr * temp.ncol + nc];
> -    }
> -}
> -
> -template<class DT>
> -void mematrix<DT>::delete_row(const int delrow)
> -{
> -    if (delrow > nrow || delrow < 0)
> -    {
> -        fprintf(stderr, "mematrix::delete_row: row out of range\n");
> -        exit(1);
> -    }
> -    mematrix<DT> temp = *this;
> -    if (nelements > 0)
> -        delete[] data;
> -    nrow--;
> -    nelements = ncol * nrow;
> -    data = new (nothrow) DT[ncol * nrow];
> -    if (!data)
> -    {
> -        fprintf(stderr,
> -                "mematrix::delete_row: cannot allocate memory (%d,%d)\n", nrow,
> -                ncol);
> -        delete[] data;
> -        exit(1);
> -    }
> -    int newrow = 0;
> -    for (int nc = 0; nc < temp.ncol; nc++)
> -    {
> -        newrow = 0;
> -        for (int nr = 0; nr < temp.nrow; nr++)
> -            if (nr != delrow)
> -                data[nr * ncol + (newrow++)] = temp[nr * temp.ncol + nc];
> -    }
> -}
> -
> -//
> -// other functions
> -//
> -template<class DT>
> -mematrix<DT> column_sum(mematrix<DT> &M)
> -{
> -    mematrix<DT> out;
> -    out.reinit(1, M.ncol);
> -    for (int j = 0; j < M.ncol; j++)
> -    {
> -        DT sum = 0;
> -        for (int i = 0; i < M.nrow; i++)
> -            sum = sum + DT(M.data[i * M.ncol + j]);
> -        out.put(sum, 0, j);
> -    }
> -    return out;
> -}
> -
> -template<class DT>
> -mematrix<DT> column_mean(mematrix<DT> &M)
> -{
> -    mematrix<DT> out;
> -    out.reinit(1, M.ncol);
> -    for (int j = 0; j < M.ncol; j++)
> -    {
> -        DT sum = 0;
> -        for (int i = 0; i < M.nrow; i++)
> -            sum = sum + DT(M.data[i * M.ncol + j]);
> -        sum /= DT(M.nrow);
> -        out.put(sum, 0, j);
> -    }
> -    return out;
> -}
> -
> -template<class DT>
> -mematrix<DT> transpose(mematrix<DT> &M)
> -{
> -    mematrix<DT> temp(M.ncol, M.nrow);
> -    for (int i = 0; i < temp.nrow; i++)
> -        for (int j = 0; j < temp.ncol; j++)
> -            temp.data[i * temp.ncol + j] = M.data[j * M.ncol + i];
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> reorder(mematrix<DT> &M, mematrix<int> order)
> -{
> -    if (M.nrow != order.nrow)
> -    {
> -        std::cerr << "reorder: M & order have different # of rows\n";
> -        exit(1);
> -    }
> -    mematrix<DT> temp(M.nrow, M.ncol);
> -    for (int i = 0; i < temp.nrow; i++)
> -        for (int j = 0; j < temp.ncol; j++)
> -            temp.data[order[i] * temp.ncol + j] = M.data[i * M.ncol + j];
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> productMatrDiag(mematrix<DT> &M, mematrix<DT> &D)
> -{
> -    //multiply all rows of M by value of first row of D
> -    if (M.ncol != D.nrow)
> -    {
> -        fprintf(stderr, "productMatrDiag: wrong dimenstions");
> -        exit(1);
> -    }
> -    mematrix<DT> temp(M.nrow, M.ncol);
> -    for (int i = 0; i < temp.nrow; i++){
> -        for (int j = 0; j < temp.ncol; j++){
> -            temp.data[i * temp.ncol + j] = M.data[i * M.ncol + j] * D.data[j];
> -    //			temp.put(M.get(i,j)*D.get(j,0),i,j);
> -        }
> -    }
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<double> todouble(mematrix<DT> &M)
> -{
> -    mematrix<double> temp(M.nrow, M.ncol);
> -    for (int i = 0; i < temp.nelements; i++)
> -        temp.data[i] = double(M.data[i]);
> -    return temp;
> -}
> -
> -template<class DT>
> -mematrix<DT> productXbySymM(mematrix<DT> &X, mematrix<DT> &M)
> -{
> -    if (M.ncol < 1 || M.nrow < 1 || X.ncol < 1 || X.nrow < 1)
> -    {
> -        fprintf(stderr,
> -                "productXbySymM: M.ncol<1 || M.nrow<1 || X.ncol<1 || X.nrow < 1\n");
> -        exit(1);
> -    }
> -    if (M.ncol != M.nrow)
> -    {
> -        fprintf(stderr, "productXbySymM: M.ncol != M.nrow\n");
> -        exit(1);
> -    }
> -    if (M.ncol != X.ncol)
> -    {
> -        fprintf(stderr, "productXbySymM: M.ncol != X.ncol\n");
> -        exit(1);
> -    }
> -    if (M.ncol != X.ncol)
> -    {
> -        fprintf(stderr, "productXbySymM: M.ncol != X.ncol\n");
> -        exit(1);
> -    }
> -
> -    mematrix<DT> out(X.nrow, X.ncol);
> -    int i, j, k;
> -
> -    double temp1, temp2, value1, value2; // not good should be of <DT>!
> -    for (k = 0; k < X.nrow; k++)
> -    {
> -        temp1 = 0.;
> -        for (i = 0; i < X.ncol; i++)
> -        {
> -            temp1 = X.get(k, i);
> -            temp2 = 0.;
> -            for (j = (i + 1); j < X.ncol; j++)
> -            {
> -                value1 = out.get(k, j) + temp1 * M.get(i, j);
> -                out.put(value1, k, j);
> -                temp2 += M.get(i, j) * X.get(k, j);
> -            }
> -            value2 = out.get(k, i) + temp2 + M.get(i, i) * X.get(k, i);
> -            out.put(value2, k, i);
> -        }
> -    }
> -
> -    return out;
> -}
> -
> -// written by Mike Dinolfo 12/98
> -// modified Yurii Aulchenko 2008-04-22
> -template<class DT>
> -mematrix<DT> invert(mematrix<DT> &M)
> -{
> -    if (M.ncol != M.nrow)
> -    {
> -        fprintf(stderr, "invert: only square matrices possible\n");
> -        exit(1);
> -    }
> -    if (M.ncol == 1)
> -    {
> -        mematrix<DT> temp(1, 1);
> -        temp[0] = 1. / M[0];
> -    }
> -    /*
> -     for (int i=0;i<M.ncol;i++)
> -     if (M.data[i*M.ncol+i]==0)
> -     {
> -     fprintf(stderr,"invert: zero elements in diagonal\n");
> -     mematrix<DT> temp = M;
> -     for (int i = 0; i < M.ncol; i++)
> -     for (int j = 0; j < M.ncol; j++)
> -     temp.put(NAN,i,j);
> -     return temp;
> -     //exit(1);
> -     }
> -     */
> -    int actualsize = M.ncol;
> -    int maxsize = M.ncol;
> -    mematrix<DT> temp = M;
> -    for (int i = 1; i < actualsize; i++)
> -        temp.data[i] /= temp.data[0]; // normalize row 0
> -    for (int i = 1; i < actualsize; i++)
> -    {
> -        for (int j = i; j < actualsize; j++)
> -        { // do a column of L
> -            DT sum = 0.0;
> -            for (int k = 0; k < i; k++)
> -                sum += temp.data[j * maxsize + k] * temp.data[k * maxsize + i];
> -            temp.data[j * maxsize + i] -= sum;
> -        }
> -        if (i == actualsize - 1)
> -            continue;
> -        for (int j = i + 1; j < actualsize; j++)
> -        { // do a row of U
> -            DT sum = 0.0;
> -            for (int k = 0; k < i; k++)
> -                sum += temp.data[i * maxsize + k] * temp.data[k * maxsize + j];
> -            temp.data[i * maxsize + j] = (temp.data[i * maxsize + j] - sum)
> -                    / temp.data[i * maxsize + i];
> -        }
> -    }
> -    for (int i = 0; i < actualsize; i++) // invert L
> -        for (int j = i; j < actualsize; j++)
> -        {
> -            DT x = 1.0;
> -            if (i != j)
> -            {
> -                x = 0.0;
> -                for (int k = i; k < j; k++)
> -                    x -= temp.data[j * maxsize + k]
> -                            * temp.data[k * maxsize + i];
> -            }
> -            temp.data[j * maxsize + i] = x / temp.data[j * maxsize + j];
> -        }
> -    for (int i = 0; i < actualsize; i++) // invert U
> -        for (int j = i; j < actualsize; j++)
> -        {
> -            if (i == j)
> -                continue;
> -            DT sum = 0.0;
> -            for (int k = i; k < j; k++)
> -                sum += temp.data[k * maxsize + j]
> -                        * ((i == k) ? 1.0 : temp.data[i * maxsize + k]);
> -            temp.data[i * maxsize + j] = -sum;
> -        }
> -    for (int i = 0; i < actualsize; i++) // final inversion
> -        for (int j = 0; j < actualsize; j++)
> -        {
> -            DT sum = 0.0;
> -            for (int k = ((i > j) ? i : j); k < actualsize; k++)
> -                sum += ((j == k) ? 1.0 : temp.data[j * maxsize + k])
> -                        * temp.data[k * maxsize + i];
> -            temp.data[j * maxsize + i] = sum;
> -        }
> -    return temp;
> -}
> -
> -//_________Maksim____________
> -template<class DT>
> -DT var(mematrix<DT> &M)
> -{
> -    DT sum = 0;
> -    for (int i = 0; i < M.nelements; i++)
> -    {
> -        sum += M.data[i];
> -    }
> -    DT mean = sum / M.nelements;
> -
> -    DT sum2 = 0;
> -    for (int i = 0; i < M.nelements; i++)
> -    {
> -        sum2 += pow(M.data[i] - mean, 2);
> -    }
> -
> -    return sum2 / (M.nelements - 1);
> -}
> -//_________Maksim____________
> -#endif /* MEMATRI1_H */
> 
> Deleted: pkg/ProbABEL/src/mematrix.h
> ===================================================================
> --- pkg/ProbABEL/src/mematrix.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/mematrix.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -1,82 +0,0 @@
> -/*
> - *
> - * Copyright (C) 2009--2014 Various members of the GenABEL team. See
> - * the SVN commit logs for more details.
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> - * as published by the Free Software Foundation; either version 2
> - * of the License, or (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
> - * MA  02110-1301, USA.
> - *
> - */
> -
> -
> -#ifndef __MEMATRIX_H__
> -#define __MEMATRIX_H__
> -#include <iostream>
> -using namespace std;
> -
> -template<class DT> class mematrix
> -{
> - public:
> -    int nrow;
> -    int ncol;
> -    int nelements;
> -    DT * data;
> -
> -    mematrix()
> -    {
> -        nrow = ncol = nelements = 0;
> -        data = NULL;
> -    }
> -    mematrix(int nr, int nc);
> -    mematrix(const mematrix &M);
> -    ~mematrix()
> -    {
> -        if (nelements > 0)
> -            delete[] data;
> -    }
> -
> -    mematrix & operator=(const mematrix &M);
> -    DT & operator[](int i);
> -    mematrix operator+(DT toadd);
> -    mematrix operator+(mematrix &M);
> -    mematrix operator-(DT toadd);
> -    mematrix operator-(mematrix &M);
> -    mematrix operator*(DT toadd);
> -    mematrix operator*(mematrix &M);
> -    mematrix operator*(mematrix *M);
> -
> -    void reinit(int nr, int nc);
> -
> -    unsigned int getnrow(void)
> -    {
> -        return nrow;
> -    }
> -    unsigned int getncol(void)
> -    {
> -        return ncol;
> -    }
> -    DT get(int nr, int nc);
> -    void put(DT value, int nr, int nc);
> -    DT column_mean(int nc);
> -    void print(void);
> -    void delete_column(const int delcol);
> -    void delete_row(const int delrow);
> -
> -};
> -
> -//	mematrix transpose(mematrix M);
> -//	mematrix invert(mematrix M);
> -
> -#endif
> 
> Modified: pkg/ProbABEL/src/phedata.h
> ===================================================================
> --- pkg/ProbABEL/src/phedata.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/phedata.h	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -29,13 +29,8 @@
>  #ifndef PHEDATA_H_
>  #define PHEDATA_H_
>  
> -#if EIGEN
>  #include "eigen_mematrix.h"
>  #include "eigen_mematrix.cpp"
> -#else
> -#include "mematrix.h"
> -#include "mematri1.h"
> -#endif
>  
>  class phedata {
>   public:
> 
> Modified: pkg/ProbABEL/src/reg1.cpp
> ===================================================================
> --- pkg/ProbABEL/src/reg1.cpp	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/reg1.cpp	2014-04-23 18:54:26 UTC (rev 1695)
> @@ -310,7 +310,6 @@
>      chi2_score = chi2[0];
>  }
>  
> -
>  void linear_reg::mmscore_regression(const mematrix<double>& X,
>          const masked_matrix& W_masked, LDLT<MatrixXd>& Ch) {
>      VectorXd Y = reg_data.Y.data.col(0);
> @@ -329,7 +328,6 @@
>      beta.data = beta_vec;
>  }
>  
> -
>  void linear_reg::logLikelihood(const mematrix<double>& X) {
>      /*
>       loglik = 0.;
> @@ -348,10 +346,8 @@
>      //cout << endl;
>      loglik = 0.;
>      double halfrecsig2 = .5 / sigma2;
> -#if EIGEN
>      //loglik -= halfrecsig2 * residuals[i] * residuals[i];
>  
> -
>      double intercept = beta.get(0, 0);
>      residuals.data = reg_data.Y.data.array() - intercept;
>      //matrix.
> @@ -364,17 +360,7 @@
>      //residuals[i] -= resid_sub;
>      loglik -= (residuals.data.array().square() * halfrecsig2).sum();
>      loglik -= static_cast<double>(reg_data.nids) * log(sqrt(sigma2));
> -#else
> -    for (int i = 0; i < reg_data.nids; i++)
> -     {
> -         double resid = reg_data.Y[i] - beta.get(0, 0); // intercept
> -         for (int j = 1; j < beta.nrow; j++){
> -             resid -= beta.get(j, 0) * X.get(i, j);
> -         }
> -         residuals[i] = resid;
> -         loglik -= halfrecsig2 * resid * resid;
> -     }
> -#endif
> +
>  }
>  
>  
> @@ -423,12 +409,8 @@
>  
>      double sigma2_internal;
>  
> -#if EIGEN
>  
>      LDLT <MatrixXd> Ch;
> -#else
> -    mematrix<double> tXX_i;
> -#endif
>      if (invvarmatrixin.length_of_mask != 0)
>      {
>          //retrieve masked data W
> @@ -440,26 +422,7 @@
>          //flops=mp(2n-1) (when n is big enough flops=mpn2)
>          //Oct 26, 2009
>  
> -#if EIGEN
>          mmscore_regression(X, invvarmatrixin, Ch);
> -#else
> -        // next line is  5997000 flops
> -        mematrix<double> tXW = transpose(X) * invvarmatrixin.masked_data;
> -        tXX_i = tXW * X;        // 17991 flops
> -        // use cholesky to invert
> -        cholesky2_mm(tXX_i, tol_chol);
> -        chinv2_mm(tXX_i);
> -        beta = tXX_i * (tXW * reg_data.Y);        // flops 15+5997
> -        // now compute residual variance
> -        sigma2 = 0.;
> -        //next line is: 1000+5000+= 6000 flops
> -        mematrix<double> sigma2_matrix = reg_data.Y - (transpose(tXW) * beta);
> -        for (int i = 0; i < sigma2_matrix.nrow; i++)
> -        {
> -            double val = sigma2_matrix.get(i, 0);
> -            sigma2 += val * val; // flops: 3000 (iterations counted)
> -        }
> -#endif
>          double N = X.nrow;
>          //sigma2_internal = sigma2 / (N - static_cast<double>(length_beta));
>          // Ugly fix to the fact that if we do mmscore, sigma2 is already
> @@ -470,7 +433,7 @@
>      }
>      else  // NO mm-score regression : normal least square regression
>      {
> -#if EIGEN
> +
>          int m = X.ncol;
>          MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView<Lower>().\
>                  rankUpdate(X.data.adjoint());
> @@ -478,23 +441,7 @@
>          beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data);
>          sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm();
>  
> -#else
> -        mematrix<double> tX = transpose(X);
> -        // use cholesky to invert
> -                tXX_i = tX * X;
> -                cholesky2_mm(tXX_i, tol_chol);
> -                chinv2_mm(tXX_i);
> -                beta = tXX_i * (tX * (reg_data.Y));
>  
> -        // now compute residual variance
> -        sigma2 = 0.;
> -        mematrix<double> sigma2_matrix = reg_data.Y - (X * beta);
> -        for (int i = 0; i < sigma2_matrix.nrow; i++)
> -        {
> -            double val = sigma2_matrix.get(i, 0);
> -            sigma2 += val * val;
> -        }
> -#endif
>          double N = static_cast<double>(X.nrow);
>          double P = static_cast<double>(length_beta);
>          sigma2_internal = sigma2 / (N - P);
> @@ -517,38 +464,19 @@
>      //cout << endl;
>      logLikelihood(X);
>  
> -#if EIGEN
>      MatrixXd tXX_inv = Ch.solve(MatrixXd(length_beta, length_beta).
>                                  Identity(length_beta, length_beta));
> -#endif
>  
>      mematrix<double> robust_sigma2(X.ncol, X.ncol);
>      if (robust)
>      {
> -#if EIGEN
>          MatrixXd Xresiduals = X.data.array().colwise()\
>              *residuals.data.col(0).array();
>          MatrixXd  XbyR = MatrixXd(X.ncol, X.ncol).setZero()\
>              .selfadjointView<Lower>().rankUpdate(Xresiduals.adjoint());
>          robust_sigma2.data = tXX_inv * XbyR * tXX_inv;
> -#else
> -
> -        mematrix<double> XbyR = X;
> -        for (int i = 0; i < X.nrow; i++){
> -            for (int j = 0; j < X.ncol; j++)
> -            {
> -                double tmpval = XbyR.get(i, j) * residuals[i];
> -                XbyR.put(tmpval, i, j);
> -            }
> -        }
> -        XbyR = transpose(XbyR) * XbyR;
> -        robust_sigma2 = tXX_i * XbyR;
> -        robust_sigma2 = robust_sigma2 * tXX_i;
> -
> -#endif
>      }
>      //cout << "estimate 0\n";
> -#if EIGEN
>      if (robust)
>      {
>          sebeta.data = robust_sigma2.data.diagonal().array().sqrt();
> @@ -578,63 +506,6 @@
>                              offset).diagonal().array();
>          }
>  
> -#else
> -
> -    //cout << "estimate 0\n";
> -    for (int i = 0; i < (length_beta); i++)
> -    {
> -        if (robust)
> -        {
> -            // cout << "estimate :robust\n";
> -            double value = sqrt(robust_sigma2.get(i, i));
> -            sebeta.put(value, i, 0);
> -            //Han Chen
> -            if (i > 0)
> -            {
> -                if (model == 0 && interaction != 0 && ngpreds == 2
> -                        && length_beta > 2)
> -                {
> -                    if (i > 1)
> -                    {
> -                        double covval = robust_sigma2.get(i, i - 2);
> -                        covariance.put(covval, i - 2, 0);
> -                    }
> -                }
> -                else
> -                {
> -                    double covval = robust_sigma2.get(i, i - 1);
> -                    covariance.put(covval, i - 1, 0);
> -                }
> -            }
> -            //Oct 26, 2009
> -        }
> -        else
> -        {
> -            // cout << "estimate :non-robust\n";
> -            double value = sqrt(sigma2_internal * tXX_i.get(i, i));
> -            sebeta.put(value, i, 0);
> -            //Han Chen
> -            if (i > 0)
> -            {
> -                if (model == 0 && interaction != 0 && ngpreds == 2
> -                        && length_beta > 2)
> -                {
> -                    if (i > 1)
> -                    {
> -                        double covval = sigma2_internal * tXX_i.get(i, i - 2);
> -                        covariance.put(covval, i - 2, 0);
> -                    }
> -                }
> -                else
> -                {
> -                    double covval = sigma2_internal * tXX_i.get(i, i - 1);
> -                    covariance.put(covval, i - 1, 0);
> -                }
> -            }
> -            //Oct 26, 2009
> -        }
> -    }
> -#endif
>  }
>  
>  
> 
> Modified: pkg/ProbABEL/src/regdata.h
> ===================================================================
> --- pkg/ProbABEL/src/regdata.h	2014-04-23 09:52:41 UTC (rev 1694)
> +++ pkg/ProbABEL/src/regdata.h	2014-04-23 18:54:26 UTC (rev 1695)
> [TRUNCATED]
> 
> To get the complete diff run:
>     svnlook diff /svnroot/genabel -r 1695
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140424/a13eec3f/attachment-0001.sig>

From Jurica.Stanojkovic at rt-rk.com  Thu Apr 24 15:52:42 2014
From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic)
Date: Thu, 24 Apr 2014 15:52:42 +0200
Subject: [GenABEL-dev] probabel big endian support
Message-ID: <896-53591700-f-3be4eec0@227853676>

Dear list,

I have tried building package probabel on mips big endian.
It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones.

I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created.
The package was built with new files without an error.

I used following command to create files:
library(GenABEL)
library(DatABEL)
fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose")
fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE)
mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose")
mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)

I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions:

What is the best course of action for supporting probabel on big endian?
Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)?
Or can *.fvd, *.fvi files be replaced with big endian files for big endian build?

Is it necessary to be able to use *.fvd *.fvi files created on a different endian system?

I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem.

Regards,
Jurica
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140424/649fa237/attachment.html>

From lennart at karssen.org  Sat Apr 26 22:17:38 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Sat, 26 Apr 2014 22:17:38 +0200
Subject: [GenABEL-dev] probabel big endian support
In-Reply-To: <896-53591700-f-3be4eec0@227853676>
References: <896-53591700-f-3be4eec0@227853676>
Message-ID: <535C1462.9090502@karssen.org>

Dear Jurica,

On 24-04-14 15:52, Jurica Stanojkovic wrote:
> Dear list,
> 
> I have tried building package probabel on mips big endian.

That is great to hear! As far as I know, none of the current developers
have access to such a machine.

> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on
> little endian machine and are not working on big endian ones.

That is correct, we found out

> 
> I have tried to create them on big endian mips, and replace ones that
> came with source package with the ones that I have created.
> The package was built with new files without an error.

That is good news. So GenABEL and DatABEL work on big-endian machines.

> 
> I used following command to create files:
> library(GenABEL)
> library(DatABEL)
> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.dose")
> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.prob", isprob=TRUE)
> mmdose <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.dose")
> mmprob <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
> 
> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me
> with following questions:
> 
> What is the best course of action for supporting probabel on big endian?
> Should *.fvi, *.fvd files allways be in little endian format (than
> DatABEL needs to be changed to always create little endian files)?
> Or can *.fvd, *.fvi files be replaced with big endian files for big
> endian build?

I would say that ideally the files need only to be created once and then
usable on all systems. Especially since these files are usually large
and converting from text format to .fvi/.fvd takes quite a while.

This, however, would require diving into the filevector and the DatABEL
code (filevector or libfilevector is the name of the 'backend' code in
which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
that code when dealing with .fvi/.fvd files). I don't have very much
experience with either code base, but could probably have a look and
give you some pointers.

> 
> Is it necessary to be able to use *.fvd *.fvi files created on a
> different endian system?

On the other hand, how often will people transfer these files to
machines of different architectures?

Jurica, can you tell us a bit more about why you are using a MIPS
machine for your work with ProbABEL? And do you think it would be a
common task to move these files between machines with different
architectures at your site?

Maybe a converter from big to little and vice versa would be the easiest
solution? I guess such a conversion can be done rather quick. The
downside would be that it (at least temporarily) requires double the
disk space.
Such a converter could be part of the fvutils and/or of DatABEL, for
example.

> 
> I am willing to work on adding big endian support and I will appreciate
> any help in determining the right course of action in resolving this
> problem.

Thank you for your time and willingness to help! It is very much
appreciated. We're a small group of developers, but we'll try to help as
much as we can.


Best,

Lennart.

> 
> Regards,
> Jurica
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140426/9796f129/attachment.sig>

From alvaro.frank at rwth-aachen.de  Sun Apr 27 05:29:41 2014
From: alvaro.frank at rwth-aachen.de (Frank, Alvaro Jesus)
Date: Sun, 27 Apr 2014 03:29:41 +0000
Subject: [GenABEL-dev] probabel big endian support
In-Reply-To: <535C1462.9090502@karssen.org>
References: <896-53591700-f-3be4eec0@227853676>, <535C1462.9090502@karssen.org>
Message-ID: <244CF001646FF74FB34F372310A332C57AFBF2@MBX2.rwth-ad.de>

Hi all,

would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html

Just a remark.

-Alvaro
________________________________________
From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org]
Sent: Saturday, April 26, 2014 10:17 PM
To: genabel-devel at lists.r-forge.r-project.org
Subject: Re: [GenABEL-dev] probabel big endian support

Dear Jurica,

On 24-04-14 15:52, Jurica Stanojkovic wrote:
> Dear list,
>
> I have tried building package probabel on mips big endian.

That is great to hear! As far as I know, none of the current developers
have access to such a machine.

> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on
> little endian machine and are not working on big endian ones.

That is correct, we found out

>
> I have tried to create them on big endian mips, and replace ones that
> came with source package with the ones that I have created.
> The package was built with new files without an error.

That is good news. So GenABEL and DatABEL work on big-endian machines.

>
> I used following command to create files:
> library(GenABEL)
> library(DatABEL)
> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.dose")
> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.prob", isprob=TRUE)
> mmdose <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.dose")
> mmprob <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>
> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me
> with following questions:
>
> What is the best course of action for supporting probabel on big endian?
> Should *.fvi, *.fvd files allways be in little endian format (than
> DatABEL needs to be changed to always create little endian files)?
> Or can *.fvd, *.fvi files be replaced with big endian files for big
> endian build?

I would say that ideally the files need only to be created once and then
usable on all systems. Especially since these files are usually large
and converting from text format to .fvi/.fvd takes quite a while.

This, however, would require diving into the filevector and the DatABEL
code (filevector or libfilevector is the name of the 'backend' code in
which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
that code when dealing with .fvi/.fvd files). I don't have very much
experience with either code base, but could probably have a look and
give you some pointers.

>
> Is it necessary to be able to use *.fvd *.fvi files created on a
> different endian system?

On the other hand, how often will people transfer these files to
machines of different architectures?

Jurica, can you tell us a bit more about why you are using a MIPS
machine for your work with ProbABEL? And do you think it would be a
common task to move these files between machines with different
architectures at your site?

Maybe a converter from big to little and vice versa would be the easiest
solution? I guess such a conversion can be done rather quick. The
downside would be that it (at least temporarily) requires double the
disk space.
Such a converter could be part of the fvutils and/or of DatABEL, for
example.

>
> I am willing to work on adding big endian support and I will appreciate
> any help in determining the right course of action in resolving this
> problem.

Thank you for your time and willingness to help! It is very much
appreciated. We're a small group of developers, but we'll try to help as
much as we can.


Best,

Lennart.

>
> Regards,
> Jurica
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>

--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-


From lennart at karssen.org  Sun Apr 27 21:48:07 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Sun, 27 Apr 2014 21:48:07 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1698 - pkg/ProbABEL/src
In-Reply-To: <20140424185052.45B8018749C@r-forge.r-project.org>
References: <20140424185052.45B8018749C@r-forge.r-project.org>
Message-ID: <535D5EF7.4080100@karssen.org>

Thanks for splitting this into separate functions, Maarten.

Could you try to add some basic Doxygen documentation for these
functions? That would be a great help (even though the names of the
functions are already explaining a lot) towards getting a
well-documented code base for ProbABEL.


Lennart.

On 24-04-14 20:50, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-24 20:50:51 +0200 (Thu, 24 Apr 2014)
> New Revision: 1698
> 
> Modified:
>    pkg/ProbABEL/src/reg1.cpp
>    pkg/ProbABEL/src/reg1.h
> Log:
> -refactored linear_reg::estimate a bit to make it more readable.
> 
> Modified: pkg/ProbABEL/src/reg1.cpp
> ===================================================================
> --- pkg/ProbABEL/src/reg1.cpp	2014-04-24 16:50:53 UTC (rev 1697)
> +++ pkg/ProbABEL/src/reg1.cpp	2014-04-24 18:50:51 UTC (rev 1698)
> @@ -327,6 +327,14 @@
>      sigma2 = (Y - tXW * beta_vec).squaredNorm();
>      beta.data = beta_vec;
>  }
> +void linear_reg::LeastSquaredRegression(mematrix<double> X,LDLT<MatrixXd>& Ch) {
> +    int m = X.ncol;
> +    MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView<Lower>().rankUpdate(
> +            X.data.adjoint());
> +    Ch = LDLT < MatrixXd > (txx.selfadjointView<Lower>());
> +    beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data);
> +    sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm();
> +}
>  
>  void linear_reg::logLikelihood(const mematrix<double>& X) {
>      /*
> @@ -364,6 +372,27 @@
>  }
>  
>  
> +
> +void linear_reg::RobustSEandCovariance(mematrix<double> X, mematrix<double> robust_sigma2,
> +        MatrixXd tXX_inv, int offset) {
> +    MatrixXd Xresiduals = X.data.array().colwise()
> +            * residuals.data.col(0).array();
> +    MatrixXd XbyR =
> +            MatrixXd(X.ncol, X.ncol).setZero().selfadjointView<Lower>().rankUpdate(
> +                    Xresiduals.adjoint());
> +    robust_sigma2.data = tXX_inv * XbyR * tXX_inv;
> +    sebeta.data = robust_sigma2.data.diagonal().array().sqrt();
> +    covariance.data =
> +            robust_sigma2.data.bottomLeftCorner(offset, offset).diagonal();
> +}
> +
> +void linear_reg::PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv,
> +        int offset) {
> +    sebeta.data = (sigma2_internal * tXX_inv.diagonal().array()).sqrt();
> +    covariance.data = sigma2_internal
> +            * tXX_inv.bottomLeftCorner(offset, offset).diagonal().array();
> +}
> +
>  void linear_reg::estimate(int verbose, double tol_chol,
>          int model, int interaction, int ngpreds, masked_matrix& invvarmatrixin,
>          int robust, int nullmodel) {
> @@ -415,13 +444,6 @@
>      {
>          //retrieve masked data W
>          invvarmatrixin.update_mask(reg_data.masked_data);
> -
> -        // This regression is Weighted Least Square: used for mmscore :
> -        // FLOPS count are calculated for 3*1000 matrix as follow:
> -        //C=AB (m X n matrix A and n x P matrix B)
> -        //flops=mp(2n-1) (when n is big enough flops=mpn2)
> -        //Oct 26, 2009
> -
>          mmscore_regression(X, invvarmatrixin, Ch);
>          double N = X.nrow;
>          //sigma2_internal = sigma2 / (N - static_cast<double>(length_beta));
> @@ -434,14 +456,7 @@
>      else  // NO mm-score regression : normal least square regression
>      {
>  
> -        int m = X.ncol;
> -        MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView<Lower>().\
> -                rankUpdate(X.data.adjoint());
> -        Ch = LDLT <MatrixXd>(txx.selfadjointView<Lower>());
> -        beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data);
> -        sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm();
> -
> -
> +        LeastSquaredRegression(X,Ch);
>          double N = static_cast<double>(X.nrow);
>          double P = static_cast<double>(length_beta);
>          sigma2_internal = sigma2 / (N - P);
> @@ -468,43 +483,22 @@
>                                  Identity(length_beta, length_beta));
>  
>      mematrix<double> robust_sigma2(X.ncol, X.ncol);
> -    if (robust)
> -    {
> -        MatrixXd Xresiduals = X.data.array().colwise()\
> -            *residuals.data.col(0).array();
> -        MatrixXd  XbyR = MatrixXd(X.ncol, X.ncol).setZero()\
> -            .selfadjointView<Lower>().rankUpdate(Xresiduals.adjoint());
> -        robust_sigma2.data = tXX_inv * XbyR * tXX_inv;
> -    }
> -    //cout << "estimate 0\n";
> -    if (robust)
> -    {
> -        sebeta.data = robust_sigma2.data.diagonal().array().sqrt();
> -    }
> -    else
> -    {
> -        sebeta.data =
> -                (sigma2_internal
> -                        * tXX_inv.diagonal().array()).sqrt();
> -    }
> -    int offset = X.ncol- 1;
> -    //if additive and interaction and 2 predictors and more then 2 betas
>  
> -    if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){
> -        offset = X.ncol - 2;
> -    }
>  
> +    int offset = X.ncol- 1;
> +     //if additive and interaction and 2 predictors and more then 2 betas
> +     if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){
> +         offset = X.ncol - 2;
> +     }
> +
>      if (robust)
>      {
> -        covariance.data = robust_sigma2.data.bottomLeftCorner(
> -                offset, offset).diagonal();
> +        RobustSEandCovariance(X, robust_sigma2, tXX_inv, offset);
>      }
>      else
>      {
> -            covariance.data = sigma2_internal
> -                    * tXX_inv.bottomLeftCorner(offset,
> -                            offset).diagonal().array();
> -        }
> +        PlainSEandCovariance(sigma2_internal, tXX_inv, offset);
> +    }
>  
>  }
>  
> 
> Modified: pkg/ProbABEL/src/reg1.h
> ===================================================================
> --- pkg/ProbABEL/src/reg1.h	2014-04-24 16:50:53 UTC (rev 1697)
> +++ pkg/ProbABEL/src/reg1.h	2014-04-24 18:50:51 UTC (rev 1698)
> @@ -104,6 +104,11 @@
>      void mmscore_regression(const mematrix<double>& X,
>              const masked_matrix& W_masked, LDLT<MatrixXd>& Ch);
>      void logLikelihood(const mematrix<double>& X);
> +    void LeastSquaredRegression(mematrix<double> X,LDLT<MatrixXd>& Ch);
> +    void RobustSEandCovariance(mematrix<double> X, mematrix<double> robust_sigma2,
> +            MatrixXd tXX_inv, int offset);
> +    void PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv,
> +            int offset);
>  };
>  
>  class logistic_reg: public base_reg {
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140427/7ea3adb9/attachment.sig>

From lennart at karssen.org  Sun Apr 27 22:30:31 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Sun, 27 Apr 2014 22:30:31 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1700 - pkg/ProbABEL/src
In-Reply-To: <20140427090149.0A4651874B2@r-forge.r-project.org>
References: <20140427090149.0A4651874B2@r-forge.r-project.org>
Message-ID: <535D68E7.5000509@karssen.org>

Hi Maarten,

More clean ups. Great!

Some comments below.

On 27-04-14 11:01, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-04-27 11:01:42 +0200 (Sun, 27 Apr 2014)
> New Revision: 1700
> 
> Removed:
>    pkg/ProbABEL/src/cholesky.cpp
>    pkg/ProbABEL/src/cholesky.h
> Modified:
>    pkg/ProbABEL/src/Makefile.am
>    pkg/ProbABEL/src/fvlib
>    pkg/ProbABEL/src/reg1.cpp
>    pkg/ProbABEL/src/reg1.h
> Log:
> -removed dependency of reg1.* on cholesky.* since this is now done with EIGEN (remove about 150 lines of code from our codebase)
> -removed cholesky.h and cholesky.cpp 
> -added some consts to functions in reg1.*
> -removed some whitespace in reg1.cpp

Happy with the consts!

> 
> Modified: pkg/ProbABEL/src/fvlib
> ===================================================================
> --- pkg/ProbABEL/src/fvlib	2014-04-25 06:26:38 UTC (rev 1699)
> +++ pkg/ProbABEL/src/fvlib	2014-04-27 09:01:42 UTC (rev 1700)
> @@ -1 +1 @@
> -link ../../../tags/filevector/v.1.0.0/fvlib
> \ No newline at end of file
> +link include/filevector/fvlib
> \ No newline at end of file

This is strange. You seem to have replace the symlink for fvlib to a
place that doesn't exist in the SVN tree. Probably a local thing.
Can you revert this and point the symlink back to the v.1.0.0 tag of fvlib?

> 
> Modified: pkg/ProbABEL/src/reg1.cpp
> ===================================================================
> --- pkg/ProbABEL/src/reg1.cpp	2014-04-25 06:26:38 UTC (rev 1699)
> +++ pkg/ProbABEL/src/reg1.cpp	2014-04-27 09:01:42 UTC (rev 1700)
> @@ -275,10 +275,12 @@
>              reg_data.is_interaction_excluded, false, nullmodel);
>      beta.reinit(X.ncol, 1);
>      sebeta.reinit(X.ncol, 1);
> +    int length_beta=X.ncol;

Could you please add spaces around the = sign, according to the coding
standards?


>      double N = static_cast<double>(resid.nrow);
>      mematrix<double> tX = transpose(X);
> -    if (invvarmatrix.length_of_mask != 0)
> +    if (invvarmatrix.length_of_mask != 0){
>          tX = tX * invvarmatrix.masked_data;
> +    }
>  
>      mematrix<double> u = tX * resid;
>      mematrix<double> v = tX * X;
> @@ -287,12 +289,16 @@
>      csum = csum * (1. / N);
>      v = v - csum;
>      // use cholesky to invert
> -    mematrix<double> v_i = v;
> -    cholesky2_mm(v_i, tol_chol);
> -    chinv2_mm(v_i);
> +
> +    LDLT <MatrixXd> Ch = LDLT < MatrixXd > (v.data.selfadjointView<Lower>());

I get the feeling here that you added too many spaces in this case :-).
The < and > here are not operators.


Thanks,

Lennart.


>      // before was
>      // mematrix<double> v_i = invert(v);
> -    beta = v_i * u;
> +    beta.data = Ch.solve(v.data.adjoint() * u.data);
> +    //TODO(maartenk): set size of v_i directly or remove mematrix class
> +    mematrix<double> v_i = v;
> +    v_i.data = Ch.solve(MatrixXd(length_beta, length_beta).
> +                                    Identity(length_beta, length_beta));
> +
>      double sr = 0.;
>      double srr = 0.;
>      for (int i = 0; i < resid.nrow; i++)
> @@ -327,7 +333,7 @@
>      sigma2 = (Y - tXW * beta_vec).squaredNorm();
>      beta.data = beta_vec;
>  }
> -void linear_reg::LeastSquaredRegression(mematrix<double> X,LDLT<MatrixXd>& Ch) {
> +void linear_reg::LeastSquaredRegression(const mematrix<double>& X, LDLT<MatrixXd>& Ch) {
>      int m = X.ncol;
>      MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView<Lower>().rankUpdate(
>              X.data.adjoint());
> @@ -368,12 +374,11 @@
>      //residuals[i] -= resid_sub;
>      loglik -= (residuals.data.array().square() * halfrecsig2).sum();
>      loglik -= static_cast<double>(reg_data.nids) * log(sqrt(sigma2));
> -
>  }
>  
>  
>  
> -void linear_reg::RobustSEandCovariance(mematrix<double> X, mematrix<double> robust_sigma2,
> +void linear_reg::RobustSEandCovariance(const mematrix<double> &X, mematrix<double> robust_sigma2,
>          MatrixXd tXX_inv, int offset) {
>      MatrixXd Xresiduals = X.data.array().colwise()
>              * residuals.data.col(0).array();
> @@ -386,7 +391,7 @@
>              robust_sigma2.data.bottomLeftCorner(offset, offset).diagonal();
>  }
>  
> -void linear_reg::PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv,
> +void linear_reg::PlainSEandCovariance(double sigma2_internal,const MatrixXd &tXX_inv,
>          int offset) {
>      sebeta.data = (sigma2_internal * tXX_inv.diagonal().array()).sqrt();
>      covariance.data = sigma2_internal
> @@ -438,7 +443,6 @@
>  
>      double sigma2_internal;
>  
> -
>      LDLT <MatrixXd> Ch;
>      if (invvarmatrixin.length_of_mask != 0)
>      {
> @@ -481,10 +485,8 @@
>  
>      MatrixXd tXX_inv = Ch.solve(MatrixXd(length_beta, length_beta).
>                                  Identity(length_beta, length_beta));
> -
>      mematrix<double> robust_sigma2(X.ncol, X.ncol);
>  
> -
>      int offset = X.ncol- 1;
>       //if additive and interaction and 2 predictors and more then 2 betas
>       if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){
> @@ -499,10 +501,8 @@
>      {
>          PlainSEandCovariance(sigma2_internal, tXX_inv, offset);
>      }
> -
>  }
>  
> -
>  void linear_reg::score(mematrix<double>& resid,
>          double tol_chol, int model, int interaction, int ngpreds,
>          const masked_matrix& invvarmatrix, int nullmodel) {
> @@ -511,7 +511,6 @@
>              invvarmatrix, nullmodel = 0);
>  }
>  
> -
>  logistic_reg::logistic_reg(regdata& rdatain) {
>      reg_data = rdatain.get_unmasked_data();
>      int length_beta = reg_data.X.ncol;
> 
> Modified: pkg/ProbABEL/src/reg1.h
> ===================================================================
> --- pkg/ProbABEL/src/reg1.h	2014-04-25 06:26:38 UTC (rev 1699)
> +++ pkg/ProbABEL/src/reg1.h	2014-04-27 09:01:42 UTC (rev 1700)
> @@ -50,11 +50,9 @@
>  #ifndef REG1_H_
>  #define REG1_H_
>  #include <cmath>
> -#include "cholesky.h"
>  #include "regdata.h"
>  #include "maskedmatrix.h"
>  
> -
>  mematrix<double> apply_model(mematrix<double>& X, int model, int interaction,
>          int ngpreds, bool is_interaction_excluded, bool iscox = false,
>          int nullmodel = 0);
> @@ -99,11 +97,10 @@
>                              const masked_matrix& W_masked,
>                              LDLT<MatrixXd>& Ch);
>      void logLikelihood(const mematrix<double>& X);
> -    void LeastSquaredRegression(mematrix<double> X, LDLT<MatrixXd>& Ch);
> -    void RobustSEandCovariance(mematrix<double> X,
> -                               mematrix<double> robust_sigma2,
> -                               MatrixXd tXX_inv, int offset);
> -    void PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv,
> +    void LeastSquaredRegression(const mematrix<double> & X,LDLT<MatrixXd>& Ch);
> +    void RobustSEandCovariance(const mematrix<double> & X,
> +            mematrix <double> robust_sigma2, MatrixXd tXX_inv, int offset);
> +    void PlainSEandCovariance(double sigma2_internal, const MatrixXd & tXX_inv,
>                                int offset);
>  };
>  
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140427/77d80e62/attachment.sig>

From kooyman at gmail.com  Sun Apr 27 23:45:43 2014
From: kooyman at gmail.com (Maarten Kooyman)
Date: Sun, 27 Apr 2014 23:45:43 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1702 - pkg/ProbABEL/src
In-Reply-To: <20140427214409.C9E6D184AD5@r-forge.r-project.org>
References: <20140427214409.C9E6D184AD5@r-forge.r-project.org>
Message-ID: <535D7A87.5040908@gmail.com>

dank!

On 27-04-14 23:44, noreply at r-forge.r-project.org wrote:
> Author: lckarssen
> Date: 2014-04-27 23:44:09 +0200 (Sun, 27 Apr 2014)
> New Revision: 1702
>
> Modified:
>     pkg/ProbABEL/src/fvlib
> Log:
> Fixing the ProbABEL symlink to the v1.0.0 tag if filevector. This reverts the change introduced in r1700.
>
>
> Modified: pkg/ProbABEL/src/fvlib
> ===================================================================
> --- pkg/ProbABEL/src/fvlib	2014-04-27 20:48:40 UTC (rev 1701)
> +++ pkg/ProbABEL/src/fvlib	2014-04-27 21:44:09 UTC (rev 1702)
> @@ -1 +1 @@
> -link include/filevector/fvlib
> \ No newline at end of file
> +link ../../../tags/filevector/v.1.0.0/fvlib
> \ No newline at end of file
>
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits


From lennart at karssen.org  Mon Apr 28 16:46:47 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 28 Apr 2014 16:46:47 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src
In-Reply-To: <535E422F.4080402@gmail.com>
References: <20140428094937.65E8B186FC6@r-forge.r-project.org>
 <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com>
Message-ID: <535E69D7.1050005@karssen.org>

Hoi Maarten,

Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en
probeer ik alle commits te reviewen en waar nodig van kritisch
commentaar te voorzien.
Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij
mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk
aan dat er dit soort reviews de normale gang van zaken zijn.


Groeten,

Lennart.

On 28-04-14 13:57, Maarten Kooyman wrote:
> On 28-04-14 12:03, L.C. Karssen wrote:
>> Hi Maarten,
>>
>> That's interesting. I assume you did this in response to bug #5658?
> Yes.
>> The change you made is only for ASCII input files, right?
> Yes.
> 
>> Any idea how
>> this is treated in GenABEL's mach2databel() and impute2databel()?
> I do not have an idea. Maybe check out the speed of reading those format
> and convert the strategy used in the trunk of ProABEL.(Those are
> generally only done once per dataset so it is not high on the priority
> list.)
>>   I
>> assume the NAs are converted to IEEE754 compatible NaN there, but I'm
>> not sure. If that is the case, then this would fix that bug, right?
> Assumption is the mother of all... But if your sure, it is fixed.
>>
>>
>> Lennart.
> 
> Kind regards,
> 
> Maarten
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140428/cebd3191/attachment.sig>

From lennart at karssen.org  Mon Apr 28 16:55:23 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 28 Apr 2014 16:55:23 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src
In-Reply-To: <535E69D7.1050005@karssen.org>
References: <20140428094937.65E8B186FC6@r-forge.r-project.org>
 <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com>
 <535E69D7.1050005@karssen.org>
Message-ID: <535E6BDB.3000206@karssen.org>

Dear non-Dutch speaking list members,

Here's a short translation of the previous e-mail for those who don't
speak Dutch :-).


Dear Maarten (and others of course),

As you must have noticed, I started to review commits. Please feel free
to review my commits as well. I will learn from those reviews as well
and hopefully these reviews indicate that this is normal procedure (from
which I don't want to be exempt).


Best,

Lennart.

On 28-04-14 16:46, L.C. Karssen wrote:
> Hoi Maarten,
> 
> Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en
> probeer ik alle commits te reviewen en waar nodig van kritisch
> commentaar te voorzien.
> Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij
> mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk
> aan dat er dit soort reviews de normale gang van zaken zijn.
> 
> 
> 
> Groeten,
> 
> Lennart.
> 
> On 28-04-14 13:57, Maarten Kooyman wrote:
>> On 28-04-14 12:03, L.C. Karssen wrote:
>>> Hi Maarten,
>>>
>>> That's interesting. I assume you did this in response to bug #5658?
>> Yes.
>>> The change you made is only for ASCII input files, right?
>> Yes.
>>
>>> Any idea how
>>> this is treated in GenABEL's mach2databel() and impute2databel()?
>> I do not have an idea. Maybe check out the speed of reading those format
>> and convert the strategy used in the trunk of ProABEL.(Those are
>> generally only done once per dataset so it is not high on the priority
>> list.)
>>>   I
>>> assume the NAs are converted to IEEE754 compatible NaN there, but I'm
>>> not sure. If that is the case, then this would fix that bug, right?
>> Assumption is the mother of all... But if your sure, it is fixed.
>>>
>>>
>>> Lennart.
>>
>> Kind regards,
>>
>> Maarten
>> _______________________________________________
>> Genabel-commits mailing list
>> Genabel-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140428/12c45e96/attachment.sig>

From kooyman at gmail.com  Mon Apr 28 20:39:26 2014
From: kooyman at gmail.com (Maarten Kooyman)
Date: Mon, 28 Apr 2014 20:39:26 +0200
Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src
In-Reply-To: <535E6BDB.3000206@karssen.org>
References: <20140428094937.65E8B186FC6@r-forge.r-project.org>
 <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com>
 <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org>
Message-ID: <535EA05E.40201@gmail.com>

Dear all,

I think it is easier to use for code review github:

Please check to get a impression 
:https://github.com/jquery/jquery/pull/1241/files

I think we should reconsider an other the software version system: the 
current system is not up to date to current usability. Bug tracking and 
branching is quite hard in terms of usability. Please have a look at 
github.com to get a impression what is possible.

Kind regards,

Maarten


On 28-04-14 16:55, L.C. Karssen wrote:
> Dear non-Dutch speaking list members,
>
> Here's a short translation of the previous e-mail for those who don't
> speak Dutch :-).
>
>
> Dear Maarten (and others of course),
>
> As you must have noticed, I started to review commits. Please feel free
> to review my commits as well. I will learn from those reviews as well
> and hopefully these reviews indicate that this is normal procedure (from
> which I don't want to be exempt).
>
>
> Best,
>
> Lennart.
>
> On 28-04-14 16:46, L.C. Karssen wrote:
>> Hoi Maarten,
>>
>> Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en
>> probeer ik alle commits te reviewen en waar nodig van kritisch
>> commentaar te voorzien.
>> Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij
>> mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk
>> aan dat er dit soort reviews de normale gang van zaken zijn.
>>
>>
>>
>> Groeten,
>>
>> Lennart.
>>
>> On 28-04-14 13:57, Maarten Kooyman wrote:
>>> On 28-04-14 12:03, L.C. Karssen wrote:
>>>> Hi Maarten,
>>>>
>>>> That's interesting. I assume you did this in response to bug #5658?
>>> Yes.
>>>> The change you made is only for ASCII input files, right?
>>> Yes.
>>>
>>>> Any idea how
>>>> this is treated in GenABEL's mach2databel() and impute2databel()?
>>> I do not have an idea. Maybe check out the speed of reading those format
>>> and convert the strategy used in the trunk of ProABEL.(Those are
>>> generally only done once per dataset so it is not high on the priority
>>> list.)
>>>>    I
>>>> assume the NAs are converted to IEEE754 compatible NaN there, but I'm
>>>> not sure. If that is the case, then this would fix that bug, right?
>>> Assumption is the mother of all... But if your sure, it is fixed.
>>>>
>>>> Lennart.
>>> Kind regards,
>>>
>>> Maarten
>>> _______________________________________________
>>> Genabel-commits mailing list
>>> Genabel-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>>>
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140428/802777a7/attachment.html>

From lennart at karssen.org  Mon Apr 28 22:09:49 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Mon, 28 Apr 2014 22:09:49 +0200
Subject: [GenABEL-dev] Proposal to move to Github (was: Re:
 [Genabel-commits] r1705 - pkg/ProbABEL/src)
In-Reply-To: <535EA05E.40201@gmail.com>
References: <20140428094937.65E8B186FC6@r-forge.r-project.org>
 <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com>
 <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org>
 <535EA05E.40201@gmail.com>
Message-ID: <535EB58D.6010900@karssen.org>

Dear Maarten, dear all,

Moving to github... Hmm... That is quite a decision, so I've renamed the
subject to better reflect the discussion. I've also dropped the older
e-mails from the bottom of the thread.

First off, are there any people that have experience with git and/or
github? I've got some git experience (still learning), but no real
experience with github.

I agree with Maarten that SVN is showing its age. As he indicates things
like branching are much easier in git. Moreover, since I'm travelling
regularly being able to work without internet connection is a pro.

On the other hand, moving to git (whether github or elsewhere) means
leaving R-forge, which is our well-known infrastructure. Furthermore,
such a move operation will cost quite some time, I guess. Moving all
bugs, features, etc... If we decide to move we should plan well and not
rush. And then the current developers will need to learn git if they
don't already know how to use it.

One thing I think we should definitely do is migrate slowly, package by
package. Given that Maarten is positive about such a move and that I am
in a bit of limbo but not fully against, it seems logical that ProbABEL
is the first package to try such a migration.


Looking forward to your comments!


Lennart.


On 28-04-14 20:39, Maarten Kooyman wrote:
> Dear all,
>
> I think it is easier to use for code review github:
>
> Please check to get a impression
> :https://github.com/jquery/jquery/pull/1241/files
>
> I think we should reconsider an other the software version system: the
> current system is not up to date to current usability. Bug tracking and
> branching is quite hard in terms of usability. Please have a look at
> github.com to get a impression what is possible.
>
> Kind regards,
>
> Maarten
>
>

--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140428/6d7b892f/attachment.sig>

From Jurica.Stanojkovic at rt-rk.com  Tue Apr 29 17:05:43 2014
From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic)
Date: Tue, 29 Apr 2014 17:05:43 +0200
Subject: [GenABEL-dev] probabel big endian support
In-Reply-To: <535C1462.9090502@karssen.org>
Message-ID: <1897-535fc000-21-6a994800@159572789>


Dear Karssen,

>> What is the best course of action for supporting probabel on big endian?
>> Should *.fvi, *.fvd files allways be in little endian format (than
>> DatABEL needs to be changed to always create little endian files)?
>> Or can *.fvd, *.fvi files be replaced with big endian files for big
>> endian build?

>I would say that ideally the files need only to be created once and then
>usable on all systems. Especially since these files are usually large
>and converting from text format to .fvi/.fvd takes quite a while.

If I had to change some values in text format, would I have to generate again fvd/fvi files?
Does one when working with ProbABEL has to change those files often?
If we do byte-swap on the run for every data in the fvd/fvi file would that be also time consuming?
I understand that user then do not need to wait files to generate again on big endian,
but same task (run) will last longer on big-endian machine than on little-endian one?

>This, however, would require diving into the filevector and the DatABEL
>code (filevector or libfilevector is the name of the 'backend' code in
>which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
>that code when dealing with .fvi/.fvd files). I don't have very much
>experience with either code base, but could probably have a look and
>give you some pointers.

I tried to work around this and got some results, but a I did not manage
to find every place in code where endian swap is needed.
I am currently busy with other work, but i will soon look at this again.

>Jurica, can you tell us a bit more about why you are using a MIPS
>machine for your work with ProbABEL? And do you think it would be a
>common task to move these files between machines with different
>architectures at your site?

I work on supporting mips/mipsel for Debian sid.
I have access to mips and mipsel boards and can help with bigendian support.
But I do not use ProbABEL actively.

>Maybe a converter from big to little and vice versa would be the easiest
>solution? I guess such a conversion can be done rather quick. The
>downside would be that it (at least temporarily) requires double the
>disk space.
>Such a converter could be part of the fvutils and/or of DatABEL, for
>example.

Maybe this could be a good solution, presuming that this would be faster then just converting from text to fileVector format?I will have to look closer how data is converted and writen from text to fvd/fvi in order to be able to convert them to different endian.
There is also a option to always create a fvd/fvi in both endian formats,
or to create some universal file that have data in both endians inside.

Regards,
Jurica

-------- Original Message --------
Subject: Re: [GenABEL-dev] probabel big endian support
Date: Saturday, April 26, 2014 22:17 CEST
From: "L.C. Karssen" <lennart at karssen.org>
To: genabel-devel at lists.r-forge.r-project.org
References: <896-53591700-f-3be4eec0 at 227853676>
?Dear Jurica,

On 24-04-14 15:52, Jurica Stanojkovic wrote:
> Dear list,
>
> I have tried building package probabel on mips big endian.

That is great to hear! As far as I know, none of the current developers
have access to such a machine.

> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on> little endian machine and are not working on big endian ones.

That is correct, we found out

>
> I have tried to create them on big endian mips, and replace ones that
> came with source package with the ones that I have created.
> The package was built with new files without an error.

That is good news. So GenABEL and DatABEL work on big-endian machines.

>
> I used following command to create files:
> library(GenABEL)
> library(DatABEL)
> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.dose")
> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.prob", isprob=TRUE)
> mmdose <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.dose")
> mmprob <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>
> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me
> with following questions:
>
> What is the best course of action for supporting probabel on big endian?
> Should *.fvi, *.fvd files allways be in little endian format (than
> DatABEL needs to be changed to always create little endian files)?
> Or can *.fvd, *.fvi files be replaced with big endian files for big
> endian build?

I would say that ideally the files need only to be created once and then
usable on all systems. Especially since these files are usually large
and converting from text format to .fvi/.fvd takes quite a while.

This, however, would require diving into the filevector and the DatABEL
code (filevector or libfilevector is the name of the 'backend' code in
which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
that code when dealing with .fvi/.fvd files). I don't have very much
experience with either code base, but could probably have a look and
give you some pointers.

>
> Is it necessary to be able to use *.fvd *.fvi files created on a
> different endian system?

On the other hand, how often will people transfer these files to
machines of different architectures?

Jurica, can you tell us a bit more about why you are using a MIPS
machine for your work with ProbABEL? And do you think it would be a
common task to move these files between machines with different
architectures at your site?

Maybe a converter from big to little and vice versa would be the easiest
solution? I guess such a conversion can be done rather quick. The
downside would be that it (at least temporarily) requires double the
disk space.
Such a converter could be part of the fvutils and/or of DatABEL, for
example.

>
> I am willing to work on adding big endian support and I will appreciate> any help in determining the right course of action in resolving this
> problem.

Thank you for your time and willingness to help! It is very much
appreciated. We're a small group of developers, but we'll try to help as
much as we can.


Best,

Lennart.

>
> Regards,
> Jurica
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>

--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
?
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140429/e6940b01/attachment.html>

From Jurica.Stanojkovic at rt-rk.com  Tue Apr 29 17:12:11 2014
From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic)
Date: Tue, 29 Apr 2014 17:12:11 +0200
Subject: [GenABEL-dev] probabel big endian support
In-Reply-To: <244CF001646FF74FB34F372310A332C57AFBF2@MBX2.rwth-ad.de>
Message-ID: <4061-535fc180-39-66293300@143043581>


Hi Alvaro,

>Hi all,
>
>would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html
>
>Just a remark.
>
>-Alvaro

I have tried that approach, it is OK for fileHeader, but there is data in *fvi, *fvd files that is float and can be double.
For that we need a byte-swap for float and double.
I had some results with this, but I did not find every one place in source where byte-swap is needed.
I was not sure that is enough to just byte-swap data on read, blockWriteOrRead could be also used for writing.
During the read process data is read with file.read like char* and then cast to other values.

Regard,
Jurica

-------- Original Message --------
Subject: Re: [GenABEL-dev] probabel big endian support
Date: Sunday, April 27, 2014 05:29 CEST
From: "Frank, Alvaro Jesus" <alvaro.frank at rwth-aachen.de>
To: "L.C. Karssen" <lennart at karssen.org>,"genabel-devel at lists.r-forge.r-project.org"<genabel-devel at lists.r-forge.r-project.org>
References: <896-53591700-f-3be4eec0 at 227853676>, <535C1462.9090502 at karssen.org>


?Hi all,

would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html

Just a remark.

-Alvaro
________________________________________
From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org]
Sent: Saturday, April 26, 2014 10:17 PM
To: genabel-devel at lists.r-forge.r-project.org
Subject: Re: [GenABEL-dev] probabel big endian support

Dear Jurica,

On 24-04-14 15:52, Jurica Stanojkovic wrote:
> Dear list,
>
> I have tried building package probabel on mips big endian.

That is great to hear! As far as I know, none of the current developers
have access to such a machine.

> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on
> little endian machine and are not working on big endian ones.

That is correct, we found out

>
> I have tried to create them on big endian mips, and replace ones that
> came with source package with the ones that I have created.
> The package was built with new files without an error.

That is good news. So GenABEL and DatABEL work on big-endian machines.

>
> I used following command to create files:
> library(GenABEL)
> library(DatABEL)
> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.dose")
> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob",
> mlinfo="./checks/inputfiles/test.mlinfo",
> outfile="./checks/inputfiles/test.prob", isprob=TRUE)
> mmdose <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.dose")
> mmprob <-
> mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob",
> mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
> outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>
> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me
> with following questions:
>
> What is the best course of action for supporting probabel on big endian?
> Should *.fvi, *.fvd files allways be in little endian format (than
> DatABEL needs to be changed to always create little endian files)?
> Or can *.fvd, *.fvi files be replaced with big endian files for big
> endian build?

I would say that ideally the files need only to be created once and then
usable on all systems. Especially since these files are usually large
and converting from text format to .fvi/.fvd takes quite a while.

This, however, would require diving into the filevector and the DatABEL
code (filevector or libfilevector is the name of the 'backend' code in
which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
that code when dealing with .fvi/.fvd files). I don't have very much
experience with either code base, but could probably have a look and
give you some pointers.

>
> Is it necessary to be able to use *.fvd *.fvi files created on a
> different endian system?

On the other hand, how often will people transfer these files to
machines of different architectures?

Jurica, can you tell us a bit more about why you are using a MIPS
machine for your work with ProbABEL? And do you think it would be a
common task to move these files between machines with different
architectures at your site?

Maybe a converter from big to little and vice versa would be the easiest
solution? I guess such a conversion can be done rather quick. The
downside would be that it (at least temporarily) requires double the
disk space.
Such a converter could be part of the fvutils and/or of DatABEL, for
example.

>
> I am willing to work on adding big endian support and I will appreciate
> any help in determining the right course of action in resolving this
> problem.

Thank you for your time and willingness to help! It is very much
appreciated. We're a small group of developers, but we'll try to help as
much as we can.


Best,

Lennart.

>
> Regards,
> Jurica
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>

--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

_______________________________________________
genabel-devel mailing list
genabel-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140429/ace5291e/attachment-0001.html>

From yurii.aulchenko at gmail.com  Wed Apr 30 15:13:55 2014
From: yurii.aulchenko at gmail.com (Yury Aulchenko)
Date: Wed, 30 Apr 2014 20:13:55 +0700
Subject: [GenABEL-dev] Proposal to move to Github (was: Re:
	[Genabel-commits] r1705 - pkg/ProbABEL/src)
In-Reply-To: <535EB58D.6010900@karssen.org>
References: <20140428094937.65E8B186FC6@r-forge.r-project.org>
 <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com>
 <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org>
 <535EA05E.40201@gmail.com> <535EB58D.6010900@karssen.org>
Message-ID: <D15BF5F4-757D-434E-94E7-2DB47FD9B43D@gmail.com>


> On 29 Apr 2014, at 03:09, "L.C. Karssen" <lennart at karssen.org> wrote:
> 
> Dear Maarten, dear all,
> 
> Moving to github... Hmm... That is quite a decision, so I've renamed the
> subject to better reflect the discussion. I've also dropped the older
> e-mails from the bottom of the thread.
> 
> First off, are there any people that have experience with git and/or
> github? I've got some git experience (still learning), but no real
> experience with github.

I have some experience and would be comfortable with either

> 
> I agree with Maarten that SVN is showing its age. As he indicates things
> like branching are much easier in git. Moreover, since I'm travelling
> regularly being able to work without internet connection is a pro.
> 
> On the other hand, moving to git (whether github or elsewhere) means
> leaving R-forge, which is our well-known infrastructure. Furthermore,
> such a move operation will cost quite some time, I guess. Moving all
> bugs, features, etc... If we decide to move we should plan well and not
> rush. And then the current developers will need to learn git if they
> don't already know how to use it.

Moving code first and keep tracker for a while? Can we 'close' tracker later and provide the link to new things on old pages?

> 
> One thing I think we should definitely do is migrate slowly, package by
> package. Given that Maarten is positive about such a move and that I am
> in a bit of limbo but not fully against, it seems logical that ProbABEL
> is the first package to try such a migration.

Totally agree. If Maarten is positive about git(hub) I have nothing against. But we do need to plan carefully and make everything possible so as not to affect (in a bad way) the end user. 

Yurii

> 
> 
> Looking forward to your comments!
> 
> 
> Lennart.
> 
> 
>> On 28-04-14 20:39, Maarten Kooyman wrote:
>> Dear all,
>> 
>> I think it is easier to use for code review github:
>> 
>> Please check to get a impression
>> :https://github.com/jquery/jquery/pull/1241/files
>> 
>> I think we should reconsider an other the software version system: the
>> current system is not up to date to current usability. Bug tracking and
>> branching is quite hard in terms of usability. Please have a look at
>> github.com to get a impression what is possible.
>> 
>> Kind regards,
>> 
>> Maarten
> 
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
> 
> lennart at karssen.org
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

From yurii.aulchenko at gmail.com  Wed Apr 30 15:25:26 2014
From: yurii.aulchenko at gmail.com (Yury Aulchenko)
Date: Wed, 30 Apr 2014 20:25:26 +0700
Subject: [GenABEL-dev] [genabel-Bugs][5658] Missing genetic data cannot
	be coded as NA or N as mentioned in the manual
In-Reply-To: <20140427224716.EB8E51851A7@r-forge.r-project.org>
References: <20140427224716.EB8E51851A7@r-forge.r-project.org>
Message-ID: <FD2130A9-AC1F-4180-96EE-147173FD0653@gmail.com>

Agree, fix the manual :)

----------------
Sent from mobile device, please excuse possible typos

> On 28 Apr 2014, at 05:47, <genabel-bugs at r-forge.r-project.org> wrote:
> 
> Bugs item #5658, was opened at 2014-04-28 00:46 by Lennart Karssen
> You can respond by visiting: 
> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5658&group_id=505
> 
> Status: Open
> Priority: 3
> Submitted By: Lennart Karssen (lckarssen)
> Assigned to: Nobody (None)
> Summary: Missing genetic data cannot be coded as NA or N as mentioned in the manual 
> Resolution: Accepted As Bug
> Operating System: All
> Severity: normal
> Hardware: All
> Version: PA v0.4.3
> Component: ProbABEL
> URL: http://forum.genabel.org/viewtopic.php?f=10&amp;t=871
> 
> 
> Initial Comment:
> Thanks to user jal on the forum for reporting this bug. From his post:
> 
> There are missing genotypes in my dosage file, while missing values were coded as "NA". My palogist run is aborted with error message:
>    Reading genotype data... No digits were found while reading genetic data (individual 5, position 1)
> where "individual 5, position 1" is the location where the first missing value "NA" appears. The ProbABEL manual says missing value can be coded as "NA", "NaN" or "N", but seems "NA" and "N" do not work in my case.
> 
> 
> We use the standard C function strtod() to convert the genetic data from text to numbers. I did a quick check and strtod() only accepts "NaN", "NAN", "nan", no "NA" or "N". My guess is that the 'nan' variations are the only ones defined for floating point numbers (IEEE 754). 
> 
> We should decided whether we want to change the manual or change to code. Changning the manual has my preference, because changing the code would make reading data slower. 
> 
> 
> ----------------------------------------------------------------------
> 
> You can respond by visiting: 
> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5658&group_id=505

From lennart at karssen.org  Wed Apr 30 18:04:00 2014
From: lennart at karssen.org (L.C. Karssen)
Date: Wed, 30 Apr 2014 18:04:00 +0200
Subject: [GenABEL-dev] probabel big endian support
In-Reply-To: <4061-535fc180-39-66293300@143043581>
References: <4061-535fc180-39-66293300@143043581>
Message-ID: <53611EF0.70608@karssen.org>

Dear Alvaro, Jurica,

On 29-04-14 17:12, Jurica Stanojkovic wrote:
> Hi Alvaro,
> 
>>Hi all,
>>
>>would it not be better practice to handle this on load, i.e: using
> this: http://man7.org/linux/man-pages/man3/endian.3.html
>>
>>Just a remark.
>>
>>-Alvaro
> 
> I have tried that approach, it is OK for fileHeader, but there is data
> in *fvi, *fvd files that is float and can be double.

That is correct. As far as I know, filevector was developed to be data
type agnostic (at least for the standard data types like int, float and
double).

> For that we need a byte-swap for float and double.
> I had some results with this, but I did not find every one place in
> source where byte-swap is needed.
> I was not sure that is enough to just byte-swap data on read,
> blockWriteOrRead could be also used for writing.
> During the read process data is read with file.read like char* and then
> cast to other values.

I noticed that too, and actually, I don't really understand why, because
the type of the data is stored in the header of a filevector file as
well (see fvlib/const.h for the types and fvlib/frutil.h for the
definition of the header).
I wasn't part of the filevector development, so I don't know the exact
considerations at that time.

In ProbABEL (gendata.cpp) the filevector data are read using the
ReadVariableAs() function (fvlib/AbstractMatrix.h), which performs the
cast.
I haven't checked, but maybe there's a better function in fvlib for
reading the data into ProbABEL.


Best,

Lennart.


> 
> Regard,
> Jurica
> 
> -------- Original Message --------
> Subject: Re: [GenABEL-dev] probabel big endian support
> Date: Sunday, April 27, 2014 05:29 CEST
> From: "Frank, Alvaro Jesus" <alvaro.frank at rwth-aachen.de>
> To: "L.C. Karssen"
> <lennart at karssen.org>,"genabel-devel at lists.r-forge.r-project.org"<genabel-devel at lists.r-forge.r-project.org>
> References: <896-53591700-f-3be4eec0 at 227853676>,
> <535C1462.9090502 at karssen.org>
> 
> 
>  
>> Hi all,
>>
>> would it not be better practice to handle this on load, i.e: using
>> this: http://man7.org/linux/man-pages/man3/endian.3.html
>>
>> Just a remark.
>>
>> -Alvaro
>> ________________________________________
>> From: genabel-devel-bounces at lists.r-forge.r-project.org
>> [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C.
>> Karssen [lennart at karssen.org]
>> Sent: Saturday, April 26, 2014 10:17 PM
>> To: genabel-devel at lists.r-forge.r-project.org
>> Subject: Re: [GenABEL-dev] probabel big endian support
>>
>> Dear Jurica,
>>
>> On 24-04-14 15:52, Jurica Stanojkovic wrote:
>> > Dear list,
>> >
>> > I have tried building package probabel on mips big endian.
>>
>> That is great to hear! As far as I know, none of the current developers
>> have access to such a machine.
>>
>> > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on
>> > little endian machine and are not working on big endian ones.
>>
>> That is correct, we found out
>>
>> >
>> > I have tried to create them on big endian mips, and replace ones that
>> > came with source package with the ones that I have created.
>> > The package was built with new files without an error.
>>
>> That is good news. So GenABEL and DatABEL work on big-endian machines.
>>
>> >
>> > I used following command to create files:
>> > library(GenABEL)
>> > library(DatABEL)
>> > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose",
>> > mlinfo="./checks/inputfiles/test.mlinfo",
>> > outfile="./checks/inputfiles/test.dose")
>> > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob",
>> > mlinfo="./checks/inputfiles/test.mlinfo",
>> > outfile="./checks/inputfiles/test.prob", isprob=TRUE)
>> > mmdose <-
>> > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose",
>> > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
>> > outfile="./checks/inputfiles/mmscore_gen.dose")
>> > mmprob <-
>> > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob",
>> > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo",
>> > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE)
>> >
>> > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me
>> > with following questions:
>> >
>> > What is the best course of action for supporting probabel on big endian?
>> > Should *.fvi, *.fvd files allways be in little endian format (than
>> > DatABEL needs to be changed to always create little endian files)?
>> > Or can *.fvd, *.fvi files be replaced with big endian files for big
>> > endian build?
>>
>> I would say that ideally the files need only to be created once and then
>> usable on all systems. Especially since these files are usually large
>> and converting from text format to .fvi/.fvd takes quite a while.
>>
>> This, however, would require diving into the filevector and the DatABEL
>> code (filevector or libfilevector is the name of the 'backend' code in
>> which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use
>> that code when dealing with .fvi/.fvd files). I don't have very much
>> experience with either code base, but could probably have a look and
>> give you some pointers.
>>
>> >
>> > Is it necessary to be able to use *.fvd *.fvi files created on a
>> > different endian system?
>>
>> On the other hand, how often will people transfer these files to
>> machines of different architectures?
>>
>> Jurica, can you tell us a bit more about why you are using a MIPS
>> machine for your work with ProbABEL? And do you think it would be a
>> common task to move these files between machines with different
>> architectures at your site?
>>
>> Maybe a converter from big to little and vice versa would be the easiest
>> solution? I guess such a conversion can be done rather quick. The
>> downside would be that it (at least temporarily) requires double the
>> disk space.
>> Such a converter could be part of the fvutils and/or of DatABEL, for
>> example.
>>
>> >
>> > I am willing to work on adding big endian support and I will appreciate
>> > any help in determining the right course of action in resolving this
>> > problem.
>>
>> Thank you for your time and willingness to help! It is very much
>> appreciated. We're a small group of developers, but we'll try to help as
>> much as we can.
>>
>>
>> Best,
>>
>> Lennart.
>>
>> >
>> > Regards,
>> > Jurica
>> >
>> >
>> > _______________________________________________
>> > genabel-devel mailing list
>> > genabel-devel at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>> >
>>
>> --
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>> L.C. Karssen
>> Utrecht
>> The Netherlands
>>
>> lennart at karssen.org
>> http://blog.karssen.org
>> GPG key ID: A88F554A
>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 
>  

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140430/2882881a/attachment.sig>