From lennart at karssen.org Tue Apr 1 09:15:34 2014 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 01 Apr 2014 09:15:34 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <244CF001646FF74FB34F372310A332C57AD8B4@MBX2.rwth-ad.de> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com>, <5335F579.9070608@karssen.org>, <244CF001646FF74FB34F372310A332C57AD899@MBX2.rwth-ad.de> <244CF001646FF74FB34F372310A332C57AD8B4@MBX2.rwth-ad.de> Message-ID: <533A6796.4030004@karssen.org> Hi Alvaro, Thanks for joining in. Much appreciated! On 31-03-14 23:07, Frank, Alvaro Jesus wrote: > Perhaps what I mentioned earlier cannot be done: > > http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Ah, Goldberg comes up again. I've read the paper several years ago, but it sounds like it's time to reread it. > > Since a Double 0.677 cant represented any other way than 67699998617172241 for example. Indeed, I think that's what we've hit here. It would be great if this cloud be converted 'cheaply' into something that more closely resembles 0.67700000000 (cheaper than snprintf()). Alternatively, we need to identify where in the calculations we have the biggest loss of precision. As I wrote in my answer to Yurii, our input is not likely to have more than 4 significant digits. I'm definitely willing to print only 4 sign.digits in the output, but at the moment I see differences in the third digit(of the chi^2 values) when looking at the example data. Best, Lennart. > ________________________________________ > From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of Frank, Alvaro Jesus [alvaro.frank at rwth-aachen.de] > Sent: Monday, March 31, 2014 10:48 PM > To: L.C. Karssen; genabel-devel at lists.r-forge.r-project.org > Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src > > Dear all, > > How about instead of going from float>text>double not just use a binary mask after casting with errors? > 0.67699998617172241 > 0.67700000000000000 with a mask on every number? > This image would tell you wish bits need to be set to zero: > http://cnx.org/content/m32770/latest/graphics1.png > > masking is super fast if its c/c++. > > This may not be portable tho. But the way floating point numbers are stored should be generic (IEEE) anyway. > > -Alvaro > > > ________________________________________ > From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] > Sent: Friday, March 28, 2014 11:19 PM > To: genabel-devel at lists.r-forge.r-project.org > Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src > > Dear all, > > (I guess the previous version of this mail went to the commit email > list, so here it is again for the devel list). > > > Indeed: an impressive speed-up! Well done Maarten. > > On 28-03-14 20:30, Maarten Kooyman wrote: >> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted >> for sex and age (I did not run it in triplet but gives an idea) >> >> version 0.42 0.50_branch >> FV 58 52 >> mldose 48 12 >> all times ate in seconds. >> >> As you can see the filevector format in the part that slows down the >> program. When profiling the reading from FV takes up 86% of all the time >> the program takes. >> > > > The current problem with reading from filevector is that the fv dat ais > stored in floats (this is logical as it means half the disk space usage > compared to storing doubles, moreover, the imputed data is never more > precise than a float anyway). > However, internally ProbABEL uses doubles for calculations. This means > conversion from float to double must occur at some point. > > Simply casting to double gives impression. For example casting a float > 0.677 to double gives: 0.67699998617172241 > Therefore, with version 0.4.0 I changed this and used a string as > intermediate form, followed by strtod(). First I used stringstreams, but > these turn out to be much too slow for our use case. Now snprintf() is > used. For the above example the double value is: 0.67700000000000005, > much closer to what we would like to see. Using this two-step conversion > means the output when using fv is equal to the output using txt data > (and equal to using R), within float precision. > > Using Maarten's 'strtod' will speed up this part as well, but the > snprintf() call is still expensive. > > Apart from this two-step conversion we may also be inefficient because > the dosage/probability values are converted one array element at the > time. Maybe we can gain something there, like Maarten did for the txt > format and simply sending a whole 'line'/array to the conversion may help. > > > > > Given that most people nowadays store their imputation results in chunks > of chromosomes anyway (i.e. small(er) files), and the fact that I think > implementing the ability to read gziped files is not difficult, it may > be time to give mldose.gz files another chance for ProbABEL users. It > will save them the conversion from mldose.gz to DatABEL. > Of course we can still support DatABEL files, but (depending on how fast > reading from gzipped files is), our recommendation could change with the > upcoming ProbABEL v0.5.0. > > Any thoughts on this? > > > Best, > > Lennart. > > > > > >> On 28-03-14 20:15, Yury Aulchenko wrote: >>> 10 fold is good speed up. An order of magnitude :) >>> >>> Wonder how it compares now to the reading from plain text files? >>> >>> Y >>> >>> ---------------- >>> Sent from mobile device, please excuse possible typos >>> >>>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>>> >>>> Author: maartenk >>>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>>> New Revision: 1664 >>>> >>>> Modified: >>>> branches/ProbABEL-0.50/src/gendata.cpp >>>> branches/ProbABEL-0.50/src/gendata.h >>>> Log: >>>> new implementation of reading in numbers of mldose file: this version >>>> is about a 10(!) fold faster than in ProABEL 0.42 >>>> >>>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>>> =================================================================== >>>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>>> (rev 1663) >>>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>>> (rev 1664) >>>> @@ -40,58 +40,69 @@ >>>> #endif >>>> #include "utilities.h" >>>> >>>> -double mldose_strtod(const char *str_pointer) { >>>> - // This function is inspired on some answers found at >>>> stackoverflow : >>>> - // eg question 5678932 >>>> - int sign = 0; >>>> - double result = 0; >>>> - //check if not a null pointer or NaN (right now checks only >>>> first character) >>>> -//TODO: make catching of NaN more rigid >>>> - if (!*str_pointer | *str_pointer == 'N'){ >>>> - return std::numeric_limits::quiet_NaN(); >>>> + >>>> +void gendata::mldose_line_to_matrix(int k,const char >>>> *all_numbers,int amount_of_numbers){ >>>> + int j = 0; >>>> + //check if not a null pointer >>>> + if (!*all_numbers){ >>>> + perror("Error while reading genetic data (expected pointer >>>> to char but found a null pointer)"); >>>> + exit(EXIT_FAILURE); >>>> } >>>> - //skip whitespace >>>> - while (*str_pointer == ' ') >>>> + while (j>>> { >>>> - str_pointer++; >>>> - } >>>> - //set sign to -1 if negative: multiply by sign just before return >>>> - if (*str_pointer == '-') >>>> - { >>>> - str_pointer++; >>>> - sign = -1; >>>> - } >>>> - //read digits before dot >>>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>>> - result = result * 10 + (*str_pointer++ - '0'); >>>> - } >>>> - //read digit after dot >>>> - if (*str_pointer == '.') >>>> - { >>>> - double decimal_counter = 1.0; >>>> - str_pointer++; >>>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>>> + double result = 0; >>>> + //skip whitespace >>>> + while (*all_numbers == ' ') >>>> { >>>> - decimal_counter *= 0.1; >>>> - result += (*str_pointer++ - '0') * decimal_counter; >>>> + all_numbers++; >>>> } >>>> + //check NaN (right now checks only first character) >>>> + //TODO: make catching of NaN more rigid >>>> + if (*all_numbers == 'N') >>>> + { >>>> + result = std::numeric_limits::quiet_NaN(); >>>> + //skip other characters of NaN >>>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>>> + { >>>> + all_numbers++; >>>> + } >>>> + } >>>> + else >>>> + { >>>> + int sign = 0; >>>> + //set sign to -1 if negative: multiply by sign just >>>> before return >>>> + if (*all_numbers == '-') >>>> + { >>>> + all_numbers++; >>>> + sign = -1; >>>> + } >>>> + //read digits before dot >>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>> + { >>>> + result = result * 10 + (*all_numbers++ - '0'); >>>> + } >>>> + //read digit after dot >>>> + if (*all_numbers == '.') >>>> + { >>>> + double decimal_counter = 1.0; >>>> + all_numbers++; >>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>> + { >>>> + decimal_counter *= 0.1; >>>> + result += (*all_numbers++ - '0') * decimal_counter; >>>> + } >>>> + } >>>> + //correct for negative number >>>> + if (sign == -1) >>>> + { >>>> + result = sign * result; >>>> + } >>>> + } >>>> + G.put(result, k, j); >>>> + j++; >>>> } >>>> - //str_pointer should be null since all characters are read. >>>> - if (*str_pointer){ >>>> - perror("Error while reading genetic data (mldose_strtod)"); >>>> - exit(EXIT_FAILURE); >>>> - } >>>> - //correct for negative number >>>> - if (sign == -1){ >>>> - return sign * result; >>>> - }else{ >>>> - return result; >>>> - } >>>> - >>>> } >>>> >>>> - >>>> - >>>> void gendata::get_var(int var, double * data) >>>> { >>>> // Read the genetic data for SNP 'var' and store in the array >>>> 'data' >>>> @@ -246,7 +257,7 @@ >>>> size_t strpos = tmpstr.find("->"); >>>> if (strpos != string::npos) >>>> { >>>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>>> } >>>> else >>>> { >>>> @@ -255,8 +266,8 @@ >>>> if (tmpid != idnames[k]) >>>> { >>>> cerr << "phenotype file and dose or probability >>>> file " >>>> - << "did not match at line " << i + 2 << " >>>> (" << tmpid >>>> - << " != " << idnames[k] << ")" << endl; >>>> + << "did not match at line " << i + 2 << >>>> " (" >>>> + << tmpid << " != " << idnames[k] << ")" >>>> << endl; >>>> infile.close(); >>>> exit(1); >>>> } >>>> @@ -267,47 +278,58 @@ >>>> infile >> tmpstr; >>>> } >>>> >>>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>> + int oldstyle = 0; >>>> + if (oldstyle == 1) >>>> { >>>> - if (infile.good()) >>>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>> { >>>> - infile >> inStr; >>>> - // tmpstr contains the dosage/probability in >>>> - // string form. Convert it to double (if tmpstr is >>>> - // NaN it will be set to nan). >>>> - double dosage; >>>> - char *endptr; >>>> - errno = 0; // To distinguish success/failure >>>> - // after strtod() >>>> + if (infile.good()) >>>> + { >>>> + infile >> inStr; >>>> + // tmpstr contains the dosage/probability in >>>> + // string form. Convert it to double (if >>>> tmpstr is >>>> + // NaN it will be set to nan). >>>> + double dosage; >>>> + char *endptr; >>>> + errno = 0; // To distinguish >>>> success/failure >>>> + // after strtod() >>>> >>>> - dosage = mldose_strtod(inStr); >>>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>>> -// if ((errno == ERANGE && >>>> -// (dosage == HUGE_VALF || dosage == >>>> HUGE_VALL)) >>>> -// || (errno != 0 && dosage == 0)) { >>>> -// perror("Error while reading genetic data >>>> (strtod)"); >>>> -// exit(EXIT_FAILURE); >>>> -// } >>>> + dosage = strtod(inStr, &endptr); >>>> + if ((errno == ERANGE >>>> + && (dosage == HUGE_VALF || dosage == >>>> HUGE_VALL)) >>>> + || (errno != 0 && dosage == 0)) >>>> + { >>>> + perror("Error while reading genetic data >>>> (strtod)"); >>>> + exit(EXIT_FAILURE); >>>> + } >>>> >>>> - if (endptr == tmpstr.c_str()) { >>>> - cerr << "No digits were found while reading >>>> genetic data" >>>> - << " (individual " << i + 1 >>>> - << ", position " << j + 1 << ")" >>>> - << endl; >>>> - exit(EXIT_FAILURE); >>>> + if (endptr == tmpstr.c_str()) >>>> + { >>>> + cerr >>>> + << "No digits were found while >>>> reading genetic data" >>>> + << " (individual " << i + 1 << >>>> ", position " >>>> + << j + 1 << ")" << endl; >>>> + exit(EXIT_FAILURE); >>>> + } >>>> + /* If we got here, strtod() successfully >>>> parsed a number */ >>>> + G.put(dosage, k, j); >>>> } >>>> - >>>> - /* If we got here, strtod() successfully parsed >>>> a number */ >>>> - G.put(dosage, k, j); >>>> + else >>>> + { >>>> + std::cerr << "cannot read dose-file: " << fname >>>> + << "check skipd and ngpreds >>>> parameters\n"; >>>> + infile.close(); >>>> + exit(1); >>>> + } >>>> } >>>> - else >>>> - { >>>> - std::cerr << "cannot read dose-file: " << fname >>>> - << "check skipd and ngpreds >>>> parameters\n"; >>>> - infile.close(); >>>> - exit(1); >>>> - } >>>> } >>>> + else >>>> + { >>>> + std::string all_numbers; >>>> + all_numbers.reserve(nsnps * ngpreds * 7); >>>> + std::getline(infile, all_numbers); >>>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>>> * ngpreds); >>>> + } >>>> k++; >>>> } >>>> else >>>> >>>> Modified: branches/ProbABEL-0.50/src/gendata.h >>>> =================================================================== >>>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>>> (rev 1663) >>>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>>> (rev 1664) >>>> @@ -44,7 +44,7 @@ >>>> unsigned int nids; >>>> unsigned int ngpreds; >>>> gendata(); >>>> - double convert( char* source, char** endPtr ); >>>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>>> amount_of_numbers); >>>> >>>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>>> ingpreds, >>>> unsigned int npeople, unsigned int nmeasured, >>>> >>>> _______________________________________________ >>>> Genabel-commits mailing list >>>> Genabel-commits at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Apr 2 17:44:44 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 02 Apr 2014 17:44:44 +0200 Subject: [GenABEL-dev] Preparing for ProbABEL 0.4.3 release In-Reply-To: <52F757FB.8070707@karssen.org> References: <52F757FB.8070707@karssen.org> Message-ID: <533C306C.30302@karssen.org> Dear list, For those of you who haven't noticed through other channels: I have tagged and released ProbABEL v0.4.3 yesterday. The release announcement is available at http://www.genabel.org. The source code is available from the ProbABEL page [2] and packages for Debian and Ubuntu have been sent to the respective build services. Now we can spend all our time on getting the next great release of ProbABEL out. Maarten has done a lot of work on it, a few things still need to be done, but we are definitely headed for a great release. Thanks to all for your work on ProbABEL. Best, Lennart. On 09-02-14 11:27, L.C. Karssen wrote: > Dear list, > > I am currently preparing the 0.4.3 release of ProbABEL. If you have any > updates/fixes/etc. that you would like to have in this release, please > let me know. I aim to do the release somewhere in the coming week. > > > Best regards, > > Lennart Karssen. > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Apr 2 17:45:59 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 02 Apr 2014 17:45:59 +0200 Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference In-Reply-To: <52EAA7D1.5080106@karssen.org> References: <52EAA7D1.5080106@karssen.org> Message-ID: <533C30B7.3060102@karssen.org> Dear list, Please find attached the poster I presented at the European Mathematical Genetics Society in Cologne over the past two days. Best, Lennart. On 30-01-14 20:28, L.C. Karssen wrote: > Dear list, > > I'm planning to go to the EMGM (European Mathematical Genetics Meeting) > in Cologne in April. I'd like to present a poster there and wrote the > abstract below. > Please let me know any comments or suggestions as soon possible as the > deadline for abstract submission is tomorrow (Fri 31 Jan). > > Thank you very much, > > Lennart. > > > --------------8<----------------8<------------------8<------------- > Over the last year the GenABEL project has seen a considerable > number of improvements. These improvements do not only consist of > updates to the existing packages of the GenABEL suite, but are also > manifest in the way the development process is being handled and > the way the packages are made available to the users. > > On our poster we will demonstrate the newly implemented features in > the various packages of the GenABEL suite. We also welcome a new > member to the GenABEL family: OmicABEL, a package for rapid > mixed-model based genome-wide association analysis of multiple > traits (think metabolomics, glycomics, etc.). > > Recently we started using the open source Jenkins Continuous > Integration server to help us release software of higher > quality. Jenkins is a framework that automatically runs several > tests (e.g. static code analysis, checks for memory leaks) for each > of our packages. It builds and tests each project after a new commit > in our version control system. This allows us to detect problems in > the code at an early stage, before they bug the user. > > After the GenABEL package, ProbABEL is the second package that is > available as a Debian package. This means that users of upcoming > Debian releases will be able to install ProbABEL with a simple click > of a button or a single command. Moreover, since many other Linux > distributions like Ubuntu and Linux Mint are derived from Debian, > users of these distributions automatically benefit as well. > > In the coming year more packages are expected to be added the > GenABEL suite as well as continued efforts to improve the existing > ones. Moreover, we plan to increase both the ease of installation as > well as the visibility of the GenABEL suite by adding more packages > into both the Debian and Red Hat Enterprise Linux repositories (as > well as derivatives like CentOS and Scientific Linux). > --------------8<----------------8<------------------8<------------- > > > > > > > > > > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: posterEMGM2014_GenABEL.pdf Type: application/pdf Size: 236741 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Apr 2 20:22:21 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 2 Apr 2014 20:22:21 +0200 Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference In-Reply-To: <533C30B7.3060102@karssen.org> References: <52EAA7D1.5080106@karssen.org> <533C30B7.3060102@karssen.org> Message-ID: <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com> European Math Genet MEETING :) ---------------- Sent from mobile device, please excuse possible typos > On 02 Apr 2014, at 17:45, "L.C. Karssen" wrote: > > Dear list, > > Please find attached the poster I presented at the European Mathematical > Genetics Society in Cologne over the past two days. > > > Best, > > Lennart. > >> On 30-01-14 20:28, L.C. Karssen wrote: >> Dear list, >> >> I'm planning to go to the EMGM (European Mathematical Genetics Meeting) >> in Cologne in April. I'd like to present a poster there and wrote the >> abstract below. >> Please let me know any comments or suggestions as soon possible as the >> deadline for abstract submission is tomorrow (Fri 31 Jan). >> >> Thank you very much, >> >> Lennart. >> >> >> --------------8<----------------8<------------------8<------------- >> Over the last year the GenABEL project has seen a considerable >> number of improvements. These improvements do not only consist of >> updates to the existing packages of the GenABEL suite, but are also >> manifest in the way the development process is being handled and >> the way the packages are made available to the users. >> >> On our poster we will demonstrate the newly implemented features in >> the various packages of the GenABEL suite. We also welcome a new >> member to the GenABEL family: OmicABEL, a package for rapid >> mixed-model based genome-wide association analysis of multiple >> traits (think metabolomics, glycomics, etc.). >> >> Recently we started using the open source Jenkins Continuous >> Integration server to help us release software of higher >> quality. Jenkins is a framework that automatically runs several >> tests (e.g. static code analysis, checks for memory leaks) for each >> of our packages. It builds and tests each project after a new commit >> in our version control system. This allows us to detect problems in >> the code at an early stage, before they bug the user. >> >> After the GenABEL package, ProbABEL is the second package that is >> available as a Debian package. This means that users of upcoming >> Debian releases will be able to install ProbABEL with a simple click >> of a button or a single command. Moreover, since many other Linux >> distributions like Ubuntu and Linux Mint are derived from Debian, >> users of these distributions automatically benefit as well. >> >> In the coming year more packages are expected to be added the >> GenABEL suite as well as continued efforts to improve the existing >> ones. Moreover, we plan to increase both the ease of installation as >> well as the visibility of the GenABEL suite by adding more packages >> into both the Debian and Red Hat Enterprise Linux repositories (as >> well as derivatives like CentOS and Scientific Linux). >> --------------8<----------------8<------------------8<------------- >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Fri Apr 4 15:46:08 2014 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 04 Apr 2014 15:46:08 +0200 Subject: [GenABEL-dev] Abstract for the EMGM 2014 conference In-Reply-To: <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com> References: <52EAA7D1.5080106@karssen.org> <533C30B7.3060102@karssen.org> <5A15B6FC-DA1F-45EB-ABCE-C6937C0E2B59@gmail.com> Message-ID: <533EB7A0.3050006@karssen.org> Of course :-). Lennart. On 02-04-14 20:22, Yury Aulchenko wrote: > European Math Genet MEETING :) > > ---------------- > Sent from mobile device, please excuse possible typos > >> On 02 Apr 2014, at 17:45, "L.C. Karssen" wrote: >> >> Dear list, >> >> Please find attached the poster I presented at the European Mathematical >> Genetics Society in Cologne over the past two days. >> >> >> Best, >> >> Lennart. >> >>> On 30-01-14 20:28, L.C. Karssen wrote: >>> Dear list, >>> >>> I'm planning to go to the EMGM (European Mathematical Genetics Meeting) >>> in Cologne in April. I'd like to present a poster there and wrote the >>> abstract below. >>> Please let me know any comments or suggestions as soon possible as the >>> deadline for abstract submission is tomorrow (Fri 31 Jan). >>> >>> Thank you very much, >>> >>> Lennart. >>> >>> >>> --------------8<----------------8<------------------8<------------- >>> Over the last year the GenABEL project has seen a considerable >>> number of improvements. These improvements do not only consist of >>> updates to the existing packages of the GenABEL suite, but are also >>> manifest in the way the development process is being handled and >>> the way the packages are made available to the users. >>> >>> On our poster we will demonstrate the newly implemented features in >>> the various packages of the GenABEL suite. We also welcome a new >>> member to the GenABEL family: OmicABEL, a package for rapid >>> mixed-model based genome-wide association analysis of multiple >>> traits (think metabolomics, glycomics, etc.). >>> >>> Recently we started using the open source Jenkins Continuous >>> Integration server to help us release software of higher >>> quality. Jenkins is a framework that automatically runs several >>> tests (e.g. static code analysis, checks for memory leaks) for each >>> of our packages. It builds and tests each project after a new commit >>> in our version control system. This allows us to detect problems in >>> the code at an early stage, before they bug the user. >>> >>> After the GenABEL package, ProbABEL is the second package that is >>> available as a Debian package. This means that users of upcoming >>> Debian releases will be able to install ProbABEL with a simple click >>> of a button or a single command. Moreover, since many other Linux >>> distributions like Ubuntu and Linux Mint are derived from Debian, >>> users of these distributions automatically benefit as well. >>> >>> In the coming year more packages are expected to be added the >>> GenABEL suite as well as continued efforts to improve the existing >>> ones. Moreover, we plan to increase both the ease of installation as >>> well as the visibility of the GenABEL suite by adding more packages >>> into both the Debian and Red Hat Enterprise Linux repositories (as >>> well as derivatives like CentOS and Scientific Linux). >>> --------------8<----------------8<------------------8<------------- >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 7 08:59:30 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 07 Apr 2014 08:59:30 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1672 - branches/ProbABEL-0.50/checks/R-tests In-Reply-To: <20140402205412.1A1CF186EE1@r-forge.r-project.org> References: <20140402205412.1A1CF186EE1@r-forge.r-project.org> Message-ID: <53424CD2.6060901@karssen.org> Thanks Maarten! Nice fix. One step closer to a complete set of tests of the ProbABEL functionality. Lennart. On 02-04-14 22:54, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-02 22:54:11 +0200 (Wed, 02 Apr 2014) > New Revision: 1672 > > Modified: > branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R > Log: > One check was malfunctioning. This was caused by combination of being hard set to a value and change to EIGEN for cholesky decomposition. I made this test succeed since mathematically it seems to work alright > > Modified: branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R > =================================================================== > --- branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R 2014-04-02 20:25:43 UTC (rev 1671) > +++ branches/ProbABEL-0.50/checks/R-tests/run_models_in_R_palinear.R 2014-04-02 20:54:11 UTC (rev 1672) > @@ -36,7 +36,11 @@ > ## (SNP 6 in the info file). ProbABEL lists them all as 0.0, R lists > ## them as: > prob.dom.PA[6, 2:4] <- c(NaN, NaN, 0.0) > +#for 2df model the last SNP is interchangeable: EIGEN calculates the beta for the other SNP than R. This causes the beta to have the wrong sign. This part of change the position of the snp beta(and swaps sign) and SE if beta and other SE are 0 > +if (sum(abs(prob.2df.PA[6, 2:3]))==0){ > +prob.2df.PA[6, 2:3] <-c(prob.2df.PA[6, 4]*-1,prob.2df.PA[6, 5]) > prob.2df.PA[6, 4:5] <- c(NA, NA) > +} > > #### > ## run analysis in R > @@ -123,7 +127,8 @@ > } > colnames(prob.2df.R) <- cols2df > rownames(prob.2df.R) <- NULL > -stopifnot( all.equal(prob.2df.PA[1:5,], prob.2df.R[1:5,], tol=tol) ) > + > +stopifnot( all.equal(prob.2df.PA, prob.2df.R, tol=tol) ) > cat("2df\n") > > cat("\t\t\t\t\t\tOK\n") > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 7 09:06:31 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 07 Apr 2014 09:06:31 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1671 - in branches/ProbABEL-0.50: examples src In-Reply-To: <20140402202543.B9693186EE2@r-forge.r-project.org> References: <20140402202543.B9693186EE2@r-forge.r-project.org> Message-ID: <53424E77.5020400@karssen.org> Hi Maarten, On 02-04-14 22:25, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-02 22:25:43 +0200 (Wed, 02 Apr 2014) > New Revision: 1671 > > Modified: > branches/ProbABEL-0.50/examples/mmscore.R > branches/ProbABEL-0.50/src/probabel > branches/ProbABEL-0.50/src/reg1.cpp > Log: > reg1.cpp : simplified code of mmscore(palinear) and fixed failing test_mms.sh check > probabel: introduce check that verifies phenotype file does exists before running pa* (prevents errors while running the script) > mmscore.R: fixed small typo > > Modified: branches/ProbABEL-0.50/examples/mmscore.R > =================================================================== > --- branches/ProbABEL-0.50/examples/mmscore.R 2014-04-02 15:57:31 UTC (rev 1670) > +++ branches/ProbABEL-0.50/examples/mmscore.R 2014-04-02 20:25:43 UTC (rev 1671) > @@ -88,5 +88,5 @@ > ## 2) residuals of the phenotype, which will be the new phenotype that > ## ProbABEL will analyse. > > -## Mow, go to ProbABEL and start analysis > +## Now, go to ProbABEL and start analysis Thanks :-). > > > Modified: branches/ProbABEL-0.50/src/probabel > =================================================================== > --- branches/ProbABEL-0.50/src/probabel 2014-04-02 15:57:31 UTC (rev 1670) > +++ branches/ProbABEL-0.50/src/probabel 2014-04-02 20:25:43 UTC (rev 1671) > @@ -169,6 +169,11 @@ > > > my $phename = $ARGV[5]; > +if (! -e $phename.".PHE"){ > +die "Phenotype file $phename.PHE does not exists. The phenotype file should be specified without the .PHE extension.\n"; > +} > + > + > # By default the output file prefix is the same as the name of the > # phenotype file (minus the .PHE extension and any paths) > use File::Basename; > > Modified: branches/ProbABEL-0.50/src/reg1.cpp > =================================================================== > --- branches/ProbABEL-0.50/src/reg1.cpp 2014-04-02 15:57:31 UTC (rev 1670) > +++ branches/ProbABEL-0.50/src/reg1.cpp 2014-04-02 20:25:43 UTC (rev 1671) > @@ -313,35 +313,21 @@ > void linear_reg::mmscore_regression(const mematrix& X, > const masked_matrix& W_masked, LDLT& Ch) { > > - > VectorXd Y = reg_data.Y.data.col(0); > - if (X.data.cols() == 3) > - { > - Matrix tXW = W_masked.masked_data->data * X.data; > - Matrix2d xWx = tXW.transpose() * X.data; > - Ch = LDLT(xWx); > - Vector3d beta_3f = Ch.solve(tXW.transpose() * Y); > - sigma2 = (Y - tXW * beta_3f).squaredNorm(); > - beta.data = beta_3f; > - } > - else if (X.data.cols() == 2) > - { > - Matrix tXW = W_masked.masked_data->data*X.data; > - Matrix2d xWx = tXW.transpose() * X.data; > - Ch = LDLT (xWx); > - Vector2d beta_2f = Ch.solve(tXW.transpose() * Y); > - sigma2 = (Y - tXW * beta_2f).squaredNorm(); > - beta.data = beta_2f; > - } > - else > - { > - // next line is 5997000 flops > - MatrixXd tXW = X.data.transpose() * W_masked.masked_data->data; > - Ch = LDLT(tXW * X.data); // 17991 flops > - beta.data = Ch.solve(tXW * Y); //5997 flops > - //next line is: 1000+5000+3000= 9000 flops > - sigma2 = (Y - tXW.transpose() * beta.data).squaredNorm(); > - } Glad to see the if/else go! This is much cleaner (and apparently not slower). > + /* > + in ProbABEL <0.50 this calculation was performed like t(X)*W > + This changed to W*X since this is better vectorized since the left hand > + side has more rows: this introduces an additional transpose, but can be > + neglected compared to the speedup this brings(about a factor 2 for the > + palinear with 1 predictor) > + */ > + MatrixXd tXW = W_masked.masked_data->data * X.data; I think the variable naming should be more apropriate here: tXW sounds like X^t * W, but you store W * X in that variable. > + MatrixXd xWx = tXW.transpose() * X.data; Similarly here, I'm not sure how to interpret xWx. Since you calculate (W*X)^t * X a name like WXtX seems more reasonable. > + Ch = LDLT(xWx); > + VectorXd beta_vec = Ch.solve(tXW.transpose() * Y); > + sigma2 = (Y - tXW * beta_vec).squaredNorm(); > + beta.data = beta_vec; > + > } Thanks for the good work! Lennart. > > void linear_reg::logLikelihood(const mematrix& X) { > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 7 16:37:38 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 07 Apr 2014 16:37:38 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector In-Reply-To: <20140407143639.30257186D32@r-forge.r-project.org> References: <20140407143639.30257186D32@r-forge.r-project.org> Message-ID: <5342B832.20207@karssen.org> Pfff..... Finally got the filevector tag the way I wanted... Lennart. On 07-04-14 16:36, noreply at r-forge.r-project.org wrote: > Author: lckarssen > Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014) > New Revision: 1679 > > Added: > tags/filevector/v.1.0.0/ > Log: > Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674. > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Apr 7 19:38:50 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 7 Apr 2014 19:38:50 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector In-Reply-To: <5342B832.20207@karssen.org> References: <20140407143639.30257186D32@r-forge.r-project.org> <5342B832.20207@karssen.org> Message-ID: Yep, I was following the story with curiosity :) ---------------- Sent from mobile device, please excuse possible typos > On 07 Apr 2014, at 16:37, "L.C. Karssen" wrote: > > Pfff..... Finally got the filevector tag the way I wanted... > > > > Lennart. > >> On 07-04-14 16:36, noreply at r-forge.r-project.org wrote: >> Author: lckarssen >> Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014) >> New Revision: 1679 >> >> Added: >> tags/filevector/v.1.0.0/ >> Log: >> Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674. >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From kooyman at gmail.com Mon Apr 7 19:51:54 2014 From: kooyman at gmail.com (Maarten Kooyman) Date: Mon, 07 Apr 2014 19:51:54 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1671 - in branches/ProbABEL-0.50: examples src In-Reply-To: <53424E77.5020400@karssen.org> References: <20140402202543.B9693186EE2@r-forge.r-project.org> <53424E77.5020400@karssen.org> Message-ID: <5342E5BA.3000509@gmail.com> Hi Lennart, On 07-04-14 09:06, L.C. Karssen wrote: > Hi Maarten, > > On 02-04-14 22:25, noreply at r-forge.r-project.org wrote: >> Author: maartenk >> Date: 2014-04-02 22:25:43 +0200 (Wed, 02 Apr 2014) >> > Thanks :-). >> >> >> Modified: branches/ProbABEL-0.50/src/probabel >> =================================================================== >> --- branches/ProbABEL-0.50/src/probabel 2014-04-02 15:57:31 UTC (rev 1670) >> +++ branches/ProbABEL-0.50/src/probabel 2014-04-02 20:25:43 UTC (rev 1671) >> @@ -169,6 +169,11 @@ >> >> >> my $phename = $ARGV[5]; >> +if (! -e $phename.".PHE"){ >> +die "Phenotype file $phename.PHE does not exists. The phenotype file should be specified without the .PHE extension.\n"; >> +} >> + >> + >> # By default the output file prefix is the same as the name of the >> # phenotype file (minus the .PHE extension and any paths) >> use File::Basename; >> >> Modified: branches/ProbABEL-0.50/src/reg1.cpp >> =================================================================== >> --- branches/ProbABEL-0.50/src/reg1.cpp 2014-04-02 15:57:31 UTC (rev 1670) >> +++ branches/ProbABEL-0.50/src/reg1.cpp 2014-04-02 20:25:43 UTC (rev 1671) >> @@ -313,35 +313,21 @@ >> void linear_reg::mmscore_regression(const mematrix& X, >> const masked_matrix& W_masked, LDLT& Ch) { >> >> - >> VectorXd Y = reg_data.Y.data.col(0); >> - if (X.data.cols() == 3) >> - { >> - Matrix tXW = W_masked.masked_data->data * X.data; >> - Matrix2d xWx = tXW.transpose() * X.data; >> - Ch = LDLT(xWx); >> - Vector3d beta_3f = Ch.solve(tXW.transpose() * Y); >> - sigma2 = (Y - tXW * beta_3f).squaredNorm(); >> - beta.data = beta_3f; >> - } >> - else if (X.data.cols() == 2) >> - { >> - Matrix tXW = W_masked.masked_data->data*X.data; >> - Matrix2d xWx = tXW.transpose() * X.data; >> - Ch = LDLT (xWx); >> - Vector2d beta_2f = Ch.solve(tXW.transpose() * Y); >> - sigma2 = (Y - tXW * beta_2f).squaredNorm(); >> - beta.data = beta_2f; >> - } >> - else >> - { >> - // next line is 5997000 flops >> - MatrixXd tXW = X.data.transpose() * W_masked.masked_data->data; >> - Ch = LDLT(tXW * X.data); // 17991 flops >> - beta.data = Ch.solve(tXW * Y); //5997 flops >> - //next line is: 1000+5000+3000= 9000 flops >> - sigma2 = (Y - tXW.transpose() * beta.data).squaredNorm(); >> - } > Glad to see the if/else go! This is much cleaner (and apparently not > slower). > > >> + /* >> + in ProbABEL <0.50 this calculation was performed like t(X)*W >> + This changed to W*X since this is better vectorized since the left hand >> + side has more rows: this introduces an additional transpose, but can be >> + neglected compared to the speedup this brings(about a factor 2 for the >> + palinear with 1 predictor) >> + */ >> + MatrixXd tXW = W_masked.masked_data->data * X.data; > I think the variable naming should be more apropriate here: tXW sounds > like X^t * W, but you store W * X in that variable. Yepp, your right it should be called tWX. We skip the transpose of W since it is a symmetric matrix: however in terms of mathematics it makes sense to call what we achieve. You can read in the code what we do. This might need some explanation in form of comments. > >> + MatrixXd xWx = tXW.transpose() * X.data; > Similarly here, I'm not sure how to interpret xWx. Since you calculate > (W*X)^t * X a name like WXtX seems more reasonable. So this will be something like ttWXX ??? Any other good solution? >> + Ch = LDLT(xWx); >> + VectorXd beta_vec = Ch.solve(tXW.transpose() * Y); >> + sigma2 = (Y - tXW * beta_vec).squaredNorm(); >> + beta.data = beta_vec; >> + >> } > Thanks for the good work! > > Lennart. > >> >> void linear_reg::logLikelihood(const mematrix& X) { >> From fabregat at aices.rwth-aachen.de Mon Apr 7 20:08:52 2014 From: fabregat at aices.rwth-aachen.de (Diego Fabregat) Date: Mon, 7 Apr 2014 20:08:52 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1671 - in branches/ProbABEL-0.50: examples src In-Reply-To: <5342E5BA.3000509@gmail.com> References: <20140402202543.B9693186EE2@r-forge.r-project.org> <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com> Message-ID: <5342E9B4.3020007@aices.rwth-aachen.de> Hi guys, If I may... >>> + /* >>> + in ProbABEL <0.50 this calculation was performed like t(X)*W >>> + This changed to W*X since this is better vectorized since the >>> left hand >>> + side has more rows: this introduces an additional transpose, >>> but can be >>> + neglected compared to the speedup this brings(about a factor 2 >>> for the >>> + palinear with 1 predictor) >>> + */ >>> + MatrixXd tXW = W_masked.masked_data->data * X.data; >> I think the variable naming should be more apropriate here: tXW sounds >> like X^t * W, but you store W * X in that variable. > Yepp, your right it should be called tWX. We skip the transpose of W > since it is a symmetric matrix: however in terms of mathematics it > makes sense to call what we achieve. You can read in the code what we > do. This might need some explanation in form of comments. >> >>> + MatrixXd xWx = tXW.transpose() * X.data; >> Similarly here, I'm not sure how to interpret xWx. Since you calculate >> (W*X)^t * X a name like WXtX seems more reasonable. > > So this will be something like ttWXX ??? Any other good solution? I don't know the context of the discussion, but what do you think about documenting the algorithm somewhere in the code (like at the top of the source file), giving simple names to the variables, and then just using those names instead of getting to a point where you have to juggle with cryptic variable names. For instance, in case you want to solve a least-squares problem inv(X^T X) X^T y: /* * Algorithm for LSQ [ b := inv(X^T X) X^T y ] * * S := X^T X * v := X^T y * b := inv(S) y (notice that this should be solved as a linear system, not explicitly inverting S) */ And then you can simply use S, v, and b in the code. Best, Diego From lennart at karssen.org Mon Apr 7 21:48:37 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 07 Apr 2014 21:48:37 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1679 - tags/filevector In-Reply-To: References: <20140407143639.30257186D32@r-forge.r-project.org> <5342B832.20207@karssen.org> Message-ID: <53430115.6020409@karssen.org> :-) One more reason to consider switching to git. There I could have gotten it right before pushing it to the public repo and bothering all of you. Lennart. On 07-04-14 19:38, Yury Aulchenko wrote: > Yep, I was following the story with curiosity :) > > ---------------- > Sent from mobile device, please excuse possible typos > >> On 07 Apr 2014, at 16:37, "L.C. Karssen" wrote: >> >> Pfff..... Finally got the filevector tag the way I wanted... >> >> >> >> Lennart. >> >>> On 07-04-14 16:36, noreply at r-forge.r-project.org wrote: >>> Author: lckarssen >>> Date: 2014-04-07 16:36:38 +0200 (Mon, 07 Apr 2014) >>> New Revision: 1679 >>> >>> Added: >>> tags/filevector/v.1.0.0/ >>> Log: >>> Tagging release v1.0.0 of filevector (libs and utils), based on SVN r1674. >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 7 21:52:43 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 07 Apr 2014 21:52:43 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1671 - in branches/ProbABEL-0.50: examples src In-Reply-To: <5342E9B4.3020007@aices.rwth-aachen.de> References: <20140402202543.B9693186EE2@r-forge.r-project.org> <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com> <5342E9B4.3020007@aices.rwth-aachen.de> Message-ID: <5343020B.8060401@karssen.org> Hi Diego, On 07-04-14 20:08, Diego Fabregat wrote: > Hi guys, > > If I may... Of course! More than welcome :-). I like your suggestion a lot. It's seems the cleanest one, with the added advantage that it documents the algorithm explicitly in the code. Thanks, Lennart. >>>> + /* >>>> + in ProbABEL <0.50 this calculation was performed like t(X)*W >>>> + This changed to W*X since this is better vectorized since the >>>> left hand >>>> + side has more rows: this introduces an additional transpose, >>>> but can be >>>> + neglected compared to the speedup this brings(about a factor 2 >>>> for the >>>> + palinear with 1 predictor) >>>> + */ >>>> + MatrixXd tXW = W_masked.masked_data->data * X.data; >>> I think the variable naming should be more apropriate here: tXW sounds >>> like X^t * W, but you store W * X in that variable. >> Yepp, your right it should be called tWX. We skip the transpose of W >> since it is a symmetric matrix: however in terms of mathematics it >> makes sense to call what we achieve. You can read in the code what we >> do. This might need some explanation in form of comments. >>> >>>> + MatrixXd xWx = tXW.transpose() * X.data; >>> Similarly here, I'm not sure how to interpret xWx. Since you calculate >>> (W*X)^t * X a name like WXtX seems more reasonable. >> >> So this will be something like ttWXX ??? Any other good solution? > I don't know the context of the discussion, but what do you think about > documenting the algorithm somewhere in the code (like at the top of the > source file), giving simple names to the variables, and then just using > those names instead of getting to a point where you have to juggle with > cryptic variable names. For instance, in case you want to solve a > least-squares problem inv(X^T X) X^T y: > > /* > * Algorithm for LSQ [ b := inv(X^T X) X^T y ] > * > * S := X^T X > * v := X^T y > * b := inv(S) y (notice that this should be solved as a linear > system, not explicitly inverting S) > */ > > And then you can simply use S, v, and b in the code. > > Best, > Diego > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Tue Apr 8 01:14:43 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Tue, 8 Apr 2014 01:14:43 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1671 - in branches/ProbABEL-0.50: examples src In-Reply-To: <5343020B.8060401@karssen.org> References: <20140402202543.B9693186EE2@r-forge.r-project.org> <53424E77.5020400@karssen.org> <5342E5BA.3000509@gmail.com> <5342E9B4.3020007@aices.rwth-aachen.de> <5343020B.8060401@karssen.org> Message-ID: agree! On Mon, Apr 7, 2014 at 9:52 PM, L.C. Karssen wrote: > Hi Diego, > > On 07-04-14 20:08, Diego Fabregat wrote: > > Hi guys, > > > > If I may... > > Of course! More than welcome :-). > > I like your suggestion a lot. It's seems the cleanest one, with the > added advantage that it documents the algorithm explicitly in the code. > > > Thanks, > > Lennart. > > >>>> + /* > >>>> + in ProbABEL <0.50 this calculation was performed like t(X)*W > >>>> + This changed to W*X since this is better vectorized since the > >>>> left hand > >>>> + side has more rows: this introduces an additional transpose, > >>>> but can be > >>>> + neglected compared to the speedup this brings(about a factor 2 > >>>> for the > >>>> + palinear with 1 predictor) > >>>> + */ > >>>> + MatrixXd tXW = W_masked.masked_data->data * X.data; > >>> I think the variable naming should be more apropriate here: tXW sounds > >>> like X^t * W, but you store W * X in that variable. > >> Yepp, your right it should be called tWX. We skip the transpose of W > >> since it is a symmetric matrix: however in terms of mathematics it > >> makes sense to call what we achieve. You can read in the code what we > >> do. This might need some explanation in form of comments. > >>> > >>>> + MatrixXd xWx = tXW.transpose() * X.data; > >>> Similarly here, I'm not sure how to interpret xWx. Since you calculate > >>> (W*X)^t * X a name like WXtX seems more reasonable. > >> > >> So this will be something like ttWXX ??? Any other good solution? > > I don't know the context of the discussion, but what do you think about > > documenting the algorithm somewhere in the code (like at the top of the > > source file), giving simple names to the variables, and then just using > > those names instead of getting to a point where you have to juggle with > > cryptic variable names. For instance, in case you want to solve a > > least-squares problem inv(X^T X) X^T y: > > > > /* > > * Algorithm for LSQ [ b := inv(X^T X) X^T y ] > > * > > * S := X^T X > > * v := X^T y > > * b := inv(S) y (notice that this should be solved as a linear > > system, not explicitly inverting S) > > */ > > > > And then you can simply use S, v, and b in the code. > > > > Best, > > Diego > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Apr 9 22:12:03 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 9 Apr 2014 22:12:03 +0200 Subject: [GenABEL-dev] Summer school in statistical omics Message-ID: <1CB1368F-D306-4AE6-AC54-BAC9E7BD7551@gmail.com> Dear All, Sorry for off-topic, but I was wondering if some of you (especially these involved in teaching and training) may be interested in information about summer school we are organizing. May be good for your students! The deadline is however close. See below and also the link http://school.statisticalomics.org Yurii Summer School in Statistical Omics 2014, to be held in Split, Croatia, from Aug 1 to 15, 2014. The deadline for applications is April 15, 2014. The School of Statistical Omics aims to train new generation of omics scientists. The School consists of project-based training-through-research and series of lectures designed to introduce students with biological/biomedical/biochemical background to statistical analyses of multiple omics datasets. With the development of high-throughput technologies in the recent years, the field of biology has become data-rich field and, consequently, there is an increasing need for biologists trained in data analysis. The aim of this highly intensive School is training of a new generation of the Statistical Omics scientists. In two weeks of work on a cutting-edge scientific projects participants will be introduced to highly relevant real-world problems related to the fields of glycomics and genomics and gain experience of programming in R programming language. Selected participants will spend two weeks working on a project in a small groups of up to 5 students. Morning sessions are planned for the lectures that are intended to give overview of the current state-of-the-art methods used in the field and corresponding theoretical background for practical sessions that will be held in the afternoons. First two days are dedicated to introduction to the field of Statistical Omics and selection of projects. All projects will be presented by project leaders and students will choose the project they wish to work on. The groups will be formed balancing between personal choices and equal distribution of participants on all projects. Each group will consist of up to 5 students, a project leader and a tutor. During the School, several lectures will be held from the leading scientists in the field discussing the latest advances and discoveries. The School will end with a small conference where students will present their work through posters and presentations. Given the nature and magnitude of the projects, students are invited to stay in contact with project leaders and to continue to work on projects after the School, potentially leading to scientific publications or qualification works (e.g. MSci). ---------------- Sent from mobile device, please excuse possible typos -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Apr 9 22:20:40 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 9 Apr 2014 22:20:40 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <5339E158.2040106@gmail.com> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com> <5335F579.9070608@karssen.org> <5339E158.2040106@gmail.com> Message-ID: Absolutely agree. More than supportive! Would be absolutely cool to be able to have all these different packages and functions we have working with different type of data via centralized API. Tremendous help in development of new methods, something which would really make GenA project attractive for other developers. Yurii On Monday, March 31, 2014, Maarten Kooyman wrote: > Dear All, > > It might be usefull to make next generation Databel with a interface for > IMPUTE2/SHAPEIT and mach/minimac. Having one library/package to read the > data would help all projects in usability. I'm not the one waiting to > convert my 1kg imputations into other format. Nobody (in user perspective) > feels like saving the same hundreds of GB of data in multiple formats. (And > that is a practical reason for choosing a program to work with, and might > not be the same as the best program) > > To centralize these function would also benefit method developers. They do > not have to bother with writing another parser. Creating a reliable, fast > and multi-format parser is boilerplate code and this kind of code you do > not want to bother with if you have a new powerful methodology in mind. > That is why lots of scientific software is picky on input format. There are > offcourse some problems caused by the nature of the data format eg [1]. > > > Kind regards, > > Maarten > > > > > [1] One problem is that there is an number of different predictors in > those formats. It varies between 1 and 3, where in case of IMPUTE2/SHAPEIT > the probabilities do not sum to one. mach/minimac might be converted to 3 > predictors since it should[1] add to one. > > On 31-03-14 20:46, Yury Aulchenko wrote: > > I personally find the fact that text outperforms binary disappointing > (and, if you forget about technical details - well, strange). On the other > hand this is probably good for user as it eradicates the need to do > conversion. Especially if we could work with compressed files. Especially > if we build interface to work with other type of text outputs (e.g. IMPUTE2 > would be a candidate)... > > Yurii > > ---------------- > Sent from mobile device, please excuse possible typos > > On 28 Mar 2014, at 23:19, "L.C. Karssen" wrote: > > Dear all, > > (I guess the previous version of this mail went to the commit email > list, so here it is again for the devel list). > > > Indeed: an impressive speed-up! Well done Maarten. > > On 28-03-14 20:30, Maarten Kooyman wrote: > I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted > for sex and age (I did not run it in triplet but gives an idea) > > version 0.42 0.50_branch > FV 58 52 > mldose 48 12 > all times ate in seconds. > > As you can see the filevector format in the part that slows down the > program. When profiling the reading from FV takes up 86% of all the time > the program takes. > > > The current problem with reading from filevector is that the fv dat ais > stored in floats (this is logical as it means half the disk space usage > compared to storing doubles, moreover, the imputed data is never more > precise than a float anyway). > However, internally ProbABEL uses doubles for calculations. This means > conversion from float to double must occur at some point. > > Simply casting to double gives impression. For example casting a float > 0.677 to double gives: 0.67699998617172241 > Therefore, with version 0.4.0 I changed this and used a string as > intermediate form, followed by strtod(). First I used stringstreams, but > these turn out to be much too slow for our use case. Now snprintf() is > used. For the above example the double value is: 0.67700000000000005, > much closer to what we would like to see. Using this two-step conversion > means the output when using fv is equal to the output using txt data > (and equal to using R), within float precision. > > Using Maarten's 'strtod' will speed up this part as well, but the > snprintf() call is still expensive. > > Apart from this two-step conversion we may also be inefficient because > the dosage/probability values are converted one array element at the > time. Maybe we can gain something there, like Maarten did for the txt > format and simply sending a whole 'line'/array to the conversion may help. > > > > > Given that most people nowadays store their imputation results in chunks > of chromosomes anyway (i.e. small(er) files), and the fact that I think > implementing the ability to read gziped files is not difficult, it may > be time to give mldose.gz files another chance for ProbABEL users. It > will save them the conversion from mldose.gz to DatABEL. > Of course we can still support DatABEL files, but (depending on how fast > reading from gzipped files is), our recommendation could change with the > upcoming ProbABEL v0.5.0. > > Any thoughts on this? > > > Best, > > Lennart. > > > > > > On 28-03-14 20:15, Yury Aulchenko wrote: > 10 fold is good speed up. An order of magnitude :) > > Wonder how it compares now to the reading from plain text files? > > Y > > ---------------- > Sent from mobile device, please excuse possible typos > > On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: > > Author: maartenk > Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) > New Revision: 1664 > > Modified: > branches/ProbABEL-0.50/src/gendata.cpp > branches/ProbABEL-0.50/src/gendata.h > Log: > new implementation of reading in numbers of mldose file: this version > is about a 10(!) fold faster than in ProABEL 0.42 > > Modified: branches/ProbABEL-0.50/src/gendata.cpp > =================================================================== > --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC > (rev 1663) > +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC< > > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.verkouteren at erasmusmc.nl Mon Apr 14 16:19:47 2014 From: j.verkouteren at erasmusmc.nl (J.A.C. Verkouteren) Date: Mon, 14 Apr 2014 14:19:47 +0000 Subject: [GenABEL-dev] Not receiving activiation e-mail new account GenABEL.org Message-ID: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl> Dear developers, I just tried to create a new account for your forum (username: JACV) but for some reason I do not receive the activation e-mail. Could your message be seen as spam by Erasmus MC Outlook? Kind regards, Joris A.C. Verkouteren MD PhD student Dermatology [Erasmus MC] P.O. Box 2040, 3000 CA Rotterdam, The Netherlands, internal postal address Gk-318 Visiting address: Burg. s' Jacobplein 51, 3015 CA Rotterdam, The Netherlands, room Gk-026 (Building Rochussenstraat) E j.verkouteren at erasmusmc.nl | T +31 10 703 89 51 www.erasmusmc.nl | www.erasmusmc.nl/dermatologie Presence: Monday-Friday -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 5102 bytes Desc: image001.gif URL: From lennart at karssen.org Mon Apr 14 21:36:44 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 14 Apr 2014 21:36:44 +0200 Subject: [GenABEL-dev] Not receiving activiation e-mail new account GenABEL.org In-Reply-To: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl> References: <1D02C6FBF773D44CA9D3AEA63BD658A8328F7F72@EXCH-HE04.erasmusmc.nl> Message-ID: <534C38CC.2080802@karssen.org> Dear Joris, Thanks for registering a forum account. I can't really say what happened to the activation e-mail, but I've just activated your account. Feel free to contact us again if it doesn't work out. Welcome on our forum! Best, Lennart. On 14-04-14 16:19, J.A.C. Verkouteren wrote: > Dear developers, > > I just tried to create a new account for your forum (username: JACV) but > for some reason I do not receive the activation e-mail. Could your > message be seen as spam by Erasmus MC Outlook? > > > > Kind regards, > > > *Joris A.C. Verkouteren MD* > > > /PhD student/ > Dermatology > Erasmus MC > > P.O. Box 2040, 3000 CA Rotterdam, The Netherlands, internal postal address Gk-318 > Visiting address: Burg. s' Jacobplein 51, 3015 CA Rotterdam, The Netherlands, room Gk-026 (Building Rochussenstraat) > E j.verkouteren at erasmusmc.nl | > T +31 10 703 89 51 > www.erasmusmc.nl | > www.erasmusmc.nl/dermatologie > > Presence: Monday-Friday > > > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 14 22:55:01 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 14 Apr 2014 22:55:01 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1689 - pkg/ProbABEL/src In-Reply-To: <20140414201904.2F0F1186B65@r-forge.r-project.org> References: <20140414201904.2F0F1186B65@r-forge.r-project.org> Message-ID: <534C4B25.60908@karssen.org> Hi Maarten, Thanks for doing the cleaning :-)! Lennart. On 14-04-14 22:19, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-14 22:19:03 +0200 (Mon, 14 Apr 2014) > New Revision: 1689 > > Modified: > pkg/ProbABEL/src/gendata.cpp > Log: > removed some old non functional code > > Modified: pkg/ProbABEL/src/gendata.cpp > =================================================================== > --- pkg/ProbABEL/src/gendata.cpp 2014-04-11 09:26:28 UTC (rev 1688) > +++ pkg/ProbABEL/src/gendata.cpp 2014-04-14 20:19:03 UTC (rev 1689) > @@ -253,7 +253,6 @@ > } > > std::string tmpid, tmpstr; > - char inStr[8]; > > int k = 0; > for (unsigned int i = 0; i < npeople; i++) > @@ -290,58 +289,11 @@ > infile >> tmpstr; > } > > - int oldstyle = 0; > - if (oldstyle == 1) > - { > - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) > - { > - if (infile.good()) > - { > - infile >> inStr; > - // tmpstr contains the dosage/probability in > - // string form. Convert it to double (if tmpstr is > - // NaN it will be set to nan). > - double dosage; > - char *endptr; > - errno = 0; // To distinguish success/failure > - // after strtod() > + std::string all_numbers; > + all_numbers.reserve(nsnps * ngpreds * 7); > + std::getline(infile, all_numbers); > + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps * ngpreds); > > - dosage = strtod(inStr, &endptr); > - if ((errno == ERANGE > - && (dosage == HUGE_VALF || dosage == HUGE_VALL)) > - || (errno != 0 && dosage == 0)) > - { > - perror("Error while reading genetic data (strtod)"); > - exit(EXIT_FAILURE); > - } > - > - if (endptr == tmpstr.c_str()) > - { > - cerr > - << "No digits were found while reading genetic data" > - << " (individual " << i + 1 << ", position " > - << j + 1 << ")" << endl; > - exit(EXIT_FAILURE); > - } > - /* If we got here, strtod() successfully parsed a number */ > - G.put(dosage, k, j); > - } > - else > - { > - std::cerr << "cannot read dose-file: " << fname > - << "check skipd and ngpreds parameters\n"; > - infile.close(); > - exit(1); > - } > - } > - } > - else > - { > - std::string all_numbers; > - all_numbers.reserve(nsnps * ngpreds * 7); > - std::getline(infile, all_numbers); > - mldose_line_to_matrix(k, all_numbers.c_str(), nsnps * ngpreds); > - } > k++; > } > else > @@ -361,7 +313,6 @@ > > } > > - > // HERE NEED A NEW CONSTRUCTOR BASED ON DATABELBASECPP OBJECT > gendata::~gendata() > { > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Apr 16 13:02:19 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 16 Apr 2014 13:02:19 +0200 Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on big-endian architectures In-Reply-To: <20140416102331.E8B38187585@r-forge.r-project.org> References: <20140416102331.E8B38187585@r-forge.r-project.org> Message-ID: <-4562335105651951279@unknownmsgid> Is that something for bug tracker or forum or a mix? ---------------------- Yurii Aulchenko (sent from mobile device) > On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" wrote: > > Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 > > Status: Open > Priority: 2 > Submitted By: Lennart Karssen (lckarssen) > Assigned to: Nobody (None) > Summary: Filevector doesn't work on big-endian architectures > Resolution: Accepted As Bug > Operating System: All > Severity: normal > Hardware: Other > Version: other > Component: FileVector > URL: https://buildd.debian.org/status/package.php?p=probabel > > > Initial Comment: > The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input. > > Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html > Wollny writes: > "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the > function blockWriteOrRead uses fstream.read|write to do raw data IO and > then in other parts of the code the data is just cast to the desired > type without doing any checks of endianess let alone the needed > conversions." > > > Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds. > > ---------------------------------------------------------------------- > > Comment By: Jurica Stanojkovic (juricast) > Date: 2014-04-16 12:23 > > Message: > Hello, > > I have tried building package probabel on mips big endian. > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones. > > I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created. > The package was built with new files without an error. > > I used following command to create files: > library(GenABEL) > library(DatABEL) > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose") > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE) > mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose") > mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) > > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions: > > What is the best course of action for supporting probabel on big endian? > Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)? > Or can *.fvd, *.fvi files be replaced with big endian files for big endian build? > > Is it necessary to be able to use *.fvd *.fvi files created on a different endian system? > > I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem. > > Regards, > Jurica > > ---------------------------------------------------------------------- > > Comment By: Lennart Karssen (lckarssen) > Date: 2014-01-27 21:20 > > Message: > A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics. > > > ---------------------------------------------------------------------- > > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 From lennart at karssen.org Wed Apr 16 14:29:27 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 16 Apr 2014 14:29:27 +0200 Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on big-endian architectures In-Reply-To: <-4562335105651951279@unknownmsgid> References: <20140416102331.E8B38187585@r-forge.r-project.org> <-4562335105651951279@unknownmsgid> Message-ID: <534E77A7.5040005@karssen.org> I would say this is for the dev list + bug Lennart. On 16-04-14 13:02, Yurii Aulchenko wrote: > Is that something for bug tracker or forum or a mix? > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" wrote: >> >> Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic >> You can respond by visiting: >> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 >> >> Status: Open >> Priority: 2 >> Submitted By: Lennart Karssen (lckarssen) >> Assigned to: Nobody (None) >> Summary: Filevector doesn't work on big-endian architectures >> Resolution: Accepted As Bug >> Operating System: All >> Severity: normal >> Hardware: Other >> Version: other >> Component: FileVector >> URL: https://buildd.debian.org/status/package.php?p=probabel >> >> >> Initial Comment: >> The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input. >> >> Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html >> Wollny writes: >> "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the >> function blockWriteOrRead uses fstream.read|write to do raw data IO and >> then in other parts of the code the data is just cast to the desired >> type without doing any checks of endianess let alone the needed >> conversions." >> >> >> Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds. >> >> ---------------------------------------------------------------------- >> >> Comment By: Jurica Stanojkovic (juricast) >> Date: 2014-04-16 12:23 >> >> Message: >> Hello, >> >> I have tried building package probabel on mips big endian. >> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones. >> >> I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created. >> The package was built with new files without an error. >> >> I used following command to create files: >> library(GenABEL) >> library(DatABEL) >> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose") >> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE) >> mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose") >> mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) >> >> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions: >> >> What is the best course of action for supporting probabel on big endian? >> Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)? >> Or can *.fvd, *.fvi files be replaced with big endian files for big endian build? >> >> Is it necessary to be able to use *.fvd *.fvi files created on a different endian system? >> >> I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem. >> >> Regards, >> Jurica >> >> ---------------------------------------------------------------------- >> >> Comment By: Lennart Karssen (lckarssen) >> Date: 2014-01-27 21:20 >> >> Message: >> A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics. >> >> >> ---------------------------------------------------------------------- >> >> You can respond by visiting: >> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Apr 16 15:11:49 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 16 Apr 2014 15:11:49 +0200 Subject: [GenABEL-dev] [genabel-Bugs][5299] Filevector doesn't work on big-endian architectures In-Reply-To: <534E77A7.5040005@karssen.org> References: <20140416102331.E8B38187585@r-forge.r-project.org> <-4562335105651951279@unknownmsgid> <534E77A7.5040005@karssen.org> Message-ID: <534E8195.2040906@karssen.org> Hmm that was maybe a bit too short a reaction :-). I'd suggest the following: - Move the discussion to the dev list (mention that in the bug tracker as well) - Once a course of action has been decided we can do status updates in the bug tracker. Lennart. On 16-04-14 14:29, L.C. Karssen wrote: > I would say this is for the dev list + bug > > > Lennart. > > On 16-04-14 13:02, Yurii Aulchenko wrote: >> Is that something for bug tracker or forum or a mix? >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >>> On Apr 16, 2014, at 12:23 PM, "genabel-bugs at r-forge.r-project.org" wrote: >>> >>> Bugs item #5299, was changed at 2014-01-24 09:38 by Jurica Stanojkovic >>> You can respond by visiting: >>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 >>> >>> Status: Open >>> Priority: 2 >>> Submitted By: Lennart Karssen (lckarssen) >>> Assigned to: Nobody (None) >>> Summary: Filevector doesn't work on big-endian architectures >>> Resolution: Accepted As Bug >>> Operating System: All >>> Severity: normal >>> Hardware: Other >>> Version: other >>> Component: FileVector >>> URL: https://buildd.debian.org/status/package.php?p=probabel >>> >>> >>> Initial Comment: >>> The Debian build logs for big-endian machines (see URL) show that the ProbABEL checks fail on machines with that architecture. Closer inspection reveals that the checks fail on the comparison between text and binary (filevector-format) input. >>> >>> Also see this discussion on the debian-mentor mailing list, especially Gert Wollny's posts: https://lists.debian.org/debian-mentors/2014/01/msg00326.html >>> Wollny writes: >>> "I dug around in the code and voila, e.g. in fvlib/frutil.cpp the >>> function blockWriteOrRead uses fstream.read|write to do raw data IO and >>> then in other parts of the code the data is just cast to the desired >>> type without doing any checks of endianess let alone the needed >>> conversions." >>> >>> >>> Since I doubt that many people will use ProbABEL/DatABEL/filevector on other (big-endian) architectures there is no hurry in fixing this. Nevertheless it's worth having this bug visible and in the back of our minds. >>> >>> ---------------------------------------------------------------------- >>> >>> Comment By: Jurica Stanojkovic (juricast) >>> Date: 2014-04-16 12:23 >>> >>> Message: >>> Hello, >>> >>> I have tried building package probabel on mips big endian. >>> It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones. >>> >>> I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created. >>> The package was built with new files without an error. >>> >>> I used following command to create files: >>> library(GenABEL) >>> library(DatABEL) >>> fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose") >>> fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE) >>> mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose") >>> mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) >>> >>> I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions: >>> >>> What is the best course of action for supporting probabel on big endian? >>> Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)? >>> Or can *.fvd, *.fvi files be replaced with big endian files for big endian build? >>> >>> Is it necessary to be able to use *.fvd *.fvi files created on a different endian system? >>> >>> I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem. >>> >>> Regards, >>> Jurica >>> >>> ---------------------------------------------------------------------- >>> >>> Comment By: Lennart Karssen (lckarssen) >>> Date: 2014-01-27 21:20 >>> >>> Message: >>> A suggestion by Andreas Tille on the debian-med list: It's good to keep in mind that in the near future architectures like arm(64) may become much more popular in genomics. >>> >>> >>> ---------------------------------------------------------------------- >>> >>> You can respond by visiting: >>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5299&group_id=505 >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Fri Apr 18 16:35:29 2014 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 18 Apr 2014 16:35:29 +0200 Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL Message-ID: <53513831.3040506@karssen.org> Dear list, In the past few months Maarten has made several speed improvements to ProbABEL. Many of these speedups make use of the EIGEN library that was first introduced into ProbABEL in v0.3.0. After merging Maarten's branch with trunk (and after I independently added more extensive checks in Jenkins) we found out compilation after configuring ProbABEL using ./configure --without-eigen fails. Fixing this is not trivial, so we are hereby proposing to remove the --without-eigen option. This doesn't necessarily mean that all mematrix code needs to be removed immediately, but by insisting on using EIGEN we can at least start removing the old code. Impact analysis for users and developers: 1) positive: consistent (and faster) analysis speed experience for all users: everybody will use EIGEN 2) positive: reduction of maintenance/development time because we no longer need to maintain the non-EIGEN parts of the code. 3) possibly negative: we need to make a choice on whether we will distribute EIGEN with the ProbABEL code, or whether we 'force' the user to download the code themselves. Point 3) is similar to the debate about libfilevector: do we go for a simple user experience where all requirements are combined in the distributed source code, or do we make use of the modularity of the code and its dependencies and let people download and install the dependencies themselves (or use packages provided by the OS). In the upcoming release we also plan to include calculation of p-values using the Boost libraries [0]. The same issue will arise there again. Therefore, I would like to start/continue the discussion here on how to proceed with external dependencies. I'm really looking forward to your opinions. Below I've outlined several options I could think of on how to go forward. Let me know what you think of them or if you have any other ideas. Thanks a lot, Lennart. Note 1: For ProbABEL we provide pre-compiled MS Windows binaries, so that platform is not part of this discussion. Note 2: EIGEN consists of header files only, no compilation is needed to use EIGEN (either at compile time or at run time). I see the following options: a) include a copy of the EIGEN source code in the ProbABEL code base (in SVN) b) include a copy of the EIGEN source code in the official released ProbABEL tar.gz. c) don't include the EIGEN source code, but provide very clear instructions on how to obtain EIGEN. d) include a script that downloads and extracts the latest EIGEN and mention that script in the installation instructions. e) Automatic download and extraction of the EIGEN source code during the ./configure (or make) process of ProbABEL. More details about these options: a): - Licence-wise this seems possible as EIGEN is released under the MPL2. But Q14 of http://www.mozilla.org/MPL/2.0/FAQ.html doesn't immediately make clear to me what the requirements/repercussions are. More thorough reading of the licence is probably required. - ProbABEL contains both GPL and LGPL licensed files (a complete overview had to be made for the Debian package and can be found at [1]), so I'm not overly happy to add yet another type of licence. - simple for the user; everything is in and compiles cleanly. - developers don't need to keep up with updates of EIGEN, so no incompatibility; we can keep the current EIGEN code in there forever (like was done with parts of the code from the R survival package) - However, with a copy of the EIGEN code in SVN we don't benefit from bug fixes and improvements in EIGEN. b): - The same licence issues as in a) apply - simple for the user - developers will need to keep up with new EIGEN releases, but we benefit from their improvements and bug fixes (unless we always distribute with the EIGEN version 3.2.1 (the current version). c): - This is what we currently do. This allows users/administrators/packages to use EIGEN either by downloading and extracting it themselves or use OS-provided packages. Maybe we can improve the documentation to make it even easier. - This requires more 'investment' from the user: they need to carefully read the installation instructions AND download and extract EIGEN AND add the path with extracted code to the ./configure --with-eigen-include-path=/your/path/to/eigen option. d): - This would be easy to do, but would require the user to have wget or curl installed (are these available for all architectures?). Does that make things better? The good thing is we can fix the extraction directory so the ./configure --with-eigen-include option can be preset. - No hassle with licences - users/developers/packagers who want to use an OS-provided EIGEN package can do so e): - simple for the user - no hassle with licences - same dependency on wget or curl as d) - I'm not sure how to do that in configure.ac, but I think it can be done. - unless we add an --dont-download-eigen option to configure.ac users/developers/packagers who want to use OS-provided EIGEN packages won't be happy. [0] http://www.boost.org/ [1] http://sources.debian.net/src/probabel/0.4.3-1/debian/copyright -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Sat Apr 19 11:44:53 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Sat, 19 Apr 2014 11:44:53 +0200 Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL In-Reply-To: <53513831.3040506@karssen.org> References: <53513831.3040506@karssen.org> Message-ID: <1564471774009216793@unknownmsgid> I am for going eigen way (only). In general, the more use of standard libraries we do, the better; here only concern is how difficult it is for user I do proper installation. In case of eigen this does not seem to be a big problem. (But mind the experience we had with GSL and MixABEL - this appeared to be non-installable by many users) ---------------------- Yurii Aulchenko (sent from mobile device) > On Apr 18, 2014, at 4:35 PM, "L.C. Karssen" wrote: > > Dear list, > > In the past few months Maarten has made several speed improvements to > ProbABEL. Many of these speedups make use of the EIGEN library that was > first introduced into ProbABEL in v0.3.0. > After merging Maarten's branch with trunk (and after I independently > added more extensive checks in Jenkins) we found out compilation after > configuring ProbABEL using > ./configure --without-eigen > fails. Fixing this is not trivial, so we are hereby proposing to remove > the --without-eigen option. This doesn't necessarily mean that all > mematrix code needs to be removed immediately, but by insisting on using > EIGEN we can at least start removing the old code. > > Impact analysis for users and developers: > 1) positive: consistent (and faster) analysis speed experience for all > users: everybody will use EIGEN > > 2) positive: reduction of maintenance/development time because we no > longer need to maintain the non-EIGEN parts of the code. > > 3) possibly negative: we need to make a choice on whether we will > distribute EIGEN with the ProbABEL code, or whether we 'force' the user > to download the code themselves. > > > Point 3) is similar to the debate about libfilevector: do we go for a > simple user experience where all requirements are combined in the > distributed source code, or do we make use of the modularity of the code > and its dependencies and let people download and install the > dependencies themselves (or use packages provided by the OS). > In the upcoming release we also plan to include calculation of p-values > using the Boost libraries [0]. The same issue will arise there again. > > Therefore, I would like to start/continue the discussion here on how to > proceed with external dependencies. I'm really looking forward to your > opinions. Below I've outlined several options I could think of on how to > go forward. Let me know what you think of them or if you have any other > ideas. > > > Thanks a lot, > > Lennart. > > > > Note 1: For ProbABEL we provide pre-compiled MS Windows binaries, so > that platform is not part of this discussion. > Note 2: EIGEN consists of header files only, no compilation is needed to > use EIGEN (either at compile time or at run time). > > I see the following options: > a) include a copy of the EIGEN source code in the ProbABEL code base (in > SVN) > b) include a copy of the EIGEN source code in the official released > ProbABEL tar.gz. > c) don't include the EIGEN source code, but provide very clear > instructions on how to obtain EIGEN. > d) include a script that downloads and extracts the latest EIGEN and > mention that script in the installation instructions. > e) Automatic download and extraction of the EIGEN source code during the > ./configure (or make) process of ProbABEL. > > More details about these options: > a): > - Licence-wise this seems possible as EIGEN is released under the MPL2. > But Q14 of http://www.mozilla.org/MPL/2.0/FAQ.html doesn't immediately > make clear to me what the requirements/repercussions are. More thorough > reading of the licence is probably required. > - ProbABEL contains both GPL and LGPL licensed files (a complete > overview had to be made for the Debian package and can be found at [1]), > so I'm not overly happy to add yet another type of licence. > - simple for the user; everything is in and compiles cleanly. > - developers don't need to keep up with updates of EIGEN, so no > incompatibility; we can keep the current EIGEN code in there forever > (like was done with parts of the code from the R survival package) > - However, with a copy of the EIGEN code in SVN we don't benefit from > bug fixes and improvements in EIGEN. > > b): > - The same licence issues as in a) apply > - simple for the user > - developers will need to keep up with new EIGEN releases, but we > benefit from their improvements and bug fixes (unless we always > distribute with the EIGEN version 3.2.1 (the current version). > > c): > - This is what we currently do. This allows > users/administrators/packages to use EIGEN either by downloading and > extracting it themselves or use OS-provided packages. Maybe we can > improve the documentation to make it even easier. > - This requires more 'investment' from the user: they need to carefully > read the installation instructions AND download and extract EIGEN AND > add the path with extracted code to the ./configure > --with-eigen-include-path=/your/path/to/eigen option. > > d): > - This would be easy to do, but would require the user to have wget or > curl installed (are these available for all architectures?). Does that > make things better? The good thing is we can fix the extraction > directory so the ./configure --with-eigen-include option can be preset. > - No hassle with licences > - users/developers/packagers who want to use an OS-provided EIGEN > package can do so > > e): > - simple for the user > - no hassle with licences > - same dependency on wget or curl as d) > - I'm not sure how to do that in configure.ac, but I think it can be done. > - unless we add an --dont-download-eigen option to configure.ac > users/developers/packagers who want to use OS-provided EIGEN packages > won't be happy. > > > > > [0] http://www.boost.org/ > [1] http://sources.debian.net/src/probabel/0.4.3-1/debian/copyright > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From kooyman at gmail.com Mon Apr 21 20:18:02 2014 From: kooyman at gmail.com (Maarten Kooyman) Date: Mon, 21 Apr 2014 20:18:02 +0200 Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL In-Reply-To: <53513831.3040506@karssen.org> References: <53513831.3040506@karssen.org> Message-ID: <535560DA.5060008@gmail.com> On 18-04-14 16:35, L.C. Karssen wrote: > a) include a copy of the EIGEN source code in the ProbABEL code base (in > SVN) I strongly oppose to this option: we do not want to maintain this code and what should it do in our SVN? > b) include a copy of the EIGEN source code in the official released > ProbABEL tar.gz. This seems to me as the most foolproof way to distribute ProbABEL as code: you control also the versions of dependencies which can be handy compared to run into old versions of libraries. This results sometime in faulty binaries or non compiling set ups . Licence wise it looks all-right to me (however, I am not a OSS lawyer). The EIGEN source files as provided on the website are about a megabyte: this should not be a problem for distribution. If you look at the boost library licence wise it seem also fine,however the download provided as on there site is 60 megabyte: quite a download! We have to trim down this size one way or an other. > c) don't include the EIGEN source code, but provide very clear > instructions on how to obtain EIGEN. Reading manuals is often not done. Also this makes it harder for inexperience computer user and rises the bar for usage. > d) include a script that downloads and extracts the latest EIGEN and > mention that script in the installation instructions. > e) Automatic download and extraction of the EIGEN source code during the > ./configure (or make) process of ProbABEL. Sounds nice but right now I have problems to download EIGEN from there server. Maybe we should host the software ourself. This still causes size problems for downloading boost. Option E is as a workflow easier then options D. However, this downloading can be buggy since you not sure wget/curl is installed on the users system. (This needs also direct internet connection to the WWW and this not always the cause on some servers) Why do we not provide a statically executable? We have Jenkins in place to perform the builds. Kind regards, Maarten From lennart at karssen.org Wed Apr 23 08:33:29 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 23 Apr 2014 08:33:29 +0200 Subject: [GenABEL-dev] Proposal to remove non-EIGEN code paths from ProbABEL In-Reply-To: <535560DA.5060008@gmail.com> References: <53513831.3040506@karssen.org> <535560DA.5060008@gmail.com> Message-ID: <53575EB9.5080800@karssen.org> Great, it seems that we have a go-ahead for switching to an EIGEN-only ProbABEL. I'll start with the removal of the relevant options in configure.ac. On 21-04-14 20:18, Maarten Kooyman wrote: > > > On 18-04-14 16:35, L.C. Karssen wrote: >> a) include a copy of the EIGEN source code in the ProbABEL code base (in >> SVN) > I strongly oppose to this option: we do not want to maintain this code > and what should it do in our SVN? I completely agree. >> b) include a copy of the EIGEN source code in the official released >> ProbABEL tar.gz. > This seems to me as the most foolproof way to distribute ProbABEL as > code: you control also the versions of dependencies which can be handy > compared to run into old versions of libraries. This results sometime > in faulty binaries or non compiling set ups . Licence wise it looks > all-right to me (however, I am not a OSS lawyer). I still want to check that in more detail. I'll post my conclusions in this thread. > The EIGEN source files > as provided on the website are about a megabyte: this should not be a > problem for distribution. Indeed. Size is not an issue for EIGEN. For Boost you already noticed the problem below. > If you look at the boost library licence wise > it seem also fine, I agree. > however the download provided as on there site is 60 > megabyte: quite a download! We have to trim down this size one way or an > other. Yes, that's my point. That's one more reason why I am not too happy with 'distribution with ProbABEL options'. On the other hand, we may decide to have a different policy for EIGEN than for Boost. > >> c) don't include the EIGEN source code, but provide very clear >> instructions on how to obtain EIGEN. > Reading manuals is often not done. Also this makes it harder for > inexperience computer user and rises the bar for usage. True, people don't read. On the other hand, how many 'inexperienced' users do we have? Probably quite some, but I'm quite sure they don't know about the ./configure; make; make install steps either. Moreover, in order to install it themselves (without root privileges), they need to know about the ./configure --prefix option. Which is also in the documentation. So, all in all, I'm not so sure 'inexperienced' users will be able to successfully compile install ProbABEL without at least some reading. How about adding (a copy of) the necessary steps to the ProbABEL website as well? That way users will find them when looking for the source. > >> d) include a script that downloads and extracts the latest EIGEN and >> mention that script in the installation instructions. > >> e) Automatic download and extraction of the EIGEN source code during the >> ./configure (or make) process of ProbABEL. > Sounds nice but right now I have problems to download EIGEN from there > server. Maybe we should host the software ourself. Hmm, doesn't that (somewhat) contradict what you wrote under a) about not wanting to host the code ourselves? > This still causes > size problems for downloading boost. Option E is as a workflow easier > then options D. However, this downloading can be buggy since you not > sure wget/curl is installed on the users system. (This needs also direct > internet connection to the WWW and this not always the cause on some > servers) Yup. > > > Why do we not provide a statically executable? We have Jenkins in place > to perform the builds. > That's a good suggestion (or actually two). We can certainly do that. I should also try to get download statistics for source packages (and later the statically linked binaries) from the web server. That will help us to get a better idea of which is used. Thanks for your input! Hoping to see input from others as well on this matter. Best, Lennart. > > Kind regards, > > Maarten > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Apr 23 18:03:53 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 23 Apr 2014 18:03:53 +0200 Subject: [GenABEL-dev] ProbABEL v0.4.3 for Ubuntu 14.04 LTS uploaded to PPA Message-ID: <5357E469.9020704@karssen.org> Dear list, This is to inform you that I have just uploaded the ProbABEL v0.4.3 packages for Ubuntu 14.04 LTS (which was released on April 17th) to the GenABEL PPA [1]. The packages have been built successfully and can be installed now. Those who upgraded an older Ubuntu installation to 14.04 will need to re-enable the PPA to receive updates. Please post any questions related to installation from the PPA on our forum [2]. Best regards, Lennart. [1] https://launchpad.net/~l.c.karssen/+archive/genabel-ppa [2] http://forum.genabel.org/ -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Thu Apr 24 11:45:28 2014 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 24 Apr 2014 11:45:28 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1695 - pkg/ProbABEL/src In-Reply-To: <20140423185426.CBEEB18736F@r-forge.r-project.org> References: <20140423185426.CBEEB18736F@r-forge.r-project.org> Message-ID: <5358DD38.2070302@karssen.org> Wow, that's a lot of code removal! Glad to see this clean-up. Thanks Maarten! Now we can slowly start thinking of using Eigen directly instead of going through the eigen_mematrix "wrapper". However, I don't think this has a high priority now. Moreover, it is quite a large task, so definitely should be done in a branch and extensively tested. Lennart. On 23-04-14 20:54, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-23 20:54:26 +0200 (Wed, 23 Apr 2014) > New Revision: 1695 > > Removed: > pkg/ProbABEL/src/mematri1.h > pkg/ProbABEL/src/mematrix.h > Modified: > pkg/ProbABEL/src/cholesky.cpp > pkg/ProbABEL/src/cholesky.h > pkg/ProbABEL/src/command_line_settings.cpp > pkg/ProbABEL/src/coxph_data.cpp > pkg/ProbABEL/src/coxph_data.h > pkg/ProbABEL/src/data.cpp > pkg/ProbABEL/src/gendata.cpp > pkg/ProbABEL/src/gendata.h > pkg/ProbABEL/src/main.cpp > pkg/ProbABEL/src/maskedmatrix.cpp > pkg/ProbABEL/src/maskedmatrix.h > pkg/ProbABEL/src/phedata.h > pkg/ProbABEL/src/reg1.cpp > pkg/ProbABEL/src/regdata.h > pkg/ProbABEL/src/testchol.cpp > pkg/ProbABEL/src/usage.cpp > Log: > Removed mematri1.h mematrix.h specific code. This remove about 797 lines of code. > > Modified: pkg/ProbABEL/src/cholesky.cpp > =================================================================== > --- pkg/ProbABEL/src/cholesky.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/cholesky.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -9,13 +9,8 @@ > #include > #include > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > > > /* SCCS @(#)cholesky2.c 5.2 10/27/98 > > Modified: pkg/ProbABEL/src/cholesky.h > =================================================================== > --- pkg/ProbABEL/src/cholesky.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/cholesky.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -8,12 +8,8 @@ > #ifndef CHOLESKY_H_ > #define CHOLESKY_H_ > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#endif > > int cholesky2_mm(mematrix &matrix, double toler); > void chinv2_mm(mematrix &matrix); > > Modified: pkg/ProbABEL/src/command_line_settings.cpp > =================================================================== > --- pkg/ProbABEL/src/command_line_settings.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/command_line_settings.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -31,9 +31,7 @@ > #include > #include "usage.h" > #include "command_line_settings.h" > -#if EIGEN > #include "eigen_mematrix.h" > -#endif > > // config.h and fvlib/FileVector.h are included for the upper case variables > #if HAVE_CONFIG_H > > Modified: pkg/ProbABEL/src/coxph_data.cpp > =================================================================== > --- pkg/ProbABEL/src/coxph_data.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/coxph_data.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -405,22 +405,14 @@ > > // When using Eigen coxfit2 needs to be called in a slightly > // different way (i.e. the .data()-part needs to be added). > -#if EIGEN > coxfit2(&maxiter, &cdata.nids, &X.nrow, cdata.stime.data.data(), > cdata.sstat.data.data(), X.data.data(), newoffset.data.data(), > cdata.weights.data.data(), cdata.strata.data.data(), > means.data.data(), beta.data.data(), u.data.data(), > imat.data.data(), loglik_int, &flag, work, &eps, &tol_chol, > &sctest); > -#else > - coxfit2(&maxiter, &cdata.nids, &X.nrow, cdata.stime.data, > - cdata.sstat.data, X.data, newoffset.data, > - cdata.weights.data, cdata.strata.data, > - means.data, beta.data, u.data, > - imat.data, loglik_int, &flag, work, &eps, &tol_chol, > - &sctest); > -#endif > > + > niter = maxiter; > > // Check the results of the Cox fit; mirrored from the same checks > @@ -449,7 +441,6 @@ > << " setting beta and se to 'nan'\n"; > setToZero = true; > } else { > -#if EIGEN > VectorXd ueigen = u.data; > MatrixXd imateigen = imat.data; > VectorXd infs = ueigen.transpose() * imateigen; > @@ -463,12 +454,7 @@ > > setToZero = true; > } > -#else > - cerr << "Warning for " << snpinfo.name[cursnp] > - << ": can't check for infinite betas." > - << " Please compile ProbABEL with Eigen support to fix this." > - << endl; > -#endif > + > } > > for (int i = 0; i < X.nrow; i++) > > Modified: pkg/ProbABEL/src/coxph_data.h > =================================================================== > --- pkg/ProbABEL/src/coxph_data.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/coxph_data.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -29,13 +29,8 @@ > #ifndef COXPH_DATA_H_ > #define COXPH_DATA_H_ > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > > #include "data.h" > #include "reg1.h" > > Modified: pkg/ProbABEL/src/data.cpp > =================================================================== > --- pkg/ProbABEL/src/data.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/data.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -38,13 +38,8 @@ > #include "gendata.h" > #include "data.h" > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > #include "utilities.h" > > > > Modified: pkg/ProbABEL/src/gendata.cpp > =================================================================== > --- pkg/ProbABEL/src/gendata.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/gendata.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -31,13 +31,8 @@ > #include > #include "gendata.h" > #include "fvlib/FileVector.h" > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > #include "utilities.h" > > > > Modified: pkg/ProbABEL/src/gendata.h > =================================================================== > --- pkg/ProbABEL/src/gendata.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/gendata.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -31,13 +31,10 @@ > #include > #include "fvlib/FileVector.h" > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#endif > > + > class gendata { > public: > unsigned int nsnps; > > Modified: pkg/ProbABEL/src/main.cpp > =================================================================== > --- pkg/ProbABEL/src/main.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/main.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -67,15 +67,8 @@ > > #include //needed for timing loading non file vector format > > - > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > - > #include "maskedmatrix.h" > #include "data.h" > #include "reg1.h" > > Modified: pkg/ProbABEL/src/maskedmatrix.cpp > =================================================================== > --- pkg/ProbABEL/src/maskedmatrix.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/maskedmatrix.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -30,13 +30,8 @@ > > #include > #include "maskedmatrix.h" > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > > masked_matrix::masked_matrix() > { > > Modified: pkg/ProbABEL/src/maskedmatrix.h > =================================================================== > --- pkg/ProbABEL/src/maskedmatrix.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/maskedmatrix.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -29,13 +29,8 @@ > #ifndef MASKEDMATRIX_H_ > #define MASKEDMATRIX_H_ > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > > class masked_matrix { > public: > > Deleted: pkg/ProbABEL/src/mematri1.h > =================================================================== > --- pkg/ProbABEL/src/mematri1.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/mematri1.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -1,636 +0,0 @@ > -/* > - * > - * Copyright (C) 2009--2014 Various members of the GenABEL team. See > - * the SVN commit logs for more details. > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public License > - * as published by the Free Software Foundation; either version 2 > - * of the License, or (at your option) any later version. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, > - * MA 02110-1301, USA. > - * > - */ > - > - > -#ifndef MEMATRI1_H > -#define MEMATRI1_H > - > -#include > -#include > -#include > -#include > - > -// > -// constructors > -// > - > -template > -mematrix
::mematrix(int nr, int nc) > -{ > - if (nr <= 0) > - { > - fprintf(stderr, "mematrix(): nr <= 0\n"); > - exit(1); > - } > - if (nc <= 0) > - { > - fprintf(stderr, "mematrix(): nc <= 0\n"); > - exit(1); > - } > - nrow = nr; > - ncol = nc; > - nelements = nr * nc; > - data = new (nothrow) DT[ncol * nrow]; > - if (!data) > - { > - fprintf(stderr, "mematrix(nr,nc): cannot allocate memory (%d,%d)\n", > - nrow, ncol); > - exit(1); > - } > -} > - > -template > -mematrix
::mematrix(const mematrix
& M) > -{ > - ncol = M.ncol; > - nrow = M.nrow; > - nelements = M.nelements; > - data = new (nothrow) DT[M.ncol * M.nrow]; > - if (!data) > - { > - fprintf(stderr, > - "mematrix const(mematrix): cannot allocate memory (%d,%d)\n", > - M.nrow, M.ncol); > - exit(1); > - } > - // std::cerr << "mematrix const(mematrix): can allocate memory (" > - // << M.nrow << "," << M.ncol << ")\n"; > - for (int i = 0; i < M.ncol * M.nrow; i++) > - data[i] = M.data[i]; > -} > - > -// > -// operators > -// > -template > -mematrix
&mematrix
::operator=(const mematrix
&M) > -{ > - if (this != &M) > - { > - if (data != NULL) > - delete[] data; > - data = new (nothrow) DT[M.ncol * M.nrow]; > - if (!data) > - { > - fprintf(stderr, "mematrix=: cannot allocate memory (%d,%d)\n", > - M.nrow, M.ncol); > - delete[] data; > - exit(1); > - } > - ncol = M.ncol; > - nrow = M.nrow; > - nelements = M.nelements; > - for (int i = 0; i < M.ncol * M.nrow; i++) > - { > - data[i] = M.data[i]; > - } > - } > - return *this; > -} > - > -template > -DT &mematrix
::operator[](int i) > -{ > - if (i < 0 || i >= (ncol * nrow)) > - { > - fprintf(stderr, "mematrix[]: %d out of bounds (0,%d)\n", i, > - nrow * ncol - 1); > - exit(1); > - } > - return data[i]; > -} > - > -template > -mematrix
mematrix
::operator+(DT toadd) > -{ > - mematrix
temp(nrow, ncol); > - for (int i = 0; i < nelements; i++) > - temp.data[i] = data[i] + toadd; > - return temp; > -} > - > -template > -mematrix
mematrix
::operator+(mematrix
&M) > -{ > - if (ncol != M.ncol || nrow != M.nrow) > - { > - fprintf(stderr, > - "mematrix+: matrices not equal in size (%d,%d) and (%d,%d)", > - nrow, ncol, M.nrow, M.ncol); > - exit(1); > - } > - mematrix
temp(nrow, ncol); > - for (int i = 0; i < nelements; i++) > - temp.data[i] = data[i] + M.data[i]; > - return temp; > -} > - > -template > -mematrix
mematrix
::operator-(DT toadd) > -{ > - mematrix
temp(nrow, ncol); > - for (int i = 0; i < nelements; i++) > - temp.data[i] = data[i] - toadd; > - return temp; > -} > - > -template > -mematrix
mematrix
::operator-(mematrix
&M) > -{ > - if (ncol != M.ncol || nrow != M.nrow) > - { > - fprintf(stderr, > - "mematrix-: matrices not equal in size (%d,%d) and (%d,%d)", > - nrow, ncol, M.nrow, M.ncol); > - exit(1); > - } > - mematrix
temp(nrow, ncol); > - for (int i = 0; i < nelements; i++) > - temp.data[i] = data[i] - M.data[i]; > - return temp; > -} > - > -template > -mematrix
mematrix
::operator*(DT toadd) > -{ > - // A che naschet std::string vmesto DT? Maksim. > - mematrix
temp(nrow, ncol); > - for (int i = 0; i < nelements; i++) > - temp.data[i] = data[i] * toadd; > - return temp; > -} > - > -template > -mematrix
mematrix
::operator*(mematrix
&M) > -{ > - if (ncol != M.nrow) > - { > - fprintf(stderr, "mematrix*: ncol != nrow (%d,%d) and (%d,%d)", nrow, > - ncol, M.nrow, M.ncol); > - exit(1); > - } > - mematrix
temp(nrow, M.ncol); > - for (int j = 0; j < temp.nrow; j++) > - { > - for (int i = 0; i < temp.ncol; i++) > - { > - DT sum = 0; > - for (int j1 = 0; j1 < ncol; j1++) > - sum += data[j * ncol + j1] * M.data[j1 * M.ncol + i]; > - temp[j * temp.ncol + i] = sum; > - } > - } > - return temp; > -} > - > -template > -mematrix
mematrix
::operator*(mematrix
*M) > -{ > - if (ncol != M->nrow) > - { > - fprintf(stderr, "mematrix*: ncol != nrow (%d,%d) and (%d,%d)", nrow, > - ncol, M->nrow, M->ncol); > - exit(1); > - } > - mematrix
temp(nrow, M->ncol); > - for (int j = 0; j < temp.nrow; j++) > - { > - for (int i = 0; i < temp.ncol; i++) > - { > - DT sum = 0; > - for (int j1 = 0; j1 < ncol; j1++) > - sum += data[j * ncol + j1] * M->data[j1 * M->ncol + i]; > - temp[j * temp.ncol + i] = sum; > - } > - } > - return temp; > -} > - > -// > -// operations > -// > -template > -void mematrix
::reinit(int nr, int nc) > -{ > - if (nelements > 0) > - delete[] data; > - if (nr <= 0) > - { > - fprintf(stderr, "mematrix(): number of rows smaller then 1\n"); > - exit(1); > - } > - if (nc <= 0) > - { > - fprintf(stderr, "mematrix(): number of columns smaller then 1\n"); > - exit(1); > - } > - nrow = nr; > - ncol = nc; > - nelements = nr * nc; > - data = new (nothrow) DT[ncol * nrow]; > - if (!data) > - { > - fprintf(stderr, "mematrix(nr,nc): cannot allocate memory (%d,%d)\n", > - nrow, ncol); > - exit(1); > - } > -} > - > -template > -DT mematrix
::get(int nr, int nc) > -{ > - if (nc < 0 || nc > ncol -1) > - { > - std::cerr << "mematrix::get: column out of range: " << nc + 1 > - << " not between (1," << ncol << ")\n" << std::flush; > - exit(1); > - } > - if (nr < 0 || nr > nrow -1) > - { > - std::cerr << "mematrix::get: row out of range: " << nr + 1 > - << " not between (1," << nrow << ")\n" << std::flush; > - exit(1); > - } > - DT temp = data[nr * ncol + nc]; > - return temp; > -} > - > -template > -void mematrix
::put(DT value, int nr, int nc) > -{ > - if (nc < 0 || nc > ncol -1) > - { > - std::cerr << "mematrix::put: column out of range: " << nc + 1 > - << " not between (1," << ncol << ")\n" << std::flush; > - exit(1); > - } > - if (nr < 0 || nr > nrow -1) > - { > - std::cerr << "mematrix::put: row out of range: " << nr + 1 > - << " not between (1," << nrow << ")\n" << std::flush; > - exit(1); > - } > - data[nr * ncol + nc] = value; > -} > - > -template > -DT mematrix
::column_mean(int nc) > -{ > - if (nc >= ncol || nc < 0) > - { > - fprintf(stderr, "colmM bad column\n"); > - exit(1); > - } > - DT out = 0.0; > - for (int i = 0; i < nrow; i++) > - out += DT(data[i * ncol + nc]); > - out /= DT(nrow); > - return out; > -} > - > -template > -void mematrix
::print(void) > -{ > - cout << "nrow=" << nrow << "; ncol=" << ncol << "; nelements=" << nelements > - << "\n"; > - for (int i = 0; i < nrow; i++) > - { > - cout << "nr=" << i << ":\t"; > - for (int j = 0; j < ncol; j++) > - { > - printf("%e\t", data[i * ncol + j]); > - } > - cout << "\n"; > - } > -} > - > -template > -void mematrix
::delete_column(const int delcol) > -{ > - if (delcol > ncol || delcol < 0) > - { > - fprintf(stderr, "mematrix::delete_column: column out of range\n"); > - exit(1); > - } > - mematrix
temp = *this; > - if (nelements > 0) > - delete[] data; > - ncol--; > - nelements = ncol * nrow; > - data = new (nothrow) DT[ncol * nrow]; > - if (!data) > - { > - fprintf(stderr, > - "mematrix::delete_column: cannot allocate memory (%d,%d)\n", > - nrow, ncol); > - delete[] data; > - exit(1); > - } > - int newcol = 0; > - for (int nr = 0; nr < temp.nrow; nr++) > - { > - newcol = 0; > - for (int nc = 0; nc < temp.ncol; nc++) > - if (nc != delcol) > - data[nr * ncol + (newcol++)] = temp[nr * temp.ncol + nc]; > - } > -} > - > -template > -void mematrix
::delete_row(const int delrow) > -{ > - if (delrow > nrow || delrow < 0) > - { > - fprintf(stderr, "mematrix::delete_row: row out of range\n"); > - exit(1); > - } > - mematrix
temp = *this; > - if (nelements > 0) > - delete[] data; > - nrow--; > - nelements = ncol * nrow; > - data = new (nothrow) DT[ncol * nrow]; > - if (!data) > - { > - fprintf(stderr, > - "mematrix::delete_row: cannot allocate memory (%d,%d)\n", nrow, > - ncol); > - delete[] data; > - exit(1); > - } > - int newrow = 0; > - for (int nc = 0; nc < temp.ncol; nc++) > - { > - newrow = 0; > - for (int nr = 0; nr < temp.nrow; nr++) > - if (nr != delrow) > - data[nr * ncol + (newrow++)] = temp[nr * temp.ncol + nc]; > - } > -} > - > -// > -// other functions > -// > -template > -mematrix
column_sum(mematrix
&M) > -{ > - mematrix
out; > - out.reinit(1, M.ncol); > - for (int j = 0; j < M.ncol; j++) > - { > - DT sum = 0; > - for (int i = 0; i < M.nrow; i++) > - sum = sum + DT(M.data[i * M.ncol + j]); > - out.put(sum, 0, j); > - } > - return out; > -} > - > -template > -mematrix
column_mean(mematrix
&M) > -{ > - mematrix
out; > - out.reinit(1, M.ncol); > - for (int j = 0; j < M.ncol; j++) > - { > - DT sum = 0; > - for (int i = 0; i < M.nrow; i++) > - sum = sum + DT(M.data[i * M.ncol + j]); > - sum /= DT(M.nrow); > - out.put(sum, 0, j); > - } > - return out; > -} > - > -template > -mematrix
transpose(mematrix
&M) > -{ > - mematrix
temp(M.ncol, M.nrow); > - for (int i = 0; i < temp.nrow; i++) > - for (int j = 0; j < temp.ncol; j++) > - temp.data[i * temp.ncol + j] = M.data[j * M.ncol + i]; > - return temp; > -} > - > -template > -mematrix
reorder(mematrix
&M, mematrix order) > -{ > - if (M.nrow != order.nrow) > - { > - std::cerr << "reorder: M & order have different # of rows\n"; > - exit(1); > - } > - mematrix
temp(M.nrow, M.ncol); > - for (int i = 0; i < temp.nrow; i++) > - for (int j = 0; j < temp.ncol; j++) > - temp.data[order[i] * temp.ncol + j] = M.data[i * M.ncol + j]; > - return temp; > -} > - > -template > -mematrix
productMatrDiag(mematrix
&M, mematrix
&D) > -{ > - //multiply all rows of M by value of first row of D > - if (M.ncol != D.nrow) > - { > - fprintf(stderr, "productMatrDiag: wrong dimenstions"); > - exit(1); > - } > - mematrix
temp(M.nrow, M.ncol); > - for (int i = 0; i < temp.nrow; i++){ > - for (int j = 0; j < temp.ncol; j++){ > - temp.data[i * temp.ncol + j] = M.data[i * M.ncol + j] * D.data[j]; > - // temp.put(M.get(i,j)*D.get(j,0),i,j); > - } > - } > - return temp; > -} > - > -template > -mematrix todouble(mematrix
&M) > -{ > - mematrix temp(M.nrow, M.ncol); > - for (int i = 0; i < temp.nelements; i++) > - temp.data[i] = double(M.data[i]); > - return temp; > -} > - > -template > -mematrix
productXbySymM(mematrix
&X, mematrix
&M) > -{ > - if (M.ncol < 1 || M.nrow < 1 || X.ncol < 1 || X.nrow < 1) > - { > - fprintf(stderr, > - "productXbySymM: M.ncol<1 || M.nrow<1 || X.ncol<1 || X.nrow < 1\n"); > - exit(1); > - } > - if (M.ncol != M.nrow) > - { > - fprintf(stderr, "productXbySymM: M.ncol != M.nrow\n"); > - exit(1); > - } > - if (M.ncol != X.ncol) > - { > - fprintf(stderr, "productXbySymM: M.ncol != X.ncol\n"); > - exit(1); > - } > - if (M.ncol != X.ncol) > - { > - fprintf(stderr, "productXbySymM: M.ncol != X.ncol\n"); > - exit(1); > - } > - > - mematrix
out(X.nrow, X.ncol); > - int i, j, k; > - > - double temp1, temp2, value1, value2; // not good should be of
! > - for (k = 0; k < X.nrow; k++) > - { > - temp1 = 0.; > - for (i = 0; i < X.ncol; i++) > - { > - temp1 = X.get(k, i); > - temp2 = 0.; > - for (j = (i + 1); j < X.ncol; j++) > - { > - value1 = out.get(k, j) + temp1 * M.get(i, j); > - out.put(value1, k, j); > - temp2 += M.get(i, j) * X.get(k, j); > - } > - value2 = out.get(k, i) + temp2 + M.get(i, i) * X.get(k, i); > - out.put(value2, k, i); > - } > - } > - > - return out; > -} > - > -// written by Mike Dinolfo 12/98 > -// modified Yurii Aulchenko 2008-04-22 > -template > -mematrix
invert(mematrix
&M) > -{ > - if (M.ncol != M.nrow) > - { > - fprintf(stderr, "invert: only square matrices possible\n"); > - exit(1); > - } > - if (M.ncol == 1) > - { > - mematrix
temp(1, 1); > - temp[0] = 1. / M[0]; > - } > - /* > - for (int i=0;i - if (M.data[i*M.ncol+i]==0) > - { > - fprintf(stderr,"invert: zero elements in diagonal\n"); > - mematrix
temp = M; > - for (int i = 0; i < M.ncol; i++) > - for (int j = 0; j < M.ncol; j++) > - temp.put(NAN,i,j); > - return temp; > - //exit(1); > - } > - */ > - int actualsize = M.ncol; > - int maxsize = M.ncol; > - mematrix
temp = M; > - for (int i = 1; i < actualsize; i++) > - temp.data[i] /= temp.data[0]; // normalize row 0 > - for (int i = 1; i < actualsize; i++) > - { > - for (int j = i; j < actualsize; j++) > - { // do a column of L > - DT sum = 0.0; > - for (int k = 0; k < i; k++) > - sum += temp.data[j * maxsize + k] * temp.data[k * maxsize + i]; > - temp.data[j * maxsize + i] -= sum; > - } > - if (i == actualsize - 1) > - continue; > - for (int j = i + 1; j < actualsize; j++) > - { // do a row of U > - DT sum = 0.0; > - for (int k = 0; k < i; k++) > - sum += temp.data[i * maxsize + k] * temp.data[k * maxsize + j]; > - temp.data[i * maxsize + j] = (temp.data[i * maxsize + j] - sum) > - / temp.data[i * maxsize + i]; > - } > - } > - for (int i = 0; i < actualsize; i++) // invert L > - for (int j = i; j < actualsize; j++) > - { > - DT x = 1.0; > - if (i != j) > - { > - x = 0.0; > - for (int k = i; k < j; k++) > - x -= temp.data[j * maxsize + k] > - * temp.data[k * maxsize + i]; > - } > - temp.data[j * maxsize + i] = x / temp.data[j * maxsize + j]; > - } > - for (int i = 0; i < actualsize; i++) // invert U > - for (int j = i; j < actualsize; j++) > - { > - if (i == j) > - continue; > - DT sum = 0.0; > - for (int k = i; k < j; k++) > - sum += temp.data[k * maxsize + j] > - * ((i == k) ? 1.0 : temp.data[i * maxsize + k]); > - temp.data[i * maxsize + j] = -sum; > - } > - for (int i = 0; i < actualsize; i++) // final inversion > - for (int j = 0; j < actualsize; j++) > - { > - DT sum = 0.0; > - for (int k = ((i > j) ? i : j); k < actualsize; k++) > - sum += ((j == k) ? 1.0 : temp.data[j * maxsize + k]) > - * temp.data[k * maxsize + i]; > - temp.data[j * maxsize + i] = sum; > - } > - return temp; > -} > - > -//_________Maksim____________ > -template > -DT var(mematrix
&M) > -{ > - DT sum = 0; > - for (int i = 0; i < M.nelements; i++) > - { > - sum += M.data[i]; > - } > - DT mean = sum / M.nelements; > - > - DT sum2 = 0; > - for (int i = 0; i < M.nelements; i++) > - { > - sum2 += pow(M.data[i] - mean, 2); > - } > - > - return sum2 / (M.nelements - 1); > -} > -//_________Maksim____________ > -#endif /* MEMATRI1_H */ > > Deleted: pkg/ProbABEL/src/mematrix.h > =================================================================== > --- pkg/ProbABEL/src/mematrix.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/mematrix.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -1,82 +0,0 @@ > -/* > - * > - * Copyright (C) 2009--2014 Various members of the GenABEL team. See > - * the SVN commit logs for more details. > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public License > - * as published by the Free Software Foundation; either version 2 > - * of the License, or (at your option) any later version. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - * > - * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, > - * MA 02110-1301, USA. > - * > - */ > - > - > -#ifndef __MEMATRIX_H__ > -#define __MEMATRIX_H__ > -#include > -using namespace std; > - > -template class mematrix > -{ > - public: > - int nrow; > - int ncol; > - int nelements; > - DT * data; > - > - mematrix() > - { > - nrow = ncol = nelements = 0; > - data = NULL; > - } > - mematrix(int nr, int nc); > - mematrix(const mematrix &M); > - ~mematrix() > - { > - if (nelements > 0) > - delete[] data; > - } > - > - mematrix & operator=(const mematrix &M); > - DT & operator[](int i); > - mematrix operator+(DT toadd); > - mematrix operator+(mematrix &M); > - mematrix operator-(DT toadd); > - mematrix operator-(mematrix &M); > - mematrix operator*(DT toadd); > - mematrix operator*(mematrix &M); > - mematrix operator*(mematrix *M); > - > - void reinit(int nr, int nc); > - > - unsigned int getnrow(void) > - { > - return nrow; > - } > - unsigned int getncol(void) > - { > - return ncol; > - } > - DT get(int nr, int nc); > - void put(DT value, int nr, int nc); > - DT column_mean(int nc); > - void print(void); > - void delete_column(const int delcol); > - void delete_row(const int delrow); > - > -}; > - > -// mematrix transpose(mematrix M); > -// mematrix invert(mematrix M); > - > -#endif > > Modified: pkg/ProbABEL/src/phedata.h > =================================================================== > --- pkg/ProbABEL/src/phedata.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/phedata.h 2014-04-23 18:54:26 UTC (rev 1695) > @@ -29,13 +29,8 @@ > #ifndef PHEDATA_H_ > #define PHEDATA_H_ > > -#if EIGEN > #include "eigen_mematrix.h" > #include "eigen_mematrix.cpp" > -#else > -#include "mematrix.h" > -#include "mematri1.h" > -#endif > > class phedata { > public: > > Modified: pkg/ProbABEL/src/reg1.cpp > =================================================================== > --- pkg/ProbABEL/src/reg1.cpp 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/reg1.cpp 2014-04-23 18:54:26 UTC (rev 1695) > @@ -310,7 +310,6 @@ > chi2_score = chi2[0]; > } > > - > void linear_reg::mmscore_regression(const mematrix& X, > const masked_matrix& W_masked, LDLT& Ch) { > VectorXd Y = reg_data.Y.data.col(0); > @@ -329,7 +328,6 @@ > beta.data = beta_vec; > } > > - > void linear_reg::logLikelihood(const mematrix& X) { > /* > loglik = 0.; > @@ -348,10 +346,8 @@ > //cout << endl; > loglik = 0.; > double halfrecsig2 = .5 / sigma2; > -#if EIGEN > //loglik -= halfrecsig2 * residuals[i] * residuals[i]; > > - > double intercept = beta.get(0, 0); > residuals.data = reg_data.Y.data.array() - intercept; > //matrix. > @@ -364,17 +360,7 @@ > //residuals[i] -= resid_sub; > loglik -= (residuals.data.array().square() * halfrecsig2).sum(); > loglik -= static_cast(reg_data.nids) * log(sqrt(sigma2)); > -#else > - for (int i = 0; i < reg_data.nids; i++) > - { > - double resid = reg_data.Y[i] - beta.get(0, 0); // intercept > - for (int j = 1; j < beta.nrow; j++){ > - resid -= beta.get(j, 0) * X.get(i, j); > - } > - residuals[i] = resid; > - loglik -= halfrecsig2 * resid * resid; > - } > -#endif > + > } > > > @@ -423,12 +409,8 @@ > > double sigma2_internal; > > -#if EIGEN > > LDLT Ch; > -#else > - mematrix tXX_i; > -#endif > if (invvarmatrixin.length_of_mask != 0) > { > //retrieve masked data W > @@ -440,26 +422,7 @@ > //flops=mp(2n-1) (when n is big enough flops=mpn2) > //Oct 26, 2009 > > -#if EIGEN > mmscore_regression(X, invvarmatrixin, Ch); > -#else > - // next line is 5997000 flops > - mematrix tXW = transpose(X) * invvarmatrixin.masked_data; > - tXX_i = tXW * X; // 17991 flops > - // use cholesky to invert > - cholesky2_mm(tXX_i, tol_chol); > - chinv2_mm(tXX_i); > - beta = tXX_i * (tXW * reg_data.Y); // flops 15+5997 > - // now compute residual variance > - sigma2 = 0.; > - //next line is: 1000+5000+= 6000 flops > - mematrix sigma2_matrix = reg_data.Y - (transpose(tXW) * beta); > - for (int i = 0; i < sigma2_matrix.nrow; i++) > - { > - double val = sigma2_matrix.get(i, 0); > - sigma2 += val * val; // flops: 3000 (iterations counted) > - } > -#endif > double N = X.nrow; > //sigma2_internal = sigma2 / (N - static_cast(length_beta)); > // Ugly fix to the fact that if we do mmscore, sigma2 is already > @@ -470,7 +433,7 @@ > } > else // NO mm-score regression : normal least square regression > { > -#if EIGEN > + > int m = X.ncol; > MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView().\ > rankUpdate(X.data.adjoint()); > @@ -478,23 +441,7 @@ > beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data); > sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm(); > > -#else > - mematrix tX = transpose(X); > - // use cholesky to invert > - tXX_i = tX * X; > - cholesky2_mm(tXX_i, tol_chol); > - chinv2_mm(tXX_i); > - beta = tXX_i * (tX * (reg_data.Y)); > > - // now compute residual variance > - sigma2 = 0.; > - mematrix sigma2_matrix = reg_data.Y - (X * beta); > - for (int i = 0; i < sigma2_matrix.nrow; i++) > - { > - double val = sigma2_matrix.get(i, 0); > - sigma2 += val * val; > - } > -#endif > double N = static_cast(X.nrow); > double P = static_cast(length_beta); > sigma2_internal = sigma2 / (N - P); > @@ -517,38 +464,19 @@ > //cout << endl; > logLikelihood(X); > > -#if EIGEN > MatrixXd tXX_inv = Ch.solve(MatrixXd(length_beta, length_beta). > Identity(length_beta, length_beta)); > -#endif > > mematrix robust_sigma2(X.ncol, X.ncol); > if (robust) > { > -#if EIGEN > MatrixXd Xresiduals = X.data.array().colwise()\ > *residuals.data.col(0).array(); > MatrixXd XbyR = MatrixXd(X.ncol, X.ncol).setZero()\ > .selfadjointView().rankUpdate(Xresiduals.adjoint()); > robust_sigma2.data = tXX_inv * XbyR * tXX_inv; > -#else > - > - mematrix XbyR = X; > - for (int i = 0; i < X.nrow; i++){ > - for (int j = 0; j < X.ncol; j++) > - { > - double tmpval = XbyR.get(i, j) * residuals[i]; > - XbyR.put(tmpval, i, j); > - } > - } > - XbyR = transpose(XbyR) * XbyR; > - robust_sigma2 = tXX_i * XbyR; > - robust_sigma2 = robust_sigma2 * tXX_i; > - > -#endif > } > //cout << "estimate 0\n"; > -#if EIGEN > if (robust) > { > sebeta.data = robust_sigma2.data.diagonal().array().sqrt(); > @@ -578,63 +506,6 @@ > offset).diagonal().array(); > } > > -#else > - > - //cout << "estimate 0\n"; > - for (int i = 0; i < (length_beta); i++) > - { > - if (robust) > - { > - // cout << "estimate :robust\n"; > - double value = sqrt(robust_sigma2.get(i, i)); > - sebeta.put(value, i, 0); > - //Han Chen > - if (i > 0) > - { > - if (model == 0 && interaction != 0 && ngpreds == 2 > - && length_beta > 2) > - { > - if (i > 1) > - { > - double covval = robust_sigma2.get(i, i - 2); > - covariance.put(covval, i - 2, 0); > - } > - } > - else > - { > - double covval = robust_sigma2.get(i, i - 1); > - covariance.put(covval, i - 1, 0); > - } > - } > - //Oct 26, 2009 > - } > - else > - { > - // cout << "estimate :non-robust\n"; > - double value = sqrt(sigma2_internal * tXX_i.get(i, i)); > - sebeta.put(value, i, 0); > - //Han Chen > - if (i > 0) > - { > - if (model == 0 && interaction != 0 && ngpreds == 2 > - && length_beta > 2) > - { > - if (i > 1) > - { > - double covval = sigma2_internal * tXX_i.get(i, i - 2); > - covariance.put(covval, i - 2, 0); > - } > - } > - else > - { > - double covval = sigma2_internal * tXX_i.get(i, i - 1); > - covariance.put(covval, i - 1, 0); > - } > - } > - //Oct 26, 2009 > - } > - } > -#endif > } > > > > Modified: pkg/ProbABEL/src/regdata.h > =================================================================== > --- pkg/ProbABEL/src/regdata.h 2014-04-23 09:52:41 UTC (rev 1694) > +++ pkg/ProbABEL/src/regdata.h 2014-04-23 18:54:26 UTC (rev 1695) > [TRUNCATED] > > To get the complete diff run: > svnlook diff /svnroot/genabel -r 1695 > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From Jurica.Stanojkovic at rt-rk.com Thu Apr 24 15:52:42 2014 From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic) Date: Thu, 24 Apr 2014 15:52:42 +0200 Subject: [GenABEL-dev] probabel big endian support Message-ID: <896-53591700-f-3be4eec0@227853676> Dear list, I have tried building package probabel on mips big endian. It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on little endian machine and are not working on big endian ones. I have tried to create them on big endian mips, and replace ones that came with source package with the ones that I have created. The package was built with new files without an error. I used following command to create files: library(GenABEL) library(DatABEL) fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.dose") fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", mlinfo="./checks/inputfiles/test.mlinfo", outfile="./checks/inputfiles/test.prob", isprob=TRUE) mmdose <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.dose") mmprob <- mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) I am new to ProbABEL, GenABEL, DatABEL so could someone please help me with following questions: What is the best course of action for supporting probabel on big endian? Should *.fvi, *.fvd files allways be in little endian format (than DatABEL needs to be changed to always create little endian files)? Or can *.fvd, *.fvi files be replaced with big endian files for big endian build? Is it necessary to be able to use *.fvd *.fvi files created on a different endian system? I am willing to work on adding big endian support and I will appreciate any help in determining the right course of action in resolving this problem. Regards, Jurica -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Sat Apr 26 22:17:38 2014 From: lennart at karssen.org (L.C. Karssen) Date: Sat, 26 Apr 2014 22:17:38 +0200 Subject: [GenABEL-dev] probabel big endian support In-Reply-To: <896-53591700-f-3be4eec0@227853676> References: <896-53591700-f-3be4eec0@227853676> Message-ID: <535C1462.9090502@karssen.org> Dear Jurica, On 24-04-14 15:52, Jurica Stanojkovic wrote: > Dear list, > > I have tried building package probabel on mips big endian. That is great to hear! As far as I know, none of the current developers have access to such a machine. > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on > little endian machine and are not working on big endian ones. That is correct, we found out > > I have tried to create them on big endian mips, and replace ones that > came with source package with the ones that I have created. > The package was built with new files without an error. That is good news. So GenABEL and DatABEL work on big-endian machines. > > I used following command to create files: > library(GenABEL) > library(DatABEL) > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.dose") > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.prob", isprob=TRUE) > mmdose <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.dose") > mmprob <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) > > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me > with following questions: > > What is the best course of action for supporting probabel on big endian? > Should *.fvi, *.fvd files allways be in little endian format (than > DatABEL needs to be changed to always create little endian files)? > Or can *.fvd, *.fvi files be replaced with big endian files for big > endian build? I would say that ideally the files need only to be created once and then usable on all systems. Especially since these files are usually large and converting from text format to .fvi/.fvd takes quite a while. This, however, would require diving into the filevector and the DatABEL code (filevector or libfilevector is the name of the 'backend' code in which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use that code when dealing with .fvi/.fvd files). I don't have very much experience with either code base, but could probably have a look and give you some pointers. > > Is it necessary to be able to use *.fvd *.fvi files created on a > different endian system? On the other hand, how often will people transfer these files to machines of different architectures? Jurica, can you tell us a bit more about why you are using a MIPS machine for your work with ProbABEL? And do you think it would be a common task to move these files between machines with different architectures at your site? Maybe a converter from big to little and vice versa would be the easiest solution? I guess such a conversion can be done rather quick. The downside would be that it (at least temporarily) requires double the disk space. Such a converter could be part of the fvutils and/or of DatABEL, for example. > > I am willing to work on adding big endian support and I will appreciate > any help in determining the right course of action in resolving this > problem. Thank you for your time and willingness to help! It is very much appreciated. We're a small group of developers, but we'll try to help as much as we can. Best, Lennart. > > Regards, > Jurica > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From alvaro.frank at rwth-aachen.de Sun Apr 27 05:29:41 2014 From: alvaro.frank at rwth-aachen.de (Frank, Alvaro Jesus) Date: Sun, 27 Apr 2014 03:29:41 +0000 Subject: [GenABEL-dev] probabel big endian support In-Reply-To: <535C1462.9090502@karssen.org> References: <896-53591700-f-3be4eec0@227853676>, <535C1462.9090502@karssen.org> Message-ID: <244CF001646FF74FB34F372310A332C57AFBF2@MBX2.rwth-ad.de> Hi all, would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html Just a remark. -Alvaro ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] Sent: Saturday, April 26, 2014 10:17 PM To: genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] probabel big endian support Dear Jurica, On 24-04-14 15:52, Jurica Stanojkovic wrote: > Dear list, > > I have tried building package probabel on mips big endian. That is great to hear! As far as I know, none of the current developers have access to such a machine. > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on > little endian machine and are not working on big endian ones. That is correct, we found out > > I have tried to create them on big endian mips, and replace ones that > came with source package with the ones that I have created. > The package was built with new files without an error. That is good news. So GenABEL and DatABEL work on big-endian machines. > > I used following command to create files: > library(GenABEL) > library(DatABEL) > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.dose") > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.prob", isprob=TRUE) > mmdose <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.dose") > mmprob <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) > > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me > with following questions: > > What is the best course of action for supporting probabel on big endian? > Should *.fvi, *.fvd files allways be in little endian format (than > DatABEL needs to be changed to always create little endian files)? > Or can *.fvd, *.fvi files be replaced with big endian files for big > endian build? I would say that ideally the files need only to be created once and then usable on all systems. Especially since these files are usually large and converting from text format to .fvi/.fvd takes quite a while. This, however, would require diving into the filevector and the DatABEL code (filevector or libfilevector is the name of the 'backend' code in which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use that code when dealing with .fvi/.fvd files). I don't have very much experience with either code base, but could probably have a look and give you some pointers. > > Is it necessary to be able to use *.fvd *.fvi files created on a > different endian system? On the other hand, how often will people transfer these files to machines of different architectures? Jurica, can you tell us a bit more about why you are using a MIPS machine for your work with ProbABEL? And do you think it would be a common task to move these files between machines with different architectures at your site? Maybe a converter from big to little and vice versa would be the easiest solution? I guess such a conversion can be done rather quick. The downside would be that it (at least temporarily) requires double the disk space. Such a converter could be part of the fvutils and/or of DatABEL, for example. > > I am willing to work on adding big endian support and I will appreciate > any help in determining the right course of action in resolving this > problem. Thank you for your time and willingness to help! It is very much appreciated. We're a small group of developers, but we'll try to help as much as we can. Best, Lennart. > > Regards, > Jurica > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From lennart at karssen.org Sun Apr 27 21:48:07 2014 From: lennart at karssen.org (L.C. Karssen) Date: Sun, 27 Apr 2014 21:48:07 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1698 - pkg/ProbABEL/src In-Reply-To: <20140424185052.45B8018749C@r-forge.r-project.org> References: <20140424185052.45B8018749C@r-forge.r-project.org> Message-ID: <535D5EF7.4080100@karssen.org> Thanks for splitting this into separate functions, Maarten. Could you try to add some basic Doxygen documentation for these functions? That would be a great help (even though the names of the functions are already explaining a lot) towards getting a well-documented code base for ProbABEL. Lennart. On 24-04-14 20:50, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-24 20:50:51 +0200 (Thu, 24 Apr 2014) > New Revision: 1698 > > Modified: > pkg/ProbABEL/src/reg1.cpp > pkg/ProbABEL/src/reg1.h > Log: > -refactored linear_reg::estimate a bit to make it more readable. > > Modified: pkg/ProbABEL/src/reg1.cpp > =================================================================== > --- pkg/ProbABEL/src/reg1.cpp 2014-04-24 16:50:53 UTC (rev 1697) > +++ pkg/ProbABEL/src/reg1.cpp 2014-04-24 18:50:51 UTC (rev 1698) > @@ -327,6 +327,14 @@ > sigma2 = (Y - tXW * beta_vec).squaredNorm(); > beta.data = beta_vec; > } > +void linear_reg::LeastSquaredRegression(mematrix X,LDLT& Ch) { > + int m = X.ncol; > + MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView().rankUpdate( > + X.data.adjoint()); > + Ch = LDLT < MatrixXd > (txx.selfadjointView()); > + beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data); > + sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm(); > +} > > void linear_reg::logLikelihood(const mematrix& X) { > /* > @@ -364,6 +372,27 @@ > } > > > + > +void linear_reg::RobustSEandCovariance(mematrix X, mematrix robust_sigma2, > + MatrixXd tXX_inv, int offset) { > + MatrixXd Xresiduals = X.data.array().colwise() > + * residuals.data.col(0).array(); > + MatrixXd XbyR = > + MatrixXd(X.ncol, X.ncol).setZero().selfadjointView().rankUpdate( > + Xresiduals.adjoint()); > + robust_sigma2.data = tXX_inv * XbyR * tXX_inv; > + sebeta.data = robust_sigma2.data.diagonal().array().sqrt(); > + covariance.data = > + robust_sigma2.data.bottomLeftCorner(offset, offset).diagonal(); > +} > + > +void linear_reg::PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv, > + int offset) { > + sebeta.data = (sigma2_internal * tXX_inv.diagonal().array()).sqrt(); > + covariance.data = sigma2_internal > + * tXX_inv.bottomLeftCorner(offset, offset).diagonal().array(); > +} > + > void linear_reg::estimate(int verbose, double tol_chol, > int model, int interaction, int ngpreds, masked_matrix& invvarmatrixin, > int robust, int nullmodel) { > @@ -415,13 +444,6 @@ > { > //retrieve masked data W > invvarmatrixin.update_mask(reg_data.masked_data); > - > - // This regression is Weighted Least Square: used for mmscore : > - // FLOPS count are calculated for 3*1000 matrix as follow: > - //C=AB (m X n matrix A and n x P matrix B) > - //flops=mp(2n-1) (when n is big enough flops=mpn2) > - //Oct 26, 2009 > - > mmscore_regression(X, invvarmatrixin, Ch); > double N = X.nrow; > //sigma2_internal = sigma2 / (N - static_cast(length_beta)); > @@ -434,14 +456,7 @@ > else // NO mm-score regression : normal least square regression > { > > - int m = X.ncol; > - MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView().\ > - rankUpdate(X.data.adjoint()); > - Ch = LDLT (txx.selfadjointView()); > - beta.data = Ch.solve(X.data.adjoint() * reg_data.Y.data); > - sigma2 = (reg_data.Y.data - (X.data * beta.data)).squaredNorm(); > - > - > + LeastSquaredRegression(X,Ch); > double N = static_cast(X.nrow); > double P = static_cast(length_beta); > sigma2_internal = sigma2 / (N - P); > @@ -468,43 +483,22 @@ > Identity(length_beta, length_beta)); > > mematrix robust_sigma2(X.ncol, X.ncol); > - if (robust) > - { > - MatrixXd Xresiduals = X.data.array().colwise()\ > - *residuals.data.col(0).array(); > - MatrixXd XbyR = MatrixXd(X.ncol, X.ncol).setZero()\ > - .selfadjointView().rankUpdate(Xresiduals.adjoint()); > - robust_sigma2.data = tXX_inv * XbyR * tXX_inv; > - } > - //cout << "estimate 0\n"; > - if (robust) > - { > - sebeta.data = robust_sigma2.data.diagonal().array().sqrt(); > - } > - else > - { > - sebeta.data = > - (sigma2_internal > - * tXX_inv.diagonal().array()).sqrt(); > - } > - int offset = X.ncol- 1; > - //if additive and interaction and 2 predictors and more then 2 betas > > - if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){ > - offset = X.ncol - 2; > - } > > + int offset = X.ncol- 1; > + //if additive and interaction and 2 predictors and more then 2 betas > + if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){ > + offset = X.ncol - 2; > + } > + > if (robust) > { > - covariance.data = robust_sigma2.data.bottomLeftCorner( > - offset, offset).diagonal(); > + RobustSEandCovariance(X, robust_sigma2, tXX_inv, offset); > } > else > { > - covariance.data = sigma2_internal > - * tXX_inv.bottomLeftCorner(offset, > - offset).diagonal().array(); > - } > + PlainSEandCovariance(sigma2_internal, tXX_inv, offset); > + } > > } > > > Modified: pkg/ProbABEL/src/reg1.h > =================================================================== > --- pkg/ProbABEL/src/reg1.h 2014-04-24 16:50:53 UTC (rev 1697) > +++ pkg/ProbABEL/src/reg1.h 2014-04-24 18:50:51 UTC (rev 1698) > @@ -104,6 +104,11 @@ > void mmscore_regression(const mematrix& X, > const masked_matrix& W_masked, LDLT& Ch); > void logLikelihood(const mematrix& X); > + void LeastSquaredRegression(mematrix X,LDLT& Ch); > + void RobustSEandCovariance(mematrix X, mematrix robust_sigma2, > + MatrixXd tXX_inv, int offset); > + void PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv, > + int offset); > }; > > class logistic_reg: public base_reg { > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Sun Apr 27 22:30:31 2014 From: lennart at karssen.org (L.C. Karssen) Date: Sun, 27 Apr 2014 22:30:31 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1700 - pkg/ProbABEL/src In-Reply-To: <20140427090149.0A4651874B2@r-forge.r-project.org> References: <20140427090149.0A4651874B2@r-forge.r-project.org> Message-ID: <535D68E7.5000509@karssen.org> Hi Maarten, More clean ups. Great! Some comments below. On 27-04-14 11:01, noreply at r-forge.r-project.org wrote: > Author: maartenk > Date: 2014-04-27 11:01:42 +0200 (Sun, 27 Apr 2014) > New Revision: 1700 > > Removed: > pkg/ProbABEL/src/cholesky.cpp > pkg/ProbABEL/src/cholesky.h > Modified: > pkg/ProbABEL/src/Makefile.am > pkg/ProbABEL/src/fvlib > pkg/ProbABEL/src/reg1.cpp > pkg/ProbABEL/src/reg1.h > Log: > -removed dependency of reg1.* on cholesky.* since this is now done with EIGEN (remove about 150 lines of code from our codebase) > -removed cholesky.h and cholesky.cpp > -added some consts to functions in reg1.* > -removed some whitespace in reg1.cpp Happy with the consts! > > Modified: pkg/ProbABEL/src/fvlib > =================================================================== > --- pkg/ProbABEL/src/fvlib 2014-04-25 06:26:38 UTC (rev 1699) > +++ pkg/ProbABEL/src/fvlib 2014-04-27 09:01:42 UTC (rev 1700) > @@ -1 +1 @@ > -link ../../../tags/filevector/v.1.0.0/fvlib > \ No newline at end of file > +link include/filevector/fvlib > \ No newline at end of file This is strange. You seem to have replace the symlink for fvlib to a place that doesn't exist in the SVN tree. Probably a local thing. Can you revert this and point the symlink back to the v.1.0.0 tag of fvlib? > > Modified: pkg/ProbABEL/src/reg1.cpp > =================================================================== > --- pkg/ProbABEL/src/reg1.cpp 2014-04-25 06:26:38 UTC (rev 1699) > +++ pkg/ProbABEL/src/reg1.cpp 2014-04-27 09:01:42 UTC (rev 1700) > @@ -275,10 +275,12 @@ > reg_data.is_interaction_excluded, false, nullmodel); > beta.reinit(X.ncol, 1); > sebeta.reinit(X.ncol, 1); > + int length_beta=X.ncol; Could you please add spaces around the = sign, according to the coding standards? > double N = static_cast(resid.nrow); > mematrix tX = transpose(X); > - if (invvarmatrix.length_of_mask != 0) > + if (invvarmatrix.length_of_mask != 0){ > tX = tX * invvarmatrix.masked_data; > + } > > mematrix u = tX * resid; > mematrix v = tX * X; > @@ -287,12 +289,16 @@ > csum = csum * (1. / N); > v = v - csum; > // use cholesky to invert > - mematrix v_i = v; > - cholesky2_mm(v_i, tol_chol); > - chinv2_mm(v_i); > + > + LDLT Ch = LDLT < MatrixXd > (v.data.selfadjointView()); I get the feeling here that you added too many spaces in this case :-). The < and > here are not operators. Thanks, Lennart. > // before was > // mematrix v_i = invert(v); > - beta = v_i * u; > + beta.data = Ch.solve(v.data.adjoint() * u.data); > + //TODO(maartenk): set size of v_i directly or remove mematrix class > + mematrix v_i = v; > + v_i.data = Ch.solve(MatrixXd(length_beta, length_beta). > + Identity(length_beta, length_beta)); > + > double sr = 0.; > double srr = 0.; > for (int i = 0; i < resid.nrow; i++) > @@ -327,7 +333,7 @@ > sigma2 = (Y - tXW * beta_vec).squaredNorm(); > beta.data = beta_vec; > } > -void linear_reg::LeastSquaredRegression(mematrix X,LDLT& Ch) { > +void linear_reg::LeastSquaredRegression(const mematrix& X, LDLT& Ch) { > int m = X.ncol; > MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView().rankUpdate( > X.data.adjoint()); > @@ -368,12 +374,11 @@ > //residuals[i] -= resid_sub; > loglik -= (residuals.data.array().square() * halfrecsig2).sum(); > loglik -= static_cast(reg_data.nids) * log(sqrt(sigma2)); > - > } > > > > -void linear_reg::RobustSEandCovariance(mematrix X, mematrix robust_sigma2, > +void linear_reg::RobustSEandCovariance(const mematrix &X, mematrix robust_sigma2, > MatrixXd tXX_inv, int offset) { > MatrixXd Xresiduals = X.data.array().colwise() > * residuals.data.col(0).array(); > @@ -386,7 +391,7 @@ > robust_sigma2.data.bottomLeftCorner(offset, offset).diagonal(); > } > > -void linear_reg::PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv, > +void linear_reg::PlainSEandCovariance(double sigma2_internal,const MatrixXd &tXX_inv, > int offset) { > sebeta.data = (sigma2_internal * tXX_inv.diagonal().array()).sqrt(); > covariance.data = sigma2_internal > @@ -438,7 +443,6 @@ > > double sigma2_internal; > > - > LDLT Ch; > if (invvarmatrixin.length_of_mask != 0) > { > @@ -481,10 +485,8 @@ > > MatrixXd tXX_inv = Ch.solve(MatrixXd(length_beta, length_beta). > Identity(length_beta, length_beta)); > - > mematrix robust_sigma2(X.ncol, X.ncol); > > - > int offset = X.ncol- 1; > //if additive and interaction and 2 predictors and more then 2 betas > if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){ > @@ -499,10 +501,8 @@ > { > PlainSEandCovariance(sigma2_internal, tXX_inv, offset); > } > - > } > > - > void linear_reg::score(mematrix& resid, > double tol_chol, int model, int interaction, int ngpreds, > const masked_matrix& invvarmatrix, int nullmodel) { > @@ -511,7 +511,6 @@ > invvarmatrix, nullmodel = 0); > } > > - > logistic_reg::logistic_reg(regdata& rdatain) { > reg_data = rdatain.get_unmasked_data(); > int length_beta = reg_data.X.ncol; > > Modified: pkg/ProbABEL/src/reg1.h > =================================================================== > --- pkg/ProbABEL/src/reg1.h 2014-04-25 06:26:38 UTC (rev 1699) > +++ pkg/ProbABEL/src/reg1.h 2014-04-27 09:01:42 UTC (rev 1700) > @@ -50,11 +50,9 @@ > #ifndef REG1_H_ > #define REG1_H_ > #include > -#include "cholesky.h" > #include "regdata.h" > #include "maskedmatrix.h" > > - > mematrix apply_model(mematrix& X, int model, int interaction, > int ngpreds, bool is_interaction_excluded, bool iscox = false, > int nullmodel = 0); > @@ -99,11 +97,10 @@ > const masked_matrix& W_masked, > LDLT& Ch); > void logLikelihood(const mematrix& X); > - void LeastSquaredRegression(mematrix X, LDLT& Ch); > - void RobustSEandCovariance(mematrix X, > - mematrix robust_sigma2, > - MatrixXd tXX_inv, int offset); > - void PlainSEandCovariance(double sigma2_internal, MatrixXd tXX_inv, > + void LeastSquaredRegression(const mematrix & X,LDLT& Ch); > + void RobustSEandCovariance(const mematrix & X, > + mematrix robust_sigma2, MatrixXd tXX_inv, int offset); > + void PlainSEandCovariance(double sigma2_internal, const MatrixXd & tXX_inv, > int offset); > }; > > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From kooyman at gmail.com Sun Apr 27 23:45:43 2014 From: kooyman at gmail.com (Maarten Kooyman) Date: Sun, 27 Apr 2014 23:45:43 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1702 - pkg/ProbABEL/src In-Reply-To: <20140427214409.C9E6D184AD5@r-forge.r-project.org> References: <20140427214409.C9E6D184AD5@r-forge.r-project.org> Message-ID: <535D7A87.5040908@gmail.com> dank! On 27-04-14 23:44, noreply at r-forge.r-project.org wrote: > Author: lckarssen > Date: 2014-04-27 23:44:09 +0200 (Sun, 27 Apr 2014) > New Revision: 1702 > > Modified: > pkg/ProbABEL/src/fvlib > Log: > Fixing the ProbABEL symlink to the v1.0.0 tag if filevector. This reverts the change introduced in r1700. > > > Modified: pkg/ProbABEL/src/fvlib > =================================================================== > --- pkg/ProbABEL/src/fvlib 2014-04-27 20:48:40 UTC (rev 1701) > +++ pkg/ProbABEL/src/fvlib 2014-04-27 21:44:09 UTC (rev 1702) > @@ -1 +1 @@ > -link include/filevector/fvlib > \ No newline at end of file > +link ../../../tags/filevector/v.1.0.0/fvlib > \ No newline at end of file > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Mon Apr 28 16:46:47 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 28 Apr 2014 16:46:47 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src In-Reply-To: <535E422F.4080402@gmail.com> References: <20140428094937.65E8B186FC6@r-forge.r-project.org> <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com> Message-ID: <535E69D7.1050005@karssen.org> Hoi Maarten, Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en probeer ik alle commits te reviewen en waar nodig van kritisch commentaar te voorzien. Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk aan dat er dit soort reviews de normale gang van zaken zijn. Groeten, Lennart. On 28-04-14 13:57, Maarten Kooyman wrote: > On 28-04-14 12:03, L.C. Karssen wrote: >> Hi Maarten, >> >> That's interesting. I assume you did this in response to bug #5658? > Yes. >> The change you made is only for ASCII input files, right? > Yes. > >> Any idea how >> this is treated in GenABEL's mach2databel() and impute2databel()? > I do not have an idea. Maybe check out the speed of reading those format > and convert the strategy used in the trunk of ProABEL.(Those are > generally only done once per dataset so it is not high on the priority > list.) >> I >> assume the NAs are converted to IEEE754 compatible NaN there, but I'm >> not sure. If that is the case, then this would fix that bug, right? > Assumption is the mother of all... But if your sure, it is fixed. >> >> >> Lennart. > > Kind regards, > > Maarten > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Apr 28 16:55:23 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 28 Apr 2014 16:55:23 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src In-Reply-To: <535E69D7.1050005@karssen.org> References: <20140428094937.65E8B186FC6@r-forge.r-project.org> <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com> <535E69D7.1050005@karssen.org> Message-ID: <535E6BDB.3000206@karssen.org> Dear non-Dutch speaking list members, Here's a short translation of the previous e-mail for those who don't speak Dutch :-). Dear Maarten (and others of course), As you must have noticed, I started to review commits. Please feel free to review my commits as well. I will learn from those reviews as well and hopefully these reviews indicate that this is normal procedure (from which I don't want to be exempt). Best, Lennart. On 28-04-14 16:46, L.C. Karssen wrote: > Hoi Maarten, > > Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en > probeer ik alle commits te reviewen en waar nodig van kritisch > commentaar te voorzien. > Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij > mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk > aan dat er dit soort reviews de normale gang van zaken zijn. > > > > Groeten, > > Lennart. > > On 28-04-14 13:57, Maarten Kooyman wrote: >> On 28-04-14 12:03, L.C. Karssen wrote: >>> Hi Maarten, >>> >>> That's interesting. I assume you did this in response to bug #5658? >> Yes. >>> The change you made is only for ASCII input files, right? >> Yes. >> >>> Any idea how >>> this is treated in GenABEL's mach2databel() and impute2databel()? >> I do not have an idea. Maybe check out the speed of reading those format >> and convert the strategy used in the trunk of ProABEL.(Those are >> generally only done once per dataset so it is not high on the priority >> list.) >>> I >>> assume the NAs are converted to IEEE754 compatible NaN there, but I'm >>> not sure. If that is the case, then this would fix that bug, right? >> Assumption is the mother of all... But if your sure, it is fixed. >>> >>> >>> Lennart. >> >> Kind regards, >> >> Maarten >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From kooyman at gmail.com Mon Apr 28 20:39:26 2014 From: kooyman at gmail.com (Maarten Kooyman) Date: Mon, 28 Apr 2014 20:39:26 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1705 - pkg/ProbABEL/src In-Reply-To: <535E6BDB.3000206@karssen.org> References: <20140428094937.65E8B186FC6@r-forge.r-project.org> <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com> <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org> Message-ID: <535EA05E.40201@gmail.com> Dear all, I think it is easier to use for code review github: Please check to get a impression :https://github.com/jquery/jquery/pull/1241/files I think we should reconsider an other the software version system: the current system is not up to date to current usability. Bug tracking and branching is quite hard in terms of usability. Please have a look at github.com to get a impression what is possible. Kind regards, Maarten On 28-04-14 16:55, L.C. Karssen wrote: > Dear non-Dutch speaking list members, > > Here's a short translation of the previous e-mail for those who don't > speak Dutch :-). > > > Dear Maarten (and others of course), > > As you must have noticed, I started to review commits. Please feel free > to review my commits as well. I will learn from those reviews as well > and hopefully these reviews indicate that this is normal procedure (from > which I don't want to be exempt). > > > Best, > > Lennart. > > On 28-04-14 16:46, L.C. Karssen wrote: >> Hoi Maarten, >> >> Zoals je wel hebt gemerkt ben ik geabonneerd op de commit list en >> probeer ik alle commits te reviewen en waar nodig van kritisch >> commentaar te voorzien. >> Als je tijd hebt staat het je natuurlijk volledig vrij om dat ook bij >> mijn commits te doen. Daar leer ik ook weer van en het geeft hopelijk >> aan dat er dit soort reviews de normale gang van zaken zijn. >> >> >> >> Groeten, >> >> Lennart. >> >> On 28-04-14 13:57, Maarten Kooyman wrote: >>> On 28-04-14 12:03, L.C. Karssen wrote: >>>> Hi Maarten, >>>> >>>> That's interesting. I assume you did this in response to bug #5658? >>> Yes. >>>> The change you made is only for ASCII input files, right? >>> Yes. >>> >>>> Any idea how >>>> this is treated in GenABEL's mach2databel() and impute2databel()? >>> I do not have an idea. Maybe check out the speed of reading those format >>> and convert the strategy used in the trunk of ProABEL.(Those are >>> generally only done once per dataset so it is not high on the priority >>> list.) >>>> I >>>> assume the NAs are converted to IEEE754 compatible NaN there, but I'm >>>> not sure. If that is the case, then this would fix that bug, right? >>> Assumption is the mother of all... But if your sure, it is fixed. >>>> >>>> Lennart. >>> Kind regards, >>> >>> Maarten >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Mon Apr 28 22:09:49 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 28 Apr 2014 22:09:49 +0200 Subject: [GenABEL-dev] Proposal to move to Github (was: Re: [Genabel-commits] r1705 - pkg/ProbABEL/src) In-Reply-To: <535EA05E.40201@gmail.com> References: <20140428094937.65E8B186FC6@r-forge.r-project.org> <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com> <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org> <535EA05E.40201@gmail.com> Message-ID: <535EB58D.6010900@karssen.org> Dear Maarten, dear all, Moving to github... Hmm... That is quite a decision, so I've renamed the subject to better reflect the discussion. I've also dropped the older e-mails from the bottom of the thread. First off, are there any people that have experience with git and/or github? I've got some git experience (still learning), but no real experience with github. I agree with Maarten that SVN is showing its age. As he indicates things like branching are much easier in git. Moreover, since I'm travelling regularly being able to work without internet connection is a pro. On the other hand, moving to git (whether github or elsewhere) means leaving R-forge, which is our well-known infrastructure. Furthermore, such a move operation will cost quite some time, I guess. Moving all bugs, features, etc... If we decide to move we should plan well and not rush. And then the current developers will need to learn git if they don't already know how to use it. One thing I think we should definitely do is migrate slowly, package by package. Given that Maarten is positive about such a move and that I am in a bit of limbo but not fully against, it seems logical that ProbABEL is the first package to try such a migration. Looking forward to your comments! Lennart. On 28-04-14 20:39, Maarten Kooyman wrote: > Dear all, > > I think it is easier to use for code review github: > > Please check to get a impression > :https://github.com/jquery/jquery/pull/1241/files > > I think we should reconsider an other the software version system: the > current system is not up to date to current usability. Bug tracking and > branching is quite hard in terms of usability. Please have a look at > github.com to get a impression what is possible. > > Kind regards, > > Maarten > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: From Jurica.Stanojkovic at rt-rk.com Tue Apr 29 17:05:43 2014 From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic) Date: Tue, 29 Apr 2014 17:05:43 +0200 Subject: [GenABEL-dev] probabel big endian support In-Reply-To: <535C1462.9090502@karssen.org> Message-ID: <1897-535fc000-21-6a994800@159572789> Dear Karssen, >> What is the best course of action for supporting probabel on big endian? >> Should *.fvi, *.fvd files allways be in little endian format (than >> DatABEL needs to be changed to always create little endian files)? >> Or can *.fvd, *.fvi files be replaced with big endian files for big >> endian build? >I would say that ideally the files need only to be created once and then >usable on all systems. Especially since these files are usually large >and converting from text format to .fvi/.fvd takes quite a while. If I had to change some values in text format, would I have to generate again fvd/fvi files? Does one when working with ProbABEL has to change those files often? If we do byte-swap on the run for every data in the fvd/fvi file would that be also time consuming? I understand that user then do not need to wait files to generate again on big endian, but same task (run) will last longer on big-endian machine than on little-endian one? >This, however, would require diving into the filevector and the DatABEL >code (filevector or libfilevector is the name of the 'backend' code in >which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use >that code when dealing with .fvi/.fvd files). I don't have very much >experience with either code base, but could probably have a look and >give you some pointers. I tried to work around this and got some results, but a I did not manage to find every place in code where endian swap is needed. I am currently busy with other work, but i will soon look at this again. >Jurica, can you tell us a bit more about why you are using a MIPS >machine for your work with ProbABEL? And do you think it would be a >common task to move these files between machines with different >architectures at your site? I work on supporting mips/mipsel for Debian sid. I have access to mips and mipsel boards and can help with bigendian support. But I do not use ProbABEL actively. >Maybe a converter from big to little and vice versa would be the easiest >solution? I guess such a conversion can be done rather quick. The >downside would be that it (at least temporarily) requires double the >disk space. >Such a converter could be part of the fvutils and/or of DatABEL, for >example. Maybe this could be a good solution, presuming that this would be faster then just converting from text to fileVector format?I will have to look closer how data is converted and writen from text to fvd/fvi in order to be able to convert them to different endian. There is also a option to always create a fvd/fvi in both endian formats, or to create some universal file that have data in both endians inside. Regards, Jurica -------- Original Message -------- Subject: Re: [GenABEL-dev] probabel big endian support Date: Saturday, April 26, 2014 22:17 CEST From: "L.C. Karssen" To: genabel-devel at lists.r-forge.r-project.org References: <896-53591700-f-3be4eec0 at 227853676> ?Dear Jurica, On 24-04-14 15:52, Jurica Stanojkovic wrote: > Dear list, > > I have tried building package probabel on mips big endian. That is great to hear! As far as I know, none of the current developers have access to such a machine. > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on> little endian machine and are not working on big endian ones. That is correct, we found out > > I have tried to create them on big endian mips, and replace ones that > came with source package with the ones that I have created. > The package was built with new files without an error. That is good news. So GenABEL and DatABEL work on big-endian machines. > > I used following command to create files: > library(GenABEL) > library(DatABEL) > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.dose") > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.prob", isprob=TRUE) > mmdose <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.dose") > mmprob <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) > > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me > with following questions: > > What is the best course of action for supporting probabel on big endian? > Should *.fvi, *.fvd files allways be in little endian format (than > DatABEL needs to be changed to always create little endian files)? > Or can *.fvd, *.fvi files be replaced with big endian files for big > endian build? I would say that ideally the files need only to be created once and then usable on all systems. Especially since these files are usually large and converting from text format to .fvi/.fvd takes quite a while. This, however, would require diving into the filevector and the DatABEL code (filevector or libfilevector is the name of the 'backend' code in which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use that code when dealing with .fvi/.fvd files). I don't have very much experience with either code base, but could probably have a look and give you some pointers. > > Is it necessary to be able to use *.fvd *.fvi files created on a > different endian system? On the other hand, how often will people transfer these files to machines of different architectures? Jurica, can you tell us a bit more about why you are using a MIPS machine for your work with ProbABEL? And do you think it would be a common task to move these files between machines with different architectures at your site? Maybe a converter from big to little and vice versa would be the easiest solution? I guess such a conversion can be done rather quick. The downside would be that it (at least temporarily) requires double the disk space. Such a converter could be part of the fvutils and/or of DatABEL, for example. > > I am willing to work on adding big endian support and I will appreciate> any help in determining the right course of action in resolving this > problem. Thank you for your time and willingness to help! It is very much appreciated. We're a small group of developers, but we'll try to help as much as we can. Best, Lennart. > > Regards, > Jurica > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jurica.Stanojkovic at rt-rk.com Tue Apr 29 17:12:11 2014 From: Jurica.Stanojkovic at rt-rk.com (Jurica Stanojkovic) Date: Tue, 29 Apr 2014 17:12:11 +0200 Subject: [GenABEL-dev] probabel big endian support In-Reply-To: <244CF001646FF74FB34F372310A332C57AFBF2@MBX2.rwth-ad.de> Message-ID: <4061-535fc180-39-66293300@143043581> Hi Alvaro, >Hi all, > >would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html > >Just a remark. > >-Alvaro I have tried that approach, it is OK for fileHeader, but there is data in *fvi, *fvd files that is float and can be double. For that we need a byte-swap for float and double. I had some results with this, but I did not find every one place in source where byte-swap is needed. I was not sure that is enough to just byte-swap data on read, blockWriteOrRead could be also used for writing. During the read process data is read with file.read like char* and then cast to other values. Regard, Jurica -------- Original Message -------- Subject: Re: [GenABEL-dev] probabel big endian support Date: Sunday, April 27, 2014 05:29 CEST From: "Frank, Alvaro Jesus" To: "L.C. Karssen" ,"genabel-devel at lists.r-forge.r-project.org" References: <896-53591700-f-3be4eec0 at 227853676>, <535C1462.9090502 at karssen.org> ?Hi all, would it not be better practice to handle this on load, i.e: using this: http://man7.org/linux/man-pages/man3/endian.3.html Just a remark. -Alvaro ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] Sent: Saturday, April 26, 2014 10:17 PM To: genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] probabel big endian support Dear Jurica, On 24-04-14 15:52, Jurica Stanojkovic wrote: > Dear list, > > I have tried building package probabel on mips big endian. That is great to hear! As far as I know, none of the current developers have access to such a machine. > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on > little endian machine and are not working on big endian ones. That is correct, we found out > > I have tried to create them on big endian mips, and replace ones that > came with source package with the ones that I have created. > The package was built with new files without an error. That is good news. So GenABEL and DatABEL work on big-endian machines. > > I used following command to create files: > library(GenABEL) > library(DatABEL) > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.dose") > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", > mlinfo="./checks/inputfiles/test.mlinfo", > outfile="./checks/inputfiles/test.prob", isprob=TRUE) > mmdose <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.dose") > mmprob <- > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) > > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me > with following questions: > > What is the best course of action for supporting probabel on big endian? > Should *.fvi, *.fvd files allways be in little endian format (than > DatABEL needs to be changed to always create little endian files)? > Or can *.fvd, *.fvi files be replaced with big endian files for big > endian build? I would say that ideally the files need only to be created once and then usable on all systems. Especially since these files are usually large and converting from text format to .fvi/.fvd takes quite a while. This, however, would require diving into the filevector and the DatABEL code (filevector or libfilevector is the name of the 'backend' code in which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use that code when dealing with .fvi/.fvd files). I don't have very much experience with either code base, but could probably have a look and give you some pointers. > > Is it necessary to be able to use *.fvd *.fvi files created on a > different endian system? On the other hand, how often will people transfer these files to machines of different architectures? Jurica, can you tell us a bit more about why you are using a MIPS machine for your work with ProbABEL? And do you think it would be a common task to move these files between machines with different architectures at your site? Maybe a converter from big to little and vice versa would be the easiest solution? I guess such a conversion can be done rather quick. The downside would be that it (at least temporarily) requires double the disk space. Such a converter could be part of the fvutils and/or of DatABEL, for example. > > I am willing to work on adding big endian support and I will appreciate > any help in determining the right course of action in resolving this > problem. Thank you for your time and willingness to help! It is very much appreciated. We're a small group of developers, but we'll try to help as much as we can. Best, Lennart. > > Regards, > Jurica > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- _______________________________________________ genabel-devel mailing list genabel-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Apr 30 15:13:55 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 30 Apr 2014 20:13:55 +0700 Subject: [GenABEL-dev] Proposal to move to Github (was: Re: [Genabel-commits] r1705 - pkg/ProbABEL/src) In-Reply-To: <535EB58D.6010900@karssen.org> References: <20140428094937.65E8B186FC6@r-forge.r-project.org> <535E2774.6030606@karssen.org> <535E422F.4080402@gmail.com> <535E69D7.1050005@karssen.org> <535E6BDB.3000206@karssen.org> <535EA05E.40201@gmail.com> <535EB58D.6010900@karssen.org> Message-ID: > On 29 Apr 2014, at 03:09, "L.C. Karssen" wrote: > > Dear Maarten, dear all, > > Moving to github... Hmm... That is quite a decision, so I've renamed the > subject to better reflect the discussion. I've also dropped the older > e-mails from the bottom of the thread. > > First off, are there any people that have experience with git and/or > github? I've got some git experience (still learning), but no real > experience with github. I have some experience and would be comfortable with either > > I agree with Maarten that SVN is showing its age. As he indicates things > like branching are much easier in git. Moreover, since I'm travelling > regularly being able to work without internet connection is a pro. > > On the other hand, moving to git (whether github or elsewhere) means > leaving R-forge, which is our well-known infrastructure. Furthermore, > such a move operation will cost quite some time, I guess. Moving all > bugs, features, etc... If we decide to move we should plan well and not > rush. And then the current developers will need to learn git if they > don't already know how to use it. Moving code first and keep tracker for a while? Can we 'close' tracker later and provide the link to new things on old pages? > > One thing I think we should definitely do is migrate slowly, package by > package. Given that Maarten is positive about such a move and that I am > in a bit of limbo but not fully against, it seems logical that ProbABEL > is the first package to try such a migration. Totally agree. If Maarten is positive about git(hub) I have nothing against. But we do need to plan carefully and make everything possible so as not to affect (in a bad way) the end user. Yurii > > > Looking forward to your comments! > > > Lennart. > > >> On 28-04-14 20:39, Maarten Kooyman wrote: >> Dear all, >> >> I think it is easier to use for code review github: >> >> Please check to get a impression >> :https://github.com/jquery/jquery/pull/1241/files >> >> I think we should reconsider an other the software version system: the >> current system is not up to date to current usability. Bug tracking and >> branching is quite hard in terms of usability. Please have a look at >> github.com to get a impression what is possible. >> >> Kind regards, >> >> Maarten > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Wed Apr 30 15:25:26 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 30 Apr 2014 20:25:26 +0700 Subject: [GenABEL-dev] [genabel-Bugs][5658] Missing genetic data cannot be coded as NA or N as mentioned in the manual In-Reply-To: <20140427224716.EB8E51851A7@r-forge.r-project.org> References: <20140427224716.EB8E51851A7@r-forge.r-project.org> Message-ID: Agree, fix the manual :) ---------------- Sent from mobile device, please excuse possible typos > On 28 Apr 2014, at 05:47, wrote: > > Bugs item #5658, was opened at 2014-04-28 00:46 by Lennart Karssen > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5658&group_id=505 > > Status: Open > Priority: 3 > Submitted By: Lennart Karssen (lckarssen) > Assigned to: Nobody (None) > Summary: Missing genetic data cannot be coded as NA or N as mentioned in the manual > Resolution: Accepted As Bug > Operating System: All > Severity: normal > Hardware: All > Version: PA v0.4.3 > Component: ProbABEL > URL: http://forum.genabel.org/viewtopic.php?f=10&t=871 > > > Initial Comment: > Thanks to user jal on the forum for reporting this bug. From his post: > > There are missing genotypes in my dosage file, while missing values were coded as "NA". My palogist run is aborted with error message: > Reading genotype data... No digits were found while reading genetic data (individual 5, position 1) > where "individual 5, position 1" is the location where the first missing value "NA" appears. The ProbABEL manual says missing value can be coded as "NA", "NaN" or "N", but seems "NA" and "N" do not work in my case. > > > We use the standard C function strtod() to convert the genetic data from text to numbers. I did a quick check and strtod() only accepts "NaN", "NAN", "nan", no "NA" or "N". My guess is that the 'nan' variations are the only ones defined for floating point numbers (IEEE 754). > > We should decided whether we want to change the manual or change to code. Changning the manual has my preference, because changing the code would make reading data slower. > > > ---------------------------------------------------------------------- > > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5658&group_id=505 From lennart at karssen.org Wed Apr 30 18:04:00 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 30 Apr 2014 18:04:00 +0200 Subject: [GenABEL-dev] probabel big endian support In-Reply-To: <4061-535fc180-39-66293300@143043581> References: <4061-535fc180-39-66293300@143043581> Message-ID: <53611EF0.70608@karssen.org> Dear Alvaro, Jurica, On 29-04-14 17:12, Jurica Stanojkovic wrote: > Hi Alvaro, > >>Hi all, >> >>would it not be better practice to handle this on load, i.e: using > this: http://man7.org/linux/man-pages/man3/endian.3.html >> >>Just a remark. >> >>-Alvaro > > I have tried that approach, it is OK for fileHeader, but there is data > in *fvi, *fvd files that is float and can be double. That is correct. As far as I know, filevector was developed to be data type agnostic (at least for the standard data types like int, float and double). > For that we need a byte-swap for float and double. > I had some results with this, but I did not find every one place in > source where byte-swap is needed. > I was not sure that is enough to just byte-swap data on read, > blockWriteOrRead could be also used for writing. > During the read process data is read with file.read like char* and then > cast to other values. I noticed that too, and actually, I don't really understand why, because the type of the data is stored in the header of a filevector file as well (see fvlib/const.h for the types and fvlib/frutil.h for the definition of the header). I wasn't part of the filevector development, so I don't know the exact considerations at that time. In ProbABEL (gendata.cpp) the filevector data are read using the ReadVariableAs() function (fvlib/AbstractMatrix.h), which performs the cast. I haven't checked, but maybe there's a better function in fvlib for reading the data into ProbABEL. Best, Lennart. > > Regard, > Jurica > > -------- Original Message -------- > Subject: Re: [GenABEL-dev] probabel big endian support > Date: Sunday, April 27, 2014 05:29 CEST > From: "Frank, Alvaro Jesus" > To: "L.C. Karssen" > ,"genabel-devel at lists.r-forge.r-project.org" > References: <896-53591700-f-3be4eec0 at 227853676>, > <535C1462.9090502 at karssen.org> > > > >> Hi all, >> >> would it not be better practice to handle this on load, i.e: using >> this: http://man7.org/linux/man-pages/man3/endian.3.html >> >> Just a remark. >> >> -Alvaro >> ________________________________________ >> From: genabel-devel-bounces at lists.r-forge.r-project.org >> [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. >> Karssen [lennart at karssen.org] >> Sent: Saturday, April 26, 2014 10:17 PM >> To: genabel-devel at lists.r-forge.r-project.org >> Subject: Re: [GenABEL-dev] probabel big endian support >> >> Dear Jurica, >> >> On 24-04-14 15:52, Jurica Stanojkovic wrote: >> > Dear list, >> > >> > I have tried building package probabel on mips big endian. >> >> That is great to hear! As far as I know, none of the current developers >> have access to such a machine. >> >> > It looks like that inputfiles/*.fvd and inputfiles/*.fvi are created on >> > little endian machine and are not working on big endian ones. >> >> That is correct, we found out >> >> > >> > I have tried to create them on big endian mips, and replace ones that >> > came with source package with the ones that I have created. >> > The package was built with new files without an error. >> >> That is good news. So GenABEL and DatABEL work on big-endian machines. >> >> > >> > I used following command to create files: >> > library(GenABEL) >> > library(DatABEL) >> > fvdose <- mach2databel(imputedg="./checks/inputfiles/test.mldose", >> > mlinfo="./checks/inputfiles/test.mlinfo", >> > outfile="./checks/inputfiles/test.dose") >> > fvprob <- mach2databel(imputedg="./checks/inputfiles/test.mlprob", >> > mlinfo="./checks/inputfiles/test.mlinfo", >> > outfile="./checks/inputfiles/test.prob", isprob=TRUE) >> > mmdose <- >> > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mldose", >> > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", >> > outfile="./checks/inputfiles/mmscore_gen.dose") >> > mmprob <- >> > mach2databel(imputedg="./checks/inputfiles/mmscore_gen.mlprob", >> > mlinfo="./checks/inputfiles/mmscore_gen.mlinfo", >> > outfile="./checks/inputfiles/mmscore_gen.prob", isprob=TRUE) >> > >> > I am new to ProbABEL, GenABEL, DatABEL so could someone please help me >> > with following questions: >> > >> > What is the best course of action for supporting probabel on big endian? >> > Should *.fvi, *.fvd files allways be in little endian format (than >> > DatABEL needs to be changed to always create little endian files)? >> > Or can *.fvd, *.fvi files be replaced with big endian files for big >> > endian build? >> >> I would say that ideally the files need only to be created once and then >> usable on all systems. Especially since these files are usually large >> and converting from text format to .fvi/.fvd takes quite a while. >> >> This, however, would require diving into the filevector and the DatABEL >> code (filevector or libfilevector is the name of the 'backend' code in >> which the .fvd/.fvi files are 'defined'; both DatABEL and ProbABEL use >> that code when dealing with .fvi/.fvd files). I don't have very much >> experience with either code base, but could probably have a look and >> give you some pointers. >> >> > >> > Is it necessary to be able to use *.fvd *.fvi files created on a >> > different endian system? >> >> On the other hand, how often will people transfer these files to >> machines of different architectures? >> >> Jurica, can you tell us a bit more about why you are using a MIPS >> machine for your work with ProbABEL? And do you think it would be a >> common task to move these files between machines with different >> architectures at your site? >> >> Maybe a converter from big to little and vice versa would be the easiest >> solution? I guess such a conversion can be done rather quick. The >> downside would be that it (at least temporarily) requires double the >> disk space. >> Such a converter could be part of the fvutils and/or of DatABEL, for >> example. >> >> > >> > I am willing to work on adding big endian support and I will appreciate >> > any help in determining the right course of action in resolving this >> > problem. >> >> Thank you for your time and willingness to help! It is very much >> appreciated. We're a small group of developers, but we'll try to help as >> much as we can. >> >> >> Best, >> >> Lennart. >> >> > >> > Regards, >> > Jurica >> > >> > >> > _______________________________________________ >> > genabel-devel mailing list >> > genabel-devel at lists.r-forge.r-project.org >> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 213 bytes Desc: OpenPGP digital signature URL: