From yurii.aulchenko at gmail.com Sat Mar 1 15:37:23 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Sat, 1 Mar 2014 21:37:23 +0700 Subject: [GenABEL-dev] Error wil testing ProbABEL 0.4.2 In-Reply-To: <5310FDD8.6000705@gmail.com> References: <5310DAA3.4020405@mail.nih.gov> <5310FDD8.6000705@gmail.com> Message-ID: <9A24D048-1967-405B-9588-F462CE33663F@gmail.com> Hi Maarten, I think you did not press "reply to all"; hence this message only went to the list, not the original sender (unless he is subscribed, which I doubt), hence no answer.., :) Yurii ---------------- Sent from mobile device, please excuse possible typos > On 01 Mar 2014, at 04:21, Maarten Kooyman wrote: > > Dear Jean, > > Can you please send the output of ./configure and make. This helps to recreate the error and inspect the source of the problem. > > Kind regards, > > Maarten > >> On 28-02-14 19:51, Jean Mao wrote: >> Hi, >> >> Thank you very much for providing this software to scientific community. We have installed it in our cluster for NIH scientists. >> >> Recently, while trying to update to 0.4.2 version, I ran into error message when testing the installation. Attached is the log file. Any help will be appreciated. Thank you. >> >> Jean Mao >> Helix Staff >> CIT, NIH >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Mon Mar 3 09:52:08 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 03 Mar 2014 09:52:08 +0100 Subject: [GenABEL-dev] Error wil testing ProbABEL 0.4.2 In-Reply-To: <5310DAA3.4020405@mail.nih.gov> References: <5310DAA3.4020405@mail.nih.gov> Message-ID: <531442B8.5090105@karssen.org> Dear Jean, Thank you for you interest in ProbABEL. From the output you sent us, it seems that one of the checks fails in which the output of various regressions in R are compared to ProbABEL output. (The XFAIL part is an expected failure, so only the check for pacoxph fails). In order to further diagnose the problem, we need some more information. Which Linux distribution are you using? Which version of R is installed on your machine? Could you sent us the config.log file that was created when you ran ./configure? One last thing that comes to mind: ./configure tests for the existence of R, but it doesn't check whether you have the 'survival' R package installed. I guess that is the problem here. Best, Lennart. On 28-02-14 19:51, Jean Mao wrote: > Hi, > > Thank you very much for providing this software to scientific community. > We have installed it in our cluster for NIH scientists. > > Recently, while trying to update to 0.4.2 version, I ran into error > message when testing the installation. Attached is the log file. Any > help will be appreciated. Thank you. > > Jean Mao > Helix Staff > CIT, NIH > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From ldimitro at wakehealth.edu Sun Mar 2 01:10:12 2014 From: ldimitro at wakehealth.edu (Latchezar (Lucho) Dimitrov) Date: Sun, 2 Mar 2014 00:10:12 +0000 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL Message-ID: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> Dear genABEL developers, I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on ORACLE Solaris 10 x86 but when I ran make check I got the subj. results. I went and manually ran one of the failing check: run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ "pacoxph check: dose vs. prob" -I SNP diff: two filename arguments required pacoxph check: dose vs. prob FAILED Then I manually compared the two fails: diff coxph_dose_add.out.txt coxph_prob_add.out.txt 1c1 < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_add sebeta_SNP_add chi2_SNP --- > name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 Finally, I compared the two files w/o their first lines and they are the same. Any help highly appreciated. Please find the log file attached Thank you very much, Latchezar (Lucho) "Speaking w/ computers" Dimitrov Analyst/Programmer IV, Center for Genomics and Personalized Medicine Research Wake Forest University School of Medicine????? fax:? (336)713-7566 Medical Center Blvd.?????????????????????????? work: (336)713-7137 Winston-Salem, NC 27157??????????????????????? -- A computer lets you make more mistakes faster than any invention in human history -- ?? with the possible exceptions of handguns and tequila. ?????????????????????????????????????????? --Mitch Ratliffe, "Technology Review" -------------- next part -------------- A non-text attachment was scrubbed... Name: test-suite.log Type: application/octet-stream Size: 2999 bytes Desc: test-suite.log URL: From lennart at karssen.org Mon Mar 3 10:13:15 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 03 Mar 2014 10:13:15 +0100 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <531447AB.6090405@karssen.org> Dear Lucho, Thanks for your interest in ProbABEL. I think you are one of the (very?) few users using ProbABEL on Solaris, so we are very interested in your feedback. Could you send us the config.log file created when running ./configure? That may give us some more hints on how your system is configured. My first hunch is that the diff utility in Solaris has some different options from the GNU version. When comparing the outputs from dosage inputs with probability input files the checks use the -I option to ignore the header line. Does your version of diff have that option? My knowledge of Solaris is a bit rusty, but I seem to remember that some of the GNU tools are available (or at least in principle installable) on Solaris. I think they are then prefixed with a g. Do you have gdiff on your system (maybe in /usr/bin/ or /usr/sfw/bin/)? Best regards, Lennart Karssen. On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: > Dear genABEL developers, > > I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on ORACLE Solaris 10 x86 but when I ran > > make check > > I got the subj. results. I went and manually ran one of the failing check: > > run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ > "pacoxph check: dose vs. prob" -I SNP > diff: two filename arguments required > pacoxph check: dose vs. prob FAILED > > Then I manually compared the two fails: > > diff coxph_dose_add.out.txt coxph_prob_add.out.txt > 1c1 > < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_add sebeta_SNP_add chi2_SNP > --- >> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > > Finally, I compared the two files w/o their first lines and they are the same. > > > Any help highly appreciated. Please find the log file attached > > > Thank you very much, > Latchezar (Lucho) "Speaking w/ computers" Dimitrov > > Analyst/Programmer IV, > Center for Genomics and Personalized Medicine Research > Wake Forest University School of Medicine fax: (336)713-7566 > Medical Center Blvd. work: (336)713-7137 > Winston-Salem, NC 27157 > > -- A computer lets you make more mistakes faster than any invention in human history -- > with the possible exceptions of handguns and tequila. > --Mitch Ratliffe, "Technology Review" > > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Mon Mar 3 13:12:25 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 03 Mar 2014 13:12:25 +0100 Subject: [GenABEL-dev] Suggested change of default weight in GenABEL/ibs() In-Reply-To: <-317756408431031488@unknownmsgid> References: <768CB804-C3EF-4E6A-B524-937A1B54331C@gmail.com> <-317756408431031488@unknownmsgid> Message-ID: <531471A9.7060109@karssen.org> Dear all, On 28-02-14 14:15, Yurii Aulchenko wrote: > In principle I agree this is good idea, rarely "no" (default) option is > used; but I am always worried to change defaults as this may destroy > people pipelines. How about issuing a warning? I see two options: 1) Keep the current default and issue a warning when the user doesn't specify a weight. 2) Change the default and when ibs() is run and the weight argument is not equal to 'freq', issue a warning saying that the default has changed to freq. Although I haven't used ibs() a lot, I think option 2 is best as it sets a sane default. Another thing to think about: when changing defaults, what do we do with the version number. If we only change the minor number (the 8 in 1.8-0), people will think not much has changed and this change in default behaviour may go unnoticed. Lennart. > > We need third opinion :) > > Best, > Y > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > > On Feb 28, 2014, at 3:45 PM, Xia Shen > wrote: > >> Hi, >> >> I suggest to change the default weight argument in the GenABEL/ibs() >> function to be set to ?freq? instead of ?no?. I have colleagues >> constantly forget to set to ?freq? and produce, what I would call, >> ?wrong? kinship matrix. >> >> *Xia Shen* >> PhD >> >> Division of Computational Genetics >> Department of Clinical Science >> *Swedish University of Agricultural Sciences* >> Uppsala, Sweden >> >> www.shen.se >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Mar 3 13:16:37 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Mon, 3 Mar 2014 16:16:37 +0400 Subject: [GenABEL-dev] Suggested change of default weight in GenABEL/ibs() In-Reply-To: <531471A9.7060109@karssen.org> References: <768CB804-C3EF-4E6A-B524-937A1B54331C@gmail.com> <-317756408431031488@unknownmsgid> <531471A9.7060109@karssen.org> Message-ID: <4828030313405700840@unknownmsgid> I am voting for option 2; as for the version number - no strong opinion. Xia, would you be willing to summarize this discussion as a feature request, or to provide a patch? Best wishes, Yurii ---------------------- Yurii Aulchenko (sent from mobile device) > On Mar 3, 2014, at 4:12 PM, "L.C. Karssen" wrote: > > Dear all, > >> On 28-02-14 14:15, Yurii Aulchenko wrote: >> In principle I agree this is good idea, rarely "no" (default) option is >> used; but I am always worried to change defaults as this may destroy >> people pipelines. > > How about issuing a warning? I see two options: > 1) Keep the current default and issue a warning when the user doesn't > specify a weight. > 2) Change the default and when ibs() is run and the weight argument is > not equal to 'freq', issue a warning saying that the default has changed > to freq. > > Although I haven't used ibs() a lot, I think option 2 is best as it sets > a sane default. > > Another thing to think about: when changing defaults, what do we do with > the version number. If we only change the minor number (the 8 in 1.8-0), > people will think not much has changed and this change in default > behaviour may go unnoticed. > > > Lennart. > >> >> We need third opinion :) >> >> Best, >> Y >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >> On Feb 28, 2014, at 3:45 PM, Xia Shen > > wrote: >> >>> Hi, >>> >>> I suggest to change the default weight argument in the GenABEL/ibs() >>> function to be set to "freq" instead of "no". I have colleagues >>> constantly forget to set to "freq" and produce, what I would call, >>> "wrong" kinship matrix. >>> >>> *Xia Shen* >>> PhD >>> >>> Division of Computational Genetics >>> Department of Clinical Science >>> *Swedish University of Agricultural Sciences* >>> Uppsala, Sweden >>> >>> www.shen.se >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From maoj at mail.nih.gov Mon Mar 3 18:34:28 2014 From: maoj at mail.nih.gov (Jean Mao) Date: Mon, 03 Mar 2014 12:34:28 -0500 Subject: [GenABEL-dev] Error wil testing ProbABEL 0.4.2 In-Reply-To: <531442B8.5090105@karssen.org> References: <5310DAA3.4020405@mail.nih.gov> <531442B8.5090105@karssen.org> Message-ID: <5314BD24.30807@mail.nih.gov> Hi Lennart, Thank you very much for your reply. Please see attached log files. I think 3.0.2 is the version of R I used. Jean On 3/3/2014 3:52 AM, L.C. Karssen wrote: > Dear Jean, > > Thank you for you interest in ProbABEL. From the output you sent us, it > seems that one of the checks fails in which the output of various > regressions in R are compared to ProbABEL output. (The XFAIL part is an > expected failure, so only the check for pacoxph fails). > > In order to further diagnose the problem, we need some more information. > Which Linux distribution are you using? Which version of R is installed > on your machine? > > Could you sent us the config.log file that was created when you ran > ./configure? > > > One last thing that comes to mind: ./configure tests for the existence > of R, but it doesn't check whether you have the 'survival' R package > installed. I guess that is the problem here. > > > Best, > > Lennart. > > On 28-02-14 19:51, Jean Mao wrote: >> Hi, >> >> Thank you very much for providing this software to scientific community. >> We have installed it in our cluster for NIH scientists. >> >> Recently, while trying to update to 0.4.2 version, I ran into error >> message when testing the installation. Attached is the log file. Any >> help will be appreciated. Thank you. >> >> Jean Mao >> Helix Staff >> CIT, NIH >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> -------------- next part -------------- Making all in src make[1]: Entering directory `/usr/local/apps/probabel/0.4.2/src' make all-am make[2]: Entering directory `/usr/local/apps/probabel/0.4.2/src' CXX palinear-data.o CXX palinear-gendata.o CXX palinear-command_line_settings.o CXX palinear-usage.o CXX palinear-main.o CXX palinear-utilities.o CXX palinear-phedata.o CXX palinear-cholesky.o CXX palinear-regdata.o CXX palinear-maskedmatrix.o CXX palinear-reg1.o CXX palinear-main_functions_dump.o CXX palinear-AbstractMatrix.o CXX palinear-CastUtils.o CXX palinear-convert_util.o CXX palinear-FileVector.o CXX palinear-FilteredMatrix.o CXX palinear-frutil.o CXX palinear-Logger.o CXX palinear-RealHandlerWrapper.o CXX palinear-ReusableFileHandle.o CXX palinear-Transposer.o CXXLD palinear CXX palogist-data.o CXX palogist-gendata.o CXX palogist-command_line_settings.o CXX palogist-usage.o CXX palogist-main.o CXX palogist-utilities.o CXX palogist-phedata.o CXX palogist-cholesky.o CXX palogist-regdata.o CXX palogist-maskedmatrix.o CXX palogist-reg1.o CXX palogist-main_functions_dump.o CXX palogist-AbstractMatrix.o CXX palogist-CastUtils.o CXX palogist-convert_util.o CXX palogist-FileVector.o CXX palogist-FilteredMatrix.o CXX palogist-frutil.o CXX palogist-Logger.o CXX palogist-RealHandlerWrapper.o CXX palogist-ReusableFileHandle.o CXX palogist-Transposer.o CXXLD palogist CC pacoxph-coxfit2.o CC pacoxph-chinv2.o CC pacoxph-cholesky2.o CC pacoxph-chsolve2.o CC pacoxph-dmatrix.o CXX pacoxph-data.o CXX pacoxph-gendata.o CXX pacoxph-command_line_settings.o CXX pacoxph-usage.o CXX pacoxph-main.o CXX pacoxph-utilities.o CXX pacoxph-phedata.o CXX pacoxph-cholesky.o CXX pacoxph-regdata.o CXX pacoxph-maskedmatrix.o CXX pacoxph-reg1.o CXX pacoxph-main_functions_dump.o CXX pacoxph-AbstractMatrix.o CXX pacoxph-CastUtils.o CXX pacoxph-convert_util.o CXX pacoxph-FileVector.o CXX pacoxph-FilteredMatrix.o CXX pacoxph-frutil.o CXX pacoxph-Logger.o CXX pacoxph-RealHandlerWrapper.o CXX pacoxph-ReusableFileHandle.o CXX pacoxph-Transposer.o CXX pacoxph-coxph_data.o CXXLD pacoxph make[2]: Leaving directory `/usr/local/apps/probabel/0.4.2/src' make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2/src' Making all in doc make[1]: Entering directory `/usr/local/apps/probabel/0.4.2/doc' === Making ProbABEL_manual.aux file === pdflatex ./ProbABEL_manual.tex This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. entering extended mode (./ProbABEL_manual.tex LaTeX2e <2005/12/01> Babel and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish, french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish, turkish, ukenglish, ukrainian, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2005/09/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size12.clo)) (/usr/share/texmf/tex/latex/tools/verbatim.sty) (/usr/share/texmf/tex/latex/ltxmisc/titleref.sty) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/base/makeidx.sty) (/usr/share/texmf/tex/latex/xcolor/xcolor.sty (/usr/share/texmf/tex/latex/config/color.cfg) (/usr/share/texmf/tex/latex/pdftex-def/pdftex.def) (/usr/share/texmf/tex/latex/graphics/dvipsnam.def)) (/usr/share/texmf/tex/latex/hyperref/hyperref.sty (/usr/share/texmf/tex/latex/graphics/keyval.sty) (/usr/share/texmf/tex/latex/hyperref/pd1enc.def) (/usr/share/texmf/tex/latex/config/hyperref.cfg) (/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty) Implicit mode ON; LaTeX internals redefined (/usr/share/texmf/tex/latex/ltxmisc/url.sty)) *hyperref using driver hpdftex* (/usr/share/texmf/tex/latex/hyperref/hpdftex.def) (/usr/share/texmf/tex/latex/oberdiek/hypcap.sty) Writing index file ProbABEL_manual.idx No file ProbABEL_manual.aux. (/usr/share/texmf/tex/latex/hyperref/nameref.sty (/usr/share/texmf/tex/latex/oberdiek/refcount.sty)) Overfull \hbox (17.1011pt too wide) in paragraph at lines 49--49 [][] [1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] Overfull \hbox (1.47583pt too wide) in paragraph at lines 113--118 \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 takes three files as in-put: a file c on-tain-ing SNP in-for-ma-tion (e.g. the [2] [3] LaTeX Warning: Reference `sec:runanalysis' on page 4 undefined on input line 19 1. Overfull \hbox (12.66078pt too wide) in paragraph at lines 215--222 []\OT1/cmr/m/n/12 In the case of lin-ear or lo-gis-tic re-gres-sion (pro-grams \OT1/cmtt/m/n/12 palinear \OT1/cmr/m/n/12 and \OT1/cmtt/m/n/12 palogist\OT1/cmr /m/n/12 , Overfull \hbox (14.5443pt too wide) in paragraph at lines 215--222 \OT1/cmr/m/n/12 lin-ear re-gres-sion anal-y-sis fol-low here (also to be found in \OT1/cmtt/m/n/12 examples/height.txt\OT1/cmr/m/n/12 ) [4] [5] [6] Overfull \hbox (5.17282pt too wide) in paragraph at lines 292--296 []\OT1/cmr/m/n/12 There are in to-tal 11 com-mand line op-tions you can spec-if y to the \OT1/cmtt/m/n/12 ProbABEL Overfull \hbox (5.2002pt too wide) in paragraph at lines 299--299 []\OT1/cmtt/m/n/12 (C) Yurii Aulchenko, Lennart C. Karssen, Maksim Struchalin, EMCR[] Overfull \hbox (23.7252pt too wide) in paragraph at lines 310--310 [] \OT1/cmtt/m/n/12 --chrom : [optional] chromosome (to be passed to output)[] Overfull \hbox (104.00024pt too wide) in paragraph at lines 311--311 [] \OT1/cmtt/m/n/12 --out : [optional] output file name (default is regression.out.txt)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 312--312 [] \OT1/cmtt/m/n/12 --skipd : [optional] how many columns to skip in the predictor[] Overfull \hbox (54.60022pt too wide) in paragraph at lines 314--314 [] \OT1/cmtt/m/n/12 --ntraits : [optional] how many traits are analysed (default 1)[] Overfull \hbox (36.07521pt too wide) in paragraph at lines 315--315 [] \OT1/cmtt/m/n/12 --ngpreds : [optional] how many predictor columns p er marker[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 316--316 [] \OT1/cmtt/m/n/12 (default 1 = MLDOSE; else use 2 for ML PROB)[] [7] Overfull \hbox (97.82524pt too wide) in paragraph at lines 317--317 [] \OT1/cmtt/m/n/12 --separat : [optional] character to separate fields (default is space)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 320--320 [] \OT1/cmtt/m/n/12 --allcov : report estimates for all covariates (la rge outputs!)[] Overfull \hbox (295.42534pt too wide) in paragraph at lines 321--321 [] \OT1/cmtt/m/n/12 --interaction: Which covariate to use for interacti on with SNP analysis (default is no interaction, 0)[] Overfull \hbox (388.05038pt too wide) in paragraph at lines 322--322 [] \OT1/cmtt/m/n/12 --interaction_only: like previous but without covar iate acting in interaction with SNP (default is no interaction, 0)[] Overfull \hbox (758.55057pt too wide) in paragraph at lines 323--323 [] \OT1/cmtt/m/n/12 --mmscore : score test in samples of related indivi duals. File with inverse of variance-covariance matrix (for palinear) or invers e correlation (for palogist) as input parameter[] Overfull \hbox (122.52525pt too wide) in paragraph at lines 324--324 [] \OT1/cmtt/m/n/12 --robust : report robust (aka sandwich, aka Hubert -White) standard errors[] LaTeX Warning: Reference `ssec:dosein' on page 8 undefined on input line 342. LaTeX Warning: Reference `ssec:phenoin' on page 8 undefined on input line 344. LaTeX Warning: Reference `ssec:infoin' on page 8 undefined on input line 346. Overfull \hbox (18.38104pt too wide) in paragraph at lines 340--347 []\OT1/cmr/m/n/12 These op-tions are \OT1/cmtt/m/n/12 --dose \OT1/cmr/m/n/12 (o r \OT1/cmtt/m/n/12 -d\OT1/cmr/m/n/12 ), spec-i-fy-ing the ge-nomic pre-dic-tor/ MLDOSE Overfull \hbox (11.3752pt too wide) in paragraph at lines 351--351 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (42.25021pt too wide) in paragraph at lines 359--359 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_ data.txt \[] Overfull \hbox (29.9002pt too wide) in paragraph at lines 365--365 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_da ta.txt \[] [8] Overfull \hbox (11.3752pt too wide) in paragraph at lines 377--377 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (5.2002pt too wide) in paragraph at lines 378--378 [] \OT1/cmtt/m/n/12 -d test.mlprob -i test.mlin fo \[] LaTeX Warning: Reference `sec:methodology' on page 9 undefined on input line 39 5. Overfull \hbox (42.25021pt too wide) in paragraph at lines 414--414 [] \OT1/cmtt/m/n/12 h2.obj$InvSigma * h2.obj$h2an$estimate[length(h2.obj$h2an$ estimate)][] [9] [10] Overfull \hbox (2.98222pt too wide) in paragraph at lines 512--516 \OT1/cmr/m/n/12 After in-stalling \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 you can find the \OT1/cmtt/m/n/12 prepare[]data.R \OT1/cmr/m/n/12 file in the \OT1 /cmtt/m/n/12 scripts [11] [12] Overfull \hbox (8.42549pt too wide) in paragraph at lines 614--616 []\OT1/cmr/m/n/12 Secondly, the Wald test can be used; for that the in-verse va riance-covariance [13] [14] LaTeX Warning: Reference `eq:expectation' on page 15 undefined on input line 71 0. LaTeX Warning: Reference `kinship' on page 15 undefined on input line 722. [15] [16] [17] No file ProbABEL_manual.ind. [18] (./ProbABEL_manual.aux) LaTeX Warning: There were undefined references. LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. ) (see the transcript file for additional information) Output written on ProbABEL_manual.pdf (18 pages, 180685 bytes). Transcript written on ProbABEL_manual.log. === Making PDF === pdflatex ./ProbABEL_manual.tex This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. entering extended mode (./ProbABEL_manual.tex LaTeX2e <2005/12/01> Babel and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish, french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish, turkish, ukenglish, ukrainian, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2005/09/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size12.clo)) (/usr/share/texmf/tex/latex/tools/verbatim.sty) (/usr/share/texmf/tex/latex/ltxmisc/titleref.sty) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/base/makeidx.sty) (/usr/share/texmf/tex/latex/xcolor/xcolor.sty (/usr/share/texmf/tex/latex/config/color.cfg) (/usr/share/texmf/tex/latex/pdftex-def/pdftex.def) (/usr/share/texmf/tex/latex/graphics/dvipsnam.def)) (/usr/share/texmf/tex/latex/hyperref/hyperref.sty (/usr/share/texmf/tex/latex/graphics/keyval.sty) (/usr/share/texmf/tex/latex/hyperref/pd1enc.def) (/usr/share/texmf/tex/latex/config/hyperref.cfg) (/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty) Implicit mode ON; LaTeX internals redefined (/usr/share/texmf/tex/latex/ltxmisc/url.sty)) *hyperref using driver hpdftex* (/usr/share/texmf/tex/latex/hyperref/hpdftex.def) (/usr/share/texmf/tex/latex/oberdiek/hypcap.sty) Writing index file ProbABEL_manual.idx (./ProbABEL_manual.aux) (/usr/share/texmf/tex/latex/hyperref/nameref.sty (/usr/share/texmf/tex/latex/oberdiek/refcount.sty)) (./ProbABEL_manual.out) (./ProbABEL_manual.out) Overfull \hbox (17.1011pt too wide) in paragraph at lines 49--49 [][] (./ProbABEL_manual.toc [1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]) [2] Overfull \hbox (1.47583pt too wide) in paragraph at lines 113--118 \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 takes three files as in-put: a file c on-tain-ing SNP in-for-ma-tion (e.g. the [3] [4] Overfull \hbox (12.66078pt too wide) in paragraph at lines 215--222 []\OT1/cmr/m/n/12 In the case of lin-ear or lo-gis-tic re-gres-sion (pro-grams \OT1/cmtt/m/n/12 palinear \OT1/cmr/m/n/12 and \OT1/cmtt/m/n/12 palogist\OT1/cmr /m/n/12 , Overfull \hbox (14.5443pt too wide) in paragraph at lines 215--222 \OT1/cmr/m/n/12 lin-ear re-gres-sion anal-y-sis fol-low here (also to be found in \OT1/cmtt/m/n/12 examples/height.txt\OT1/cmr/m/n/12 ) [5] [6] [7] Overfull \hbox (5.17282pt too wide) in paragraph at lines 292--296 []\OT1/cmr/m/n/12 There are in to-tal 11 com-mand line op-tions you can spec-if y to the \OT1/cmtt/m/n/12 ProbABEL Overfull \hbox (5.2002pt too wide) in paragraph at lines 299--299 []\OT1/cmtt/m/n/12 (C) Yurii Aulchenko, Lennart C. Karssen, Maksim Struchalin, EMCR[] Overfull \hbox (23.7252pt too wide) in paragraph at lines 310--310 [] \OT1/cmtt/m/n/12 --chrom : [optional] chromosome (to be passed to output)[] Overfull \hbox (104.00024pt too wide) in paragraph at lines 311--311 [] \OT1/cmtt/m/n/12 --out : [optional] output file name (default is regression.out.txt)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 312--312 [] \OT1/cmtt/m/n/12 --skipd : [optional] how many columns to skip in the predictor[] Overfull \hbox (54.60022pt too wide) in paragraph at lines 314--314 [] \OT1/cmtt/m/n/12 --ntraits : [optional] how many traits are analysed (default 1)[] Overfull \hbox (36.07521pt too wide) in paragraph at lines 315--315 [] \OT1/cmtt/m/n/12 --ngpreds : [optional] how many predictor columns p er marker[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 316--316 [] \OT1/cmtt/m/n/12 (default 1 = MLDOSE; else use 2 for ML PROB)[] Overfull \hbox (97.82524pt too wide) in paragraph at lines 317--317 [] \OT1/cmtt/m/n/12 --separat : [optional] character to separate fields (default is space)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 320--320 [] \OT1/cmtt/m/n/12 --allcov : report estimates for all covariates (la rge outputs!)[] Overfull \hbox (295.42534pt too wide) in paragraph at lines 321--321 [] \OT1/cmtt/m/n/12 --interaction: Which covariate to use for interacti on with SNP analysis (default is no interaction, 0)[] Overfull \hbox (388.05038pt too wide) in paragraph at lines 322--322 [] \OT1/cmtt/m/n/12 --interaction_only: like previous but without covar iate acting in interaction with SNP (default is no interaction, 0)[] Overfull \hbox (758.55057pt too wide) in paragraph at lines 323--323 [] \OT1/cmtt/m/n/12 --mmscore : score test in samples of related indivi duals. File with inverse of variance-covariance matrix (for palinear) or invers e correlation (for palogist) as input parameter[] Overfull \hbox (122.52525pt too wide) in paragraph at lines 324--324 [] \OT1/cmtt/m/n/12 --robust : report robust (aka sandwich, aka Hubert -White) standard errors[] [8] Overfull \hbox (18.38104pt too wide) in paragraph at lines 340--347 []\OT1/cmr/m/n/12 These op-tions are \OT1/cmtt/m/n/12 --dose \OT1/cmr/m/n/12 (o r \OT1/cmtt/m/n/12 -d\OT1/cmr/m/n/12 ), spec-i-fy-ing the ge-nomic pre-dic-tor/ MLDOSE Overfull \hbox (11.3752pt too wide) in paragraph at lines 351--351 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (42.25021pt too wide) in paragraph at lines 359--359 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_ data.txt \[] Overfull \hbox (29.9002pt too wide) in paragraph at lines 365--365 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_da ta.txt \[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 377--377 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (5.2002pt too wide) in paragraph at lines 378--378 [] \OT1/cmtt/m/n/12 -d test.mlprob -i test.mlin fo \[] [9] Overfull \hbox (42.25021pt too wide) in paragraph at lines 414--414 [] \OT1/cmtt/m/n/12 h2.obj$InvSigma * h2.obj$h2an$estimate[length(h2.obj$h2an$ estimate)][] [10] [11] Overfull \hbox (2.98222pt too wide) in paragraph at lines 512--516 \OT1/cmr/m/n/12 After in-stalling \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 you can find the \OT1/cmtt/m/n/12 prepare[]data.R \OT1/cmr/m/n/12 file in the \OT1 /cmtt/m/n/12 scripts [12] [13] Overfull \hbox (8.42549pt too wide) in paragraph at lines 614--616 []\OT1/cmr/m/n/12 Secondly, the Wald test can be used; for that the in-verse va riance-covariance [14] [15] [16] [17] [18] No file ProbABEL_manual.ind. [19] (./ProbABEL_manual.aux) LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. ) (see the transcript file for additional information) Output written on ProbABEL_manual.pdf (19 pages, 193943 bytes). Transcript written on ProbABEL_manual.log. LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. ** Re-running LaTeX ** This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. entering extended mode (./ProbABEL_manual.tex LaTeX2e <2005/12/01> Babel and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish, french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish, turkish, ukenglish, ukrainian, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2005/09/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size12.clo)) (/usr/share/texmf/tex/latex/tools/verbatim.sty) (/usr/share/texmf/tex/latex/ltxmisc/titleref.sty) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/base/makeidx.sty) (/usr/share/texmf/tex/latex/xcolor/xcolor.sty (/usr/share/texmf/tex/latex/config/color.cfg) (/usr/share/texmf/tex/latex/pdftex-def/pdftex.def) (/usr/share/texmf/tex/latex/graphics/dvipsnam.def)) (/usr/share/texmf/tex/latex/hyperref/hyperref.sty (/usr/share/texmf/tex/latex/graphics/keyval.sty) (/usr/share/texmf/tex/latex/hyperref/pd1enc.def) (/usr/share/texmf/tex/latex/config/hyperref.cfg) (/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty) Implicit mode ON; LaTeX internals redefined (/usr/share/texmf/tex/latex/ltxmisc/url.sty)) *hyperref using driver hpdftex* (/usr/share/texmf/tex/latex/hyperref/hpdftex.def) (/usr/share/texmf/tex/latex/oberdiek/hypcap.sty) Writing index file ProbABEL_manual.idx (./ProbABEL_manual.aux) (/usr/share/texmf/tex/latex/hyperref/nameref.sty (/usr/share/texmf/tex/latex/oberdiek/refcount.sty)) (./ProbABEL_manual.out) (./ProbABEL_manual.out) Overfull \hbox (17.1011pt too wide) in paragraph at lines 49--49 [][] (./ProbABEL_manual.toc [1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]) [2] Overfull \hbox (1.47583pt too wide) in paragraph at lines 113--118 \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 takes three files as in-put: a file c on-tain-ing SNP in-for-ma-tion (e.g. the [3] [4] Overfull \hbox (12.66078pt too wide) in paragraph at lines 215--222 []\OT1/cmr/m/n/12 In the case of lin-ear or lo-gis-tic re-gres-sion (pro-grams \OT1/cmtt/m/n/12 palinear \OT1/cmr/m/n/12 and \OT1/cmtt/m/n/12 palogist\OT1/cmr /m/n/12 , Overfull \hbox (14.5443pt too wide) in paragraph at lines 215--222 \OT1/cmr/m/n/12 lin-ear re-gres-sion anal-y-sis fol-low here (also to be found in \OT1/cmtt/m/n/12 examples/height.txt\OT1/cmr/m/n/12 ) [5] [6] [7] Overfull \hbox (5.17282pt too wide) in paragraph at lines 292--296 []\OT1/cmr/m/n/12 There are in to-tal 11 com-mand line op-tions you can spec-if y to the \OT1/cmtt/m/n/12 ProbABEL Overfull \hbox (5.2002pt too wide) in paragraph at lines 299--299 []\OT1/cmtt/m/n/12 (C) Yurii Aulchenko, Lennart C. Karssen, Maksim Struchalin, EMCR[] Overfull \hbox (23.7252pt too wide) in paragraph at lines 310--310 [] \OT1/cmtt/m/n/12 --chrom : [optional] chromosome (to be passed to output)[] Overfull \hbox (104.00024pt too wide) in paragraph at lines 311--311 [] \OT1/cmtt/m/n/12 --out : [optional] output file name (default is regression.out.txt)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 312--312 [] \OT1/cmtt/m/n/12 --skipd : [optional] how many columns to skip in the predictor[] Overfull \hbox (54.60022pt too wide) in paragraph at lines 314--314 [] \OT1/cmtt/m/n/12 --ntraits : [optional] how many traits are analysed (default 1)[] Overfull \hbox (36.07521pt too wide) in paragraph at lines 315--315 [] \OT1/cmtt/m/n/12 --ngpreds : [optional] how many predictor columns p er marker[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 316--316 [] \OT1/cmtt/m/n/12 (default 1 = MLDOSE; else use 2 for ML PROB)[] Overfull \hbox (97.82524pt too wide) in paragraph at lines 317--317 [] \OT1/cmtt/m/n/12 --separat : [optional] character to separate fields (default is space)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 320--320 [] \OT1/cmtt/m/n/12 --allcov : report estimates for all covariates (la rge outputs!)[] Overfull \hbox (295.42534pt too wide) in paragraph at lines 321--321 [] \OT1/cmtt/m/n/12 --interaction: Which covariate to use for interacti on with SNP analysis (default is no interaction, 0)[] Overfull \hbox (388.05038pt too wide) in paragraph at lines 322--322 [] \OT1/cmtt/m/n/12 --interaction_only: like previous but without covar iate acting in interaction with SNP (default is no interaction, 0)[] Overfull \hbox (758.55057pt too wide) in paragraph at lines 323--323 [] \OT1/cmtt/m/n/12 --mmscore : score test in samples of related indivi duals. File with inverse of variance-covariance matrix (for palinear) or invers e correlation (for palogist) as input parameter[] Overfull \hbox (122.52525pt too wide) in paragraph at lines 324--324 [] \OT1/cmtt/m/n/12 --robust : report robust (aka sandwich, aka Hubert -White) standard errors[] [8] Overfull \hbox (18.38104pt too wide) in paragraph at lines 340--347 []\OT1/cmr/m/n/12 These op-tions are \OT1/cmtt/m/n/12 --dose \OT1/cmr/m/n/12 (o r \OT1/cmtt/m/n/12 -d\OT1/cmr/m/n/12 ), spec-i-fy-ing the ge-nomic pre-dic-tor/ MLDOSE Overfull \hbox (11.3752pt too wide) in paragraph at lines 351--351 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (42.25021pt too wide) in paragraph at lines 359--359 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_ data.txt \[] Overfull \hbox (29.9002pt too wide) in paragraph at lines 365--365 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_da ta.txt \[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 377--377 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (5.2002pt too wide) in paragraph at lines 378--378 [] \OT1/cmtt/m/n/12 -d test.mlprob -i test.mlin fo \[] [9] Overfull \hbox (42.25021pt too wide) in paragraph at lines 414--414 [] \OT1/cmtt/m/n/12 h2.obj$InvSigma * h2.obj$h2an$estimate[length(h2.obj$h2an$ estimate)][] [10] [11] Overfull \hbox (2.98222pt too wide) in paragraph at lines 512--516 \OT1/cmr/m/n/12 After in-stalling \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 you can find the \OT1/cmtt/m/n/12 prepare[]data.R \OT1/cmr/m/n/12 file in the \OT1 /cmtt/m/n/12 scripts [12] [13] Overfull \hbox (8.42549pt too wide) in paragraph at lines 614--616 []\OT1/cmr/m/n/12 Secondly, the Wald test can be used; for that the in-verse va riance-covariance [14] [15] [16] [17] [18] No file ProbABEL_manual.ind. [19] (./ProbABEL_manual.aux) ) (see the transcript file for additional information) Output written on ProbABEL_manual.pdf (19 pages, 193790 bytes). Transcript written on ProbABEL_manual.log. === Making index === makeindex ProbABEL_manual === Making final PDF === pdflatex ./ProbABEL_manual.tex This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. entering extended mode (./ProbABEL_manual.tex LaTeX2e <2005/12/01> Babel and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish, french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish, turkish, ukenglish, ukrainian, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2005/09/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size12.clo)) (/usr/share/texmf/tex/latex/tools/verbatim.sty) (/usr/share/texmf/tex/latex/ltxmisc/titleref.sty) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/base/makeidx.sty) (/usr/share/texmf/tex/latex/xcolor/xcolor.sty (/usr/share/texmf/tex/latex/config/color.cfg) (/usr/share/texmf/tex/latex/pdftex-def/pdftex.def) (/usr/share/texmf/tex/latex/graphics/dvipsnam.def)) (/usr/share/texmf/tex/latex/hyperref/hyperref.sty (/usr/share/texmf/tex/latex/graphics/keyval.sty) (/usr/share/texmf/tex/latex/hyperref/pd1enc.def) (/usr/share/texmf/tex/latex/config/hyperref.cfg) (/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty) Implicit mode ON; LaTeX internals redefined (/usr/share/texmf/tex/latex/ltxmisc/url.sty)) *hyperref using driver hpdftex* (/usr/share/texmf/tex/latex/hyperref/hpdftex.def) (/usr/share/texmf/tex/latex/oberdiek/hypcap.sty) Writing index file ProbABEL_manual.idx (./ProbABEL_manual.aux) (/usr/share/texmf/tex/latex/hyperref/nameref.sty (/usr/share/texmf/tex/latex/oberdiek/refcount.sty)) (./ProbABEL_manual.out) (./ProbABEL_manual.out) Overfull \hbox (17.1011pt too wide) in paragraph at lines 49--49 [][] (./ProbABEL_manual.toc [1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]) [2] Overfull \hbox (1.47583pt too wide) in paragraph at lines 113--118 \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 takes three files as in-put: a file c on-tain-ing SNP in-for-ma-tion (e.g. the [3] [4] Overfull \hbox (12.66078pt too wide) in paragraph at lines 215--222 []\OT1/cmr/m/n/12 In the case of lin-ear or lo-gis-tic re-gres-sion (pro-grams \OT1/cmtt/m/n/12 palinear \OT1/cmr/m/n/12 and \OT1/cmtt/m/n/12 palogist\OT1/cmr /m/n/12 , Overfull \hbox (14.5443pt too wide) in paragraph at lines 215--222 \OT1/cmr/m/n/12 lin-ear re-gres-sion anal-y-sis fol-low here (also to be found in \OT1/cmtt/m/n/12 examples/height.txt\OT1/cmr/m/n/12 ) [5] [6] [7] Overfull \hbox (5.17282pt too wide) in paragraph at lines 292--296 []\OT1/cmr/m/n/12 There are in to-tal 11 com-mand line op-tions you can spec-if y to the \OT1/cmtt/m/n/12 ProbABEL Overfull \hbox (5.2002pt too wide) in paragraph at lines 299--299 []\OT1/cmtt/m/n/12 (C) Yurii Aulchenko, Lennart C. Karssen, Maksim Struchalin, EMCR[] Overfull \hbox (23.7252pt too wide) in paragraph at lines 310--310 [] \OT1/cmtt/m/n/12 --chrom : [optional] chromosome (to be passed to output)[] Overfull \hbox (104.00024pt too wide) in paragraph at lines 311--311 [] \OT1/cmtt/m/n/12 --out : [optional] output file name (default is regression.out.txt)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 312--312 [] \OT1/cmtt/m/n/12 --skipd : [optional] how many columns to skip in the predictor[] Overfull \hbox (54.60022pt too wide) in paragraph at lines 314--314 [] \OT1/cmtt/m/n/12 --ntraits : [optional] how many traits are analysed (default 1)[] Overfull \hbox (36.07521pt too wide) in paragraph at lines 315--315 [] \OT1/cmtt/m/n/12 --ngpreds : [optional] how many predictor columns p er marker[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 316--316 [] \OT1/cmtt/m/n/12 (default 1 = MLDOSE; else use 2 for ML PROB)[] Overfull \hbox (97.82524pt too wide) in paragraph at lines 317--317 [] \OT1/cmtt/m/n/12 --separat : [optional] character to separate fields (default is space)[] Overfull \hbox (60.77522pt too wide) in paragraph at lines 320--320 [] \OT1/cmtt/m/n/12 --allcov : report estimates for all covariates (la rge outputs!)[] Overfull \hbox (295.42534pt too wide) in paragraph at lines 321--321 [] \OT1/cmtt/m/n/12 --interaction: Which covariate to use for interacti on with SNP analysis (default is no interaction, 0)[] Overfull \hbox (388.05038pt too wide) in paragraph at lines 322--322 [] \OT1/cmtt/m/n/12 --interaction_only: like previous but without covar iate acting in interaction with SNP (default is no interaction, 0)[] Overfull \hbox (758.55057pt too wide) in paragraph at lines 323--323 [] \OT1/cmtt/m/n/12 --mmscore : score test in samples of related indivi duals. File with inverse of variance-covariance matrix (for palinear) or invers e correlation (for palogist) as input parameter[] Overfull \hbox (122.52525pt too wide) in paragraph at lines 324--324 [] \OT1/cmtt/m/n/12 --robust : report robust (aka sandwich, aka Hubert -White) standard errors[] [8] Overfull \hbox (18.38104pt too wide) in paragraph at lines 340--347 []\OT1/cmr/m/n/12 These op-tions are \OT1/cmtt/m/n/12 --dose \OT1/cmr/m/n/12 (o r \OT1/cmtt/m/n/12 -d\OT1/cmr/m/n/12 ), spec-i-fy-ing the ge-nomic pre-dic-tor/ MLDOSE Overfull \hbox (11.3752pt too wide) in paragraph at lines 351--351 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (42.25021pt too wide) in paragraph at lines 359--359 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_ data.txt \[] Overfull \hbox (29.9002pt too wide) in paragraph at lines 365--365 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_da ta.txt \[] Overfull \hbox (11.3752pt too wide) in paragraph at lines 377--377 []\OT1/cmtt/m/n/12 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height. txt \[] Overfull \hbox (5.2002pt too wide) in paragraph at lines 378--378 [] \OT1/cmtt/m/n/12 -d test.mlprob -i test.mlin fo \[] [9] Overfull \hbox (42.25021pt too wide) in paragraph at lines 414--414 [] \OT1/cmtt/m/n/12 h2.obj$InvSigma * h2.obj$h2an$estimate[length(h2.obj$h2an$ estimate)][] [10] [11] Overfull \hbox (2.98222pt too wide) in paragraph at lines 512--516 \OT1/cmr/m/n/12 After in-stalling \OT1/cmtt/m/n/12 ProbABEL \OT1/cmr/m/n/12 you can find the \OT1/cmtt/m/n/12 prepare[]data.R \OT1/cmr/m/n/12 file in the \OT1 /cmtt/m/n/12 scripts [12] [13] Overfull \hbox (8.42549pt too wide) in paragraph at lines 614--616 []\OT1/cmr/m/n/12 Secondly, the Wald test can be used; for that the in-verse va riance-covariance [14] [15] [16] [17] [18] (./ProbABEL_manual.ind [19] [20]) (./ProbABEL_manual.aux) ) (see the transcript file for additional information) Output written on ProbABEL_manual.pdf (20 pages, 195221 bytes). Transcript written on ProbABEL_manual.log. rm ProbABEL_manual.aux make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2/doc' Making all in checks make[1]: Entering directory `/usr/local/apps/probabel/0.4.2/checks' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2/checks' Making all in examples make[1]: Entering directory `/usr/local/apps/probabel/0.4.2/examples' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2/examples' Making all in checks/R-tests make[1]: Entering directory `/usr/local/apps/probabel/0.4.2/checks/R-tests' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2/checks/R-tests' make[1]: Entering directory `/usr/local/apps/probabel/0.4.2' make[1]: Nothing to be done for `all-am'. make[1]: Leaving directory `/usr/local/apps/probabel/0.4.2' -------------- next part -------------- A non-text attachment was scrubbed... Name: config.status Type: application/octet-stream Size: 34684 bytes Desc: not available URL: -------------- next part -------------- This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by ProbABEL configure 0.4.2, which was generated by GNU Autoconf 2.69. Invocation command line was $ ./configure --prefix=/usr/local/apps/probabel/0.4.2 --without-eigen ## --------- ## ## Platform. ## ## --------- ## hostname = biowulf.nih.gov uname -m = x86_64 uname -r = 2.6.18-371.3.1.el5 uname -s = Linux uname -v = #1 SMP Mon Nov 11 03:23:58 EST 2013 /usr/bin/uname -p = unknown /bin/uname -X = unknown /bin/arch = x86_64 /usr/bin/arch -k = unknown /usr/convex/getsysinfo = unknown /usr/bin/hostinfo = unknown /bin/machine = unknown /usr/bin/oslevel = unknown /bin/universe = unknown PATH: /usr/local/apps/augustus/2.5.5/scripts PATH: /usr/local/apps/augustus/2.5.5/bin PATH: /usr/local/pbs/bin PATH: /usr/local/bin PATH: /usr/local/mysql/bin PATH: /usr/X11R6/bin PATH: /usr/local/jdk/bin PATH: /usr/lib64/qt-3.3/bin PATH: /usr/kerberos/bin PATH: /usr/local/bin PATH: /bin PATH: /usr/bin PATH: /sbin PATH: /usr/sbin PATH: /usr/local/sbin PATH: /usr/local/etc PATH: /sbin PATH: /usr/sbin PATH: /usr/local/sbin PATH: /usr/local/etc PATH: /home/maoj/bin PATH: /sbin PATH: /usr/sbin PATH: /usr/local/sbin PATH: /usr/local/etc ## ----------- ## ## Core tests. ## ## ----------- ## configure:2341: checking for a BSD-compatible install configure:2409: result: /usr/bin/install -c configure:2420: checking whether build environment is sane configure:2475: result: yes configure:2626: checking for a thread-safe mkdir -p configure:2665: result: /bin/mkdir -p configure:2672: checking for gawk configure:2688: found /bin/gawk configure:2699: result: gawk configure:2710: checking whether make sets $(MAKE) configure:2732: result: yes configure:2761: checking whether make supports nested variables configure:2778: result: yes configure:2873: checking whether make supports nested variables configure:2890: result: yes configure:2910: checking whether to enable maintainer-specific portions of Makefiles configure:2919: result: yes configure:2982: checking for gcc configure:2998: found /usr/bin/gcc configure:3009: result: gcc configure:3238: checking for C compiler version configure:3247: gcc --version >&5 gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. configure:3258: $? = 0 configure:3247: gcc -v >&5 Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --disable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-54) configure:3258: $? = 0 configure:3247: gcc -V >&5 gcc: '-V' option must have argument configure:3258: $? = 1 configure:3247: gcc -qversion >&5 gcc: unrecognized option '-qversion' gcc: no input files configure:3258: $? = 1 configure:3278: checking whether the C compiler works configure:3300: gcc conftest.c >&5 configure:3304: $? = 0 configure:3352: result: yes configure:3355: checking for C compiler default output file name configure:3357: result: a.out configure:3363: checking for suffix of executables configure:3370: gcc -o conftest conftest.c >&5 configure:3374: $? = 0 configure:3396: result: configure:3418: checking whether we are cross compiling configure:3426: gcc -o conftest conftest.c >&5 configure:3430: $? = 0 configure:3437: ./conftest configure:3441: $? = 0 configure:3456: result: no configure:3461: checking for suffix of object files configure:3483: gcc -c conftest.c >&5 configure:3487: $? = 0 configure:3508: result: o configure:3512: checking whether we are using the GNU C compiler configure:3531: gcc -c conftest.c >&5 configure:3531: $? = 0 configure:3540: result: yes configure:3549: checking whether gcc accepts -g configure:3569: gcc -c -g conftest.c >&5 configure:3569: $? = 0 configure:3610: result: yes configure:3627: checking for gcc option to accept ISO C89 configure:3690: gcc -c -g -O2 conftest.c >&5 configure:3690: $? = 0 configure:3703: result: none needed configure:3734: checking for style of include used by make configure:3762: result: GNU configure:3788: checking dependency style of gcc configure:3899: result: gcc3 configure:3915: checking whether gcc and cc understand -c and -o together configure:3946: gcc -c conftest.c -o conftest2.o >&5 configure:3950: $? = 0 configure:3956: gcc -c conftest.c -o conftest2.o >&5 configure:3960: $? = 0 configure:3971: cc -c conftest.c >&5 configure:3975: $? = 0 configure:3983: cc -c conftest.c -o conftest2.o >&5 configure:3987: $? = 0 configure:3993: cc -c conftest.c -o conftest2.o >&5 configure:3997: $? = 0 configure:4015: result: yes configure:4041: checking whether ln -s works configure:4045: result: yes configure:4120: checking for g++ configure:4136: found /usr/bin/g++ configure:4147: result: g++ configure:4174: checking for C++ compiler version configure:4183: g++ --version >&5 g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. configure:4194: $? = 0 configure:4183: g++ -v >&5 Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --disable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-54) configure:4194: $? = 0 configure:4183: g++ -V >&5 g++: '-V' option must have argument configure:4194: $? = 1 configure:4183: g++ -qversion >&5 g++: unrecognized option '-qversion' g++: no input files configure:4194: $? = 1 configure:4198: checking whether we are using the GNU C++ compiler configure:4217: g++ -c -g -O2 -Wall conftest.cpp >&5 configure:4217: $? = 0 configure:4226: result: yes configure:4235: checking whether g++ accepts -g configure:4255: g++ -c -g -Wall conftest.cpp >&5 configure:4255: $? = 0 configure:4296: result: yes configure:4321: checking dependency style of g++ configure:4432: result: gcc3 configure:4471: checking how to run the C++ preprocessor configure:4498: g++ -E -Wall conftest.cpp configure:4498: $? = 0 configure:4512: g++ -E -Wall conftest.cpp conftest.cpp:11:28: error: ac_nonexistent.h: No such file or directory configure:4512: $? = 1 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "ProbABEL" | #define PACKAGE_TARNAME "probabel" | #define PACKAGE_VERSION "0.4.2" | #define PACKAGE_STRING "ProbABEL 0.4.2" | #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" | #define PACKAGE_URL "" | #define PACKAGE "probabel" | #define VERSION "0.4.2" | /* end confdefs.h. */ | #include configure:4537: result: g++ -E configure:4557: g++ -E -Wall conftest.cpp configure:4557: $? = 0 configure:4571: g++ -E -Wall conftest.cpp conftest.cpp:11:28: error: ac_nonexistent.h: No such file or directory configure:4571: $? = 1 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "ProbABEL" | #define PACKAGE_TARNAME "probabel" | #define PACKAGE_VERSION "0.4.2" | #define PACKAGE_STRING "ProbABEL 0.4.2" | #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" | #define PACKAGE_URL "" | #define PACKAGE "probabel" | #define VERSION "0.4.2" | /* end confdefs.h. */ | #include configure:4600: checking for grep that handles long lines and -e configure:4658: result: /bin/grep configure:4663: checking for egrep configure:4725: result: /bin/grep -E configure:4730: checking for ANSI C header files configure:4750: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4750: $? = 0 configure:4823: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4823: $? = 0 configure:4823: ./conftest configure:4823: $? = 0 configure:4834: result: yes configure:4847: checking for sys/types.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for sys/stat.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for stdlib.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for string.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for memory.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for strings.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for inttypes.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for stdint.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4847: checking for unistd.h configure:4847: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4847: $? = 0 configure:4847: result: yes configure:4859: checking for size_t configure:4859: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4859: $? = 0 configure:4859: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 conftest.cpp: In function 'int main()': conftest.cpp:57: error: expected primary-expression before ')' token configure:4859: $? = 1 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "ProbABEL" | #define PACKAGE_TARNAME "probabel" | #define PACKAGE_VERSION "0.4.2" | #define PACKAGE_STRING "ProbABEL 0.4.2" | #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" | #define PACKAGE_URL "" | #define PACKAGE "probabel" | #define VERSION "0.4.2" | #define STDC_HEADERS 1 | #define HAVE_SYS_TYPES_H 1 | #define HAVE_SYS_STAT_H 1 | #define HAVE_STDLIB_H 1 | #define HAVE_STRING_H 1 | #define HAVE_MEMORY_H 1 | #define HAVE_STRINGS_H 1 | #define HAVE_INTTYPES_H 1 | #define HAVE_STDINT_H 1 | #define HAVE_UNISTD_H 1 | /* end confdefs.h. */ | #include | #ifdef HAVE_SYS_TYPES_H | # include | #endif | #ifdef HAVE_SYS_STAT_H | # include | #endif | #ifdef STDC_HEADERS | # include | # include | #else | # ifdef HAVE_STDLIB_H | # include | # endif | #endif | #ifdef HAVE_STRING_H | # if !defined STDC_HEADERS && defined HAVE_MEMORY_H | # include | # endif | # include | #endif | #ifdef HAVE_STRINGS_H | # include | #endif | #ifdef HAVE_INTTYPES_H | # include | #endif | #ifdef HAVE_STDINT_H | # include | #endif | #ifdef HAVE_UNISTD_H | # include | #endif | int | main () | { | if (sizeof ((size_t))) | return 0; | ; | return 0; | } configure:4859: result: yes configure:4872: checking for working alloca.h configure:4889: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4889: $? = 0 configure:4897: result: yes configure:4905: checking for alloca configure:4942: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:4942: $? = 0 configure:4950: result: yes configure:5061: checking float.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking float.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for float.h configure:5061: result: yes configure:5061: checking for inttypes.h configure:5061: result: yes configure:5061: checking libintl.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking libintl.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for libintl.h configure:5061: result: yes configure:5061: checking limits.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking limits.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for limits.h configure:5061: result: yes configure:5061: checking stddef.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking stddef.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for stddef.h configure:5061: result: yes configure:5061: checking for stdint.h configure:5061: result: yes configure:5061: checking for stdlib.h configure:5061: result: yes configure:5061: checking for string.h configure:5061: result: yes configure:5061: checking sys/param.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking sys/param.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for sys/param.h configure:5061: result: yes configure:5061: checking wchar.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking wchar.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for wchar.h configure:5061: result: yes configure:5061: checking wctype.h usability configure:5061: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking wctype.h presence configure:5061: g++ -E -Wall conftest.cpp configure:5061: $? = 0 configure:5061: result: yes configure:5061: checking for wctype.h configure:5061: result: yes configure:5118: not using Eigen for linear algebra configure:5132: checking for stdbool.h that conforms to C99 configure:5199: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5199: $? = 0 configure:5206: result: yes configure:5208: checking for _Bool configure:5208: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 conftest.cpp: In function 'int main()': conftest.cpp:70: error: '_Bool' was not declared in this scope configure:5208: $? = 1 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "ProbABEL" | #define PACKAGE_TARNAME "probabel" | #define PACKAGE_VERSION "0.4.2" | #define PACKAGE_STRING "ProbABEL 0.4.2" | #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" | #define PACKAGE_URL "" | #define PACKAGE "probabel" | #define VERSION "0.4.2" | #define STDC_HEADERS 1 | #define HAVE_SYS_TYPES_H 1 | #define HAVE_SYS_STAT_H 1 | #define HAVE_STDLIB_H 1 | #define HAVE_STRING_H 1 | #define HAVE_MEMORY_H 1 | #define HAVE_STRINGS_H 1 | #define HAVE_INTTYPES_H 1 | #define HAVE_STDINT_H 1 | #define HAVE_UNISTD_H 1 | #define HAVE_ALLOCA_H 1 | #define HAVE_ALLOCA 1 | #define HAVE_FLOAT_H 1 | #define HAVE_INTTYPES_H 1 | #define HAVE_LIBINTL_H 1 | #define HAVE_LIMITS_H 1 | #define HAVE_STDDEF_H 1 | #define HAVE_STDINT_H 1 | #define HAVE_STDLIB_H 1 | #define HAVE_STRING_H 1 | #define HAVE_SYS_PARAM_H 1 | #define HAVE_WCHAR_H 1 | #define HAVE_WCTYPE_H 1 | /* end confdefs.h. */ | #include | #ifdef HAVE_SYS_TYPES_H | # include | #endif | #ifdef HAVE_SYS_STAT_H | # include | #endif | #ifdef STDC_HEADERS | # include | # include | #else | # ifdef HAVE_STDLIB_H | # include | # endif | #endif | #ifdef HAVE_STRING_H | # if !defined STDC_HEADERS && defined HAVE_MEMORY_H | # include | # endif | # include | #endif | #ifdef HAVE_STRINGS_H | # include | #endif | #ifdef HAVE_INTTYPES_H | # include | #endif | #ifdef HAVE_STDINT_H | # include | #endif | #ifdef HAVE_UNISTD_H | # include | #endif | int | main () | { | if (sizeof (_Bool)) | return 0; | ; | return 0; | } configure:5208: result: no configure:5225: checking for inline configure:5241: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5241: $? = 0 configure:5249: result: inline configure:5267: checking for off_t configure:5267: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5267: $? = 0 configure:5267: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 conftest.cpp: In function 'int main()': conftest.cpp:71: error: expected primary-expression before ')' token configure:5267: $? = 1 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "ProbABEL" | #define PACKAGE_TARNAME "probabel" | #define PACKAGE_VERSION "0.4.2" | #define PACKAGE_STRING "ProbABEL 0.4.2" | #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" | #define PACKAGE_URL "" | #define PACKAGE "probabel" | #define VERSION "0.4.2" | #define STDC_HEADERS 1 | #define HAVE_SYS_TYPES_H 1 | #define HAVE_SYS_STAT_H 1 | #define HAVE_STDLIB_H 1 | #define HAVE_STRING_H 1 | #define HAVE_MEMORY_H 1 | #define HAVE_STRINGS_H 1 | #define HAVE_INTTYPES_H 1 | #define HAVE_STDINT_H 1 | #define HAVE_UNISTD_H 1 | #define HAVE_ALLOCA_H 1 | #define HAVE_ALLOCA 1 | #define HAVE_FLOAT_H 1 | #define HAVE_INTTYPES_H 1 | #define HAVE_LIBINTL_H 1 | #define HAVE_LIMITS_H 1 | #define HAVE_STDDEF_H 1 | #define HAVE_STDINT_H 1 | #define HAVE_STDLIB_H 1 | #define HAVE_STRING_H 1 | #define HAVE_SYS_PARAM_H 1 | #define HAVE_WCHAR_H 1 | #define HAVE_WCTYPE_H 1 | #define HAVE_STDBOOL_H 1 | /* end confdefs.h. */ | #include | #ifdef HAVE_SYS_TYPES_H | # include | #endif | #ifdef HAVE_SYS_STAT_H | # include | #endif | #ifdef STDC_HEADERS | # include | # include | #else | # ifdef HAVE_STDLIB_H | # include | # endif | #endif | #ifdef HAVE_STRING_H | # if !defined STDC_HEADERS && defined HAVE_MEMORY_H | # include | # endif | # include | #endif | #ifdef HAVE_STRINGS_H | # include | #endif | #ifdef HAVE_INTTYPES_H | # include | #endif | #ifdef HAVE_STDINT_H | # include | #endif | #ifdef HAVE_UNISTD_H | # include | #endif | int | main () | { | if (sizeof ((off_t))) | return 0; | ; | return 0; | } configure:5267: result: yes configure:5278: checking for size_t configure:5278: result: yes configure:5291: checking for error_at_line configure:5307: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5307: $? = 0 configure:5315: result: yes configure:5331: checking for pow configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5331: checking for putenv configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5331: checking for sqrt configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5331: checking for strdup configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5331: checking for strncasecmp configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5331: checking for floor configure:5331: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5331: $? = 0 configure:5331: result: yes configure:5349: checking for special C compiler options needed for large files configure:5394: result: no configure:5400: checking for _FILE_OFFSET_BITS value needed for large files configure:5425: g++ -c -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5425: $? = 0 configure:5457: result: no configure:5545: checking for _LARGEFILE_SOURCE value needed for large files configure:5564: g++ -o conftest -g -O2 -D_NOT_R_FILEVECTOR -Wall conftest.cpp >&5 configure:5564: $? = 0 configure:5592: result: no configure:5616: checking for a sed that does not truncate output configure:5680: result: /usr/local/bin/sed configure:5691: checking for gawk configure:5718: result: gawk configure:5733: checking for pdflatex configure:5763: result: no configure:5769: WARNING: Unable to create PDF version of the user manual configure:5784: checking for R configure:5800: found /usr/local/bin/R configure:5811: result: R configure:5843: building of palinear is enabled configure:5864: building of palogist is enabled configure:5886: building of pacoxph is enabled configure:6037: checking that generated files are newer than configure configure:6043: result: done configure:6102: creating ./config.status ## ---------------------- ## ## Running config.status. ## ## ---------------------- ## This file was extended by ProbABEL config.status 0.4.2, which was generated by GNU Autoconf 2.69. Invocation command line was CONFIG_FILES = CONFIG_HEADERS = CONFIG_LINKS = CONFIG_COMMANDS = $ ./config.status on biowulf.nih.gov config.status:906: creating Makefile config.status:906: creating src/Makefile config.status:906: creating doc/Makefile config.status:906: creating examples/Makefile config.status:906: creating checks/Makefile config.status:906: creating checks/R-tests/Makefile config.status:906: creating src/config.h config.status:1087: src/config.h is unchanged config.status:1135: executing depfiles commands ## ---------------- ## ## Cache variables. ## ## ---------------- ## ac_cv_c_compiler_gnu=yes ac_cv_c_inline=inline ac_cv_cxx_compiler_gnu=yes ac_cv_env_CCC_set= ac_cv_env_CCC_value= ac_cv_env_CC_set= ac_cv_env_CC_value= ac_cv_env_CFLAGS_set= ac_cv_env_CFLAGS_value= ac_cv_env_CPPFLAGS_set= ac_cv_env_CPPFLAGS_value= ac_cv_env_CXXCPP_set= ac_cv_env_CXXCPP_value= ac_cv_env_CXXFLAGS_set= ac_cv_env_CXXFLAGS_value= ac_cv_env_CXX_set= ac_cv_env_CXX_value= ac_cv_env_LDFLAGS_set= ac_cv_env_LDFLAGS_value= ac_cv_env_LIBS_set= ac_cv_env_LIBS_value= ac_cv_env_build_alias_set= ac_cv_env_build_alias_value= ac_cv_env_host_alias_set= ac_cv_env_host_alias_value= ac_cv_env_target_alias_set= ac_cv_env_target_alias_value= ac_cv_func_alloca_works=yes ac_cv_func_floor=yes ac_cv_func_pow=yes ac_cv_func_putenv=yes ac_cv_func_sqrt=yes ac_cv_func_strdup=yes ac_cv_func_strncasecmp=yes ac_cv_header_float_h=yes ac_cv_header_inttypes_h=yes ac_cv_header_libintl_h=yes ac_cv_header_limits_h=yes ac_cv_header_memory_h=yes ac_cv_header_stdbool_h=yes ac_cv_header_stdc=yes ac_cv_header_stddef_h=yes ac_cv_header_stdint_h=yes ac_cv_header_stdlib_h=yes ac_cv_header_string_h=yes ac_cv_header_strings_h=yes ac_cv_header_sys_param_h=yes ac_cv_header_sys_stat_h=yes ac_cv_header_sys_types_h=yes ac_cv_header_unistd_h=yes ac_cv_header_wchar_h=yes ac_cv_header_wctype_h=yes ac_cv_lib_error_at_line=yes ac_cv_objext=o ac_cv_path_EGREP='/bin/grep -E' ac_cv_path_GREP=/bin/grep ac_cv_path_SED=/usr/local/bin/sed ac_cv_path_install='/usr/bin/install -c' ac_cv_path_mkdir=/bin/mkdir ac_cv_prog_AWK=gawk ac_cv_prog_CXXCPP='g++ -E' ac_cv_prog_R=R ac_cv_prog_ac_ct_CC=gcc ac_cv_prog_ac_ct_CXX=g++ ac_cv_prog_cc_c89= ac_cv_prog_cc_g=yes ac_cv_prog_cc_gcc_c_o=yes ac_cv_prog_cxx_g=yes ac_cv_prog_make_make_set=yes ac_cv_sys_file_offset_bits=no ac_cv_sys_largefile_CC=no ac_cv_sys_largefile_source=no ac_cv_type__Bool=no ac_cv_type_off_t=yes ac_cv_type_size_t=yes ac_cv_working_alloca_h=yes am_cv_CC_dependencies_compiler_type=gcc3 am_cv_CXX_dependencies_compiler_type=gcc3 am_cv_make_support_nested_variables=yes ## ----------------- ## ## Output variables. ## ## ----------------- ## ACLOCAL='${SHELL} /usr/local/apps/probabel/0.4.2/missing aclocal-1.13' ALLOCA='' AMDEPBACKSLASH='\' AMDEP_FALSE='#' AMDEP_TRUE='' AMTAR='$${TAR-tar}' AM_BACKSLASH='\' AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)' AM_DEFAULT_VERBOSITY='0' AM_V='$(V)' AUTOCONF='${SHELL} /usr/local/apps/probabel/0.4.2/missing autoconf' AUTOHEADER='${SHELL} /usr/local/apps/probabel/0.4.2/missing autoheader' AUTOMAKE='${SHELL} /usr/local/apps/probabel/0.4.2/missing automake-1.13' AWK='gawk' BUILD_extractsnp_FALSE='' BUILD_extractsnp_TRUE='#' BUILD_pacoxph_FALSE='#' BUILD_pacoxph_TRUE='' BUILD_palinear_FALSE='#' BUILD_palinear_TRUE='' BUILD_palogist_FALSE='#' BUILD_palogist_TRUE='' CC='gcc' CCDEPMODE='depmode=gcc3' CFLAGS='-g -O2' CPPFLAGS='-Wall' CXX='g++' CXXCPP='g++ -E' CXXDEPMODE='depmode=gcc3' CXXFLAGS='-g -O2 -D_NOT_R_FILEVECTOR' CYGPATH_W='echo' DEFS='-DHAVE_CONFIG_H' DEPDIR='.deps' ECHO_C='' ECHO_N='-n' ECHO_T='' EGREP='/bin/grep -E' EXEEXT='' GREP='/bin/grep' HAVE_PDFLATEX_FALSE='' HAVE_PDFLATEX_TRUE='#' HAVE_R_FALSE='#' HAVE_R_TRUE='' INSTALL_DATA='${INSTALL} -m 644' INSTALL_PROGRAM='${INSTALL}' INSTALL_SCRIPT='${INSTALL}' INSTALL_STRIP_PROGRAM='$(install_sh) -c -s' LDFLAGS='' LIBOBJS='' LIBS='' LN_S='ln -s' LTLIBOBJS='' MAINT='' MAINTAINER_MODE_FALSE='#' MAINTAINER_MODE_TRUE='' MAKEINFO='${SHELL} /usr/local/apps/probabel/0.4.2/missing makeinfo' MKDIR_P='/bin/mkdir -p' OBJEXT='o' PACKAGE='probabel' PACKAGE_BUGREPORT='genabel-devel at r-forge.wu-wien.ac.at' PACKAGE_NAME='ProbABEL' PACKAGE_STRING='ProbABEL 0.4.2' PACKAGE_TARNAME='probabel' PACKAGE_URL='' PACKAGE_VERSION='0.4.2' PATH_SEPARATOR=':' PDFLATEX='' R='R' SED='/usr/local/bin/sed' SET_MAKE='' SHELL='/bin/sh' STRIP='' VERSION='0.4.2' WITH_EIGEN_FALSE='' WITH_EIGEN_TRUE='#' ac_ct_CC='gcc' ac_ct_CXX='g++' am__EXEEXT_FALSE='' am__EXEEXT_TRUE='#' am__fastdepCC_FALSE='#' am__fastdepCC_TRUE='' am__fastdepCXX_FALSE='#' am__fastdepCXX_TRUE='' am__include='include' am__isrc='' am__leading_dot='.' am__nodep='_no' am__quote='' am__tar='$${TAR-tar} chof - "$$tardir"' am__untar='$${TAR-tar} xf -' bindir='${exec_prefix}/bin' build_alias='' datadir='${datarootdir}' datarootdir='${prefix}/share' docdir='${datarootdir}/doc/${PACKAGE_TARNAME}' dvidir='${docdir}' exec_prefix='${prefix}' host_alias='' htmldir='${docdir}' includedir='${prefix}/include' infodir='${datarootdir}/info' install_sh='${SHELL} /usr/local/apps/probabel/0.4.2/install-sh' libdir='${exec_prefix}/lib' libexecdir='${exec_prefix}/libexec' localedir='${datarootdir}/locale' localstatedir='${prefix}/var' mandir='${datarootdir}/man' mkdir_p='$(MKDIR_P)' oldincludedir='/usr/include' pdfdir='${docdir}' prefix='/usr/local/apps/probabel/0.4.2' program_transform_name='s,x,x,' psdir='${docdir}' sbindir='${exec_prefix}/sbin' sharedstatedir='${prefix}/com' sysconfdir='${prefix}/etc' target_alias='' ## ----------- ## ## confdefs.h. ## ## ----------- ## /* confdefs.h */ #define PACKAGE_NAME "ProbABEL" #define PACKAGE_TARNAME "probabel" #define PACKAGE_VERSION "0.4.2" #define PACKAGE_STRING "ProbABEL 0.4.2" #define PACKAGE_BUGREPORT "genabel-devel at r-forge.wu-wien.ac.at" #define PACKAGE_URL "" #define PACKAGE "probabel" #define VERSION "0.4.2" #define STDC_HEADERS 1 #define HAVE_SYS_TYPES_H 1 #define HAVE_SYS_STAT_H 1 #define HAVE_STDLIB_H 1 #define HAVE_STRING_H 1 #define HAVE_MEMORY_H 1 #define HAVE_STRINGS_H 1 #define HAVE_INTTYPES_H 1 #define HAVE_STDINT_H 1 #define HAVE_UNISTD_H 1 #define HAVE_ALLOCA_H 1 #define HAVE_ALLOCA 1 #define HAVE_FLOAT_H 1 #define HAVE_INTTYPES_H 1 #define HAVE_LIBINTL_H 1 #define HAVE_LIMITS_H 1 #define HAVE_STDDEF_H 1 #define HAVE_STDINT_H 1 #define HAVE_STDLIB_H 1 #define HAVE_STRING_H 1 #define HAVE_SYS_PARAM_H 1 #define HAVE_WCHAR_H 1 #define HAVE_WCTYPE_H 1 #define HAVE_STDBOOL_H 1 #define HAVE_POW 1 #define HAVE_PUTENV 1 #define HAVE_SQRT 1 #define HAVE_STRDUP 1 #define HAVE_STRNCASECMP 1 #define HAVE_FLOOR 1 #define HAVE_FSEEKO 1 configure: exit 0 From ldimitro at wakehealth.edu Mon Mar 3 21:38:01 2014 From: ldimitro at wakehealth.edu (Latchezar (Lucho) Dimitrov) Date: Mon, 3 Mar 2014 20:38:01 +0000 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <531447AB.6090405@karssen.org> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> Message-ID: <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> Dear Lennart, Yeah, I know. Solaris is not for the faint of heart ;-) however it is rewarding! Anyway, I have built some of the gnu utils on solaris but diff is not amongst them. GCC is though. I looked more carefully at the diff's (pun intended) between gnu(linux) diff and the one I have and figured it is -I option missing in mine. I changed run_diff my way to skip first line in all comparisons and it worked. Now that I confirmed it is 'make check' issue and not the build itself one I am happy. BTW, I couldn't find a quick nice replacement for your -I option to share so it might be a good idea at least to mention the case and the requirement for diff to support -I for 'make check' to work properly. I'd also suggest changing if diff "$file1" "$file2" $args; then to the canonical if diff $args "$file1" "$file2" ; then which actually quickly showed me where the problems was and might be helpful to ones who do not read the readme files :-)) Thank you very much, Lucho PS. I may decide to make a module gnu in my solaris systems ;-))) > -----Original Message----- > From: L.C. Karssen [mailto:lennart at karssen.org] > Sent: Monday, March 03, 2014 4:13 AM > To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' > Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > > Dear Lucho, > > Thanks for your interest in ProbABEL. I think you are one of the > (very?) few users using ProbABEL on Solaris, so we are very interested > in your feedback. > > Could you send us the config.log file created when running ./configure? > That may give us some more hints on how your system is configured. > > My first hunch is that the diff utility in Solaris has some different > options from the GNU version. When comparing the outputs from dosage > inputs with probability input files the checks use the -I option to > ignore the header line. Does your version of diff have that option? > > My knowledge of Solaris is a bit rusty, but I seem to remember that > some of the GNU tools are available (or at least in principle > installable) on Solaris. I think they are then prefixed with a g. Do > you have gdiff on your system (maybe in /usr/bin/ or /usr/sfw/bin/)? > > > Best regards, > > Lennart Karssen. > > On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: > > Dear genABEL developers, > > > > I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on ORACLE > > Solaris 10 x86 but when I ran > > > > make check > > > > I got the subj. results. I went and manually ran one of the failing > check: > > > > run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ > > "pacoxph check: dose vs. prob" -I SNP > > diff: two filename arguments required > > pacoxph check: dose vs. prob > FAILED > > > > Then I manually compared the two fails: > > > > diff coxph_dose_add.out.txt coxph_prob_add.out.txt > > 1c1 > > < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > > position beta_SNP_add sebeta_SNP_add chi2_SNP > > --- > >> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > >> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > > > > Finally, I compared the two files w/o their first lines and they are > the same. > > > > > > Any help highly appreciated. Please find the log file attached > > > > > > Thank you very much, > > Latchezar (Lucho) "Speaking w/ computers" Dimitrov > > > > Analyst/Programmer IV, > > Center for Genomics and Personalized Medicine Research > > Wake Forest University School of Medicine fax: (336)713-7566 > > Medical Center Blvd. work: (336)713-7137 > > Winston-Salem, NC 27157 > > > > -- A computer lets you make more mistakes faster than any invention > in human history -- > > with the possible exceptions of handguns and tequila. > > --Mitch Ratliffe, > "Technology Review" > > > > > > > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel- > d > > evel > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From lennart at karssen.org Mon Mar 3 22:16:16 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 03 Mar 2014 22:16:16 +0100 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <5314F120.2030805@karssen.org> Dear Lucho, On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: > Dear Lennart, > > Yeah, I know. Solaris is not for the faint of heart ;-) however it > is rewarding! Anyway, I have built some of the gnu utils on solaris but > diff is not amongst them. GCC is though. I looked more carefully at the > diff's (pun intended) between gnu(linux) diff and the one I have and > figured it is -I option missing in mine. I changed run_diff my way to > skip first line in all comparisons and it worked. Now that I confirmed > it is 'make check' issue and not the build itself one I am happy. Glad to hear it all worked out! From your remark below it seems that you modified run_diff a bit in order to ignor ethe first line. If so, would you consider sending a patch? Ths would bring Solaris support (and portability in general) one step closer. > > BTW, I couldn't find a quick nice replacement for your -I option to > share so it might be a good idea at least to mention the case and the > requirement for diff to support -I for 'make check' to work properly. I'll see if I can find a way to let autoconf figure out if a certain option is accepted by a command. It seems that the AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful for other cases as well. > I'd also suggest changing > > if diff "$file1" "$file2" $args; then > > to the canonical > > if diff $args "$file1" "$file2" ; then > > which actually quickly showed me where the problems was and might be > helpful to ones who do not read the readme files :-)) > Done, thanks for the suggestion. It's in SVN r1603. Thanks, Lennart. > > Thank you very much, > Lucho > > PS. I may decide to make a module gnu in my solaris systems ;-))) > > >> -----Original Message----- >> From: L.C. Karssen [mailto:lennart at karssen.org] >> Sent: Monday, March 03, 2014 4:13 AM >> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' >> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >> >> Dear Lucho, >> >> Thanks for your interest in ProbABEL. I think you are one of the >> (very?) few users using ProbABEL on Solaris, so we are very interested >> in your feedback. >> >> Could you send us the config.log file created when running ./configure? >> That may give us some more hints on how your system is configured. >> >> My first hunch is that the diff utility in Solaris has some different >> options from the GNU version. When comparing the outputs from dosage >> inputs with probability input files the checks use the -I option to >> ignore the header line. Does your version of diff have that option? >> >> My knowledge of Solaris is a bit rusty, but I seem to remember that >> some of the GNU tools are available (or at least in principle >> installable) on Solaris. I think they are then prefixed with a g. Do >> you have gdiff on your system (maybe in /usr/bin/ or /usr/sfw/bin/)? >> >> >> Best regards, >> >> Lennart Karssen. >> >> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: >>> Dear genABEL developers, >>> >>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on ORACLE >>> Solaris 10 x86 but when I ran >>> >>> make check >>> >>> I got the subj. results. I went and manually ran one of the failing >> check: >>> >>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ >>> "pacoxph check: dose vs. prob" -I SNP >>> diff: two filename arguments required >>> pacoxph check: dose vs. prob >> FAILED >>> >>> Then I manually compared the two fails: >>> >>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt >>> 1c1 >>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>> position beta_SNP_add sebeta_SNP_add chi2_SNP >>> --- >>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 >>> >>> Finally, I compared the two files w/o their first lines and they are >> the same. >>> >>> >>> Any help highly appreciated. Please find the log file attached >>> >>> >>> Thank you very much, >>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov >>> >>> Analyst/Programmer IV, >>> Center for Genomics and Personalized Medicine Research >>> Wake Forest University School of Medicine fax: (336)713-7566 >>> Medical Center Blvd. work: (336)713-7137 >>> Winston-Salem, NC 27157 >>> >>> -- A computer lets you make more mistakes faster than any invention >> in human history -- >>> with the possible exceptions of handguns and tequila. >>> --Mitch Ratliffe, >> "Technology Review" >>> >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel- >> d >>> evel >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From ldimitro at wakehealth.edu Mon Mar 3 23:38:53 2014 From: ldimitro at wakehealth.edu (Latchezar (Lucho) Dimitrov) Date: Mon, 3 Mar 2014 22:38:53 +0000 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <5314F120.2030805@karssen.org> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> Message-ID: <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> Dear Lennart, As I said (or thought I had) it was ugly (quick & dirty) but worked as a proof. I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of each of the two files with first line removed and then 'diff'. The ugliest part is it is the same for all diff's, i.e., it does not take -I into account at all just blindly compares the two files w/o the first line. But it worked for me as a proof my build is correct. Sorry I do not have a nice solution. If I come across something I'll let you know. Thanks, Lucho > -----Original Message----- > From: L.C. Karssen [mailto:lennart at karssen.org] > Sent: Monday, March 03, 2014 4:16 PM > To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' > Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > > Dear Lucho, > > > On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: > > Dear Lennart, > > > > Yeah, I know. Solaris is not for the faint of heart ;-) however it is > > rewarding! Anyway, I have built some of the gnu utils on solaris but > > diff is not amongst them. GCC is though. I looked more carefully at > > the diff's (pun intended) between gnu(linux) diff and the one I have > > and figured it is -I option missing in mine. I changed run_diff my > way > > to skip first line in all comparisons and it worked. Now that I > > confirmed it is 'make check' issue and not the build itself one I am > happy. > > Glad to hear it all worked out! From your remark below it seems that > you modified run_diff a bit in order to ignor ethe first line. If so, > would you consider sending a patch? Ths would bring Solaris support > (and portability in general) one step closer. > > > > > BTW, I couldn't find a quick nice replacement for your -I option to > > share so it might be a good idea at least to mention the case and the > > requirement for diff to support -I for 'make check' to work properly. > > I'll see if I can find a way to let autoconf figure out if a certain > option is accepted by a command. It seems that the > AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful for > other cases as well. > > > I'd also suggest changing > > > > if diff "$file1" "$file2" $args; then > > > > to the canonical > > > > if diff $args "$file1" "$file2" ; then > > > > which actually quickly showed me where the problems was and might be > > helpful to ones who do not read the readme files :-)) > > > > Done, thanks for the suggestion. It's in SVN r1603. > > > Thanks, > > Lennart. > > > > > Thank you very much, > > Lucho > > > > PS. I may decide to make a module gnu in my solaris systems ;-))) > > > > > >> -----Original Message----- > >> From: L.C. Karssen [mailto:lennart at karssen.org] > >> Sent: Monday, March 03, 2014 4:13 AM > >> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- > wien.ac.at' > >> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > >> > >> Dear Lucho, > >> > >> Thanks for your interest in ProbABEL. I think you are one of the > >> (very?) few users using ProbABEL on Solaris, so we are very > >> interested in your feedback. > >> > >> Could you send us the config.log file created when running > ./configure? > >> That may give us some more hints on how your system is configured. > >> > >> My first hunch is that the diff utility in Solaris has some > different > >> options from the GNU version. When comparing the outputs from dosage > >> inputs with probability input files the checks use the -I option to > >> ignore the header line. Does your version of diff have that option? > >> > >> My knowledge of Solaris is a bit rusty, but I seem to remember that > >> some of the GNU tools are available (or at least in principle > >> installable) on Solaris. I think they are then prefixed with a g. Do > >> you have gdiff on your system (maybe in /usr/bin/ or /usr/sfw/bin/)? > >> > >> > >> Best regards, > >> > >> Lennart Karssen. > >> > >> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: > >>> Dear genABEL developers, > >>> > >>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on > ORACLE > >>> Solaris 10 x86 but when I ran > >>> > >>> make check > >>> > >>> I got the subj. results. I went and manually ran one of the failing > >> check: > >>> > >>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ > >>> "pacoxph check: dose vs. prob" -I SNP > >>> diff: two filename arguments required pacoxph check: dose vs. prob > >> FAILED > >>> > >>> Then I manually compared the two fails: > >>> > >>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt > >>> 1c1 > >>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > >>> position beta_SNP_add sebeta_SNP_add chi2_SNP > >>> --- > >>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > >>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > >>> > >>> Finally, I compared the two files w/o their first lines and they > are > >> the same. > >>> > >>> > >>> Any help highly appreciated. Please find the log file attached > >>> > >>> > >>> Thank you very much, > >>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov > >>> > >>> Analyst/Programmer IV, > >>> Center for Genomics and Personalized Medicine Research > >>> Wake Forest University School of Medicine fax: (336)713-7566 > >>> Medical Center Blvd. work: (336)713-7137 > >>> Winston-Salem, NC 27157 > >>> > >>> -- A computer lets you make more mistakes faster than any invention > >> in human history -- > >>> with the possible exceptions of handguns and tequila. > >>> --Mitch Ratliffe, > >> "Technology Review" > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> genabel-devel mailing list > >>> genabel-devel at lists.r-forge.r-project.org > >>> https://lists.r-forge.r-project.org/cgi- > bin/mailman/listinfo/genabel- > >> d > >>> evel > >>> > >> > >> -- > >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > >> L.C. Karssen > >> Utrecht > >> The Netherlands > >> > >> lennart at karssen.org > >> http://blog.karssen.org > >> GPG key ID: A88F554A > >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From ldimitro at wakehealth.edu Tue Mar 4 01:14:43 2014 From: ldimitro at wakehealth.edu (Latchezar (Lucho) Dimitrov) Date: Tue, 4 Mar 2014 00:14:43 +0000 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <3CB7E549BE1AA14BB108ABCB4637B02519A1C2DB@exchdb6.medctr.ad.wfubmc.edu> Dear Lennart, I had some time to spend and looked at possible solutions. Unfortunately the more I look the more I realize: 1. why there is no -I in solaris diff ;-)) It's ugly kludge by itself 2. I am afraid the idea of having run_diff creates more problems than it solves. Sorry. If I am to implement it now I'd just: 1. have only the parts that should be identical in verified_results/ 2. use appropriate diff directly in test_*.sh files - it's just replacing a proc. call with the actual one-liner like replacing, for example: run_diff linear_base_add.out.txt \ linear_ngp2_add.out.txt \ "QT check: dose vs. prob (additive model)" -I SNP if tail -n +2 linear_ngp2_add.out.txt |diff linear_base_add.out.txt_1st_line_removed - ; then echo -e "${name}${blanks:${#name}} OK" else echo -e "${name}${blanks:${#name}} FAILED" # exit 1 # replace this as appropriate fi This way it will be way more flexible and more importantly system independent - no autoconf and all that stuff. There are many variations of the approach but the idea should be clear. More important question to me to resolve/answer, though, is "Do really otherwise the same columns have to have different names in different files?". I'd rather put the differences in the file names and keep the columns the same. But ... this may be just me ;-)) Best regards, Lucho > -----Original Message----- > From: Latchezar (Lucho) Dimitrov > Sent: Monday, March 03, 2014 5:39 PM > To: 'L.C. Karssen'; 'genabel-devel at r-forge.wu-wien.ac.at' > Subject: RE: [GenABEL-dev] make check 2 PASS 4 FAIL > > Dear Lennart, > > As I said (or thought I had) it was ugly (quick & dirty) but worked as > a proof. I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of each > of the two files with first line removed and then 'diff'. The ugliest > part is it is the same for all diff's, i.e., it does not take -I into > account at all just blindly compares the two files w/o the first line. > But it worked for me as a proof my build is correct. > > Sorry I do not have a nice solution. If I come across something I'll > let you know. > > Thanks, > Lucho > > > -----Original Message----- > > From: L.C. Karssen [mailto:lennart at karssen.org] > > Sent: Monday, March 03, 2014 4:16 PM > > To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' > > Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > > > > Dear Lucho, > > > > > > On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: > > > Dear Lennart, > > > > > > Yeah, I know. Solaris is not for the faint of heart ;-) however it > is > > > rewarding! Anyway, I have built some of the gnu utils on solaris > but > > > diff is not amongst them. GCC is though. I looked more carefully at > > > the diff's (pun intended) between gnu(linux) diff and the one I > have > > > and figured it is -I option missing in mine. I changed run_diff my > > way > > > to skip first line in all comparisons and it worked. Now that I > > > confirmed it is 'make check' issue and not the build itself one I > am > > happy. > > > > Glad to hear it all worked out! From your remark below it seems that > > you modified run_diff a bit in order to ignor ethe first line. If so, > > would you consider sending a patch? Ths would bring Solaris support > > (and portability in general) one step closer. > > > > > > > > BTW, I couldn't find a quick nice replacement for your -I option to > > > share so it might be a good idea at least to mention the case and > the > > > requirement for diff to support -I for 'make check' to work > properly. > > > > I'll see if I can find a way to let autoconf figure out if a certain > > option is accepted by a command. It seems that the > > AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful > for > > other cases as well. > > > > > I'd also suggest changing > > > > > > if diff "$file1" "$file2" $args; then > > > > > > to the canonical > > > > > > if diff $args "$file1" "$file2" ; then > > > > > > which actually quickly showed me where the problems was and might > be > > > helpful to ones who do not read the readme files :-)) > > > > > > > Done, thanks for the suggestion. It's in SVN r1603. > > > > > > Thanks, > > > > Lennart. > > > > > > > > Thank you very much, > > > Lucho > > > > > > PS. I may decide to make a module gnu in my solaris systems ;-))) > > > > > > > > >> -----Original Message----- > > >> From: L.C. Karssen [mailto:lennart at karssen.org] > > >> Sent: Monday, March 03, 2014 4:13 AM > > >> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- > > wien.ac.at' > > >> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > > >> > > >> Dear Lucho, > > >> > > >> Thanks for your interest in ProbABEL. I think you are one of the > > >> (very?) few users using ProbABEL on Solaris, so we are very > > >> interested in your feedback. > > >> > > >> Could you send us the config.log file created when running > > ./configure? > > >> That may give us some more hints on how your system is configured. > > >> > > >> My first hunch is that the diff utility in Solaris has some > > different > > >> options from the GNU version. When comparing the outputs from > dosage > > >> inputs with probability input files the checks use the -I option > to > > >> ignore the header line. Does your version of diff have that > option? > > >> > > >> My knowledge of Solaris is a bit rusty, but I seem to remember > that > > >> some of the GNU tools are available (or at least in principle > > >> installable) on Solaris. I think they are then prefixed with a g. > Do > > >> you have gdiff on your system (maybe in /usr/bin/ or > /usr/sfw/bin/)? > > >> > > >> > > >> Best regards, > > >> > > >> Lennart Karssen. > > >> > > >> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: > > >>> Dear genABEL developers, > > >>> > > >>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on > > ORACLE > > >>> Solaris 10 x86 but when I ran > > >>> > > >>> make check > > >>> > > >>> I got the subj. results. I went and manually ran one of the > failing > > >> check: > > >>> > > >>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ > > >>> "pacoxph check: dose vs. prob" -I SNP > > >>> diff: two filename arguments required pacoxph check: dose vs. > prob > > >> FAILED > > >>> > > >>> Then I manually compared the two fails: > > >>> > > >>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt > > >>> 1c1 > > >>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > > >>> position beta_SNP_add sebeta_SNP_add chi2_SNP > > >>> --- > > >>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > > >>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > > >>> > > >>> Finally, I compared the two files w/o their first lines and they > > are > > >> the same. > > >>> > > >>> > > >>> Any help highly appreciated. Please find the log file attached > > >>> > > >>> > > >>> Thank you very much, > > >>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov > > >>> > > >>> Analyst/Programmer IV, > > >>> Center for Genomics and Personalized Medicine Research > > >>> Wake Forest University School of Medicine fax: (336)713- > 7566 > > >>> Medical Center Blvd. work: (336)713- > 7137 > > >>> Winston-Salem, NC 27157 > > >>> > > >>> -- A computer lets you make more mistakes faster than any > invention > > >> in human history -- > > >>> with the possible exceptions of handguns and tequila. > > >>> --Mitch Ratliffe, > > >> "Technology Review" > > >>> > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> genabel-devel mailing list > > >>> genabel-devel at lists.r-forge.r-project.org > > >>> https://lists.r-forge.r-project.org/cgi- > > bin/mailman/listinfo/genabel- > > >> d > > >>> evel > > >>> > > >> > > >> -- > > >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > >> L.C. Karssen > > >> Utrecht > > >> The Netherlands > > >> > > >> lennart at karssen.org > > >> http://blog.karssen.org > > >> GPG key ID: A88F554A > > >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > > > > > > > > -- > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > L.C. Karssen > > Utrecht > > The Netherlands > > > > lennart at karssen.org > > http://blog.karssen.org > > GPG key ID: A88F554A > > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From lennart at karssen.org Tue Mar 4 12:34:31 2014 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 04 Mar 2014 12:34:31 +0100 Subject: [GenABEL-dev] Download statistics for the GenABEL PPA Message-ID: <5315BA47.6090307@karssen.org> Dear list, I'd thought I'd make your day by posting some download statistics of the ProbABEL packages in the Ubuntu GenABEL PPA [1]. The list is ordered by Ubuntu code name. This is how to 'decode' the code names to year.month: precise: 12.04 (this is a Long Term Support release) quantal: 12.10 raring: 13.04 saucy: 13.10 (the current release) The columns are: package name, package version, nr of downloads ====== Stats for Ubuntu precise ====== == Current package versions: == probabel 0.4.2-2ubuntu1~precise1 16 probabel-examples 0.4.2-2ubuntu1~precise1 0 == Superseded versions: == probabel 0.4.1-0ubuntu1~precise1 25 probabel 0.4.0-0ubuntu1~precise1 5 probabel 0.3.0-0ubuntu1~precise1 27 probabel 0.2.2-0ubuntu2~precise1 8 probabel 0.2.0.99-0ubuntu1~precise1 2 probabel 0.2.0-1 2 probabel 0.1.99-1 1 ====== Stats for Ubuntu quantal ====== == Current package versions: == probabel 0.4.1-0ubuntu1~quantal1 4 == Superseded versions: == probabel 0.4.0-0ubuntu1~quantal1 0 probabel 0.3.0-0ubuntu1~quantal2 5 ====== Stats for Ubuntu raring ====== == Current package versions: == probabel 0.4.1-0ubuntu1~raring1 4 == Superseded versions: == probabel 0.4.0-0ubuntu1~raring1 1 ====== Stats for Ubuntu saucy ====== == Current package versions: == probabel 0.4.2-2ubuntu1~saucy1 3 probabel-examples 0.4.2-2ubuntu1~saucy1 1 == Superseded versions: == Some observations: - It seems (as is to be expected for a server) that most people use Ubuntu 12.04 LTS. - Since ProbABEL 0.4.2 I split the examples into a separate package, and the main package does not depend on it. - Version 0.4.0 was only briefly online before being superseded by 0.4.1. I guess that at least three of the 5 downloads came from the servers I administered at Erasmus MC. - There are 10 downloads for Ubuntu releases before 12.04, but I guess these are mostly mine, trying the PPA out. Anyway, I hope no one is using these older versions any more. - In fact, at least one of each downloaded package is probably mine. I always test on the LTS release and the current release. - Note that you can't simply add up these numbers to get the total number of installed sites because once the PPA is added to the system, it will automatically update the package whenever the sysadmin will install general updates. Debian uses the so-called popularity-contest package to measure how many people have a package installed (assuming the have the popcon package installed). Results for that are still disappointing [2], but is probably because our packages are not in a Debian stable release yet. Best, Lennart. [1] https://launchpad.net/~l.c.karssen/+archive/genabel-ppa [2] http://qa.debian.org/popcon.php?package=probabel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Mar 5 15:00:44 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 15:00:44 +0100 Subject: [GenABEL-dev] Suggested change of default weight in GenABEL/ibs() In-Reply-To: References: <768CB804-C3EF-4E6A-B524-937A1B54331C@gmail.com> <-317756408431031488@unknownmsgid> <531471A9.7060109@karssen.org> <4828030313405700840@unknownmsgid> Message-ID: <53172E0C.40609@karssen.org> Hi Xia, Glad to hear you want to take on the task of writing a patch. On 05-03-14 13:53, Xia Shen wrote: > How shall I provide a patch? Do I simply revise the ibs() code and submit to you the ibs.R file? That's the basic idea. Technical instructions can be found at http://genabel.r-forge.r-project.org/tutHowToPatch.html. In steps: - Check out the current code from SVN (our version control system) - Change the code until you think the bug is fixed - Test the changed function - Send the changes to the original file(s) as a patch to the mailing list. - We will review the code, and if approved, will commit it to SVN. When writing code, please follow our coding style guidelines at http://genabel.r-forge.r-project.org/codingstyle.html If you need any help, let us know. Lennart. > > Xia > > On 03 Mar 2014, at 13:16, Yurii Aulchenko wrote: > >> I am voting for option 2; as for the version number - no strong opinion. >> >> Xia, would you be willing to summarize this discussion as a feature >> request, or to provide a patch? >> >> Best wishes, >> Yurii >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >>> On Mar 3, 2014, at 4:12 PM, "L.C. Karssen" wrote: >>> >>> Dear all, >>> >>>> On 28-02-14 14:15, Yurii Aulchenko wrote: >>>> In principle I agree this is good idea, rarely "no" (default) option is >>>> used; but I am always worried to change defaults as this may destroy >>>> people pipelines. >>> >>> How about issuing a warning? I see two options: >>> 1) Keep the current default and issue a warning when the user doesn't >>> specify a weight. >>> 2) Change the default and when ibs() is run and the weight argument is >>> not equal to 'freq', issue a warning saying that the default has changed >>> to freq. >>> >>> Although I haven't used ibs() a lot, I think option 2 is best as it sets >>> a sane default. >>> >>> Another thing to think about: when changing defaults, what do we do with >>> the version number. If we only change the minor number (the 8 in 1.8-0), >>> people will think not much has changed and this change in default >>> behaviour may go unnoticed. >>> >>> >>> Lennart. >>> >>>> >>>> We need third opinion :) >>>> >>>> Best, >>>> Y >>>> >>>> ---------------------- >>>> Yurii Aulchenko >>>> (sent from mobile device) >>>> >>>> On Feb 28, 2014, at 3:45 PM, Xia Shen >>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I suggest to change the default weight argument in the GenABEL/ibs() >>>>> function to be set to "freq" instead of "no". I have colleagues >>>>> constantly forget to set to "freq" and produce, what I would call, >>>>> "wrong" kinship matrix. >>>>> >>>>> *Xia Shen* >>>>> PhD >>>>> >>>>> Division of Computational Genetics >>>>> Department of Clinical Science >>>>> *Swedish University of Agricultural Sciences* >>>>> Uppsala, Sweden >>>>> >>>>> www.shen.se >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> >>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Mar 5 15:03:02 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 5 Mar 2014 18:03:02 +0400 Subject: [GenABEL-dev] Suggested change of default weight in GenABEL/ibs() In-Reply-To: References: <768CB804-C3EF-4E6A-B524-937A1B54331C@gmail.com> <-317756408431031488@unknownmsgid> <531471A9.7060109@karssen.org> <4828030313405700840@unknownmsgid> Message-ID: <-6112768375721639515@unknownmsgid> Please have a look at http://genabel.r-forge.r-project.org ---------------------- Yurii Aulchenko (sent from mobile device) > On Mar 5, 2014, at 4:53 PM, Xia Shen wrote: > > How shall I provide a patch? Do I simply revise the ibs() code and submit to you the ibs.R file? > > Xia > >> On 03 Mar 2014, at 13:16, Yurii Aulchenko wrote: >> >> I am voting for option 2; as for the version number - no strong opinion. >> >> Xia, would you be willing to summarize this discussion as a feature >> request, or to provide a patch? >> >> Best wishes, >> Yurii >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >>> On Mar 3, 2014, at 4:12 PM, "L.C. Karssen" wrote: >>> >>> Dear all, >>> >>>> On 28-02-14 14:15, Yurii Aulchenko wrote: >>>> In principle I agree this is good idea, rarely "no" (default) option is >>>> used; but I am always worried to change defaults as this may destroy >>>> people pipelines. >>> >>> How about issuing a warning? I see two options: >>> 1) Keep the current default and issue a warning when the user doesn't >>> specify a weight. >>> 2) Change the default and when ibs() is run and the weight argument is >>> not equal to 'freq', issue a warning saying that the default has changed >>> to freq. >>> >>> Although I haven't used ibs() a lot, I think option 2 is best as it sets >>> a sane default. >>> >>> Another thing to think about: when changing defaults, what do we do with >>> the version number. If we only change the minor number (the 8 in 1.8-0), >>> people will think not much has changed and this change in default >>> behaviour may go unnoticed. >>> >>> >>> Lennart. >>> >>>> >>>> We need third opinion :) >>>> >>>> Best, >>>> Y >>>> >>>> ---------------------- >>>> Yurii Aulchenko >>>> (sent from mobile device) >>>> >>>> On Feb 28, 2014, at 3:45 PM, Xia Shen >>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I suggest to change the default weight argument in the GenABEL/ibs() >>>>> function to be set to "freq" instead of "no". I have colleagues >>>>> constantly forget to set to "freq" and produce, what I would call, >>>>> "wrong" kinship matrix. >>>>> >>>>> *Xia Shen* >>>>> PhD >>>>> >>>>> Division of Computational Genetics >>>>> Department of Clinical Science >>>>> *Swedish University of Agricultural Sciences* >>>>> Uppsala, Sweden >>>>> >>>>> www.shen.se >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > From lennart at karssen.org Wed Mar 5 15:05:44 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 15:05:44 +0100 Subject: [GenABEL-dev] Error wil testing ProbABEL 0.4.2 In-Reply-To: <5314BD24.30807@mail.nih.gov> References: <5310DAA3.4020405@mail.nih.gov> <531442B8.5090105@karssen.org> <5314BD24.30807@mail.nih.gov> Message-ID: <53172F38.3060605@karssen.org> Hi Jean, On 03-03-14 18:34, Jean Mao wrote: > Hi Lennart, > > Thank you very much for your reply. Please see attached log files. I > think 3.0.2 is the version of R I used. Thanks for the information. Can you verify whether you have the R package 'survival' installed or not? In case you're not familiar with R, here are the steps: - Start R from the Linux command line by typing: R - At the R prompt, type: library("survival") If these steps produce no warnings or errors, you have the package installed. - Exit R by typing: q() Thanks, Lennart. > > Jean > > On 3/3/2014 3:52 AM, L.C. Karssen wrote: >> Dear Jean, >> >> Thank you for you interest in ProbABEL. From the output you sent us, it >> seems that one of the checks fails in which the output of various >> regressions in R are compared to ProbABEL output. (The XFAIL part is an >> expected failure, so only the check for pacoxph fails). >> >> In order to further diagnose the problem, we need some more information. >> Which Linux distribution are you using? Which version of R is installed >> on your machine? >> >> Could you sent us the config.log file that was created when you ran >> ./configure? >> >> >> One last thing that comes to mind: ./configure tests for the existence >> of R, but it doesn't check whether you have the 'survival' R package >> installed. I guess that is the problem here. >> >> >> Best, >> >> Lennart. >> >> On 28-02-14 19:51, Jean Mao wrote: >>> Hi, >>> >>> Thank you very much for providing this software to scientific community. >>> We have installed it in our cluster for NIH scientists. >>> >>> Recently, while trying to update to 0.4.2 version, I ran into error >>> message when testing the installation. Attached is the log file. Any >>> help will be appreciated. Thank you. >>> >>> Jean Mao >>> Helix Staff >>> CIT, NIH >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Mar 5 15:16:55 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 15:16:55 +0100 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <531731D7.5010405@karssen.org> Dear Lucho, On 03-03-14 23:38, Latchezar (Lucho) Dimitrov wrote: > Dear Lennart, > > As I said (or thought I had) it was ugly (quick & dirty) but worked > as a proof. You did mention that. I was hoping it wasn't as ugly as you said :-). > I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of each of > the two files with first line removed and then 'diff'. The ugliest part > is it is the same for all diff's, i.e., it does not take -I into account > at all just blindly compares the two files w/o the first line. But it > worked for me as a proof my build is correct. Not too bad a solution. Excluding the headers will not be too much of a problem, although the current implementation has the advantage that it excludes the headers only when they are expected to be different. > > Sorry I do not have a nice solution. If I come across something I'll > let you know. Thanks a lot, Lennart. > > Thanks, > Lucho > >> -----Original Message----- >> From: L.C. Karssen [mailto:lennart at karssen.org] >> Sent: Monday, March 03, 2014 4:16 PM >> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' >> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >> >> Dear Lucho, >> >> >> On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: >>> Dear Lennart, >>> >>> Yeah, I know. Solaris is not for the faint of heart ;-) however it is >>> rewarding! Anyway, I have built some of the gnu utils on solaris but >>> diff is not amongst them. GCC is though. I looked more carefully at >>> the diff's (pun intended) between gnu(linux) diff and the one I have >>> and figured it is -I option missing in mine. I changed run_diff my >> way >>> to skip first line in all comparisons and it worked. Now that I >>> confirmed it is 'make check' issue and not the build itself one I am >> happy. >> >> Glad to hear it all worked out! From your remark below it seems that >> you modified run_diff a bit in order to ignor ethe first line. If so, >> would you consider sending a patch? Ths would bring Solaris support >> (and portability in general) one step closer. >> >>> >>> BTW, I couldn't find a quick nice replacement for your -I option to >>> share so it might be a good idea at least to mention the case and the >>> requirement for diff to support -I for 'make check' to work properly. >> >> I'll see if I can find a way to let autoconf figure out if a certain >> option is accepted by a command. It seems that the >> AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful for >> other cases as well. >> >>> I'd also suggest changing >>> >>> if diff "$file1" "$file2" $args; then >>> >>> to the canonical >>> >>> if diff $args "$file1" "$file2" ; then >>> >>> which actually quickly showed me where the problems was and might be >>> helpful to ones who do not read the readme files :-)) >>> >> >> Done, thanks for the suggestion. It's in SVN r1603. >> >> >> Thanks, >> >> Lennart. >> >>> >>> Thank you very much, >>> Lucho >>> >>> PS. I may decide to make a module gnu in my solaris systems ;-))) >>> >>> >>>> -----Original Message----- >>>> From: L.C. Karssen [mailto:lennart at karssen.org] >>>> Sent: Monday, March 03, 2014 4:13 AM >>>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- >> wien.ac.at' >>>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >>>> >>>> Dear Lucho, >>>> >>>> Thanks for your interest in ProbABEL. I think you are one of the >>>> (very?) few users using ProbABEL on Solaris, so we are very >>>> interested in your feedback. >>>> >>>> Could you send us the config.log file created when running >> ./configure? >>>> That may give us some more hints on how your system is configured. >>>> >>>> My first hunch is that the diff utility in Solaris has some >> different >>>> options from the GNU version. When comparing the outputs from dosage >>>> inputs with probability input files the checks use the -I option to >>>> ignore the header line. Does your version of diff have that option? >>>> >>>> My knowledge of Solaris is a bit rusty, but I seem to remember that >>>> some of the GNU tools are available (or at least in principle >>>> installable) on Solaris. I think they are then prefixed with a g. Do >>>> you have gdiff on your system (maybe in /usr/bin/ or /usr/sfw/bin/)? >>>> >>>> >>>> Best regards, >>>> >>>> Lennart Karssen. >>>> >>>> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: >>>>> Dear genABEL developers, >>>>> >>>>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on >> ORACLE >>>>> Solaris 10 x86 but when I ran >>>>> >>>>> make check >>>>> >>>>> I got the subj. results. I went and manually ran one of the failing >>>> check: >>>>> >>>>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ >>>>> "pacoxph check: dose vs. prob" -I SNP >>>>> diff: two filename arguments required pacoxph check: dose vs. prob >>>> FAILED >>>>> >>>>> Then I manually compared the two fails: >>>>> >>>>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt >>>>> 1c1 >>>>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>> position beta_SNP_add sebeta_SNP_add chi2_SNP >>>>> --- >>>>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 >>>>> >>>>> Finally, I compared the two files w/o their first lines and they >> are >>>> the same. >>>>> >>>>> >>>>> Any help highly appreciated. Please find the log file attached >>>>> >>>>> >>>>> Thank you very much, >>>>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov >>>>> >>>>> Analyst/Programmer IV, >>>>> Center for Genomics and Personalized Medicine Research >>>>> Wake Forest University School of Medicine fax: (336)713-7566 >>>>> Medical Center Blvd. work: (336)713-7137 >>>>> Winston-Salem, NC 27157 >>>>> >>>>> -- A computer lets you make more mistakes faster than any invention >>>> in human history -- >>>>> with the possible exceptions of handguns and tequila. >>>>> --Mitch Ratliffe, >>>> "Technology Review" >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi- >> bin/mailman/listinfo/genabel- >>>> d >>>>> evel >>>>> >>>> >>>> -- >>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>> L.C. Karssen >>>> Utrecht >>>> The Netherlands >>>> >>>> lennart at karssen.org >>>> http://blog.karssen.org >>>> GPG key ID: A88F554A >>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Mar 5 15:40:58 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 15:40:58 +0100 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1C2DB@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2DB@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <5317377A.3040508@karssen.org> Hi Lucho, On 04-03-14 01:14, Latchezar (Lucho) Dimitrov wrote: > Dear Lennart, > > I had some time to spend and looked at possible solutions. Thanks for that! > Unfortunately the more I look the more I realize: > > 1. why there is no -I in solaris diff ;-)) It's ugly kludge by itself > > 2. I am afraid the idea of having run_diff creates more problems than it > solves. Sorry. > > If I am to implement it now I'd just: > > 1. have only the parts that should be identical in verified_results/ > 2. use appropriate diff directly in test_*.sh files - it's just > replacing a proc. call with the actual one-liner > > like replacing, for example: > > run_diff linear_base_add.out.txt \ > linear_ngp2_add.out.txt \ > "QT check: dose vs. prob (additive model)" -I SNP > > if tail -n +2 linear_ngp2_add.out.txt |diff linear_base_add.out.txt_1st_line_removed - ; then > echo -e "${name}${blanks:${#name}} OK" > else > echo -e "${name}${blanks:${#name}} FAILED" > # exit 1 # replace this as appropriate > fi > > This way it will be way more flexible and more importantly system > independent - no autoconf and all that stuff. Autoconf has some nice features, but definitely isn't for the faint-of-heart :-). > > There are many variations of the approach but the idea should be clear. > More important question to me to resolve/answer, though, is "Do really > otherwise the same columns have to have different names in different > files?". I'd rather put the differences in the file names and keep the > columns the same. That's a good suggestion! One of those things you overlook if you're used to it for so long. Indeed the -I option is only used when comparing output from dosage data to output from probability data (and then only for the additive model). It makes sense to change the headers, make the "dosage headers" equal to the "probability headers". Especially since the "probability header" contains an explicit reference to the SNP that is used as reference for calculating beta. I'll start up a conversation on the mailing list specifically for this and add a bug report as well (unless you're willing to do so, of course!). > > > But ... this may be just me ;-)) Always good to have a fresh pair of eyes looking at the code. Best, Lennart. > > > Best regards, > Lucho > >> -----Original Message----- >> From: Latchezar (Lucho) Dimitrov >> Sent: Monday, March 03, 2014 5:39 PM >> To: 'L.C. Karssen'; 'genabel-devel at r-forge.wu-wien.ac.at' >> Subject: RE: [GenABEL-dev] make check 2 PASS 4 FAIL >> >> Dear Lennart, >> >> As I said (or thought I had) it was ugly (quick & dirty) but worked as >> a proof. I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of each >> of the two files with first line removed and then 'diff'. The ugliest >> part is it is the same for all diff's, i.e., it does not take -I into >> account at all just blindly compares the two files w/o the first line. >> But it worked for me as a proof my build is correct. >> >> Sorry I do not have a nice solution. If I come across something I'll >> let you know. >> >> Thanks, >> Lucho >> >>> -----Original Message----- >>> From: L.C. Karssen [mailto:lennart at karssen.org] >>> Sent: Monday, March 03, 2014 4:16 PM >>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' >>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >>> >>> Dear Lucho, >>> >>> >>> On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: >>>> Dear Lennart, >>>> >>>> Yeah, I know. Solaris is not for the faint of heart ;-) however it >> is >>>> rewarding! Anyway, I have built some of the gnu utils on solaris >> but >>>> diff is not amongst them. GCC is though. I looked more carefully at >>>> the diff's (pun intended) between gnu(linux) diff and the one I >> have >>>> and figured it is -I option missing in mine. I changed run_diff my >>> way >>>> to skip first line in all comparisons and it worked. Now that I >>>> confirmed it is 'make check' issue and not the build itself one I >> am >>> happy. >>> >>> Glad to hear it all worked out! From your remark below it seems that >>> you modified run_diff a bit in order to ignor ethe first line. If so, >>> would you consider sending a patch? Ths would bring Solaris support >>> (and portability in general) one step closer. >>> >>>> >>>> BTW, I couldn't find a quick nice replacement for your -I option to >>>> share so it might be a good idea at least to mention the case and >> the >>>> requirement for diff to support -I for 'make check' to work >> properly. >>> >>> I'll see if I can find a way to let autoconf figure out if a certain >>> option is accepted by a command. It seems that the >>> AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful >> for >>> other cases as well. >>> >>>> I'd also suggest changing >>>> >>>> if diff "$file1" "$file2" $args; then >>>> >>>> to the canonical >>>> >>>> if diff $args "$file1" "$file2" ; then >>>> >>>> which actually quickly showed me where the problems was and might >> be >>>> helpful to ones who do not read the readme files :-)) >>>> >>> >>> Done, thanks for the suggestion. It's in SVN r1603. >>> >>> >>> Thanks, >>> >>> Lennart. >>> >>>> >>>> Thank you very much, >>>> Lucho >>>> >>>> PS. I may decide to make a module gnu in my solaris systems ;-))) >>>> >>>> >>>>> -----Original Message----- >>>>> From: L.C. Karssen [mailto:lennart at karssen.org] >>>>> Sent: Monday, March 03, 2014 4:13 AM >>>>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- >>> wien.ac.at' >>>>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >>>>> >>>>> Dear Lucho, >>>>> >>>>> Thanks for your interest in ProbABEL. I think you are one of the >>>>> (very?) few users using ProbABEL on Solaris, so we are very >>>>> interested in your feedback. >>>>> >>>>> Could you send us the config.log file created when running >>> ./configure? >>>>> That may give us some more hints on how your system is configured. >>>>> >>>>> My first hunch is that the diff utility in Solaris has some >>> different >>>>> options from the GNU version. When comparing the outputs from >> dosage >>>>> inputs with probability input files the checks use the -I option >> to >>>>> ignore the header line. Does your version of diff have that >> option? >>>>> >>>>> My knowledge of Solaris is a bit rusty, but I seem to remember >> that >>>>> some of the GNU tools are available (or at least in principle >>>>> installable) on Solaris. I think they are then prefixed with a g. >> Do >>>>> you have gdiff on your system (maybe in /usr/bin/ or >> /usr/sfw/bin/)? >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Lennart Karssen. >>>>> >>>>> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: >>>>>> Dear genABEL developers, >>>>>> >>>>>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on >>> ORACLE >>>>>> Solaris 10 x86 but when I ran >>>>>> >>>>>> make check >>>>>> >>>>>> I got the subj. results. I went and manually ran one of the >> failing >>>>> check: >>>>>> >>>>>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ >>>>>> "pacoxph check: dose vs. prob" -I SNP >>>>>> diff: two filename arguments required pacoxph check: dose vs. >> prob >>>>> FAILED >>>>>> >>>>>> Then I manually compared the two fails: >>>>>> >>>>>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt >>>>>> 1c1 >>>>>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>>> position beta_SNP_add sebeta_SNP_add chi2_SNP >>>>>> --- >>>>>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 >>>>>> >>>>>> Finally, I compared the two files w/o their first lines and they >>> are >>>>> the same. >>>>>> >>>>>> >>>>>> Any help highly appreciated. Please find the log file attached >>>>>> >>>>>> >>>>>> Thank you very much, >>>>>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov >>>>>> >>>>>> Analyst/Programmer IV, >>>>>> Center for Genomics and Personalized Medicine Research >>>>>> Wake Forest University School of Medicine fax: (336)713- >> 7566 >>>>>> Medical Center Blvd. work: (336)713- >> 7137 >>>>>> Winston-Salem, NC 27157 >>>>>> >>>>>> -- A computer lets you make more mistakes faster than any >> invention >>>>> in human history -- >>>>>> with the possible exceptions of handguns and tequila. >>>>>> --Mitch Ratliffe, >>>>> "Technology Review" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi- >>> bin/mailman/listinfo/genabel- >>>>> d >>>>>> evel >>>>>> >>>>> >>>>> -- >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> GPG key ID: A88F554A >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>> >>>> >>> >>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From ldimitro at wakehealth.edu Wed Mar 5 17:37:52 2014 From: ldimitro at wakehealth.edu (Latchezar (Lucho) Dimitrov) Date: Wed, 5 Mar 2014 16:37:52 +0000 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <5317377A.3040508@karssen.org> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2DB@exchdb6.medctr.ad.wfubmc.edu> <5317377A.3040508@karssen.org> Message-ID: <3CB7E549BE1AA14BB108ABCB4637B02519A1F538@exchdb6.medctr.ad.wfubmc.edu> Hi Lennart, First of all thank you for considering my thoughts. Secondly I'll leave the bug report to you ;-) since I/we are just considering using some of the functionality *ABEL provides and I seemingly have to subscribe or something in order to file a bug report. This (subscription) is something I will do if we get more involved. For now I am happy to be somewhat helpful and more importantly to have the software built successfully ;-)) Thanks again, Lucho > -----Original Message----- > From: L.C. Karssen [mailto:lennart at karssen.org] > Sent: Wednesday, March 05, 2014 9:41 AM > To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' > Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > > Hi Lucho, > > On 04-03-14 01:14, Latchezar (Lucho) Dimitrov wrote: > > Dear Lennart, > > > > I had some time to spend and looked at possible solutions. > > Thanks for that! > > > > Unfortunately the more I look the more I realize: > > > > 1. why there is no -I in solaris diff ;-)) It's ugly kludge by itself > > > > 2. I am afraid the idea of having run_diff creates more problems than > > it solves. Sorry. > > > > If I am to implement it now I'd just: > > > > 1. have only the parts that should be identical in verified_results/ > > 2. use appropriate diff directly in test_*.sh files - it's just > > replacing a proc. call with the actual one-liner > > > > like replacing, for example: > > > > run_diff linear_base_add.out.txt \ > > linear_ngp2_add.out.txt \ > > "QT check: dose vs. prob (additive model)" -I SNP > > > > if tail -n +2 linear_ngp2_add.out.txt |diff > linear_base_add.out.txt_1st_line_removed - ; then > > echo -e "${name}${blanks:${#name}} OK" > > else > > echo -e "${name}${blanks:${#name}} FAILED" > > # exit 1 # replace this as appropriate > > fi > > > > This way it will be way more flexible and more importantly system > > independent - no autoconf and all that stuff. > > Autoconf has some nice features, but definitely isn't for the faint-of- > heart :-). > > > > > There are many variations of the approach but the idea should be > clear. > > More important question to me to resolve/answer, though, is "Do > really > > otherwise the same columns have to have different names in different > > files?". I'd rather put the differences in the file names and keep > the > > columns the same. > > That's a good suggestion! One of those things you overlook if you're > used to it for so long. Indeed the -I option is only used when > comparing output from dosage data to output from probability data (and > then only for the additive model). It makes sense to change the > headers, make the "dosage headers" equal to the "probability headers". > Especially since the "probability header" contains an explicit > reference to the SNP that is used as reference for calculating beta. > > I'll start up a conversation on the mailing list specifically for this > and add a bug report as well (unless you're willing to do so, of > course!). > > > > > > > > But ... this may be just me ;-)) > > Always good to have a fresh pair of eyes looking at the code. > > > Best, > > Lennart. > > > > > > > Best regards, > > Lucho > > > >> -----Original Message----- > >> From: Latchezar (Lucho) Dimitrov > >> Sent: Monday, March 03, 2014 5:39 PM > >> To: 'L.C. Karssen'; 'genabel-devel at r-forge.wu-wien.ac.at' > >> Subject: RE: [GenABEL-dev] make check 2 PASS 4 FAIL > >> > >> Dear Lennart, > >> > >> As I said (or thought I had) it was ugly (quick & dirty) but worked > >> as a proof. I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of > >> each of the two files with first line removed and then 'diff'. The > >> ugliest part is it is the same for all diff's, i.e., it does not > take > >> -I into account at all just blindly compares the two files w/o the > first line. > >> But it worked for me as a proof my build is correct. > >> > >> Sorry I do not have a nice solution. If I come across something I'll > >> let you know. > >> > >> Thanks, > >> Lucho > >> > >>> -----Original Message----- > >>> From: L.C. Karssen [mailto:lennart at karssen.org] > >>> Sent: Monday, March 03, 2014 4:16 PM > >>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- > wien.ac.at' > >>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > >>> > >>> Dear Lucho, > >>> > >>> > >>> On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: > >>>> Dear Lennart, > >>>> > >>>> Yeah, I know. Solaris is not for the faint of heart ;-) however it > >> is > >>>> rewarding! Anyway, I have built some of the gnu utils on solaris > >> but > >>>> diff is not amongst them. GCC is though. I looked more carefully > at > >>>> the diff's (pun intended) between gnu(linux) diff and the one I > >> have > >>>> and figured it is -I option missing in mine. I changed run_diff my > >>> way > >>>> to skip first line in all comparisons and it worked. Now that I > >>>> confirmed it is 'make check' issue and not the build itself one I > >> am > >>> happy. > >>> > >>> Glad to hear it all worked out! From your remark below it seems > that > >>> you modified run_diff a bit in order to ignor ethe first line. If > >>> so, would you consider sending a patch? Ths would bring Solaris > >>> support (and portability in general) one step closer. > >>> > >>>> > >>>> BTW, I couldn't find a quick nice replacement for your -I option > to > >>>> share so it might be a good idea at least to mention the case and > >> the > >>>> requirement for diff to support -I for 'make check' to work > >> properly. > >>> > >>> I'll see if I can find a way to let autoconf figure out if a > certain > >>> option is accepted by a command. It seems that the > >>> AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful > >> for > >>> other cases as well. > >>> > >>>> I'd also suggest changing > >>>> > >>>> if diff "$file1" "$file2" $args; then > >>>> > >>>> to the canonical > >>>> > >>>> if diff $args "$file1" "$file2" ; then > >>>> > >>>> which actually quickly showed me where the problems was and might > >> be > >>>> helpful to ones who do not read the readme files :-)) > >>>> > >>> > >>> Done, thanks for the suggestion. It's in SVN r1603. > >>> > >>> > >>> Thanks, > >>> > >>> Lennart. > >>> > >>>> > >>>> Thank you very much, > >>>> Lucho > >>>> > >>>> PS. I may decide to make a module gnu in my solaris systems ;-))) > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: L.C. Karssen [mailto:lennart at karssen.org] > >>>>> Sent: Monday, March 03, 2014 4:13 AM > >>>>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- > >>> wien.ac.at' > >>>>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL > >>>>> > >>>>> Dear Lucho, > >>>>> > >>>>> Thanks for your interest in ProbABEL. I think you are one of the > >>>>> (very?) few users using ProbABEL on Solaris, so we are very > >>>>> interested in your feedback. > >>>>> > >>>>> Could you send us the config.log file created when running > >>> ./configure? > >>>>> That may give us some more hints on how your system is > configured. > >>>>> > >>>>> My first hunch is that the diff utility in Solaris has some > >>> different > >>>>> options from the GNU version. When comparing the outputs from > >> dosage > >>>>> inputs with probability input files the checks use the -I option > >> to > >>>>> ignore the header line. Does your version of diff have that > >> option? > >>>>> > >>>>> My knowledge of Solaris is a bit rusty, but I seem to remember > >> that > >>>>> some of the GNU tools are available (or at least in principle > >>>>> installable) on Solaris. I think they are then prefixed with a g. > >> Do > >>>>> you have gdiff on your system (maybe in /usr/bin/ or > >> /usr/sfw/bin/)? > >>>>> > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Lennart Karssen. > >>>>> > >>>>> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: > >>>>>> Dear genABEL developers, > >>>>>> > >>>>>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on > >>> ORACLE > >>>>>> Solaris 10 x86 but when I ran > >>>>>> > >>>>>> make check > >>>>>> > >>>>>> I got the subj. results. I went and manually ran one of the > >> failing > >>>>> check: > >>>>>> > >>>>>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ > >>>>>> "pacoxph check: dose vs. prob" -I SNP > >>>>>> diff: two filename arguments required pacoxph check: dose vs. > >> prob > >>>>> FAILED > >>>>>> > >>>>>> Then I manually compared the two fails: > >>>>>> > >>>>>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt > >>>>>> 1c1 > >>>>>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > >>>>>> position beta_SNP_add sebeta_SNP_add chi2_SNP > >>>>>> --- > >>>>>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom > >>>>>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > >>>>>> > >>>>>> Finally, I compared the two files w/o their first lines and they > >>> are > >>>>> the same. > >>>>>> > >>>>>> > >>>>>> Any help highly appreciated. Please find the log file attached > >>>>>> > >>>>>> > >>>>>> Thank you very much, > >>>>>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov > >>>>>> > >>>>>> Analyst/Programmer IV, > >>>>>> Center for Genomics and Personalized Medicine Research > >>>>>> Wake Forest University School of Medicine fax: (336)713- > >> 7566 > >>>>>> Medical Center Blvd. work: (336)713- > >> 7137 > >>>>>> Winston-Salem, NC 27157 > >>>>>> > >>>>>> -- A computer lets you make more mistakes faster than any > >> invention > >>>>> in human history -- > >>>>>> with the possible exceptions of handguns and tequila. > >>>>>> --Mitch Ratliffe, > >>>>> "Technology Review" > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> genabel-devel mailing list > >>>>>> genabel-devel at lists.r-forge.r-project.org > >>>>>> https://lists.r-forge.r-project.org/cgi- > >>> bin/mailman/listinfo/genabel- > >>>>> d > >>>>>> evel > >>>>>> > >>>>> > >>>>> -- > >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > >>>>> L.C. Karssen > >>>>> Utrecht > >>>>> The Netherlands > >>>>> > >>>>> lennart at karssen.org > >>>>> http://blog.karssen.org > >>>>> GPG key ID: A88F554A > >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > >>>> > >>>> > >>> > >>> -- > >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > >>> L.C. Karssen > >>> Utrecht > >>> The Netherlands > >>> > >>> lennart at karssen.org > >>> http://blog.karssen.org > >>> GPG key ID: A88F554A > >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From shenxia911 at gmail.com Wed Mar 5 13:53:39 2014 From: shenxia911 at gmail.com (Xia Shen) Date: Wed, 5 Mar 2014 13:53:39 +0100 Subject: [GenABEL-dev] Suggested change of default weight in GenABEL/ibs() In-Reply-To: <4828030313405700840@unknownmsgid> References: <768CB804-C3EF-4E6A-B524-937A1B54331C@gmail.com> <-317756408431031488@unknownmsgid> <531471A9.7060109@karssen.org> <4828030313405700840@unknownmsgid> Message-ID: How shall I provide a patch? Do I simply revise the ibs() code and submit to you the ibs.R file? Xia On 03 Mar 2014, at 13:16, Yurii Aulchenko wrote: > I am voting for option 2; as for the version number - no strong opinion. > > Xia, would you be willing to summarize this discussion as a feature > request, or to provide a patch? > > Best wishes, > Yurii > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Mar 3, 2014, at 4:12 PM, "L.C. Karssen" wrote: >> >> Dear all, >> >>> On 28-02-14 14:15, Yurii Aulchenko wrote: >>> In principle I agree this is good idea, rarely "no" (default) option is >>> used; but I am always worried to change defaults as this may destroy >>> people pipelines. >> >> How about issuing a warning? I see two options: >> 1) Keep the current default and issue a warning when the user doesn't >> specify a weight. >> 2) Change the default and when ibs() is run and the weight argument is >> not equal to 'freq', issue a warning saying that the default has changed >> to freq. >> >> Although I haven't used ibs() a lot, I think option 2 is best as it sets >> a sane default. >> >> Another thing to think about: when changing defaults, what do we do with >> the version number. If we only change the minor number (the 8 in 1.8-0), >> people will think not much has changed and this change in default >> behaviour may go unnoticed. >> >> >> Lennart. >> >>> >>> We need third opinion :) >>> >>> Best, >>> Y >>> >>> ---------------------- >>> Yurii Aulchenko >>> (sent from mobile device) >>> >>> On Feb 28, 2014, at 3:45 PM, Xia Shen >> > wrote: >>> >>>> Hi, >>>> >>>> I suggest to change the default weight argument in the GenABEL/ibs() >>>> function to be set to "freq" instead of "no". I have colleagues >>>> constantly forget to set to "freq" and produce, what I would call, >>>> "wrong" kinship matrix. >>>> >>>> *Xia Shen* >>>> PhD >>>> >>>> Division of Computational Genetics >>>> Department of Clinical Science >>>> *Swedish University of Agricultural Sciences* >>>> Uppsala, Sweden >>>> >>>> www.shen.se >>>> >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Wed Mar 5 22:26:09 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 22:26:09 +0100 Subject: [GenABEL-dev] make check 2 PASS 4 FAIL In-Reply-To: <3CB7E549BE1AA14BB108ABCB4637B02519A1F538@exchdb6.medctr.ad.wfubmc.edu> References: <3CB7E549BE1AA14BB108ABCB4637B02519A1AE90@exchdb6.medctr.ad.wfubmc.edu> <531447AB.6090405@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C25B@exchdb6.medctr.ad.wfubmc.edu> <5314F120.2030805@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2BC@exchdb6.medctr.ad.wfubmc.edu> <3CB7E549BE1AA14BB108ABCB4637B02519A1C2DB@exchdb6.medctr.ad.wfubmc.edu> <5317377A.3040508@karssen.org> <3CB7E549BE1AA14BB108ABCB4637B02519A1F538@exchdb6.medctr.ad.wfubmc.edu> Message-ID: <53179671.9080704@karssen.org> Hi Lucho, On 05-03-14 17:37, Latchezar (Lucho) Dimitrov wrote: > Hi Lennart, > > First of all thank you for considering my thoughts. Most welcome! > > Secondly I'll leave the bug report to you ;-) since I/we are just > considering using some of the functionality *ABEL provides and I > seemingly have to subscribe or something in order to file a bug report. > This (subscription) is something I will do if we get more involved. For > now I am happy to be somewhat helpful and more importantly to have the > software built successfully ;-)) Fair enough :-). Hope you'll enjoy working with the ABELs. If not, you know how to find us. Best, Lennart. P.S. The bug has been filed: Bug #5409, https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5409&group_id=505&atid=2058 > > Thanks again, > Lucho > >> -----Original Message----- >> From: L.C. Karssen [mailto:lennart at karssen.org] >> Sent: Wednesday, March 05, 2014 9:41 AM >> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu-wien.ac.at' >> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >> >> Hi Lucho, >> >> On 04-03-14 01:14, Latchezar (Lucho) Dimitrov wrote: >>> Dear Lennart, >>> >>> I had some time to spend and looked at possible solutions. >> >> Thanks for that! >> >> >>> Unfortunately the more I look the more I realize: >>> >>> 1. why there is no -I in solaris diff ;-)) It's ugly kludge by itself >>> >>> 2. I am afraid the idea of having run_diff creates more problems than >>> it solves. Sorry. >>> >>> If I am to implement it now I'd just: >>> >>> 1. have only the parts that should be identical in verified_results/ >>> 2. use appropriate diff directly in test_*.sh files - it's just >>> replacing a proc. call with the actual one-liner >>> >>> like replacing, for example: >>> >>> run_diff linear_base_add.out.txt \ >>> linear_ngp2_add.out.txt \ >>> "QT check: dose vs. prob (additive model)" -I SNP >>> >>> if tail -n +2 linear_ngp2_add.out.txt |diff >> linear_base_add.out.txt_1st_line_removed - ; then >>> echo -e "${name}${blanks:${#name}} OK" >>> else >>> echo -e "${name}${blanks:${#name}} FAILED" >>> # exit 1 # replace this as appropriate >>> fi >>> >>> This way it will be way more flexible and more importantly system >>> independent - no autoconf and all that stuff. >> >> Autoconf has some nice features, but definitely isn't for the faint-of- >> heart :-). >> >>> >>> There are many variations of the approach but the idea should be >> clear. >>> More important question to me to resolve/answer, though, is "Do >> really >>> otherwise the same columns have to have different names in different >>> files?". I'd rather put the differences in the file names and keep >> the >>> columns the same. >> >> That's a good suggestion! One of those things you overlook if you're >> used to it for so long. Indeed the -I option is only used when >> comparing output from dosage data to output from probability data (and >> then only for the additive model). It makes sense to change the >> headers, make the "dosage headers" equal to the "probability headers". >> Especially since the "probability header" contains an explicit >> reference to the SNP that is used as reference for calculating beta. >> >> I'll start up a conversation on the mailing list specifically for this >> and add a bug report as well (unless you're willing to do so, of >> course!). >> >> >>> >>> >>> But ... this may be just me ;-)) >> >> Always good to have a fresh pair of eyes looking at the code. >> >> >> Best, >> >> Lennart. >> >>> >>> >>> Best regards, >>> Lucho >>> >>>> -----Original Message----- >>>> From: Latchezar (Lucho) Dimitrov >>>> Sent: Monday, March 03, 2014 5:39 PM >>>> To: 'L.C. Karssen'; 'genabel-devel at r-forge.wu-wien.ac.at' >>>> Subject: RE: [GenABEL-dev] make check 2 PASS 4 FAIL >>>> >>>> Dear Lennart, >>>> >>>> As I said (or thought I had) it was ugly (quick & dirty) but worked >>>> as a proof. I used, e.g., 'tail -n +2 $file1 >f1' to make a copy of >>>> each of the two files with first line removed and then 'diff'. The >>>> ugliest part is it is the same for all diff's, i.e., it does not >> take >>>> -I into account at all just blindly compares the two files w/o the >> first line. >>>> But it worked for me as a proof my build is correct. >>>> >>>> Sorry I do not have a nice solution. If I come across something I'll >>>> let you know. >>>> >>>> Thanks, >>>> Lucho >>>> >>>>> -----Original Message----- >>>>> From: L.C. Karssen [mailto:lennart at karssen.org] >>>>> Sent: Monday, March 03, 2014 4:16 PM >>>>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- >> wien.ac.at' >>>>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >>>>> >>>>> Dear Lucho, >>>>> >>>>> >>>>> On 03-03-14 21:38, Latchezar (Lucho) Dimitrov wrote: >>>>>> Dear Lennart, >>>>>> >>>>>> Yeah, I know. Solaris is not for the faint of heart ;-) however it >>>> is >>>>>> rewarding! Anyway, I have built some of the gnu utils on solaris >>>> but >>>>>> diff is not amongst them. GCC is though. I looked more carefully >> at >>>>>> the diff's (pun intended) between gnu(linux) diff and the one I >>>> have >>>>>> and figured it is -I option missing in mine. I changed run_diff my >>>>> way >>>>>> to skip first line in all comparisons and it worked. Now that I >>>>>> confirmed it is 'make check' issue and not the build itself one I >>>> am >>>>> happy. >>>>> >>>>> Glad to hear it all worked out! From your remark below it seems >> that >>>>> you modified run_diff a bit in order to ignor ethe first line. If >>>>> so, would you consider sending a patch? Ths would bring Solaris >>>>> support (and portability in general) one step closer. >>>>> >>>>>> >>>>>> BTW, I couldn't find a quick nice replacement for your -I option >> to >>>>>> share so it might be a good idea at least to mention the case and >>>> the >>>>>> requirement for diff to support -I for 'make check' to work >>>> properly. >>>>> >>>>> I'll see if I can find a way to let autoconf figure out if a >> certain >>>>> option is accepted by a command. It seems that the >>>>> AC_PATH_PROGS_FEATURE_CHECK macro can do this. That may be helpful >>>> for >>>>> other cases as well. >>>>> >>>>>> I'd also suggest changing >>>>>> >>>>>> if diff "$file1" "$file2" $args; then >>>>>> >>>>>> to the canonical >>>>>> >>>>>> if diff $args "$file1" "$file2" ; then >>>>>> >>>>>> which actually quickly showed me where the problems was and might >>>> be >>>>>> helpful to ones who do not read the readme files :-)) >>>>>> >>>>> >>>>> Done, thanks for the suggestion. It's in SVN r1603. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Lennart. >>>>> >>>>>> >>>>>> Thank you very much, >>>>>> Lucho >>>>>> >>>>>> PS. I may decide to make a module gnu in my solaris systems ;-))) >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: L.C. Karssen [mailto:lennart at karssen.org] >>>>>>> Sent: Monday, March 03, 2014 4:13 AM >>>>>>> To: Latchezar (Lucho) Dimitrov; 'genabel-devel at r-forge.wu- >>>>> wien.ac.at' >>>>>>> Subject: Re: [GenABEL-dev] make check 2 PASS 4 FAIL >>>>>>> >>>>>>> Dear Lucho, >>>>>>> >>>>>>> Thanks for your interest in ProbABEL. I think you are one of the >>>>>>> (very?) few users using ProbABEL on Solaris, so we are very >>>>>>> interested in your feedback. >>>>>>> >>>>>>> Could you send us the config.log file created when running >>>>> ./configure? >>>>>>> That may give us some more hints on how your system is >> configured. >>>>>>> >>>>>>> My first hunch is that the diff utility in Solaris has some >>>>> different >>>>>>> options from the GNU version. When comparing the outputs from >>>> dosage >>>>>>> inputs with probability input files the checks use the -I option >>>> to >>>>>>> ignore the header line. Does your version of diff have that >>>> option? >>>>>>> >>>>>>> My knowledge of Solaris is a bit rusty, but I seem to remember >>>> that >>>>>>> some of the GNU tools are available (or at least in principle >>>>>>> installable) on Solaris. I think they are then prefixed with a g. >>>> Do >>>>>>> you have gdiff on your system (maybe in /usr/bin/ or >>>> /usr/sfw/bin/)? >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Lennart Karssen. >>>>>>> >>>>>>> On 02-03-14 01:10, Latchezar (Lucho) Dimitrov wrote: >>>>>>>> Dear genABEL developers, >>>>>>>> >>>>>>>> I have successfully built probABEL v.0.4.2 using gcc-4.1.1 on >>>>> ORACLE >>>>>>>> Solaris 10 x86 but when I ran >>>>>>>> >>>>>>>> make check >>>>>>>> >>>>>>>> I got the subj. results. I went and manually ran one of the >>>> failing >>>>>>> check: >>>>>>>> >>>>>>>> run_diff coxph_dose_add.out.txt coxph_prob_add.out.txt \ >>>>>>>> "pacoxph check: dose vs. prob" -I SNP >>>>>>>> diff: two filename arguments required pacoxph check: dose vs. >>>> prob >>>>>>> FAILED >>>>>>>> >>>>>>>> Then I manually compared the two fails: >>>>>>>> >>>>>>>> diff coxph_dose_add.out.txt coxph_prob_add.out.txt >>>>>>>> 1c1 >>>>>>>> < name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>>>>> position beta_SNP_add sebeta_SNP_add chi2_SNP >>>>>>>> --- >>>>>>>>> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom >>>>>>>>> position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 >>>>>>>> >>>>>>>> Finally, I compared the two files w/o their first lines and they >>>>> are >>>>>>> the same. >>>>>>>> >>>>>>>> >>>>>>>> Any help highly appreciated. Please find the log file attached >>>>>>>> >>>>>>>> >>>>>>>> Thank you very much, >>>>>>>> Latchezar (Lucho) "Speaking w/ computers" Dimitrov >>>>>>>> >>>>>>>> Analyst/Programmer IV, >>>>>>>> Center for Genomics and Personalized Medicine Research >>>>>>>> Wake Forest University School of Medicine fax: (336)713- >>>> 7566 >>>>>>>> Medical Center Blvd. work: (336)713- >>>> 7137 >>>>>>>> Winston-Salem, NC 27157 >>>>>>>> >>>>>>>> -- A computer lets you make more mistakes faster than any >>>> invention >>>>>>> in human history -- >>>>>>>> with the possible exceptions of handguns and tequila. >>>>>>>> --Mitch Ratliffe, >>>>>>> "Technology Review" >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi- >>>>> bin/mailman/listinfo/genabel- >>>>>>> d >>>>>>>> evel >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>>>> L.C. Karssen >>>>>>> Utrecht >>>>>>> The Netherlands >>>>>>> >>>>>>> lennart at karssen.org >>>>>>> http://blog.karssen.org >>>>>>> GPG key ID: A88F554A >>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>> >>>>>> >>>>> >>>>> -- >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> GPG key ID: A88F554A >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Wed Mar 5 22:31:06 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 22:31:06 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1635 - in pkg/OmicABELnoMM: . tests In-Reply-To: <20140305195410.110E8186CE5@r-forge.r-project.org> References: <20140305195410.110E8186CE5@r-forge.r-project.org> Message-ID: <5317979A.2010401@karssen.org> Hi Alvaro, On 05-03-14 20:54, noreply at r-forge.r-project.org wrote: > Author: afrank > Date: 2014-03-05 20:54:09 +0100 (Wed, 05 Mar 2014) > New Revision: 1635 > > Added: > pkg/OmicABELnoMM/tests/ > pkg/OmicABELnoMM/tests/Makefile > pkg/OmicABELnoMM/tests/test.cpp > Log: > Added base tests for validity under tests folder. This should be passed by any build of the solver. Good news to see you've developed tests and put them in a separate directory. I'll put it on my todo list to add them to automake so they will be run when the user runs 'make check'. I won't have time this week or the next, so if you'd like to do this yourself, please go ahead! See ProbABEL/checks/Makefile.am for example code. Thanks, Lennart. > > Added: pkg/OmicABELnoMM/tests/Makefile > =================================================================== > --- pkg/OmicABELnoMM/tests/Makefile (rev 0) > +++ pkg/OmicABELnoMM/tests/Makefile 2014-03-05 19:54:09 UTC (rev 1635) > @@ -0,0 +1,2 @@ > +normal: > + g++ *.cpp ../src/Algorithm.cpp ../src/Utility.cpp ../src/AIOwrapper.cpp -o test.out -L/usr/lib/openblas-base -lpthread -lopenblas -llapacke -fopenmp -O2 -g -Wall -pedantic -Wunused-result -Wmaybe-uninitialized -Wformat -g > \ No newline at end of file > > > Property changes on: pkg/OmicABELnoMM/tests/Makefile > ___________________________________________________________________ > Added: svn:executable > + * > > Added: pkg/OmicABELnoMM/tests/test.cpp > =================================================================== > --- pkg/OmicABELnoMM/tests/test.cpp (rev 0) > +++ pkg/OmicABELnoMM/tests/test.cpp 2014-03-05 19:54:09 UTC (rev 1635) > @@ -0,0 +1,85 @@ > +//export MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 OMP_NESTED=true > +#include > +#include > + > + > + > +#include "../src/Definitions.h" > +#include "../src/Algorithm.h" > + > + > + > +int main(int argc, char *argv[] ) > +{ > + struct Settings params; > + > + params.ForceCheck = true; > + > + > + //!default params > + params.r = 1; > + params.threads = 1; > + > + omp_set_num_threads(params.threads); > + blas_set_num_threads(params.threads); > + > + Algorithm alg; > + > + params.use_fake_files = true; > + int iters = 10; > + int max_threads = 2; > + > + > + for (int th = 0; th < max_threads; th++) > + { > + params.threads = th; > + > + params.n=10; params.l=4; params.r=1; > + params.t=16; params.tb=1; params.m=16; params.mb=1; > + > + struct Outputs out = {0}; > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=16; params.tb=4; params.m=16; params.mb=4; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=16; params.tb=5; params.m=16; params.mb=3; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=4; params.tb=4; params.m=4; params.mb=4; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + > + } > + > + cout << "\nTest finished succesfully\n"; > + > + > + > + > + return 0; > +} > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From alvaro.frank at rwth-aachen.de Wed Mar 5 22:44:46 2014 From: alvaro.frank at rwth-aachen.de (Frank, Alvaro Jesus) Date: Wed, 5 Mar 2014 21:44:46 +0000 Subject: [GenABEL-dev] [Genabel-commits] r1635 - in pkg/OmicABELnoMM: . tests In-Reply-To: <5317979A.2010401@karssen.org> References: <20140305195410.110E8186CE5@r-forge.r-project.org>, <5317979A.2010401@karssen.org> Message-ID: <244CF001646FF74FB34F372310A332C57AA973@MBX2.rwth-ad.de> Hi Lennart, I will take a look at it. At the moment those are base tests and more comprehensive ones should follow soon once the base one is integrated into the automake. I still need to rework some verbosity problem when performing hundreds of tests, too much output wont give meaningful information. Ill keep you posted. Alvaro ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] Sent: Wednesday, March 05, 2014 10:31 PM To: genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] [Genabel-commits] r1635 - in pkg/OmicABELnoMM: . tests Hi Alvaro, On 05-03-14 20:54, noreply at r-forge.r-project.org wrote: > Author: afrank > Date: 2014-03-05 20:54:09 +0100 (Wed, 05 Mar 2014) > New Revision: 1635 > > Added: > pkg/OmicABELnoMM/tests/ > pkg/OmicABELnoMM/tests/Makefile > pkg/OmicABELnoMM/tests/test.cpp > Log: > Added base tests for validity under tests folder. This should be passed by any build of the solver. Good news to see you've developed tests and put them in a separate directory. I'll put it on my todo list to add them to automake so they will be run when the user runs 'make check'. I won't have time this week or the next, so if you'd like to do this yourself, please go ahead! See ProbABEL/checks/Makefile.am for example code. Thanks, Lennart. > > Added: pkg/OmicABELnoMM/tests/Makefile > =================================================================== > --- pkg/OmicABELnoMM/tests/Makefile (rev 0) > +++ pkg/OmicABELnoMM/tests/Makefile 2014-03-05 19:54:09 UTC (rev 1635) > @@ -0,0 +1,2 @@ > +normal: > + g++ *.cpp ../src/Algorithm.cpp ../src/Utility.cpp ../src/AIOwrapper.cpp -o test.out -L/usr/lib/openblas-base -lpthread -lopenblas -llapacke -fopenmp -O2 -g -Wall -pedantic -Wunused-result -Wmaybe-uninitialized -Wformat -g > \ No newline at end of file > > > Property changes on: pkg/OmicABELnoMM/tests/Makefile > ___________________________________________________________________ > Added: svn:executable > + * > > Added: pkg/OmicABELnoMM/tests/test.cpp > =================================================================== > --- pkg/OmicABELnoMM/tests/test.cpp (rev 0) > +++ pkg/OmicABELnoMM/tests/test.cpp 2014-03-05 19:54:09 UTC (rev 1635) > @@ -0,0 +1,85 @@ > +//export MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 OMP_NESTED=true > +#include > +#include > + > + > + > +#include "../src/Definitions.h" > +#include "../src/Algorithm.h" > + > + > + > +int main(int argc, char *argv[] ) > +{ > + struct Settings params; > + > + params.ForceCheck = true; > + > + > + //!default params > + params.r = 1; > + params.threads = 1; > + > + omp_set_num_threads(params.threads); > + blas_set_num_threads(params.threads); > + > + Algorithm alg; > + > + params.use_fake_files = true; > + int iters = 10; > + int max_threads = 2; > + > + > + for (int th = 0; th < max_threads; th++) > + { > + params.threads = th; > + > + params.n=10; params.l=4; params.r=1; > + params.t=16; params.tb=1; params.m=16; params.mb=1; > + > + struct Outputs out = {0}; > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=16; params.tb=4; params.m=16; params.mb=4; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=16; params.tb=5; params.m=16; params.mb=3; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + /******************************/ > + params.n=10; params.l=4; params.r=2; > + params.t=4; params.tb=4; params.m=4; params.mb=4; > + > + > + for (int i = 0; i < iters; i++) > + { > + alg.solve(params, out, P_NEQ_B_OPT_MD); > + } > + cout << endl; > + > + } > + > + cout << "\nTest finished succesfully\n"; > + > + > + > + > + return 0; > +} > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From lennart at karssen.org Wed Mar 5 23:10:57 2014 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 05 Mar 2014 23:10:57 +0100 Subject: [GenABEL-dev] Inconsistent headers for ProbABEL outputs (dose vs. probs) Message-ID: <5317A0F1.8010403@karssen.org> Dear list, For those who haven't followed the discussion on this list I had with Lucho Dimitrov [1], here's a summary: While debugging the problem that ProbABEL's make check gives errors on Solaris we found out that this is due to the use of the -I option to the diff command, which is present in GNU's diff, but not in the Solaris version. The -I option is used to ignore the header line when comparing the output for the additive model calculated with dosage data as input vs. using probability data as input. Lucho wondered why the column headers are different in the first place. A good point, IMHO. The header for dosage-based output is: name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_add sebeta_SNP_add chi2_SNP For probability-based output, the headers is: name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 Why don't we harmonise these two headers? I would suggest going with the second header: - it clearly indicates which allele is used as reference when calculating beta - it's consistent with the other probability-base output headers. Pros: - more consistent output - simpler checks - compatibility with other OSes (e.g. Solaris) Cons: - Change of output format may disturb current pipelines (so definitely something for a major increase in version number) What do you think? Any other ideas, pros, cons? For now I've filed a bug for this issue [2] Thanks for thinking along, Lennart. [1] http://lists.r-forge.r-project.org/pipermail/genabel-devel/2014-March/000993.html [2] https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5409&group_id=505&atid=2058 -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Mar 12 09:26:27 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 12 Mar 2014 09:26:27 +0100 Subject: [GenABEL-dev] Inconsistent headers for ProbABEL outputs (dose vs. probs) In-Reply-To: <5317A0F1.8010403@karssen.org> References: <5317A0F1.8010403@karssen.org> Message-ID: I am for unification + more clear format (name beta_SNP_addA1 indicates that A1 is effect, and hence A2 is the reference if I understand correctly; need to double check as usual) allele (so, format 2) "chi2_SNP_A1" is weird though as chi2 does not relate to specific allele used (would be the same if we swap the reference; it is only Z whose sign is sensitive to ref/eff alleles) indeed it may disturb pipelines, so we need to make that very clear etc. best wishes, Yurii On Mar 5, 2014, at 23:10, L.C. Karssen wrote: > Dear list, > > For those who haven't followed the discussion on this list I had with > Lucho Dimitrov [1], here's a summary: > > While debugging the problem that ProbABEL's make check gives errors on > Solaris we found out that this is due to the use of the -I option to the > diff command, which is present in GNU's diff, but not in the Solaris > version. > > The -I option is used to ignore the header line when comparing the > output for the additive model calculated with dosage data as input vs. > using probability data as input. > > Lucho wondered why the column headers are different in the first place. > A good point, IMHO. The header for dosage-based output is: > > name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position > beta_SNP_add sebeta_SNP_add chi2_SNP > > For probability-based output, the headers is: > > name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position > beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 > > Why don't we harmonise these two headers? I would suggest going with the > second header: > - it clearly indicates which allele is used as reference when > calculating beta > - it's consistent with the other probability-base output headers. > > Pros: > - more consistent output > - simpler checks > - compatibility with other OSes (e.g. Solaris) > > Cons: > - Change of output format may disturb current pipelines (so definitely > something for a major increase in version number) > > > What do you think? Any other ideas, pros, cons? > For now I've filed a bug for this issue [2] > > Thanks for thinking along, > > Lennart. > > > [1] > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2014-March/000993.html > [2] > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5409&group_id=505&atid=2058 > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Sat Mar 15 14:58:26 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Sat, 15 Mar 2014 14:58:26 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1643 - pkg/DatABEL/man In-Reply-To: <20140315102108.633B4186D8E@r-forge.r-project.org> References: <20140315102108.633B4186D8E@r-forge.r-project.org> Message-ID: <1947174170152473749@unknownmsgid> Re: "needs to be checked" check --as-cran May be useful - they do check if docs are in good shape Y ---------------------- Yurii Aulchenko (sent from mobile device) > On Mar 15, 2014, at 11:21 AM, "noreply at r-forge.r-project.org" wrote: > > Author: lckarssen > Date: 2014-03-15 11:21:08 +0100 (Sat, 15 Mar 2014) > New Revision: 1643 > > Modified: > pkg/DatABEL/man/DatABEL-package.Rd > pkg/DatABEL/man/apply2dfo.Rd > pkg/DatABEL/man/databel.Rd > pkg/DatABEL/man/databel2matrix.Rd > pkg/DatABEL/man/databel2text.Rd > pkg/DatABEL/man/extract_text_file_columns.Rd > pkg/DatABEL/man/get_temporary_file_name.Rd > pkg/DatABEL/man/make_empty_fvf.Rd > pkg/DatABEL/man/matrix2databel.Rd > pkg/DatABEL/man/process_lm_output.Rd > pkg/DatABEL/man/text2databel.Rd > Log: > Updated manual files for DatABEL after running Roxygen2. Only layout changes. > I excluded databel-class.Rd, because it was missing a lot of \alias{} lines and had some other changes. Needs to be checked. > > > Modified: pkg/DatABEL/man/DatABEL-package.Rd > =================================================================== > --- pkg/DatABEL/man/DatABEL-package.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/DatABEL-package.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -4,20 +4,20 @@ > \alias{DatABEL-package} > \title{DatABEL package for fast consecutive access to large out-of-RAM stored matrices} > \description{ > - A package interfacing FILEVECTOR C++ library for storage > - of and fast consecutive access to large data matrices in > - out-of-RAM disk mode with regulated cache size. Columns > - of matrix are accessible very quickly. > +A package interfacing FILEVECTOR C++ library for storage of > +and fast consecutive access to large data matrices in > +out-of-RAM disk mode with regulated cache size. Columns of > +matrix are accessible very quickly. > } > \author{ > - Yurii Aulchenko (R code), Stepan Yakovenko (R and C++ > - code), Andrey Chernyh (C++ code) > +Yurii Aulchenko (R code), Stepan Yakovenko (R and C++ > +code), Andrey Chernyh (C++ code) > } > \seealso{ > - \code{\link{apply2dfo}}, \code{\link{databel2matrix}}, > - \code{\link{databel2text}}, > - \code{\link{extract_text_file_columns}}, > - \code{\link{matrix2databel}}, \code{\link{text2databel}}, > - \code{\linkS4class{databel}} > +\code{\link{apply2dfo}}, \code{\link{databel2matrix}}, > +\code{\link{databel2text}}, > +\code{\link{extract_text_file_columns}}, > +\code{\link{matrix2databel}}, \code{\link{text2databel}}, > +\code{\linkS4class{databel}} > } > > > Modified: pkg/DatABEL/man/apply2dfo.Rd > =================================================================== > --- pkg/DatABEL/man/apply2dfo.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/apply2dfo.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,22 +2,21 @@ > \alias{apply2dfo} > \title{applies a function to 'databel' object} > \usage{ > - apply2dfo(..., dfodata, anFUN = "lm", MAR = 2, procFUN, > - outclass = "matrix", outfile, type = "DOUBLE", > - transpose = TRUE) > +apply2dfo(..., dfodata, anFUN = "lm", MAR = 2, procFUN, > + outclass = "matrix", outfile, type = "DOUBLE", transpose = TRUE) > } > \arguments{ > \item{dfodata}{'databel' object which is iterated over} > > \item{anFUN}{user-defined analysis function} > > - \item{MAR}{which margin to iterate over (default = 2, > - usually these are the 'columns' used to store SNP data)} > + \item{MAR}{which margin to iteracte over (default = 2, > + usually these are 'columns' used to store SNP data)} > > \item{procFUN}{function to process the output and present > that as a fixed-number-of-columns matrix or fixed-length > - vector. Can be missing if one of the standard functions listed > - below is used. Pre-defined processors included are > + vector. Can be missing if standard functions listed below > + are used. Pre-defined processors included are > "process_lm_output" (can process functions "lm", "glm", > "coxph") and "process_simple_output" (process output from > "sum", "prod", "sum_not_NA" [no. non-missing obs], > @@ -26,9 +25,9 @@ > \item{outclass}{output to ("matrix" or "databel")} > > \item{outfile}{if output class is "databel", the > - generated object is bound to the outfile} > + generated object is bond to the outfile} > > - \item{type}{if output class is "databel", what data type > + \item{type}{if output class is "databel", what data tyoe > to use for storage} > > \item{transpose}{whether to transpose the output} > @@ -36,47 +35,43 @@ > \item{...}{arguments passed to the anFUN} > } > \value{ > - A matrix (or 'databel'-matrix) containing results of > - applying the function > +A matrix (or 'databel'-matrix) containing results of > +applying the function > } > \description{ > - An iterator applying a user-defined function to an object > - of 'databel-class'. > +An iterator applying a user-defined function to an object > +of 'databel-class' object > } > \examples{ > -a <- matrix(rnorm(50), 10, 5) > -rownames(a) <- paste("id", 1:10, sep="") > -colnames(a) <- paste("snp", 1:5, sep="") > -b <- as(a, "databel") > -apply(a, FUN="sum", MAR=2) > -apply2dfo(SNP, dfodata=b, anFUN="sum") > -tA <- apply2dfo(SNP, dfodata=b, anFUN="sum", > - outclass="databel", outfile="tmpA") > +a <- matrix(rnorm(50),10,5) > +rownames(a) <- paste("id",1:10,sep="") > +colnames(a) <- paste("snp",1:5,sep="") > +b <- as(a,"databel") > +apply(a,FUN="sum",MAR=2) > +apply2dfo(SNP,dfodata=b,anFUN="sum") > +tA <- apply2dfo(SNP,dfodata=b,anFUN="sum",outclass="databel",outfile="tmpA") > tA > -as(tA, "matrix") > -apply2dfo(SNP, dfodata=b, anFUN="sum", transpose=FALSE) > -tB <- apply2dfo(SNP, dfodata=b, anFUN="sum", transpose=FALSE, > - outclass="databel", outfile="tmpB") > +as(tA,"matrix") > +apply2dfo(SNP,dfodata=b,anFUN="sum",transpose=FALSE) > +tB <- apply2dfo(SNP,dfodata=b,anFUN="sum",transpose=FALSE,outclass="databel",outfile="tmpB") > tB > -as(tB, "matrix") > +as(tB,"matrix") > > -sex <- 1*(runif(10) > .5) > -trait <- rnorm(10) + sex + as(b[, 2], "vector") + as(b[, 2], "vector") * sex * 5 > -apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm") > -tC <- apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", > - outclass="databel", outfile="tmpC") > +sex <- 1*(runif(10)>.5) > +trait <- rnorm(10)+sex+as(b[,2],"vector")+as(b[,2],"vector")*sex*5 > +apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm") > +tC <- apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",outclass="databel",outfile="tmpC") > tC > -as(tC, "matrix") > -apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", transpose=FALSE) > -tD <- apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", transpose=FALSE, > - outclass="databel", outfile="tmpD") > +as(tC,"matrix") > +apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",transpose=FALSE) > +tD <- apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",transpose=FALSE,outclass="databel",outfile="tmpD") > tD > -as(tD, "matrix") > -rm(tA, tB, tC, tD) > +as(tD,"matrix") > +rm(tA,tB,tC,tD) > gc() > unlink("tmp*") > } > \author{ > - Yurii Aulchenko > +Yurii Aulchenko > } > > > Modified: pkg/DatABEL/man/databel.Rd > =================================================================== > --- pkg/DatABEL/man/databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,7 +2,7 @@ > \alias{databel} > \title{initiates databel object} > \usage{ > - databel(baseobject, cachesizeMb = 64, readonly = TRUE) > +databel(baseobject, cachesizeMb = 64, readonly = TRUE) > } > \arguments{ > \item{baseobject}{name of the file or > @@ -13,10 +13,10 @@ > \item{readonly}{readonly flag} > } > \description{ > - this is a simple wrapper for "new" function creating > - databel object > +this is a simple wrapper for "new" function creating > +databel object > } > \author{ > - Yurii Aulchenko > +Yurii Aulchenko > } > > > Modified: pkg/DatABEL/man/databel2matrix.Rd > =================================================================== > --- pkg/DatABEL/man/databel2matrix.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/databel2matrix.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,7 +2,7 @@ > \alias{databel2matrix} > \title{converts 'databel' to matrix} > \usage{ > - databel2matrix(from, rows, cols) > +databel2matrix(from, rows, cols) > } > \arguments{ > \item{from}{'databel' matrix} > @@ -12,15 +12,15 @@ > \item{cols}{which columns to include} > } > \value{ > - object of \code{\linkS4class{matrix}} class > +object of \code{\linkS4class{matrix}} class > } > \description{ > - Converts a \code{\linkS4class{databel}} object to a > - regular R matrix. This is the procedure used by the "as" > - converting to DatABEL objects, in which case a temporary > - file name is created. > +Converts a \code{\linkS4class{databel}} object to a regular > +R matrix. This is the procedure used by the "as" converting > +to DatABEL objects, in which case a temporary file name is > +created. > } > \author{ > - Stepan Yakovenko > +Stepan Yakovenko > } > > > Modified: pkg/DatABEL/man/databel2text.Rd > =================================================================== > --- pkg/DatABEL/man/databel2text.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/databel2text.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,8 +2,8 @@ > \alias{databel2text} > \title{Exports DatABEL object to a text file} > \usage{ > - databel2text(databel, file, NAString = "NA", > - row.names = TRUE, col.names = TRUE, transpose = FALSE) > +databel2text(databel, file, NAString = "NA", row.names = TRUE, > + col.names = TRUE, transpose = FALSE) > } > \arguments{ > \item{databel}{DatABEL object} > @@ -19,9 +19,9 @@ > \item{transpose}{whether the matrix should be transposed} > } > \description{ > - Exports DatABEL object to a text file > +Exports DatABEL object to a text file > } > \author{ > - Stepan Yakovenko > +Stepan Yakovenko > } > > > Modified: pkg/DatABEL/man/extract_text_file_columns.Rd > =================================================================== > --- pkg/DatABEL/man/extract_text_file_columns.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/extract_text_file_columns.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,7 +2,7 @@ > \alias{extract_text_file_columns} > \title{extracts columns from text file} > \usage{ > - extract_text_file_columns(file, whichcols) > +extract_text_file_columns(file, whichcols) > } > \arguments{ > \item{file}{file name} > @@ -10,11 +10,11 @@ > \item{whichcols}{which columns to extract} > } > \value{ > - matrix of strings with values from that columns > +matrix of strings with values from that columns > } > \description{ > - Extracts a column from text file to a matrix. If in a > - particular file line the number of columns is less then a > - column specified, returns last column! > +Extracts a column from text file to a matrix. If in a > +particular file line the number of columns is less then a > +column specified, returns last column! > } > > > Modified: pkg/DatABEL/man/get_temporary_file_name.Rd > =================================================================== > --- pkg/DatABEL/man/get_temporary_file_name.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/get_temporary_file_name.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,7 +2,7 @@ > \alias{get_temporary_file_name} > \title{generates temporary file name} > \usage{ > - get_temporary_file_name(path = ".", withFVext = TRUE) > +get_temporary_file_name(path = ".", withFVext = TRUE) > } > \arguments{ > \item{path}{path to directory where the temporary file > @@ -12,6 +12,6 @@ > of *FVD and *FVI files too} > } > \description{ > - function to generate temporary file name > +function to generate temporary file name > } > > > Modified: pkg/DatABEL/man/make_empty_fvf.Rd > =================================================================== > --- pkg/DatABEL/man/make_empty_fvf.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/make_empty_fvf.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,8 +2,8 @@ > \alias{make_empty_fvf} > \title{makes empty filevector object} > \usage{ > - make_empty_fvf(name, nvariables, nobservations, > - type = "DOUBLE", cachesizeMb = 64, readonly = FALSE) > +make_empty_fvf(name, nvariables, nobservations, type = "DOUBLE", > + cachesizeMb = 64, readonly = FALSE) > } > \arguments{ > \item{name}{name fo the file to be assoiated with new > @@ -24,10 +24,10 @@ > mode} > } > \value{ > - databel object; also file is created in file system > +databel object; also file is created in file system > } > \description{ > - function to generate empty filevector object (and disk > - files) > +function to generate empty filevector object (and disk > +files) > } > > > Modified: pkg/DatABEL/man/matrix2databel.Rd > =================================================================== > --- pkg/DatABEL/man/matrix2databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/matrix2databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,8 +2,8 @@ > \alias{matrix2databel} > \title{converts matrix to 'databel'} > \usage{ > - matrix2databel(from, filename, cachesizeMb = 64, > - type = "DOUBLE", readonly = FALSE) > +matrix2databel(from, filename, cachesizeMb = 64, type = "DOUBLE", > + readonly = FALSE) > } > \arguments{ > \item{from}{R matrix} > @@ -21,15 +21,15 @@ > only mode} > } > \value{ > - object of class \code{\linkS4class{databel}} > +object of class \code{\linkS4class{databel}} > } > \description{ > - Converts regular R matrix to \code{\linkS4class{databel}} > - object. This is the procedure used by "as" converting to > - DatABEL objects, in which case a temporary file name is > - created > +Converts regular R matrix to \code{\linkS4class{databel}} > +object. This is the procedure used by "as" converting to > +DatABEL objects, in which case a temporary file name is > +created > } > \author{ > - Yurii Aulchenko > +Yurii Aulchenko > } > > > Modified: pkg/DatABEL/man/process_lm_output.Rd > =================================================================== > --- pkg/DatABEL/man/process_lm_output.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/process_lm_output.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -5,7 +5,7 @@ > \alias{sum_not_NA} > \title{'apply2dfo'-associated functions} > \usage{ > - process_lm_output(lmo,verbosity=2) > +process_lm_output(lmo,verbosity=2) > } > \arguments{ > \item{lmo}{object returned by analysis with "lm", "glm", > @@ -14,14 +14,14 @@ > \item{verbosity}{verbosity} > } > \description{ > - A number of functions used in conjunction with > - 'apply2dfo'. Standardly supported apply2dfo's anFUN > - analysis functions include 'lm', 'glm', 'coxph', 'sum', > - 'prod', "sum_not_NA" (no. non-missing obs), and "sum_NA" > - (no. missing obs.). Pre-defined processing functions > - include "process_lm_output" (can process functions "lm", > - "glm", "coxph") and "process_simple_output" (process > - output from "sum", "prod", "sum_not_NA", "sum_NA") > +A number of functions used in conjunction with 'apply2dfo'. > +Standardly supported apply2dfo's anFUN analysis functions > +include 'lm', 'glm', 'coxph', 'sum', 'prod', "sum_not_NA" > +(no. non-missing obs), and "sum_NA" (no. missing obs.). > +Pre-defined processing functions include > +"process_lm_output" (can process functions "lm", "glm", > +"coxph") and "process_simple_output" (process output from > +"sum", "prod", "sum_not_NA", "sum_NA") > } > \examples{ > a <- matrix(rnorm(50),10,5) > @@ -37,6 +37,6 @@ > apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",procFUN="process_lm_output") > } > \seealso{ > - \link{apply2dfo} > +\link{apply2dfo} > } > > > Modified: pkg/DatABEL/man/text2databel.Rd > =================================================================== > --- pkg/DatABEL/man/text2databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) > +++ pkg/DatABEL/man/text2databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) > @@ -2,16 +2,16 @@ > \alias{text2databel} > \title{converts text file to filevector format} > \usage{ > - text2databel(infile, outfile, colnames, rownames, > - skipcols, skiprows, transpose = FALSE, > - R_matrix = FALSE, type = "DOUBLE", cachesizeMb = 64, > - readonly = TRUE, naString = "NA") > +text2databel(infile, outfile, colnames, rownames, skipcols, skiprows, > + transpose = FALSE, R_matrix = FALSE, type = "DOUBLE", > + cachesizeMb = 64, readonly = TRUE, naString = "NA", > + unlinkTmpTransposeFiles = TRUE) > } > \arguments{ > \item{infile}{input text file name} > > \item{outfile}{output filevector file name; if missing, > - it is set to infile + ".filevector"} > + it is set to infile+".filevector"} > > \item{colnames}{where are the column names stored? If > missing, no column names; if integer, this denotes the > @@ -48,33 +48,38 @@ > > \item{naString}{the string used for missing data > (default: NA)} > + > + \item{unlinkTmpTransposeFiles}{Boolean to indicate > + whether the intermediate "_fvtmp.fvi/d" files should be > + deleted. Default: TRUE. These intermediate files are > + generated while transposing the filevector files.} > } > \value{ > - The converted file is stored in the file system, a > - \link{databel-class} object connection to the file is > - returned. > +The converted file is stored in the file system, a > +\link{databel-class} object connection to the file is > +returned. > } > \description{ > - The file provides the data to be converted to filevector > - format. The file may provide the data only (no row and > - column names) in which case col/row names may be left > - empty or provided in separate files (in which case it is > - assumed that names are provided only for the imported > - columns/rows -- see skip-options). There is an option to > - skip a number of first ros and columns. The row and > - column names may also be provided in the file itself, in > - which case one needs to tell the row/column number > - providing column/row names. Unless option "R_matrix" is > - set to TRUE, it is asumed that the number of columns is > - always the same acorss the file. If above option is > - provided, it is assumed that both column and row names > - are provided in the file, and the first line contains one > - column less than other lines (such is the case with files > - produced from R using the function > - \code{write.table(..., col.names=TRUE, row.names=TRUE)}. > +The file provides the data to be converted to filevector > +format. The file may provide the data only (no row and > +column names) in which case col/row names may be left empty > +or provided in separate files (in which case it is assumed > +that names are provided only for the imported columns/rows > +-- see skip-options). There is an option to skip a number > +of first ros and columns. The row and column names may also > +be provided in the file itself, in which case one needs to > +tell the row/column number providing column/row names. > +Unless option "R_matrix" is set to TRUE, it is asumed that > +the number of columns is always the same acorss the file. > +If above option is provided, it is assumed that both column > +and row names are provided in the file, and the first line > +contains one column less than other lines (such is the case > +with files produced from R using the function > +\code{write.table(...,col.names=TRUE,row.names=TRUE)}. > } > \examples{ > -cat("this is an example which you can run if you can write to the file system\\n") > +cat("this is an example which you can run if you can write to the > +file system\\n") > > \dontrun{ > > @@ -82,19 +87,19 @@ > NC <- 5 > NR <- 10 > data <- matrix(rnorm(NC*NR),ncol=NC,nrow=NR) > -rownames(data) <- paste("r", 1:NR, sep="") > -colnames(data) <- paste("c", 1:NC, sep="") > +rownames(data) <- paste("r",1:NR,sep="") > +colnames(data) <- paste("c",1:NC,sep="") > data > > # create text files > -write.table(data, file="test_matrix_dimnames.dat", > - row.names=TRUE, col.names=TRUE, quote=FALSE) > -write.table(data, file="test_matrix_colnames.dat", > - row.names=FALSE, col.names=TRUE, quote=FALSE) > -write.table(data, file="test_matrix_rownames.dat", > - row.names=TRUE, col.names=FALSE, quote=FALSE) > -write.table(data, file="test_matrix_NOnames.dat", > - row.names=FALSE, col.names=FALSE, quote=FALSE) > +write.table(data, file="test_matrix_dimnames.dat", row.names=TRUE, > + col.names=TRUE, quote=FALSE) > +write.table(data, file="test_matrix_colnames.dat", row.names=FALSE, > + col.names=TRUE, quote=FALSE) > +write.table(data, file="test_matrix_rownames.dat", row.names=TRUE, > + col.names=FALSE, quote=FALSE) > +write.table(data, file="test_matrix_NOnames.dat", row.names=FALSE, > + col.names=FALSE, quote=FALSE) > write(colnames(data), file="test_matrix.colnames") > write(rownames(data), file="test_matrix.rownames") > > @@ -107,25 +112,28 @@ > > # convert text two filevector format > > -text2databel(infile="test_matrix_NOnames.dat", outfile="test_matrix_NOnames.fvf", > +text2databel(infile="test_matrix_NOnames.dat", > + outfile="test_matrix_NOnames.fvf", > colnames="test_matrix.colnames", > rownames="test_matrix.rownames") > x <- databel("test_matrix_NOnames.fvf") > if (!identical(data, as(x, "matrix"))) stop("not identical data") > > -text2databel(infile="test_matrix_NOnames.dat", outfile="test_matrix_NOnames_T.fvf", > +text2databel(infile="test_matrix_NOnames.dat", > + outfile="test_matrix_NOnames_T.fvf", > colnames="test_matrix.colnames", > - rownames="test_matrix.rownames", > - transpose=TRUE) > + rownames="test_matrix.rownames", transpose=TRUE) > x <- databel("test_matrix_NOnames_T.fvf") > if (!identical(data, t(as(x, "matrix")))) stop("not identical data") > > -text2databel(infile="test_matrix_rownames.dat", outfile="test_matrix_rownames.fvf", > +text2databel(infile="test_matrix_rownames.dat", > + outfile="test_matrix_rownames.fvf", > rownames=1, colnames="test_matrix.colnames") > x <- databel("test_matrix_rownames.fvf") > if (!identical(data, as(x, "matrix"))) stop("not identical data") > > -text2databel(infile="test_matrix_colnames.dat", outfile="test_matrix_colnames.fvf", > +text2databel(infile="test_matrix_colnames.dat", > + outfile="test_matrix_colnames.fvf", > colnames=1, rownames="test_matrix.rownames") > x <- databel("test_matrix_colnames.fvf") > if (!identical(data, as(x, "matrix"))) stop("not identical data") > @@ -144,7 +152,8 @@ > write.table(newmat, file="test_matrix_strange.dat", > col.names=FALSE, row.names=FALSE, quote=FALSE) > > -text2databel(infile="test_matrix_strange.dat", outfile="test_matrix_strange.fvf", > +text2databel(infile="test_matrix_strange.dat", > + outfile="test_matrix_strange.fvf", > colnames=2, rownames=3) > x <- databel("test_matrix_strange.fvf") > if (!identical(data, as(x, "matrix"))) stop("not identical data") > @@ -152,5 +161,6 @@ > } > } > \author{ > - Yurii Aulchenko > +Yurii Aulchenko > } > + > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Sat Mar 15 17:43:30 2014 From: lennart at karssen.org (L.C. Karssen) Date: Sat, 15 Mar 2014 17:43:30 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1643 - pkg/DatABEL/man In-Reply-To: <1947174170152473749@unknownmsgid> References: <20140315102108.633B4186D8E@r-forge.r-project.org> <1947174170152473749@unknownmsgid> Message-ID: <53248332.9000905@karssen.org> Will do! I've updated the makedistrib_DatABEL.sh script some time ago to add that check as well. With that databel-class.Rd I noticed that roxygen2 had remove a lot of things from the .Rd file. So it's something that I need to check in more detail. Also roxygen2 has reduced the NAMESPACE file to one line: exportClasses(databel) Another thing that I need to check before attempting an upload. Best, Lennart. On 15-03-14 14:58, Yurii Aulchenko wrote: > Re: "needs to be checked" > > check --as-cran > > May be useful - they do check if docs are in good shape > > Y > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Mar 15, 2014, at 11:21 AM, "noreply at r-forge.r-project.org" wrote: >> >> Author: lckarssen >> Date: 2014-03-15 11:21:08 +0100 (Sat, 15 Mar 2014) >> New Revision: 1643 >> >> Modified: >> pkg/DatABEL/man/DatABEL-package.Rd >> pkg/DatABEL/man/apply2dfo.Rd >> pkg/DatABEL/man/databel.Rd >> pkg/DatABEL/man/databel2matrix.Rd >> pkg/DatABEL/man/databel2text.Rd >> pkg/DatABEL/man/extract_text_file_columns.Rd >> pkg/DatABEL/man/get_temporary_file_name.Rd >> pkg/DatABEL/man/make_empty_fvf.Rd >> pkg/DatABEL/man/matrix2databel.Rd >> pkg/DatABEL/man/process_lm_output.Rd >> pkg/DatABEL/man/text2databel.Rd >> Log: >> Updated manual files for DatABEL after running Roxygen2. Only layout changes. >> I excluded databel-class.Rd, because it was missing a lot of \alias{} lines and had some other changes. Needs to be checked. >> >> >> Modified: pkg/DatABEL/man/DatABEL-package.Rd >> =================================================================== >> --- pkg/DatABEL/man/DatABEL-package.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/DatABEL-package.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -4,20 +4,20 @@ >> \alias{DatABEL-package} >> \title{DatABEL package for fast consecutive access to large out-of-RAM stored matrices} >> \description{ >> - A package interfacing FILEVECTOR C++ library for storage >> - of and fast consecutive access to large data matrices in >> - out-of-RAM disk mode with regulated cache size. Columns >> - of matrix are accessible very quickly. >> +A package interfacing FILEVECTOR C++ library for storage of >> +and fast consecutive access to large data matrices in >> +out-of-RAM disk mode with regulated cache size. Columns of >> +matrix are accessible very quickly. >> } >> \author{ >> - Yurii Aulchenko (R code), Stepan Yakovenko (R and C++ >> - code), Andrey Chernyh (C++ code) >> +Yurii Aulchenko (R code), Stepan Yakovenko (R and C++ >> +code), Andrey Chernyh (C++ code) >> } >> \seealso{ >> - \code{\link{apply2dfo}}, \code{\link{databel2matrix}}, >> - \code{\link{databel2text}}, >> - \code{\link{extract_text_file_columns}}, >> - \code{\link{matrix2databel}}, \code{\link{text2databel}}, >> - \code{\linkS4class{databel}} >> +\code{\link{apply2dfo}}, \code{\link{databel2matrix}}, >> +\code{\link{databel2text}}, >> +\code{\link{extract_text_file_columns}}, >> +\code{\link{matrix2databel}}, \code{\link{text2databel}}, >> +\code{\linkS4class{databel}} >> } >> >> >> Modified: pkg/DatABEL/man/apply2dfo.Rd >> =================================================================== >> --- pkg/DatABEL/man/apply2dfo.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/apply2dfo.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,22 +2,21 @@ >> \alias{apply2dfo} >> \title{applies a function to 'databel' object} >> \usage{ >> - apply2dfo(..., dfodata, anFUN = "lm", MAR = 2, procFUN, >> - outclass = "matrix", outfile, type = "DOUBLE", >> - transpose = TRUE) >> +apply2dfo(..., dfodata, anFUN = "lm", MAR = 2, procFUN, >> + outclass = "matrix", outfile, type = "DOUBLE", transpose = TRUE) >> } >> \arguments{ >> \item{dfodata}{'databel' object which is iterated over} >> >> \item{anFUN}{user-defined analysis function} >> >> - \item{MAR}{which margin to iterate over (default = 2, >> - usually these are the 'columns' used to store SNP data)} >> + \item{MAR}{which margin to iteracte over (default = 2, >> + usually these are 'columns' used to store SNP data)} >> >> \item{procFUN}{function to process the output and present >> that as a fixed-number-of-columns matrix or fixed-length >> - vector. Can be missing if one of the standard functions listed >> - below is used. Pre-defined processors included are >> + vector. Can be missing if standard functions listed below >> + are used. Pre-defined processors included are >> "process_lm_output" (can process functions "lm", "glm", >> "coxph") and "process_simple_output" (process output from >> "sum", "prod", "sum_not_NA" [no. non-missing obs], >> @@ -26,9 +25,9 @@ >> \item{outclass}{output to ("matrix" or "databel")} >> >> \item{outfile}{if output class is "databel", the >> - generated object is bound to the outfile} >> + generated object is bond to the outfile} >> >> - \item{type}{if output class is "databel", what data type >> + \item{type}{if output class is "databel", what data tyoe >> to use for storage} >> >> \item{transpose}{whether to transpose the output} >> @@ -36,47 +35,43 @@ >> \item{...}{arguments passed to the anFUN} >> } >> \value{ >> - A matrix (or 'databel'-matrix) containing results of >> - applying the function >> +A matrix (or 'databel'-matrix) containing results of >> +applying the function >> } >> \description{ >> - An iterator applying a user-defined function to an object >> - of 'databel-class'. >> +An iterator applying a user-defined function to an object >> +of 'databel-class' object >> } >> \examples{ >> -a <- matrix(rnorm(50), 10, 5) >> -rownames(a) <- paste("id", 1:10, sep="") >> -colnames(a) <- paste("snp", 1:5, sep="") >> -b <- as(a, "databel") >> -apply(a, FUN="sum", MAR=2) >> -apply2dfo(SNP, dfodata=b, anFUN="sum") >> -tA <- apply2dfo(SNP, dfodata=b, anFUN="sum", >> - outclass="databel", outfile="tmpA") >> +a <- matrix(rnorm(50),10,5) >> +rownames(a) <- paste("id",1:10,sep="") >> +colnames(a) <- paste("snp",1:5,sep="") >> +b <- as(a,"databel") >> +apply(a,FUN="sum",MAR=2) >> +apply2dfo(SNP,dfodata=b,anFUN="sum") >> +tA <- apply2dfo(SNP,dfodata=b,anFUN="sum",outclass="databel",outfile="tmpA") >> tA >> -as(tA, "matrix") >> -apply2dfo(SNP, dfodata=b, anFUN="sum", transpose=FALSE) >> -tB <- apply2dfo(SNP, dfodata=b, anFUN="sum", transpose=FALSE, >> - outclass="databel", outfile="tmpB") >> +as(tA,"matrix") >> +apply2dfo(SNP,dfodata=b,anFUN="sum",transpose=FALSE) >> +tB <- apply2dfo(SNP,dfodata=b,anFUN="sum",transpose=FALSE,outclass="databel",outfile="tmpB") >> tB >> -as(tB, "matrix") >> +as(tB,"matrix") >> >> -sex <- 1*(runif(10) > .5) >> -trait <- rnorm(10) + sex + as(b[, 2], "vector") + as(b[, 2], "vector") * sex * 5 >> -apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm") >> -tC <- apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", >> - outclass="databel", outfile="tmpC") >> +sex <- 1*(runif(10)>.5) >> +trait <- rnorm(10)+sex+as(b[,2],"vector")+as(b[,2],"vector")*sex*5 >> +apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm") >> +tC <- apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",outclass="databel",outfile="tmpC") >> tC >> -as(tC, "matrix") >> -apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", transpose=FALSE) >> -tD <- apply2dfo(trait~SNP*sex, dfodata=b, anFUN="lm", transpose=FALSE, >> - outclass="databel", outfile="tmpD") >> +as(tC,"matrix") >> +apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",transpose=FALSE) >> +tD <- apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",transpose=FALSE,outclass="databel",outfile="tmpD") >> tD >> -as(tD, "matrix") >> -rm(tA, tB, tC, tD) >> +as(tD,"matrix") >> +rm(tA,tB,tC,tD) >> gc() >> unlink("tmp*") >> } >> \author{ >> - Yurii Aulchenko >> +Yurii Aulchenko >> } >> >> >> Modified: pkg/DatABEL/man/databel.Rd >> =================================================================== >> --- pkg/DatABEL/man/databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,7 +2,7 @@ >> \alias{databel} >> \title{initiates databel object} >> \usage{ >> - databel(baseobject, cachesizeMb = 64, readonly = TRUE) >> +databel(baseobject, cachesizeMb = 64, readonly = TRUE) >> } >> \arguments{ >> \item{baseobject}{name of the file or >> @@ -13,10 +13,10 @@ >> \item{readonly}{readonly flag} >> } >> \description{ >> - this is a simple wrapper for "new" function creating >> - databel object >> +this is a simple wrapper for "new" function creating >> +databel object >> } >> \author{ >> - Yurii Aulchenko >> +Yurii Aulchenko >> } >> >> >> Modified: pkg/DatABEL/man/databel2matrix.Rd >> =================================================================== >> --- pkg/DatABEL/man/databel2matrix.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/databel2matrix.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,7 +2,7 @@ >> \alias{databel2matrix} >> \title{converts 'databel' to matrix} >> \usage{ >> - databel2matrix(from, rows, cols) >> +databel2matrix(from, rows, cols) >> } >> \arguments{ >> \item{from}{'databel' matrix} >> @@ -12,15 +12,15 @@ >> \item{cols}{which columns to include} >> } >> \value{ >> - object of \code{\linkS4class{matrix}} class >> +object of \code{\linkS4class{matrix}} class >> } >> \description{ >> - Converts a \code{\linkS4class{databel}} object to a >> - regular R matrix. This is the procedure used by the "as" >> - converting to DatABEL objects, in which case a temporary >> - file name is created. >> +Converts a \code{\linkS4class{databel}} object to a regular >> +R matrix. This is the procedure used by the "as" converting >> +to DatABEL objects, in which case a temporary file name is >> +created. >> } >> \author{ >> - Stepan Yakovenko >> +Stepan Yakovenko >> } >> >> >> Modified: pkg/DatABEL/man/databel2text.Rd >> =================================================================== >> --- pkg/DatABEL/man/databel2text.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/databel2text.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,8 +2,8 @@ >> \alias{databel2text} >> \title{Exports DatABEL object to a text file} >> \usage{ >> - databel2text(databel, file, NAString = "NA", >> - row.names = TRUE, col.names = TRUE, transpose = FALSE) >> +databel2text(databel, file, NAString = "NA", row.names = TRUE, >> + col.names = TRUE, transpose = FALSE) >> } >> \arguments{ >> \item{databel}{DatABEL object} >> @@ -19,9 +19,9 @@ >> \item{transpose}{whether the matrix should be transposed} >> } >> \description{ >> - Exports DatABEL object to a text file >> +Exports DatABEL object to a text file >> } >> \author{ >> - Stepan Yakovenko >> +Stepan Yakovenko >> } >> >> >> Modified: pkg/DatABEL/man/extract_text_file_columns.Rd >> =================================================================== >> --- pkg/DatABEL/man/extract_text_file_columns.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/extract_text_file_columns.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,7 +2,7 @@ >> \alias{extract_text_file_columns} >> \title{extracts columns from text file} >> \usage{ >> - extract_text_file_columns(file, whichcols) >> +extract_text_file_columns(file, whichcols) >> } >> \arguments{ >> \item{file}{file name} >> @@ -10,11 +10,11 @@ >> \item{whichcols}{which columns to extract} >> } >> \value{ >> - matrix of strings with values from that columns >> +matrix of strings with values from that columns >> } >> \description{ >> - Extracts a column from text file to a matrix. If in a >> - particular file line the number of columns is less then a >> - column specified, returns last column! >> +Extracts a column from text file to a matrix. If in a >> +particular file line the number of columns is less then a >> +column specified, returns last column! >> } >> >> >> Modified: pkg/DatABEL/man/get_temporary_file_name.Rd >> =================================================================== >> --- pkg/DatABEL/man/get_temporary_file_name.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/get_temporary_file_name.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,7 +2,7 @@ >> \alias{get_temporary_file_name} >> \title{generates temporary file name} >> \usage{ >> - get_temporary_file_name(path = ".", withFVext = TRUE) >> +get_temporary_file_name(path = ".", withFVext = TRUE) >> } >> \arguments{ >> \item{path}{path to directory where the temporary file >> @@ -12,6 +12,6 @@ >> of *FVD and *FVI files too} >> } >> \description{ >> - function to generate temporary file name >> +function to generate temporary file name >> } >> >> >> Modified: pkg/DatABEL/man/make_empty_fvf.Rd >> =================================================================== >> --- pkg/DatABEL/man/make_empty_fvf.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/make_empty_fvf.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,8 +2,8 @@ >> \alias{make_empty_fvf} >> \title{makes empty filevector object} >> \usage{ >> - make_empty_fvf(name, nvariables, nobservations, >> - type = "DOUBLE", cachesizeMb = 64, readonly = FALSE) >> +make_empty_fvf(name, nvariables, nobservations, type = "DOUBLE", >> + cachesizeMb = 64, readonly = FALSE) >> } >> \arguments{ >> \item{name}{name fo the file to be assoiated with new >> @@ -24,10 +24,10 @@ >> mode} >> } >> \value{ >> - databel object; also file is created in file system >> +databel object; also file is created in file system >> } >> \description{ >> - function to generate empty filevector object (and disk >> - files) >> +function to generate empty filevector object (and disk >> +files) >> } >> >> >> Modified: pkg/DatABEL/man/matrix2databel.Rd >> =================================================================== >> --- pkg/DatABEL/man/matrix2databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/matrix2databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,8 +2,8 @@ >> \alias{matrix2databel} >> \title{converts matrix to 'databel'} >> \usage{ >> - matrix2databel(from, filename, cachesizeMb = 64, >> - type = "DOUBLE", readonly = FALSE) >> +matrix2databel(from, filename, cachesizeMb = 64, type = "DOUBLE", >> + readonly = FALSE) >> } >> \arguments{ >> \item{from}{R matrix} >> @@ -21,15 +21,15 @@ >> only mode} >> } >> \value{ >> - object of class \code{\linkS4class{databel}} >> +object of class \code{\linkS4class{databel}} >> } >> \description{ >> - Converts regular R matrix to \code{\linkS4class{databel}} >> - object. This is the procedure used by "as" converting to >> - DatABEL objects, in which case a temporary file name is >> - created >> +Converts regular R matrix to \code{\linkS4class{databel}} >> +object. This is the procedure used by "as" converting to >> +DatABEL objects, in which case a temporary file name is >> +created >> } >> \author{ >> - Yurii Aulchenko >> +Yurii Aulchenko >> } >> >> >> Modified: pkg/DatABEL/man/process_lm_output.Rd >> =================================================================== >> --- pkg/DatABEL/man/process_lm_output.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/process_lm_output.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -5,7 +5,7 @@ >> \alias{sum_not_NA} >> \title{'apply2dfo'-associated functions} >> \usage{ >> - process_lm_output(lmo,verbosity=2) >> +process_lm_output(lmo,verbosity=2) >> } >> \arguments{ >> \item{lmo}{object returned by analysis with "lm", "glm", >> @@ -14,14 +14,14 @@ >> \item{verbosity}{verbosity} >> } >> \description{ >> - A number of functions used in conjunction with >> - 'apply2dfo'. Standardly supported apply2dfo's anFUN >> - analysis functions include 'lm', 'glm', 'coxph', 'sum', >> - 'prod', "sum_not_NA" (no. non-missing obs), and "sum_NA" >> - (no. missing obs.). Pre-defined processing functions >> - include "process_lm_output" (can process functions "lm", >> - "glm", "coxph") and "process_simple_output" (process >> - output from "sum", "prod", "sum_not_NA", "sum_NA") >> +A number of functions used in conjunction with 'apply2dfo'. >> +Standardly supported apply2dfo's anFUN analysis functions >> +include 'lm', 'glm', 'coxph', 'sum', 'prod', "sum_not_NA" >> +(no. non-missing obs), and "sum_NA" (no. missing obs.). >> +Pre-defined processing functions include >> +"process_lm_output" (can process functions "lm", "glm", >> +"coxph") and "process_simple_output" (process output from >> +"sum", "prod", "sum_not_NA", "sum_NA") >> } >> \examples{ >> a <- matrix(rnorm(50),10,5) >> @@ -37,6 +37,6 @@ >> apply2dfo(trait~SNP*sex,dfodata=b,anFUN="lm",procFUN="process_lm_output") >> } >> \seealso{ >> - \link{apply2dfo} >> +\link{apply2dfo} >> } >> >> >> Modified: pkg/DatABEL/man/text2databel.Rd >> =================================================================== >> --- pkg/DatABEL/man/text2databel.Rd 2014-03-15 10:12:34 UTC (rev 1642) >> +++ pkg/DatABEL/man/text2databel.Rd 2014-03-15 10:21:08 UTC (rev 1643) >> @@ -2,16 +2,16 @@ >> \alias{text2databel} >> \title{converts text file to filevector format} >> \usage{ >> - text2databel(infile, outfile, colnames, rownames, >> - skipcols, skiprows, transpose = FALSE, >> - R_matrix = FALSE, type = "DOUBLE", cachesizeMb = 64, >> - readonly = TRUE, naString = "NA") >> +text2databel(infile, outfile, colnames, rownames, skipcols, skiprows, >> + transpose = FALSE, R_matrix = FALSE, type = "DOUBLE", >> + cachesizeMb = 64, readonly = TRUE, naString = "NA", >> + unlinkTmpTransposeFiles = TRUE) >> } >> \arguments{ >> \item{infile}{input text file name} >> >> \item{outfile}{output filevector file name; if missing, >> - it is set to infile + ".filevector"} >> + it is set to infile+".filevector"} >> >> \item{colnames}{where are the column names stored? If >> missing, no column names; if integer, this denotes the >> @@ -48,33 +48,38 @@ >> >> \item{naString}{the string used for missing data >> (default: NA)} >> + >> + \item{unlinkTmpTransposeFiles}{Boolean to indicate >> + whether the intermediate "_fvtmp.fvi/d" files should be >> + deleted. Default: TRUE. These intermediate files are >> + generated while transposing the filevector files.} >> } >> \value{ >> - The converted file is stored in the file system, a >> - \link{databel-class} object connection to the file is >> - returned. >> +The converted file is stored in the file system, a >> +\link{databel-class} object connection to the file is >> +returned. >> } >> \description{ >> - The file provides the data to be converted to filevector >> - format. The file may provide the data only (no row and >> - column names) in which case col/row names may be left >> - empty or provided in separate files (in which case it is >> - assumed that names are provided only for the imported >> - columns/rows -- see skip-options). There is an option to >> - skip a number of first ros and columns. The row and >> - column names may also be provided in the file itself, in >> - which case one needs to tell the row/column number >> - providing column/row names. Unless option "R_matrix" is >> - set to TRUE, it is asumed that the number of columns is >> - always the same acorss the file. If above option is >> - provided, it is assumed that both column and row names >> - are provided in the file, and the first line contains one >> - column less than other lines (such is the case with files >> - produced from R using the function >> - \code{write.table(..., col.names=TRUE, row.names=TRUE)}. >> +The file provides the data to be converted to filevector >> +format. The file may provide the data only (no row and >> +column names) in which case col/row names may be left empty >> +or provided in separate files (in which case it is assumed >> +that names are provided only for the imported columns/rows >> +-- see skip-options). There is an option to skip a number >> +of first ros and columns. The row and column names may also >> +be provided in the file itself, in which case one needs to >> +tell the row/column number providing column/row names. >> +Unless option "R_matrix" is set to TRUE, it is asumed that >> +the number of columns is always the same acorss the file. >> +If above option is provided, it is assumed that both column >> +and row names are provided in the file, and the first line >> +contains one column less than other lines (such is the case >> +with files produced from R using the function >> +\code{write.table(...,col.names=TRUE,row.names=TRUE)}. >> } >> \examples{ >> -cat("this is an example which you can run if you can write to the file system\\n") >> +cat("this is an example which you can run if you can write to the >> +file system\\n") >> >> \dontrun{ >> >> @@ -82,19 +87,19 @@ >> NC <- 5 >> NR <- 10 >> data <- matrix(rnorm(NC*NR),ncol=NC,nrow=NR) >> -rownames(data) <- paste("r", 1:NR, sep="") >> -colnames(data) <- paste("c", 1:NC, sep="") >> +rownames(data) <- paste("r",1:NR,sep="") >> +colnames(data) <- paste("c",1:NC,sep="") >> data >> >> # create text files >> -write.table(data, file="test_matrix_dimnames.dat", >> - row.names=TRUE, col.names=TRUE, quote=FALSE) >> -write.table(data, file="test_matrix_colnames.dat", >> - row.names=FALSE, col.names=TRUE, quote=FALSE) >> -write.table(data, file="test_matrix_rownames.dat", >> - row.names=TRUE, col.names=FALSE, quote=FALSE) >> -write.table(data, file="test_matrix_NOnames.dat", >> - row.names=FALSE, col.names=FALSE, quote=FALSE) >> +write.table(data, file="test_matrix_dimnames.dat", row.names=TRUE, >> + col.names=TRUE, quote=FALSE) >> +write.table(data, file="test_matrix_colnames.dat", row.names=FALSE, >> + col.names=TRUE, quote=FALSE) >> +write.table(data, file="test_matrix_rownames.dat", row.names=TRUE, >> + col.names=FALSE, quote=FALSE) >> +write.table(data, file="test_matrix_NOnames.dat", row.names=FALSE, >> + col.names=FALSE, quote=FALSE) >> write(colnames(data), file="test_matrix.colnames") >> write(rownames(data), file="test_matrix.rownames") >> >> @@ -107,25 +112,28 @@ >> >> # convert text two filevector format >> >> -text2databel(infile="test_matrix_NOnames.dat", outfile="test_matrix_NOnames.fvf", >> +text2databel(infile="test_matrix_NOnames.dat", >> + outfile="test_matrix_NOnames.fvf", >> colnames="test_matrix.colnames", >> rownames="test_matrix.rownames") >> x <- databel("test_matrix_NOnames.fvf") >> if (!identical(data, as(x, "matrix"))) stop("not identical data") >> >> -text2databel(infile="test_matrix_NOnames.dat", outfile="test_matrix_NOnames_T.fvf", >> +text2databel(infile="test_matrix_NOnames.dat", >> + outfile="test_matrix_NOnames_T.fvf", >> colnames="test_matrix.colnames", >> - rownames="test_matrix.rownames", >> - transpose=TRUE) >> + rownames="test_matrix.rownames", transpose=TRUE) >> x <- databel("test_matrix_NOnames_T.fvf") >> if (!identical(data, t(as(x, "matrix")))) stop("not identical data") >> >> -text2databel(infile="test_matrix_rownames.dat", outfile="test_matrix_rownames.fvf", >> +text2databel(infile="test_matrix_rownames.dat", >> + outfile="test_matrix_rownames.fvf", >> rownames=1, colnames="test_matrix.colnames") >> x <- databel("test_matrix_rownames.fvf") >> if (!identical(data, as(x, "matrix"))) stop("not identical data") >> >> -text2databel(infile="test_matrix_colnames.dat", outfile="test_matrix_colnames.fvf", >> +text2databel(infile="test_matrix_colnames.dat", >> + outfile="test_matrix_colnames.fvf", >> colnames=1, rownames="test_matrix.rownames") >> x <- databel("test_matrix_colnames.fvf") >> if (!identical(data, as(x, "matrix"))) stop("not identical data") >> @@ -144,7 +152,8 @@ >> write.table(newmat, file="test_matrix_strange.dat", >> col.names=FALSE, row.names=FALSE, quote=FALSE) >> >> -text2databel(infile="test_matrix_strange.dat", outfile="test_matrix_strange.fvf", >> +text2databel(infile="test_matrix_strange.dat", >> + outfile="test_matrix_strange.fvf", >> colnames=2, rownames=3) >> x <- databel("test_matrix_strange.fvf") >> if (!identical(data, as(x, "matrix"))) stop("not identical data") >> @@ -152,5 +161,6 @@ >> } >> } >> \author{ >> - Yurii Aulchenko >> +Yurii Aulchenko >> } >> + >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Sat Mar 15 17:45:34 2014 From: lennart at karssen.org (L.C. Karssen) Date: Sat, 15 Mar 2014 17:45:34 +0100 Subject: [GenABEL-dev] Inconsistent headers for ProbABEL outputs (dose vs. probs) In-Reply-To: References: <5317A0F1.8010403@karssen.org> Message-ID: <532483AE.1030107@karssen.org> Dear Yurii, dear all, On 12-03-14 09:26, Yury Aulchenko wrote: > I am for unification + more clear format (name beta_SNP_addA1 > indicates that A1 is effect, and hence A2 is the reference if I > understand correctly; need to double check as usual) allele (so, format 2) Glad to hear that you agree. > > "chi2_SNP_A1" is weird though as chi2 does not relate to specific allele used (would be the same if we swap the reference; it is only Z whose sign is sensitive to ref/eff alleles) Good point! I'll need to look into that. > > indeed it may disturb pipelines, so we need to make that very clear etc. Something for v0.5, not for the upcoming 0.4.3 release. Thanks for thinking along. Lennart. > > best wishes, > Yurii > > On Mar 5, 2014, at 23:10, L.C. Karssen wrote: > >> Dear list, >> >> For those who haven't followed the discussion on this list I had with >> Lucho Dimitrov [1], here's a summary: >> >> While debugging the problem that ProbABEL's make check gives errors on >> Solaris we found out that this is due to the use of the -I option to the >> diff command, which is present in GNU's diff, but not in the Solaris >> version. >> >> The -I option is used to ignore the header line when comparing the >> output for the additive model calculated with dosage data as input vs. >> using probability data as input. >> >> Lucho wondered why the column headers are different in the first place. >> A good point, IMHO. The header for dosage-based output is: >> >> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position >> beta_SNP_add sebeta_SNP_add chi2_SNP >> >> For probability-based output, the headers is: >> >> name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position >> beta_SNP_addA1 sebeta_SNP_addA1 chi2_SNP_A1 >> >> Why don't we harmonise these two headers? I would suggest going with the >> second header: >> - it clearly indicates which allele is used as reference when >> calculating beta >> - it's consistent with the other probability-base output headers. >> >> Pros: >> - more consistent output >> - simpler checks >> - compatibility with other OSes (e.g. Solaris) >> >> Cons: >> - Change of output format may disturb current pipelines (so definitely >> something for a major increase in version number) >> >> >> What do you think? Any other ideas, pros, cons? >> For now I've filed a bug for this issue [2] >> >> Thanks for thinking along, >> >> Lennart. >> >> >> [1] >> http://lists.r-forge.r-project.org/pipermail/genabel-devel/2014-March/000993.html >> [2] >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5409&group_id=505&atid=2058 >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Mar 19 23:57:35 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 19 Mar 2014 23:57:35 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1600 - pkg/OmicABEL/src In-Reply-To: <20140207173225.6826A186CEB@r-forge.r-project.org> References: <20140207173225.6826A186CEB@r-forge.r-project.org> Message-ID: Deiego, what is the "p" (n^2 -> np) in the Log message? On Fri, Feb 7, 2014 at 6:32 PM, wrote: > Author: dfabregat > Date: 2014-02-07 18:32:24 +0100 (Fri, 07 Feb 2014) > New Revision: 1600 > > Modified: > pkg/OmicABEL/src/REML.c > Log: > Performance improvement for REML estimation. > Reusing precomputed data to replace an expensive > n^2 gemv for a cheaper n*p gemv. > > > Modified: pkg/OmicABEL/src/REML.c > =================================================================== > --- pkg/OmicABEL/src/REML.c 2014-02-06 21:29:08 UTC (rev 1599) > +++ pkg/OmicABEL/src/REML.c 2014-02-07 17:32:24 UTC (rev 1600) > @@ -194,10 +194,15 @@ > // loglik = a + b > // a -> log(det(M)) > // b -> YmXB' inv(M) YmXB > - dgemv_(TRANS, > + /*dgemv_(TRANS, > &n, &n, > &ONE, Z, &n, YmXB, &iONE, > - &ZERO, ZtY_upd, &iONE); > + &ZERO, ZtY_upd, &iONE);*/ > + memcpy( ZtY_upd, ZtY, n * sizeof(double) ); > + dgemv_( NO_TRANS, > + &n, &wXL, > + &MINUS_ONE, ZtX, &n, beta, &iONE, > + &ONE, ZtY_upd, &iONE ); > // YmXB' * inv( M ) * YmXB > *loglik = 0.0; > for (i = 0; i < n; i++ ) > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabregat at aices.rwth-aachen.de Thu Mar 20 00:02:32 2014 From: fabregat at aices.rwth-aachen.de (Diego Fabregat) Date: Thu, 20 Mar 2014 00:02:32 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1600 - pkg/OmicABEL/src In-Reply-To: References: <20140207173225.6826A186CEB@r-forge.r-project.org> Message-ID: <532A2208.8050001@aices.rwth-aachen.de> From the log, it looks like by p I mean the width of XL (i.e., intercept + covariates). On 03/19/2014 11:57 PM, Yurii Aulchenko wrote: > Deiego, what is the "p" (n^2 -> np) in the Log message? > > > On Fri, Feb 7, 2014 at 6:32 PM, > wrote: > > Author: dfabregat > Date: 2014-02-07 18:32:24 +0100 (Fri, 07 Feb 2014) > New Revision: 1600 > > Modified: > pkg/OmicABEL/src/REML.c > Log: > Performance improvement for REML estimation. > Reusing precomputed data to replace an expensive > n^2 gemv for a cheaper n*p gemv. > > > Modified: pkg/OmicABEL/src/REML.c > =================================================================== > --- pkg/OmicABEL/src/REML.c 2014-02-06 21:29:08 UTC (rev 1599) > +++ pkg/OmicABEL/src/REML.c 2014-02-07 17:32:24 UTC (rev 1600) > @@ -194,10 +194,15 @@ > // loglik = a + b > // a -> log(det(M)) > // b -> YmXB' inv(M) YmXB > - dgemv_(TRANS, > + /*dgemv_(TRANS, > &n, &n, > &ONE, Z, &n, YmXB, &iONE, > - &ZERO, ZtY_upd, &iONE); > + &ZERO, ZtY_upd, &iONE);*/ > + memcpy( ZtY_upd, ZtY, n * sizeof(double) ); > + dgemv_( NO_TRANS, > + &n, &wXL, > + &MINUS_ONE, ZtX, &n, beta, &iONE, > + &ONE, ZtY_upd, &iONE ); > // YmXB' * inv( M ) * YmXB > *loglik = 0.0; > for (i = 0; i < n; i++ ) > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > > > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Thu Mar 20 00:13:45 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Thu, 20 Mar 2014 00:13:45 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1600 - pkg/OmicABEL/src In-Reply-To: <532A2208.8050001@aices.rwth-aachen.de> References: <20140207173225.6826A186CEB@r-forge.r-project.org> <532A2208.8050001@aices.rwth-aachen.de> Message-ID: wow! from n^2 to n*p is not a trivial speedup then - assuming n is the sample size :) Y On Mar 20, 2014, at 00:02, Diego Fabregat wrote: > From the log, it looks like by p I mean the width of XL (i.e., intercept + covariates). > > On 03/19/2014 11:57 PM, Yurii Aulchenko wrote: >> Deiego, what is the "p" (n^2 -> np) in the Log message? >> >> >> On Fri, Feb 7, 2014 at 6:32 PM, wrote: >> Author: dfabregat >> Date: 2014-02-07 18:32:24 +0100 (Fri, 07 Feb 2014) >> New Revision: 1600 >> >> Modified: >> pkg/OmicABEL/src/REML.c >> Log: >> Performance improvement for REML estimation. >> Reusing precomputed data to replace an expensive >> n^2 gemv for a cheaper n*p gemv. >> >> >> Modified: pkg/OmicABEL/src/REML.c >> =================================================================== >> --- pkg/OmicABEL/src/REML.c 2014-02-06 21:29:08 UTC (rev 1599) >> +++ pkg/OmicABEL/src/REML.c 2014-02-07 17:32:24 UTC (rev 1600) >> @@ -194,10 +194,15 @@ >> // loglik = a + b >> // a -> log(det(M)) >> // b -> YmXB' inv(M) YmXB >> - dgemv_(TRANS, >> + /*dgemv_(TRANS, >> &n, &n, >> &ONE, Z, &n, YmXB, &iONE, >> - &ZERO, ZtY_upd, &iONE); >> + &ZERO, ZtY_upd, &iONE);*/ >> + memcpy( ZtY_upd, ZtY, n * sizeof(double) ); >> + dgemv_( NO_TRANS, >> + &n, &wXL, >> + &MINUS_ONE, ZtX, &n, beta, &iONE, >> + &ONE, ZtY_upd, &iONE ); >> // YmXB' * inv( M ) * YmXB >> *loglik = 0.0; >> for (i = 0; i < n; i++ ) >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> >> >> >> -- >> ----------------------------------------------------- >> Yurii S. Aulchenko >> >> [ LinkedIn ] [ Twitter ] [ Blog ] >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Mar 21 11:54:07 2014 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 21 Mar 2014 11:54:07 +0100 Subject: [GenABEL-dev] Design: Versions for filevector library, change symlinks In-Reply-To: <52F77EF9.5010504@karssen.org> References: <52F77EF9.5010504@karssen.org> Message-ID: <532C1A4F.4020209@karssen.org> Dear list, As I didn't get any negative responses on my proposal to start versioning the filevector code and tagging it in SVN I will proceed to do so. I'd like to give the current filevector code version number 1.0.0 because it's been in production for several years and I assume most bugs have been ironed out by now. Any objections? Let me know! Thanks, Lennart. On 09-02-14 14:13, L.C. Karssen wrote: > Dear list, > > As you may know I have been playing with the idea of creating filevector > (more specifically the fvlib directory) a separate library (see the > discussions at [1-3]). I also created a separate SVN branch to play > around with that idea. Basically this would mean that packages like > ProbABEL, DatABEL and VariABEL would get a dependency in the form of > what I call 'libfilevector'. Especially for the R packages in the > GenABEL suite this may not be very user-friendly because many of them > would not have admin rights (I assume) and therefore cannot simply > install operating system packages that contain libfilevector. Obviously > such a change to libfilevector would be a major one. > > Apart from the discussions on the mailing list this topic came up again > a few weeks ago in a discussion with Maksim and Yurii. We discussed many > of the pros and cons (mostly revolving around user-friendliness vs. > technical advantages) and in the end we came up with the following: > > Let's start to put versions on the filevector code. So far filevector > has never been released separately (always as part of the ABEL > packages). By putting a version number on the fv code and by making tags > of those releases in SVN we can solve several issues: > - It can be made clear which *ABEL package depends on which version of > the filevector code > - Improvements in the filevector code do not immediately affect the > 'trunk' of the other packages > - This allows packages like ProbABEL to depend on a separate > libfilevector, whereas R packages can create a symlink to the tagged > code instead of filevector trunk (as is the present situation) > > A final advantage may be that it becomes easier to treat filevector as > one of various file-format backends to the *ABELs. But this is something > I'll elaborate on in another e-mail to this list. > > > We would like to hear your opinions about this versioning of filevector > and tagging of these official filevector releases, as well as on the use > of symlinks to these tags by other packages (this may not be ideal, but > it solves the user-friendliness issue, while allowing the creation and > release of libfilevector). > > > Hope to hear from you! > > Lennart. > > > > [1]http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-July/000714.html > [2] > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-November/000779.html > [2] > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-November/000797.html > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Fri Mar 21 13:07:55 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 21 Mar 2014 13:07:55 +0100 Subject: [GenABEL-dev] Design: Versions for filevector library, change symlinks In-Reply-To: <532C1A4F.4020209@karssen.org> References: <52F77EF9.5010504@karssen.org> <532C1A4F.4020209@karssen.org> Message-ID: no objections! On Fri, Mar 21, 2014 at 11:54 AM, L.C. Karssen wrote: > Dear list, > > As I didn't get any negative responses on my proposal to start > versioning the filevector code and tagging it in SVN I will proceed to > do so. > > I'd like to give the current filevector code version number 1.0.0 > because it's been in production for several years and I assume most bugs > have been ironed out by now. > > Any objections? Let me know! > > > Thanks, > > Lennart. > > On 09-02-14 14:13, L.C. Karssen wrote: > > Dear list, > > > > As you may know I have been playing with the idea of creating filevector > > (more specifically the fvlib directory) a separate library (see the > > discussions at [1-3]). I also created a separate SVN branch to play > > around with that idea. Basically this would mean that packages like > > ProbABEL, DatABEL and VariABEL would get a dependency in the form of > > what I call 'libfilevector'. Especially for the R packages in the > > GenABEL suite this may not be very user-friendly because many of them > > would not have admin rights (I assume) and therefore cannot simply > > install operating system packages that contain libfilevector. Obviously > > such a change to libfilevector would be a major one. > > > > Apart from the discussions on the mailing list this topic came up again > > a few weeks ago in a discussion with Maksim and Yurii. We discussed many > > of the pros and cons (mostly revolving around user-friendliness vs. > > technical advantages) and in the end we came up with the following: > > > > Let's start to put versions on the filevector code. So far filevector > > has never been released separately (always as part of the ABEL > > packages). By putting a version number on the fv code and by making tags > > of those releases in SVN we can solve several issues: > > - It can be made clear which *ABEL package depends on which version of > > the filevector code > > - Improvements in the filevector code do not immediately affect the > > 'trunk' of the other packages > > - This allows packages like ProbABEL to depend on a separate > > libfilevector, whereas R packages can create a symlink to the tagged > > code instead of filevector trunk (as is the present situation) > > > > A final advantage may be that it becomes easier to treat filevector as > > one of various file-format backends to the *ABELs. But this is something > > I'll elaborate on in another e-mail to this list. > > > > > > We would like to hear your opinions about this versioning of filevector > > and tagging of these official filevector releases, as well as on the use > > of symlinks to these tags by other packages (this may not be ideal, but > > it solves the user-friendliness issue, while allowing the creation and > > release of libfilevector). > > > > > > Hope to hear from you! > > > > Lennart. > > > > > > > > [1] > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-July/000714.html > > [2] > > > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-November/000779.html > > [2] > > > http://lists.r-forge.r-project.org/pipermail/genabel-devel/2013-November/000797.html > > > > > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Mon Mar 24 09:46:47 2014 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Mon, 24 Mar 2014 09:46:47 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1658 - pkg/ProbABEL/src In-Reply-To: <20140321225720.7DB691862D4@r-forge.r-project.org> References: <20140321225720.7DB691862D4@r-forge.r-project.org> Message-ID: <-7899124546465247655@unknownmsgid> Sum of logs = log of prods The latter is in principle dangerous practice when we multiply probabilities - the product is getting close to zero very fast potentially leading to numerical problems (of which machine zero is the smallest because easily detected) Yurii ---------------------- Yurii Aulchenko (sent from mobile device) > On Mar 21, 2014, at 11:57 PM, "noreply at r-forge.r-project.org" wrote: > > Author: lckarssen > Date: 2014-03-21 23:57:20 +0100 (Fri, 21 Mar 2014) > New Revision: 1658 > > Modified: > pkg/ProbABEL/src/reg1.cpp > Log: > Speed-ups in ProbABEL's logistic regression. > Profiling showed lots of (expensive) calls to exp() and log(). I got rid of ~ 1/3 of the exp() calls by saving the result in a variable and reusing it in the calculation of exp(mu) / ( 1+exp(mu) ). > The number of log() calls was reduced even more by using the fact that sum_i( log(x_i) ) = log( prod_i(x_i) ) > > In total this gives roughly a speed up of 30% -- 40% when reading txt dosage files and roughly 20% -- 30% when using filevector files (measured for dosage data). > > > Modified: pkg/ProbABEL/src/reg1.cpp > =================================================================== > --- pkg/ProbABEL/src/reg1.cpp 2014-03-21 21:05:44 UTC (rev 1657) > +++ pkg/ProbABEL/src/reg1.cpp 2014-03-21 22:57:20 UTC (rev 1658) > @@ -719,7 +719,8 @@ > double emu = eMu.get(i, 0); > double value = emu; > double zval; > - value = exp(value) / (1. + exp(value)); > + double expval = exp(value); > + value = expval / (1. + expval); > residuals[i] = (rdata.Y).get(i, 0) - value; > eMu.put(value, i, 0); > W.put(value * (1. - value), i, 0); > @@ -778,11 +779,24 @@ > beta.print(); > } > // std::cout << "beta:\n"; beta.print(); > - // compute likelihood > + > + // Compute the likelihood. The following commented code gives > + // the 'easy to understand' algorithm. The code that's > + // actually used is mathematically equivalent (remember: > + // log(a*b) = log(a)+log(b)), but faster because log() is > + // relatively expensive. > + // for (int i = 0; i < eMu.nrow; i++) { > + // loglik += rdata.Y[i] * eMu_us[i] - log(1. + > + // exp(eMu_us[i])); > + // } > prevlik = loglik; > loglik = 0.; > - for (int i = 0; i < eMu.nrow; i++) > - loglik += rdata.Y[i] * eMu_us[i] - log(1. + exp(eMu_us[i])); > + double logterm = 1; > + for (int i = 0; i < eMu.nrow; i++) { > + loglik += rdata.Y[i] * eMu_us[i]; > + logterm *= 1. + exp(eMu_us[i]); > + } > + loglik += - log(logterm); > > delta = fabs(1. - (prevlik / loglik)); > niter++; > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Thu Mar 27 10:36:03 2014 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 27 Mar 2014 10:36:03 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1658 - pkg/ProbABEL/src In-Reply-To: <-7899124546465247655@unknownmsgid> References: <20140321225720.7DB691862D4@r-forge.r-project.org> <-7899124546465247655@unknownmsgid> Message-ID: <5333F103.40507@karssen.org> Hi Yurii, Ouch! Good catch! That's what you get when coding too late at night ;-). Thanks a lot, Lennart. On 24-03-14 09:46, Yurii Aulchenko wrote: > Sum of logs = log of prods > > The latter is in principle dangerous practice when we multiply > probabilities - the product is getting close to zero very fast > potentially leading to numerical problems (of which machine zero is > the smallest because easily detected) > > Yurii > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Mar 21, 2014, at 11:57 PM, "noreply at r-forge.r-project.org" wrote: >> >> Author: lckarssen >> Date: 2014-03-21 23:57:20 +0100 (Fri, 21 Mar 2014) >> New Revision: 1658 >> >> Modified: >> pkg/ProbABEL/src/reg1.cpp >> Log: >> Speed-ups in ProbABEL's logistic regression. >> Profiling showed lots of (expensive) calls to exp() and log(). I got rid of ~ 1/3 of the exp() calls by saving the result in a variable and reusing it in the calculation of exp(mu) / ( 1+exp(mu) ). >> The number of log() calls was reduced even more by using the fact that sum_i( log(x_i) ) = log( prod_i(x_i) ) >> >> In total this gives roughly a speed up of 30% -- 40% when reading txt dosage files and roughly 20% -- 30% when using filevector files (measured for dosage data). >> >> >> Modified: pkg/ProbABEL/src/reg1.cpp >> =================================================================== >> --- pkg/ProbABEL/src/reg1.cpp 2014-03-21 21:05:44 UTC (rev 1657) >> +++ pkg/ProbABEL/src/reg1.cpp 2014-03-21 22:57:20 UTC (rev 1658) >> @@ -719,7 +719,8 @@ >> double emu = eMu.get(i, 0); >> double value = emu; >> double zval; >> - value = exp(value) / (1. + exp(value)); >> + double expval = exp(value); >> + value = expval / (1. + expval); >> residuals[i] = (rdata.Y).get(i, 0) - value; >> eMu.put(value, i, 0); >> W.put(value * (1. - value), i, 0); >> @@ -778,11 +779,24 @@ >> beta.print(); >> } >> // std::cout << "beta:\n"; beta.print(); >> - // compute likelihood >> + >> + // Compute the likelihood. The following commented code gives >> + // the 'easy to understand' algorithm. The code that's >> + // actually used is mathematically equivalent (remember: >> + // log(a*b) = log(a)+log(b)), but faster because log() is >> + // relatively expensive. >> + // for (int i = 0; i < eMu.nrow; i++) { >> + // loglik += rdata.Y[i] * eMu_us[i] - log(1. + >> + // exp(eMu_us[i])); >> + // } >> prevlik = loglik; >> loglik = 0.; >> - for (int i = 0; i < eMu.nrow; i++) >> - loglik += rdata.Y[i] * eMu_us[i] - log(1. + exp(eMu_us[i])); >> + double logterm = 1; >> + for (int i = 0; i < eMu.nrow; i++) { >> + loglik += rdata.Y[i] * eMu_us[i]; >> + logterm *= 1. + exp(eMu_us[i]); >> + } >> + loglik += - log(logterm); >> >> delta = fabs(1. - (prevlik / loglik)); >> niter++; >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Fri Mar 28 20:15:09 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 28 Mar 2014 20:15:09 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <20140328191241.F38E6185FBC@r-forge.r-project.org> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> Message-ID: 10 fold is good speed up. An order of magnitude :) Wonder how it compares now to the reading from plain text files? Y ---------------- Sent from mobile device, please excuse possible typos > On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: > > Author: maartenk > Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) > New Revision: 1664 > > Modified: > branches/ProbABEL-0.50/src/gendata.cpp > branches/ProbABEL-0.50/src/gendata.h > Log: > new implementation of reading in numbers of mldose file: this version is about a 10(!) fold faster than in ProABEL 0.42 > > Modified: branches/ProbABEL-0.50/src/gendata.cpp > =================================================================== > --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC (rev 1663) > +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC (rev 1664) > @@ -40,58 +40,69 @@ > #endif > #include "utilities.h" > > -double mldose_strtod(const char *str_pointer) { > - // This function is inspired on some answers found at stackoverflow : > - // eg question 5678932 > - int sign = 0; > - double result = 0; > - //check if not a null pointer or NaN (right now checks only first character) > -//TODO: make catching of NaN more rigid > - if (!*str_pointer | *str_pointer == 'N'){ > - return std::numeric_limits::quiet_NaN(); > + > +void gendata::mldose_line_to_matrix(int k,const char *all_numbers,int amount_of_numbers){ > + int j = 0; > + //check if not a null pointer > + if (!*all_numbers){ > + perror("Error while reading genetic data (expected pointer to char but found a null pointer)"); > + exit(EXIT_FAILURE); > } > - //skip whitespace > - while (*str_pointer == ' ') > + while (j { > - str_pointer++; > - } > - //set sign to -1 if negative: multiply by sign just before return > - if (*str_pointer == '-') > - { > - str_pointer++; > - sign = -1; > - } > - //read digits before dot > - while (*str_pointer <= '9' && *str_pointer >= '0'){ > - result = result * 10 + (*str_pointer++ - '0'); > - } > - //read digit after dot > - if (*str_pointer == '.') > - { > - double decimal_counter = 1.0; > - str_pointer++; > - while (*str_pointer <= '9' && *str_pointer >= '0') > + double result = 0; > + //skip whitespace > + while (*all_numbers == ' ') > { > - decimal_counter *= 0.1; > - result += (*str_pointer++ - '0') * decimal_counter; > + all_numbers++; > } > + //check NaN (right now checks only first character) > + //TODO: make catching of NaN more rigid > + if (*all_numbers == 'N') > + { > + result = std::numeric_limits::quiet_NaN(); > + //skip other characters of NaN > + while ((*all_numbers == 'a') | (*all_numbers == 'N')) > + { > + all_numbers++; > + } > + } > + else > + { > + int sign = 0; > + //set sign to -1 if negative: multiply by sign just before return > + if (*all_numbers == '-') > + { > + all_numbers++; > + sign = -1; > + } > + //read digits before dot > + while (*all_numbers <= '9' && *all_numbers >= '0') > + { > + result = result * 10 + (*all_numbers++ - '0'); > + } > + //read digit after dot > + if (*all_numbers == '.') > + { > + double decimal_counter = 1.0; > + all_numbers++; > + while (*all_numbers <= '9' && *all_numbers >= '0') > + { > + decimal_counter *= 0.1; > + result += (*all_numbers++ - '0') * decimal_counter; > + } > + } > + //correct for negative number > + if (sign == -1) > + { > + result = sign * result; > + } > + } > + G.put(result, k, j); > + j++; > } > - //str_pointer should be null since all characters are read. > - if (*str_pointer){ > - perror("Error while reading genetic data (mldose_strtod)"); > - exit(EXIT_FAILURE); > - } > - //correct for negative number > - if (sign == -1){ > - return sign * result; > - }else{ > - return result; > - } > - > } > > - > - > void gendata::get_var(int var, double * data) > { > // Read the genetic data for SNP 'var' and store in the array 'data' > @@ -246,7 +257,7 @@ > size_t strpos = tmpstr.find("->"); > if (strpos != string::npos) > { > - tmpid = tmpstr.substr(strpos+2, string::npos); > + tmpid = tmpstr.substr(strpos + 2, string::npos); > } > else > { > @@ -255,8 +266,8 @@ > if (tmpid != idnames[k]) > { > cerr << "phenotype file and dose or probability file " > - << "did not match at line " < - << " != " << idnames[k] << ")" << endl; > + << "did not match at line " < + << tmpid << " != " << idnames[k] << ")" << endl; > infile.close(); > exit(1); > } > @@ -267,47 +278,58 @@ > infile >> tmpstr; > } > > - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) > + int oldstyle = 0; > + if (oldstyle == 1) > { > - if (infile.good()) > + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) > { > - infile >> inStr; > - // tmpstr contains the dosage/probability in > - // string form. Convert it to double (if tmpstr is > - // NaN it will be set to nan). > - double dosage; > - char *endptr; > - errno = 0; // To distinguish success/failure > - // after strtod() > + if (infile.good()) > + { > + infile >> inStr; > + // tmpstr contains the dosage/probability in > + // string form. Convert it to double (if tmpstr is > + // NaN it will be set to nan). > + double dosage; > + char *endptr; > + errno = 0; // To distinguish success/failure > + // after strtod() > > - dosage = mldose_strtod(inStr); > - //dosage = strtod(tmpstr.c_str(), &endptr); > -// if ((errno == ERANGE && > -// (dosage == HUGE_VALF || dosage == HUGE_VALL)) > -// || (errno != 0 && dosage == 0)) { > -// perror("Error while reading genetic data (strtod)"); > -// exit(EXIT_FAILURE); > -// } > + dosage = strtod(inStr, &endptr); > + if ((errno == ERANGE > + && (dosage == HUGE_VALF || dosage == HUGE_VALL)) > + || (errno != 0 && dosage == 0)) > + { > + perror("Error while reading genetic data (strtod)"); > + exit(EXIT_FAILURE); > + } > > - if (endptr == tmpstr.c_str()) { > - cerr << "No digits were found while reading genetic data" > - << " (individual " < - << ", position " << j + 1 << ")" > - << endl; > - exit(EXIT_FAILURE); > + if (endptr == tmpstr.c_str()) > + { > + cerr > + << "No digits were found while reading genetic data" > + << " (individual " < + << j + 1 << ")" << endl; > + exit(EXIT_FAILURE); > + } > + /* If we got here, strtod() successfully parsed a number */ > + G.put(dosage, k, j); > } > - > - /* If we got here, strtod() successfully parsed a number */ > - G.put(dosage, k, j); > + else > + { > + std::cerr << "cannot read dose-file: " << fname > + << "check skipd and ngpreds parameters\n"; > + infile.close(); > + exit(1); > + } > } > - else > - { > - std::cerr << "cannot read dose-file: " << fname > - << "check skipd and ngpreds parameters\n"; > - infile.close(); > - exit(1); > - } > } > + else > + { > + std::string all_numbers; > + all_numbers.reserve(nsnps * ngpreds * 7); > + std::getline(infile, all_numbers); > + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps * ngpreds); > + } > k++; > } > else > > Modified: branches/ProbABEL-0.50/src/gendata.h > =================================================================== > --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC (rev 1663) > +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC (rev 1664) > @@ -44,7 +44,7 @@ > unsigned int nids; > unsigned int ngpreds; > gendata(); > - double convert( char* source, char** endPtr ); > + void mldose_line_to_matrix(int k,const char *all_numbers,int amount_of_numbers); > > void re_gendata(char * fname, unsigned int insnps, unsigned int ingpreds, > unsigned int npeople, unsigned int nmeasured, > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Fri Mar 28 23:19:37 2014 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 28 Mar 2014 23:19:37 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <5335CDCF.8090503@gmail.com> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com> Message-ID: <5335F579.9070608@karssen.org> Dear all, (I guess the previous version of this mail went to the commit email list, so here it is again for the devel list). Indeed: an impressive speed-up! Well done Maarten. On 28-03-14 20:30, Maarten Kooyman wrote: > I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted > for sex and age (I did not run it in triplet but gives an idea) > > version 0.42 0.50_branch > FV 58 52 > mldose 48 12 > all times ate in seconds. > > As you can see the filevector format in the part that slows down the > program. When profiling the reading from FV takes up 86% of all the time > the program takes. > The current problem with reading from filevector is that the fv dat ais stored in floats (this is logical as it means half the disk space usage compared to storing doubles, moreover, the imputed data is never more precise than a float anyway). However, internally ProbABEL uses doubles for calculations. This means conversion from float to double must occur at some point. Simply casting to double gives impression. For example casting a float 0.677 to double gives: 0.67699998617172241 Therefore, with version 0.4.0 I changed this and used a string as intermediate form, followed by strtod(). First I used stringstreams, but these turn out to be much too slow for our use case. Now snprintf() is used. For the above example the double value is: 0.67700000000000005, much closer to what we would like to see. Using this two-step conversion means the output when using fv is equal to the output using txt data (and equal to using R), within float precision. Using Maarten's 'strtod' will speed up this part as well, but the snprintf() call is still expensive. Apart from this two-step conversion we may also be inefficient because the dosage/probability values are converted one array element at the time. Maybe we can gain something there, like Maarten did for the txt format and simply sending a whole 'line'/array to the conversion may help. Given that most people nowadays store their imputation results in chunks of chromosomes anyway (i.e. small(er) files), and the fact that I think implementing the ability to read gziped files is not difficult, it may be time to give mldose.gz files another chance for ProbABEL users. It will save them the conversion from mldose.gz to DatABEL. Of course we can still support DatABEL files, but (depending on how fast reading from gzipped files is), our recommendation could change with the upcoming ProbABEL v0.5.0. Any thoughts on this? Best, Lennart. > On 28-03-14 20:15, Yury Aulchenko wrote: >> 10 fold is good speed up. An order of magnitude :) >> >> Wonder how it compares now to the reading from plain text files? >> >> Y >> >> ---------------- >> Sent from mobile device, please excuse possible typos >> >>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>> >>> Author: maartenk >>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>> New Revision: 1664 >>> >>> Modified: >>> branches/ProbABEL-0.50/src/gendata.cpp >>> branches/ProbABEL-0.50/src/gendata.h >>> Log: >>> new implementation of reading in numbers of mldose file: this version >>> is about a 10(!) fold faster than in ProABEL 0.42 >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -40,58 +40,69 @@ >>> #endif >>> #include "utilities.h" >>> >>> -double mldose_strtod(const char *str_pointer) { >>> - // This function is inspired on some answers found at >>> stackoverflow : >>> - // eg question 5678932 >>> - int sign = 0; >>> - double result = 0; >>> - //check if not a null pointer or NaN (right now checks only >>> first character) >>> -//TODO: make catching of NaN more rigid >>> - if (!*str_pointer | *str_pointer == 'N'){ >>> - return std::numeric_limits::quiet_NaN(); >>> + >>> +void gendata::mldose_line_to_matrix(int k,const char >>> *all_numbers,int amount_of_numbers){ >>> + int j = 0; >>> + //check if not a null pointer >>> + if (!*all_numbers){ >>> + perror("Error while reading genetic data (expected pointer >>> to char but found a null pointer)"); >>> + exit(EXIT_FAILURE); >>> } >>> - //skip whitespace >>> - while (*str_pointer == ' ') >>> + while (j>> { >>> - str_pointer++; >>> - } >>> - //set sign to -1 if negative: multiply by sign just before return >>> - if (*str_pointer == '-') >>> - { >>> - str_pointer++; >>> - sign = -1; >>> - } >>> - //read digits before dot >>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>> - result = result * 10 + (*str_pointer++ - '0'); >>> - } >>> - //read digit after dot >>> - if (*str_pointer == '.') >>> - { >>> - double decimal_counter = 1.0; >>> - str_pointer++; >>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>> + double result = 0; >>> + //skip whitespace >>> + while (*all_numbers == ' ') >>> { >>> - decimal_counter *= 0.1; >>> - result += (*str_pointer++ - '0') * decimal_counter; >>> + all_numbers++; >>> } >>> + //check NaN (right now checks only first character) >>> + //TODO: make catching of NaN more rigid >>> + if (*all_numbers == 'N') >>> + { >>> + result = std::numeric_limits::quiet_NaN(); >>> + //skip other characters of NaN >>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>> + { >>> + all_numbers++; >>> + } >>> + } >>> + else >>> + { >>> + int sign = 0; >>> + //set sign to -1 if negative: multiply by sign just >>> before return >>> + if (*all_numbers == '-') >>> + { >>> + all_numbers++; >>> + sign = -1; >>> + } >>> + //read digits before dot >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + result = result * 10 + (*all_numbers++ - '0'); >>> + } >>> + //read digit after dot >>> + if (*all_numbers == '.') >>> + { >>> + double decimal_counter = 1.0; >>> + all_numbers++; >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + decimal_counter *= 0.1; >>> + result += (*all_numbers++ - '0') * decimal_counter; >>> + } >>> + } >>> + //correct for negative number >>> + if (sign == -1) >>> + { >>> + result = sign * result; >>> + } >>> + } >>> + G.put(result, k, j); >>> + j++; >>> } >>> - //str_pointer should be null since all characters are read. >>> - if (*str_pointer){ >>> - perror("Error while reading genetic data (mldose_strtod)"); >>> - exit(EXIT_FAILURE); >>> - } >>> - //correct for negative number >>> - if (sign == -1){ >>> - return sign * result; >>> - }else{ >>> - return result; >>> - } >>> - >>> } >>> >>> - >>> - >>> void gendata::get_var(int var, double * data) >>> { >>> // Read the genetic data for SNP 'var' and store in the array >>> 'data' >>> @@ -246,7 +257,7 @@ >>> size_t strpos = tmpstr.find("->"); >>> if (strpos != string::npos) >>> { >>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>> } >>> else >>> { >>> @@ -255,8 +266,8 @@ >>> if (tmpid != idnames[k]) >>> { >>> cerr << "phenotype file and dose or probability >>> file " >>> - << "did not match at line " <>> (" << tmpid >>> - << " != " << idnames[k] << ")" << endl; >>> + << "did not match at line " <>> " (" >>> + << tmpid << " != " << idnames[k] << ")" >>> << endl; >>> infile.close(); >>> exit(1); >>> } >>> @@ -267,47 +278,58 @@ >>> infile >> tmpstr; >>> } >>> >>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> + int oldstyle = 0; >>> + if (oldstyle == 1) >>> { >>> - if (infile.good()) >>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> { >>> - infile >> inStr; >>> - // tmpstr contains the dosage/probability in >>> - // string form. Convert it to double (if tmpstr is >>> - // NaN it will be set to nan). >>> - double dosage; >>> - char *endptr; >>> - errno = 0; // To distinguish success/failure >>> - // after strtod() >>> + if (infile.good()) >>> + { >>> + infile >> inStr; >>> + // tmpstr contains the dosage/probability in >>> + // string form. Convert it to double (if >>> tmpstr is >>> + // NaN it will be set to nan). >>> + double dosage; >>> + char *endptr; >>> + errno = 0; // To distinguish >>> success/failure >>> + // after strtod() >>> >>> - dosage = mldose_strtod(inStr); >>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>> -// if ((errno == ERANGE && >>> -// (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> -// || (errno != 0 && dosage == 0)) { >>> -// perror("Error while reading genetic data >>> (strtod)"); >>> -// exit(EXIT_FAILURE); >>> -// } >>> + dosage = strtod(inStr, &endptr); >>> + if ((errno == ERANGE >>> + && (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> + || (errno != 0 && dosage == 0)) >>> + { >>> + perror("Error while reading genetic data >>> (strtod)"); >>> + exit(EXIT_FAILURE); >>> + } >>> >>> - if (endptr == tmpstr.c_str()) { >>> - cerr << "No digits were found while reading >>> genetic data" >>> - << " (individual " <>> - << ", position " << j + 1 << ")" >>> - << endl; >>> - exit(EXIT_FAILURE); >>> + if (endptr == tmpstr.c_str()) >>> + { >>> + cerr >>> + << "No digits were found while >>> reading genetic data" >>> + << " (individual " <>> ", position " >>> + << j + 1 << ")" << endl; >>> + exit(EXIT_FAILURE); >>> + } >>> + /* If we got here, strtod() successfully >>> parsed a number */ >>> + G.put(dosage, k, j); >>> } >>> - >>> - /* If we got here, strtod() successfully parsed >>> a number */ >>> - G.put(dosage, k, j); >>> + else >>> + { >>> + std::cerr << "cannot read dose-file: " << fname >>> + << "check skipd and ngpreds >>> parameters\n"; >>> + infile.close(); >>> + exit(1); >>> + } >>> } >>> - else >>> - { >>> - std::cerr << "cannot read dose-file: " << fname >>> - << "check skipd and ngpreds >>> parameters\n"; >>> - infile.close(); >>> - exit(1); >>> - } >>> } >>> + else >>> + { >>> + std::string all_numbers; >>> + all_numbers.reserve(nsnps * ngpreds * 7); >>> + std::getline(infile, all_numbers); >>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>> * ngpreds); >>> + } >>> k++; >>> } >>> else >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.h >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -44,7 +44,7 @@ >>> unsigned int nids; >>> unsigned int ngpreds; >>> gendata(); >>> - double convert( char* source, char** endPtr ); >>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>> amount_of_numbers); >>> >>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>> ingpreds, >>> unsigned int npeople, unsigned int nmeasured, >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From l.c.karssen at gmail.com Sun Mar 30 23:12:05 2014 From: l.c.karssen at gmail.com (L.C. Karssen) Date: Sun, 30 Mar 2014 23:12:05 +0200 Subject: [GenABEL-dev] Benchmarks: Speed of ProbABEL Message-ID: <533888A5.8030804@gmail.com> Dear list, I've started doing some benchmarks for ProbABEL. In the attached figures you can see how much time (s) an analysis took for a given version of ProbABEL. The only difference in the figures are the limits of the vertical axis. Notes: - Each analysis was run 3 times, and the average is shown (as well as the error bars for the std.dev., but these are not/hardly visible) - v0.4.3 means SVN trunk - v0.5.0 means SVN branch probabel-0.50, checked out last Satureday, after Maarten's 10x speedup of reading text files. - The reduction in mmscore run time at v0.3.0 is because of the use of the EIGEN lib. - The fact that Cox regression has zero run time before v0.4.0 is because only since that version does that module work without manual intervention. - The increase in run time for filevector data at v0.4.0 is attributable to the fact that since then the fv data is no longer simply cast from float to double, but converted via a string, followed by strtod(). This is very time consuming, especially in v0.4.0/0.4.1 when C++ streams were used. Description of the benchmarks: - CPU Core i7 - RAM: 12 GB (was never a limiting factor) - Disk: SSD - All benchmarks were run 3 times, consecutively (i.e. only a single core was used) - The EIGEN lib was used when compiling each version. So, two conclusions: - We must always benchmark our code before release, it would have caught the use increase in run time at v0.4.0 - ProbABEL v0.5.0 will be a great release! Best, Lennart. -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands l.c.karssen at gmail.com GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: versions_all_noylim.pdf Type: application/pdf Size: 46421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: versions_all_ylim.pdf Type: application/pdf Size: 49057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: versions_all_ylim2.pdf Type: application/pdf Size: 47947 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Mar 31 20:46:55 2014 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 31 Mar 2014 20:46:55 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <5335F579.9070608@karssen.org> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com> <5335F579.9070608@karssen.org> Message-ID: I personally find the fact that text outperforms binary disappointing (and, if you forget about technical details - well, strange). On the other hand this is probably good for user as it eradicates the need to do conversion. Especially if we could work with compressed files. Especially if we build interface to work with other type of text outputs (e.g. IMPUTE2 would be a candidate)... Yurii ---------------- Sent from mobile device, please excuse possible typos > On 28 Mar 2014, at 23:19, "L.C. Karssen" wrote: > > Dear all, > > (I guess the previous version of this mail went to the commit email > list, so here it is again for the devel list). > > > Indeed: an impressive speed-up! Well done Maarten. > >> On 28-03-14 20:30, Maarten Kooyman wrote: >> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted >> for sex and age (I did not run it in triplet but gives an idea) >> >> version 0.42 0.50_branch >> FV 58 52 >> mldose 48 12 >> all times ate in seconds. >> >> As you can see the filevector format in the part that slows down the >> program. When profiling the reading from FV takes up 86% of all the time >> the program takes. > > > The current problem with reading from filevector is that the fv dat ais > stored in floats (this is logical as it means half the disk space usage > compared to storing doubles, moreover, the imputed data is never more > precise than a float anyway). > However, internally ProbABEL uses doubles for calculations. This means > conversion from float to double must occur at some point. > > Simply casting to double gives impression. For example casting a float > 0.677 to double gives: 0.67699998617172241 > Therefore, with version 0.4.0 I changed this and used a string as > intermediate form, followed by strtod(). First I used stringstreams, but > these turn out to be much too slow for our use case. Now snprintf() is > used. For the above example the double value is: 0.67700000000000005, > much closer to what we would like to see. Using this two-step conversion > means the output when using fv is equal to the output using txt data > (and equal to using R), within float precision. > > Using Maarten's 'strtod' will speed up this part as well, but the > snprintf() call is still expensive. > > Apart from this two-step conversion we may also be inefficient because > the dosage/probability values are converted one array element at the > time. Maybe we can gain something there, like Maarten did for the txt > format and simply sending a whole 'line'/array to the conversion may help. > > > > > Given that most people nowadays store their imputation results in chunks > of chromosomes anyway (i.e. small(er) files), and the fact that I think > implementing the ability to read gziped files is not difficult, it may > be time to give mldose.gz files another chance for ProbABEL users. It > will save them the conversion from mldose.gz to DatABEL. > Of course we can still support DatABEL files, but (depending on how fast > reading from gzipped files is), our recommendation could change with the > upcoming ProbABEL v0.5.0. > > Any thoughts on this? > > > Best, > > Lennart. > > > > > >>> On 28-03-14 20:15, Yury Aulchenko wrote: >>> 10 fold is good speed up. An order of magnitude :) >>> >>> Wonder how it compares now to the reading from plain text files? >>> >>> Y >>> >>> ---------------- >>> Sent from mobile device, please excuse possible typos >>> >>>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>>> >>>> Author: maartenk >>>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>>> New Revision: 1664 >>>> >>>> Modified: >>>> branches/ProbABEL-0.50/src/gendata.cpp >>>> branches/ProbABEL-0.50/src/gendata.h >>>> Log: >>>> new implementation of reading in numbers of mldose file: this version >>>> is about a 10(!) fold faster than in ProABEL 0.42 >>>> >>>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>>> =================================================================== >>>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>>> (rev 1663) >>>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>>> (rev 1664) >>>> @@ -40,58 +40,69 @@ >>>> #endif >>>> #include "utilities.h" >>>> >>>> -double mldose_strtod(const char *str_pointer) { >>>> - // This function is inspired on some answers found at >>>> stackoverflow : >>>> - // eg question 5678932 >>>> - int sign = 0; >>>> - double result = 0; >>>> - //check if not a null pointer or NaN (right now checks only >>>> first character) >>>> -//TODO: make catching of NaN more rigid >>>> - if (!*str_pointer | *str_pointer == 'N'){ >>>> - return std::numeric_limits::quiet_NaN(); >>>> + >>>> +void gendata::mldose_line_to_matrix(int k,const char >>>> *all_numbers,int amount_of_numbers){ >>>> + int j = 0; >>>> + //check if not a null pointer >>>> + if (!*all_numbers){ >>>> + perror("Error while reading genetic data (expected pointer >>>> to char but found a null pointer)"); >>>> + exit(EXIT_FAILURE); >>>> } >>>> - //skip whitespace >>>> - while (*str_pointer == ' ') >>>> + while (j>>> { >>>> - str_pointer++; >>>> - } >>>> - //set sign to -1 if negative: multiply by sign just before return >>>> - if (*str_pointer == '-') >>>> - { >>>> - str_pointer++; >>>> - sign = -1; >>>> - } >>>> - //read digits before dot >>>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>>> - result = result * 10 + (*str_pointer++ - '0'); >>>> - } >>>> - //read digit after dot >>>> - if (*str_pointer == '.') >>>> - { >>>> - double decimal_counter = 1.0; >>>> - str_pointer++; >>>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>>> + double result = 0; >>>> + //skip whitespace >>>> + while (*all_numbers == ' ') >>>> { >>>> - decimal_counter *= 0.1; >>>> - result += (*str_pointer++ - '0') * decimal_counter; >>>> + all_numbers++; >>>> } >>>> + //check NaN (right now checks only first character) >>>> + //TODO: make catching of NaN more rigid >>>> + if (*all_numbers == 'N') >>>> + { >>>> + result = std::numeric_limits::quiet_NaN(); >>>> + //skip other characters of NaN >>>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>>> + { >>>> + all_numbers++; >>>> + } >>>> + } >>>> + else >>>> + { >>>> + int sign = 0; >>>> + //set sign to -1 if negative: multiply by sign just >>>> before return >>>> + if (*all_numbers == '-') >>>> + { >>>> + all_numbers++; >>>> + sign = -1; >>>> + } >>>> + //read digits before dot >>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>> + { >>>> + result = result * 10 + (*all_numbers++ - '0'); >>>> + } >>>> + //read digit after dot >>>> + if (*all_numbers == '.') >>>> + { >>>> + double decimal_counter = 1.0; >>>> + all_numbers++; >>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>> + { >>>> + decimal_counter *= 0.1; >>>> + result += (*all_numbers++ - '0') * decimal_counter; >>>> + } >>>> + } >>>> + //correct for negative number >>>> + if (sign == -1) >>>> + { >>>> + result = sign * result; >>>> + } >>>> + } >>>> + G.put(result, k, j); >>>> + j++; >>>> } >>>> - //str_pointer should be null since all characters are read. >>>> - if (*str_pointer){ >>>> - perror("Error while reading genetic data (mldose_strtod)"); >>>> - exit(EXIT_FAILURE); >>>> - } >>>> - //correct for negative number >>>> - if (sign == -1){ >>>> - return sign * result; >>>> - }else{ >>>> - return result; >>>> - } >>>> - >>>> } >>>> >>>> - >>>> - >>>> void gendata::get_var(int var, double * data) >>>> { >>>> // Read the genetic data for SNP 'var' and store in the array >>>> 'data' >>>> @@ -246,7 +257,7 @@ >>>> size_t strpos = tmpstr.find("->"); >>>> if (strpos != string::npos) >>>> { >>>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>>> } >>>> else >>>> { >>>> @@ -255,8 +266,8 @@ >>>> if (tmpid != idnames[k]) >>>> { >>>> cerr << "phenotype file and dose or probability >>>> file " >>>> - << "did not match at line " <>>> (" << tmpid >>>> - << " != " << idnames[k] << ")" << endl; >>>> + << "did not match at line " <>>> " (" >>>> + << tmpid << " != " << idnames[k] << ")" >>>> << endl; >>>> infile.close(); >>>> exit(1); >>>> } >>>> @@ -267,47 +278,58 @@ >>>> infile >> tmpstr; >>>> } >>>> >>>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>> + int oldstyle = 0; >>>> + if (oldstyle == 1) >>>> { >>>> - if (infile.good()) >>>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>> { >>>> - infile >> inStr; >>>> - // tmpstr contains the dosage/probability in >>>> - // string form. Convert it to double (if tmpstr is >>>> - // NaN it will be set to nan). >>>> - double dosage; >>>> - char *endptr; >>>> - errno = 0; // To distinguish success/failure >>>> - // after strtod() >>>> + if (infile.good()) >>>> + { >>>> + infile >> inStr; >>>> + // tmpstr contains the dosage/probability in >>>> + // string form. Convert it to double (if >>>> tmpstr is >>>> + // NaN it will be set to nan). >>>> + double dosage; >>>> + char *endptr; >>>> + errno = 0; // To distinguish >>>> success/failure >>>> + // after strtod() >>>> >>>> - dosage = mldose_strtod(inStr); >>>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>>> -// if ((errno == ERANGE && >>>> -// (dosage == HUGE_VALF || dosage == >>>> HUGE_VALL)) >>>> -// || (errno != 0 && dosage == 0)) { >>>> -// perror("Error while reading genetic data >>>> (strtod)"); >>>> -// exit(EXIT_FAILURE); >>>> -// } >>>> + dosage = strtod(inStr, &endptr); >>>> + if ((errno == ERANGE >>>> + && (dosage == HUGE_VALF || dosage == >>>> HUGE_VALL)) >>>> + || (errno != 0 && dosage == 0)) >>>> + { >>>> + perror("Error while reading genetic data >>>> (strtod)"); >>>> + exit(EXIT_FAILURE); >>>> + } >>>> >>>> - if (endptr == tmpstr.c_str()) { >>>> - cerr << "No digits were found while reading >>>> genetic data" >>>> - << " (individual " <>>> - << ", position " << j + 1 << ")" >>>> - << endl; >>>> - exit(EXIT_FAILURE); >>>> + if (endptr == tmpstr.c_str()) >>>> + { >>>> + cerr >>>> + << "No digits were found while >>>> reading genetic data" >>>> + << " (individual " <>>> ", position " >>>> + << j + 1 << ")" << endl; >>>> + exit(EXIT_FAILURE); >>>> + } >>>> + /* If we got here, strtod() successfully >>>> parsed a number */ >>>> + G.put(dosage, k, j); >>>> } >>>> - >>>> - /* If we got here, strtod() successfully parsed >>>> a number */ >>>> - G.put(dosage, k, j); >>>> + else >>>> + { >>>> + std::cerr << "cannot read dose-file: " << fname >>>> + << "check skipd and ngpreds >>>> parameters\n"; >>>> + infile.close(); >>>> + exit(1); >>>> + } >>>> } >>>> - else >>>> - { >>>> - std::cerr << "cannot read dose-file: " << fname >>>> - << "check skipd and ngpreds >>>> parameters\n"; >>>> - infile.close(); >>>> - exit(1); >>>> - } >>>> } >>>> + else >>>> + { >>>> + std::string all_numbers; >>>> + all_numbers.reserve(nsnps * ngpreds * 7); >>>> + std::getline(infile, all_numbers); >>>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>>> * ngpreds); >>>> + } >>>> k++; >>>> } >>>> else >>>> >>>> Modified: branches/ProbABEL-0.50/src/gendata.h >>>> =================================================================== >>>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>>> (rev 1663) >>>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>>> (rev 1664) >>>> @@ -44,7 +44,7 @@ >>>> unsigned int nids; >>>> unsigned int ngpreds; >>>> gendata(); >>>> - double convert( char* source, char** endPtr ); >>>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>>> amount_of_numbers); >>>> >>>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>>> ingpreds, >>>> unsigned int npeople, unsigned int nmeasured, >>>> >>>> _______________________________________________ >>>> Genabel-commits mailing list >>>> Genabel-commits at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From alvaro.frank at rwth-aachen.de Mon Mar 31 22:48:11 2014 From: alvaro.frank at rwth-aachen.de (Frank, Alvaro Jesus) Date: Mon, 31 Mar 2014 20:48:11 +0000 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <5335F579.9070608@karssen.org> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com>,<5335F579.9070608@karssen.org> Message-ID: <244CF001646FF74FB34F372310A332C57AD899@MBX2.rwth-ad.de> Dear all, How about instead of going from float>text>double not just use a binary mask after casting with errors? 0.67699998617172241 > 0.67700000000000000 with a mask on every number? This image would tell you wish bits need to be set to zero: http://cnx.org/content/m32770/latest/graphics1.png masking is super fast if its c/c++. This may not be portable tho. But the way floating point numbers are stored should be generic (IEEE) anyway. -Alvaro ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] Sent: Friday, March 28, 2014 11:19 PM To: genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src Dear all, (I guess the previous version of this mail went to the commit email list, so here it is again for the devel list). Indeed: an impressive speed-up! Well done Maarten. On 28-03-14 20:30, Maarten Kooyman wrote: > I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted > for sex and age (I did not run it in triplet but gives an idea) > > version 0.42 0.50_branch > FV 58 52 > mldose 48 12 > all times ate in seconds. > > As you can see the filevector format in the part that slows down the > program. When profiling the reading from FV takes up 86% of all the time > the program takes. > The current problem with reading from filevector is that the fv dat ais stored in floats (this is logical as it means half the disk space usage compared to storing doubles, moreover, the imputed data is never more precise than a float anyway). However, internally ProbABEL uses doubles for calculations. This means conversion from float to double must occur at some point. Simply casting to double gives impression. For example casting a float 0.677 to double gives: 0.67699998617172241 Therefore, with version 0.4.0 I changed this and used a string as intermediate form, followed by strtod(). First I used stringstreams, but these turn out to be much too slow for our use case. Now snprintf() is used. For the above example the double value is: 0.67700000000000005, much closer to what we would like to see. Using this two-step conversion means the output when using fv is equal to the output using txt data (and equal to using R), within float precision. Using Maarten's 'strtod' will speed up this part as well, but the snprintf() call is still expensive. Apart from this two-step conversion we may also be inefficient because the dosage/probability values are converted one array element at the time. Maybe we can gain something there, like Maarten did for the txt format and simply sending a whole 'line'/array to the conversion may help. Given that most people nowadays store their imputation results in chunks of chromosomes anyway (i.e. small(er) files), and the fact that I think implementing the ability to read gziped files is not difficult, it may be time to give mldose.gz files another chance for ProbABEL users. It will save them the conversion from mldose.gz to DatABEL. Of course we can still support DatABEL files, but (depending on how fast reading from gzipped files is), our recommendation could change with the upcoming ProbABEL v0.5.0. Any thoughts on this? Best, Lennart. > On 28-03-14 20:15, Yury Aulchenko wrote: >> 10 fold is good speed up. An order of magnitude :) >> >> Wonder how it compares now to the reading from plain text files? >> >> Y >> >> ---------------- >> Sent from mobile device, please excuse possible typos >> >>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>> >>> Author: maartenk >>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>> New Revision: 1664 >>> >>> Modified: >>> branches/ProbABEL-0.50/src/gendata.cpp >>> branches/ProbABEL-0.50/src/gendata.h >>> Log: >>> new implementation of reading in numbers of mldose file: this version >>> is about a 10(!) fold faster than in ProABEL 0.42 >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -40,58 +40,69 @@ >>> #endif >>> #include "utilities.h" >>> >>> -double mldose_strtod(const char *str_pointer) { >>> - // This function is inspired on some answers found at >>> stackoverflow : >>> - // eg question 5678932 >>> - int sign = 0; >>> - double result = 0; >>> - //check if not a null pointer or NaN (right now checks only >>> first character) >>> -//TODO: make catching of NaN more rigid >>> - if (!*str_pointer | *str_pointer == 'N'){ >>> - return std::numeric_limits::quiet_NaN(); >>> + >>> +void gendata::mldose_line_to_matrix(int k,const char >>> *all_numbers,int amount_of_numbers){ >>> + int j = 0; >>> + //check if not a null pointer >>> + if (!*all_numbers){ >>> + perror("Error while reading genetic data (expected pointer >>> to char but found a null pointer)"); >>> + exit(EXIT_FAILURE); >>> } >>> - //skip whitespace >>> - while (*str_pointer == ' ') >>> + while (j>> { >>> - str_pointer++; >>> - } >>> - //set sign to -1 if negative: multiply by sign just before return >>> - if (*str_pointer == '-') >>> - { >>> - str_pointer++; >>> - sign = -1; >>> - } >>> - //read digits before dot >>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>> - result = result * 10 + (*str_pointer++ - '0'); >>> - } >>> - //read digit after dot >>> - if (*str_pointer == '.') >>> - { >>> - double decimal_counter = 1.0; >>> - str_pointer++; >>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>> + double result = 0; >>> + //skip whitespace >>> + while (*all_numbers == ' ') >>> { >>> - decimal_counter *= 0.1; >>> - result += (*str_pointer++ - '0') * decimal_counter; >>> + all_numbers++; >>> } >>> + //check NaN (right now checks only first character) >>> + //TODO: make catching of NaN more rigid >>> + if (*all_numbers == 'N') >>> + { >>> + result = std::numeric_limits::quiet_NaN(); >>> + //skip other characters of NaN >>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>> + { >>> + all_numbers++; >>> + } >>> + } >>> + else >>> + { >>> + int sign = 0; >>> + //set sign to -1 if negative: multiply by sign just >>> before return >>> + if (*all_numbers == '-') >>> + { >>> + all_numbers++; >>> + sign = -1; >>> + } >>> + //read digits before dot >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + result = result * 10 + (*all_numbers++ - '0'); >>> + } >>> + //read digit after dot >>> + if (*all_numbers == '.') >>> + { >>> + double decimal_counter = 1.0; >>> + all_numbers++; >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + decimal_counter *= 0.1; >>> + result += (*all_numbers++ - '0') * decimal_counter; >>> + } >>> + } >>> + //correct for negative number >>> + if (sign == -1) >>> + { >>> + result = sign * result; >>> + } >>> + } >>> + G.put(result, k, j); >>> + j++; >>> } >>> - //str_pointer should be null since all characters are read. >>> - if (*str_pointer){ >>> - perror("Error while reading genetic data (mldose_strtod)"); >>> - exit(EXIT_FAILURE); >>> - } >>> - //correct for negative number >>> - if (sign == -1){ >>> - return sign * result; >>> - }else{ >>> - return result; >>> - } >>> - >>> } >>> >>> - >>> - >>> void gendata::get_var(int var, double * data) >>> { >>> // Read the genetic data for SNP 'var' and store in the array >>> 'data' >>> @@ -246,7 +257,7 @@ >>> size_t strpos = tmpstr.find("->"); >>> if (strpos != string::npos) >>> { >>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>> } >>> else >>> { >>> @@ -255,8 +266,8 @@ >>> if (tmpid != idnames[k]) >>> { >>> cerr << "phenotype file and dose or probability >>> file " >>> - << "did not match at line " <>> (" << tmpid >>> - << " != " << idnames[k] << ")" << endl; >>> + << "did not match at line " <>> " (" >>> + << tmpid << " != " << idnames[k] << ")" >>> << endl; >>> infile.close(); >>> exit(1); >>> } >>> @@ -267,47 +278,58 @@ >>> infile >> tmpstr; >>> } >>> >>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> + int oldstyle = 0; >>> + if (oldstyle == 1) >>> { >>> - if (infile.good()) >>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> { >>> - infile >> inStr; >>> - // tmpstr contains the dosage/probability in >>> - // string form. Convert it to double (if tmpstr is >>> - // NaN it will be set to nan). >>> - double dosage; >>> - char *endptr; >>> - errno = 0; // To distinguish success/failure >>> - // after strtod() >>> + if (infile.good()) >>> + { >>> + infile >> inStr; >>> + // tmpstr contains the dosage/probability in >>> + // string form. Convert it to double (if >>> tmpstr is >>> + // NaN it will be set to nan). >>> + double dosage; >>> + char *endptr; >>> + errno = 0; // To distinguish >>> success/failure >>> + // after strtod() >>> >>> - dosage = mldose_strtod(inStr); >>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>> -// if ((errno == ERANGE && >>> -// (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> -// || (errno != 0 && dosage == 0)) { >>> -// perror("Error while reading genetic data >>> (strtod)"); >>> -// exit(EXIT_FAILURE); >>> -// } >>> + dosage = strtod(inStr, &endptr); >>> + if ((errno == ERANGE >>> + && (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> + || (errno != 0 && dosage == 0)) >>> + { >>> + perror("Error while reading genetic data >>> (strtod)"); >>> + exit(EXIT_FAILURE); >>> + } >>> >>> - if (endptr == tmpstr.c_str()) { >>> - cerr << "No digits were found while reading >>> genetic data" >>> - << " (individual " <>> - << ", position " << j + 1 << ")" >>> - << endl; >>> - exit(EXIT_FAILURE); >>> + if (endptr == tmpstr.c_str()) >>> + { >>> + cerr >>> + << "No digits were found while >>> reading genetic data" >>> + << " (individual " <>> ", position " >>> + << j + 1 << ")" << endl; >>> + exit(EXIT_FAILURE); >>> + } >>> + /* If we got here, strtod() successfully >>> parsed a number */ >>> + G.put(dosage, k, j); >>> } >>> - >>> - /* If we got here, strtod() successfully parsed >>> a number */ >>> - G.put(dosage, k, j); >>> + else >>> + { >>> + std::cerr << "cannot read dose-file: " << fname >>> + << "check skipd and ngpreds >>> parameters\n"; >>> + infile.close(); >>> + exit(1); >>> + } >>> } >>> - else >>> - { >>> - std::cerr << "cannot read dose-file: " << fname >>> - << "check skipd and ngpreds >>> parameters\n"; >>> - infile.close(); >>> - exit(1); >>> - } >>> } >>> + else >>> + { >>> + std::string all_numbers; >>> + all_numbers.reserve(nsnps * ngpreds * 7); >>> + std::getline(infile, all_numbers); >>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>> * ngpreds); >>> + } >>> k++; >>> } >>> else >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.h >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -44,7 +44,7 @@ >>> unsigned int nids; >>> unsigned int ngpreds; >>> gendata(); >>> - double convert( char* source, char** endPtr ); >>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>> amount_of_numbers); >>> >>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>> ingpreds, >>> unsigned int npeople, unsigned int nmeasured, >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- From alvaro.frank at rwth-aachen.de Mon Mar 31 23:07:20 2014 From: alvaro.frank at rwth-aachen.de (Frank, Alvaro Jesus) Date: Mon, 31 Mar 2014 21:07:20 +0000 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: <244CF001646FF74FB34F372310A332C57AD899@MBX2.rwth-ad.de> References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com>, <5335F579.9070608@karssen.org>, <244CF001646FF74FB34F372310A332C57AD899@MBX2.rwth-ad.de> Message-ID: <244CF001646FF74FB34F372310A332C57AD8B4@MBX2.rwth-ad.de> Perhaps what I mentioned earlier cannot be done: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Since a Double 0.677 cant represented any other way than 67699998617172241 for example. ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of Frank, Alvaro Jesus [alvaro.frank at rwth-aachen.de] Sent: Monday, March 31, 2014 10:48 PM To: L.C. Karssen; genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src Dear all, How about instead of going from float>text>double not just use a binary mask after casting with errors? 0.67699998617172241 > 0.67700000000000000 with a mask on every number? This image would tell you wish bits need to be set to zero: http://cnx.org/content/m32770/latest/graphics1.png masking is super fast if its c/c++. This may not be portable tho. But the way floating point numbers are stored should be generic (IEEE) anyway. -Alvaro ________________________________________ From: genabel-devel-bounces at lists.r-forge.r-project.org [genabel-devel-bounces at lists.r-forge.r-project.org] on behalf of L.C. Karssen [lennart at karssen.org] Sent: Friday, March 28, 2014 11:19 PM To: genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src Dear all, (I guess the previous version of this mail went to the commit email list, so here it is again for the devel list). Indeed: an impressive speed-up! Well done Maarten. On 28-03-14 20:30, Maarten Kooyman wrote: > I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted > for sex and age (I did not run it in triplet but gives an idea) > > version 0.42 0.50_branch > FV 58 52 > mldose 48 12 > all times ate in seconds. > > As you can see the filevector format in the part that slows down the > program. When profiling the reading from FV takes up 86% of all the time > the program takes. > The current problem with reading from filevector is that the fv dat ais stored in floats (this is logical as it means half the disk space usage compared to storing doubles, moreover, the imputed data is never more precise than a float anyway). However, internally ProbABEL uses doubles for calculations. This means conversion from float to double must occur at some point. Simply casting to double gives impression. For example casting a float 0.677 to double gives: 0.67699998617172241 Therefore, with version 0.4.0 I changed this and used a string as intermediate form, followed by strtod(). First I used stringstreams, but these turn out to be much too slow for our use case. Now snprintf() is used. For the above example the double value is: 0.67700000000000005, much closer to what we would like to see. Using this two-step conversion means the output when using fv is equal to the output using txt data (and equal to using R), within float precision. Using Maarten's 'strtod' will speed up this part as well, but the snprintf() call is still expensive. Apart from this two-step conversion we may also be inefficient because the dosage/probability values are converted one array element at the time. Maybe we can gain something there, like Maarten did for the txt format and simply sending a whole 'line'/array to the conversion may help. Given that most people nowadays store their imputation results in chunks of chromosomes anyway (i.e. small(er) files), and the fact that I think implementing the ability to read gziped files is not difficult, it may be time to give mldose.gz files another chance for ProbABEL users. It will save them the conversion from mldose.gz to DatABEL. Of course we can still support DatABEL files, but (depending on how fast reading from gzipped files is), our recommendation could change with the upcoming ProbABEL v0.5.0. Any thoughts on this? Best, Lennart. > On 28-03-14 20:15, Yury Aulchenko wrote: >> 10 fold is good speed up. An order of magnitude :) >> >> Wonder how it compares now to the reading from plain text files? >> >> Y >> >> ---------------- >> Sent from mobile device, please excuse possible typos >> >>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>> >>> Author: maartenk >>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>> New Revision: 1664 >>> >>> Modified: >>> branches/ProbABEL-0.50/src/gendata.cpp >>> branches/ProbABEL-0.50/src/gendata.h >>> Log: >>> new implementation of reading in numbers of mldose file: this version >>> is about a 10(!) fold faster than in ProABEL 0.42 >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -40,58 +40,69 @@ >>> #endif >>> #include "utilities.h" >>> >>> -double mldose_strtod(const char *str_pointer) { >>> - // This function is inspired on some answers found at >>> stackoverflow : >>> - // eg question 5678932 >>> - int sign = 0; >>> - double result = 0; >>> - //check if not a null pointer or NaN (right now checks only >>> first character) >>> -//TODO: make catching of NaN more rigid >>> - if (!*str_pointer | *str_pointer == 'N'){ >>> - return std::numeric_limits::quiet_NaN(); >>> + >>> +void gendata::mldose_line_to_matrix(int k,const char >>> *all_numbers,int amount_of_numbers){ >>> + int j = 0; >>> + //check if not a null pointer >>> + if (!*all_numbers){ >>> + perror("Error while reading genetic data (expected pointer >>> to char but found a null pointer)"); >>> + exit(EXIT_FAILURE); >>> } >>> - //skip whitespace >>> - while (*str_pointer == ' ') >>> + while (j>> { >>> - str_pointer++; >>> - } >>> - //set sign to -1 if negative: multiply by sign just before return >>> - if (*str_pointer == '-') >>> - { >>> - str_pointer++; >>> - sign = -1; >>> - } >>> - //read digits before dot >>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>> - result = result * 10 + (*str_pointer++ - '0'); >>> - } >>> - //read digit after dot >>> - if (*str_pointer == '.') >>> - { >>> - double decimal_counter = 1.0; >>> - str_pointer++; >>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>> + double result = 0; >>> + //skip whitespace >>> + while (*all_numbers == ' ') >>> { >>> - decimal_counter *= 0.1; >>> - result += (*str_pointer++ - '0') * decimal_counter; >>> + all_numbers++; >>> } >>> + //check NaN (right now checks only first character) >>> + //TODO: make catching of NaN more rigid >>> + if (*all_numbers == 'N') >>> + { >>> + result = std::numeric_limits::quiet_NaN(); >>> + //skip other characters of NaN >>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>> + { >>> + all_numbers++; >>> + } >>> + } >>> + else >>> + { >>> + int sign = 0; >>> + //set sign to -1 if negative: multiply by sign just >>> before return >>> + if (*all_numbers == '-') >>> + { >>> + all_numbers++; >>> + sign = -1; >>> + } >>> + //read digits before dot >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + result = result * 10 + (*all_numbers++ - '0'); >>> + } >>> + //read digit after dot >>> + if (*all_numbers == '.') >>> + { >>> + double decimal_counter = 1.0; >>> + all_numbers++; >>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>> + { >>> + decimal_counter *= 0.1; >>> + result += (*all_numbers++ - '0') * decimal_counter; >>> + } >>> + } >>> + //correct for negative number >>> + if (sign == -1) >>> + { >>> + result = sign * result; >>> + } >>> + } >>> + G.put(result, k, j); >>> + j++; >>> } >>> - //str_pointer should be null since all characters are read. >>> - if (*str_pointer){ >>> - perror("Error while reading genetic data (mldose_strtod)"); >>> - exit(EXIT_FAILURE); >>> - } >>> - //correct for negative number >>> - if (sign == -1){ >>> - return sign * result; >>> - }else{ >>> - return result; >>> - } >>> - >>> } >>> >>> - >>> - >>> void gendata::get_var(int var, double * data) >>> { >>> // Read the genetic data for SNP 'var' and store in the array >>> 'data' >>> @@ -246,7 +257,7 @@ >>> size_t strpos = tmpstr.find("->"); >>> if (strpos != string::npos) >>> { >>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>> } >>> else >>> { >>> @@ -255,8 +266,8 @@ >>> if (tmpid != idnames[k]) >>> { >>> cerr << "phenotype file and dose or probability >>> file " >>> - << "did not match at line " <>> (" << tmpid >>> - << " != " << idnames[k] << ")" << endl; >>> + << "did not match at line " <>> " (" >>> + << tmpid << " != " << idnames[k] << ")" >>> << endl; >>> infile.close(); >>> exit(1); >>> } >>> @@ -267,47 +278,58 @@ >>> infile >> tmpstr; >>> } >>> >>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> + int oldstyle = 0; >>> + if (oldstyle == 1) >>> { >>> - if (infile.good()) >>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>> { >>> - infile >> inStr; >>> - // tmpstr contains the dosage/probability in >>> - // string form. Convert it to double (if tmpstr is >>> - // NaN it will be set to nan). >>> - double dosage; >>> - char *endptr; >>> - errno = 0; // To distinguish success/failure >>> - // after strtod() >>> + if (infile.good()) >>> + { >>> + infile >> inStr; >>> + // tmpstr contains the dosage/probability in >>> + // string form. Convert it to double (if >>> tmpstr is >>> + // NaN it will be set to nan). >>> + double dosage; >>> + char *endptr; >>> + errno = 0; // To distinguish >>> success/failure >>> + // after strtod() >>> >>> - dosage = mldose_strtod(inStr); >>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>> -// if ((errno == ERANGE && >>> -// (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> -// || (errno != 0 && dosage == 0)) { >>> -// perror("Error while reading genetic data >>> (strtod)"); >>> -// exit(EXIT_FAILURE); >>> -// } >>> + dosage = strtod(inStr, &endptr); >>> + if ((errno == ERANGE >>> + && (dosage == HUGE_VALF || dosage == >>> HUGE_VALL)) >>> + || (errno != 0 && dosage == 0)) >>> + { >>> + perror("Error while reading genetic data >>> (strtod)"); >>> + exit(EXIT_FAILURE); >>> + } >>> >>> - if (endptr == tmpstr.c_str()) { >>> - cerr << "No digits were found while reading >>> genetic data" >>> - << " (individual " <>> - << ", position " << j + 1 << ")" >>> - << endl; >>> - exit(EXIT_FAILURE); >>> + if (endptr == tmpstr.c_str()) >>> + { >>> + cerr >>> + << "No digits were found while >>> reading genetic data" >>> + << " (individual " <>> ", position " >>> + << j + 1 << ")" << endl; >>> + exit(EXIT_FAILURE); >>> + } >>> + /* If we got here, strtod() successfully >>> parsed a number */ >>> + G.put(dosage, k, j); >>> } >>> - >>> - /* If we got here, strtod() successfully parsed >>> a number */ >>> - G.put(dosage, k, j); >>> + else >>> + { >>> + std::cerr << "cannot read dose-file: " << fname >>> + << "check skipd and ngpreds >>> parameters\n"; >>> + infile.close(); >>> + exit(1); >>> + } >>> } >>> - else >>> - { >>> - std::cerr << "cannot read dose-file: " << fname >>> - << "check skipd and ngpreds >>> parameters\n"; >>> - infile.close(); >>> - exit(1); >>> - } >>> } >>> + else >>> + { >>> + std::string all_numbers; >>> + all_numbers.reserve(nsnps * ngpreds * 7); >>> + std::getline(infile, all_numbers); >>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>> * ngpreds); >>> + } >>> k++; >>> } >>> else >>> >>> Modified: branches/ProbABEL-0.50/src/gendata.h >>> =================================================================== >>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>> (rev 1663) >>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>> (rev 1664) >>> @@ -44,7 +44,7 @@ >>> unsigned int nids; >>> unsigned int ngpreds; >>> gendata(); >>> - double convert( char* source, char** endPtr ); >>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>> amount_of_numbers); >>> >>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>> ingpreds, >>> unsigned int npeople, unsigned int nmeasured, >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- _______________________________________________ genabel-devel mailing list genabel-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Mon Mar 31 23:28:56 2014 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 31 Mar 2014 23:28:56 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com> <5335F579.9070608@karssen.org> Message-ID: <5339DE18.8050304@karssen.org> Hi Yurii, On 31-03-14 20:46, Yury Aulchenko wrote: > I personally find the fact that text outperforms binary disappointing Me too... One way out of the dilemma could be if we truncate the output to e.g. 4 significant digits. I don't think the output of mach/minimac will be more precise. I gave it a try, but it seems that the LRT-based chi^2 values don't play along nicely: they differ at the 1e-3 or 1e-4 level in the example data. My hunch is that a careful look at how we calculate the LRT may solve this, as subtraction of two values is a likely candidate for loss of precision. > (and, if you forget about technical details - well, strange). On the > other hand this is probably good for user as it eradicates the need > to do conversion. Especially if we could work with compressed files. That's my hope as well. Of course I haven't tested the impact of reading zipped files yet. Maybe it'll put us back in square one. > Especially if we build interface to work with other type of text > outputs (e.g. IMPUTE2 would be a candidate)... Indeed. Lennart. > > Yurii > > ---------------- > Sent from mobile device, please excuse possible typos > >> On 28 Mar 2014, at 23:19, "L.C. Karssen" wrote: >> >> Dear all, >> >> (I guess the previous version of this mail went to the commit email >> list, so here it is again for the devel list). >> >> >> Indeed: an impressive speed-up! Well done Maarten. >> >>> On 28-03-14 20:30, Maarten Kooyman wrote: >>> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted >>> for sex and age (I did not run it in triplet but gives an idea) >>> >>> version 0.42 0.50_branch >>> FV 58 52 >>> mldose 48 12 >>> all times ate in seconds. >>> >>> As you can see the filevector format in the part that slows down the >>> program. When profiling the reading from FV takes up 86% of all the time >>> the program takes. >> >> >> The current problem with reading from filevector is that the fv dat ais >> stored in floats (this is logical as it means half the disk space usage >> compared to storing doubles, moreover, the imputed data is never more >> precise than a float anyway). >> However, internally ProbABEL uses doubles for calculations. This means >> conversion from float to double must occur at some point. >> >> Simply casting to double gives impression. For example casting a float >> 0.677 to double gives: 0.67699998617172241 >> Therefore, with version 0.4.0 I changed this and used a string as >> intermediate form, followed by strtod(). First I used stringstreams, but >> these turn out to be much too slow for our use case. Now snprintf() is >> used. For the above example the double value is: 0.67700000000000005, >> much closer to what we would like to see. Using this two-step conversion >> means the output when using fv is equal to the output using txt data >> (and equal to using R), within float precision. >> >> Using Maarten's 'strtod' will speed up this part as well, but the >> snprintf() call is still expensive. >> >> Apart from this two-step conversion we may also be inefficient because >> the dosage/probability values are converted one array element at the >> time. Maybe we can gain something there, like Maarten did for the txt >> format and simply sending a whole 'line'/array to the conversion may help. >> >> >> >> >> Given that most people nowadays store their imputation results in chunks >> of chromosomes anyway (i.e. small(er) files), and the fact that I think >> implementing the ability to read gziped files is not difficult, it may >> be time to give mldose.gz files another chance for ProbABEL users. It >> will save them the conversion from mldose.gz to DatABEL. >> Of course we can still support DatABEL files, but (depending on how fast >> reading from gzipped files is), our recommendation could change with the >> upcoming ProbABEL v0.5.0. >> >> Any thoughts on this? >> >> >> Best, >> >> Lennart. >> >> >> >> >> >>>> On 28-03-14 20:15, Yury Aulchenko wrote: >>>> 10 fold is good speed up. An order of magnitude :) >>>> >>>> Wonder how it compares now to the reading from plain text files? >>>> >>>> Y >>>> >>>> ---------------- >>>> Sent from mobile device, please excuse possible typos >>>> >>>>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>>>> >>>>> Author: maartenk >>>>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>>>> New Revision: 1664 >>>>> >>>>> Modified: >>>>> branches/ProbABEL-0.50/src/gendata.cpp >>>>> branches/ProbABEL-0.50/src/gendata.h >>>>> Log: >>>>> new implementation of reading in numbers of mldose file: this version >>>>> is about a 10(!) fold faster than in ProABEL 0.42 >>>>> >>>>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>>>> =================================================================== >>>>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>>>> (rev 1663) >>>>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>>>> (rev 1664) >>>>> @@ -40,58 +40,69 @@ >>>>> #endif >>>>> #include "utilities.h" >>>>> >>>>> -double mldose_strtod(const char *str_pointer) { >>>>> - // This function is inspired on some answers found at >>>>> stackoverflow : >>>>> - // eg question 5678932 >>>>> - int sign = 0; >>>>> - double result = 0; >>>>> - //check if not a null pointer or NaN (right now checks only >>>>> first character) >>>>> -//TODO: make catching of NaN more rigid >>>>> - if (!*str_pointer | *str_pointer == 'N'){ >>>>> - return std::numeric_limits::quiet_NaN(); >>>>> + >>>>> +void gendata::mldose_line_to_matrix(int k,const char >>>>> *all_numbers,int amount_of_numbers){ >>>>> + int j = 0; >>>>> + //check if not a null pointer >>>>> + if (!*all_numbers){ >>>>> + perror("Error while reading genetic data (expected pointer >>>>> to char but found a null pointer)"); >>>>> + exit(EXIT_FAILURE); >>>>> } >>>>> - //skip whitespace >>>>> - while (*str_pointer == ' ') >>>>> + while (j>>>> { >>>>> - str_pointer++; >>>>> - } >>>>> - //set sign to -1 if negative: multiply by sign just before return >>>>> - if (*str_pointer == '-') >>>>> - { >>>>> - str_pointer++; >>>>> - sign = -1; >>>>> - } >>>>> - //read digits before dot >>>>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>>>> - result = result * 10 + (*str_pointer++ - '0'); >>>>> - } >>>>> - //read digit after dot >>>>> - if (*str_pointer == '.') >>>>> - { >>>>> - double decimal_counter = 1.0; >>>>> - str_pointer++; >>>>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>>>> + double result = 0; >>>>> + //skip whitespace >>>>> + while (*all_numbers == ' ') >>>>> { >>>>> - decimal_counter *= 0.1; >>>>> - result += (*str_pointer++ - '0') * decimal_counter; >>>>> + all_numbers++; >>>>> } >>>>> + //check NaN (right now checks only first character) >>>>> + //TODO: make catching of NaN more rigid >>>>> + if (*all_numbers == 'N') >>>>> + { >>>>> + result = std::numeric_limits::quiet_NaN(); >>>>> + //skip other characters of NaN >>>>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>>>> + { >>>>> + all_numbers++; >>>>> + } >>>>> + } >>>>> + else >>>>> + { >>>>> + int sign = 0; >>>>> + //set sign to -1 if negative: multiply by sign just >>>>> before return >>>>> + if (*all_numbers == '-') >>>>> + { >>>>> + all_numbers++; >>>>> + sign = -1; >>>>> + } >>>>> + //read digits before dot >>>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>>> + { >>>>> + result = result * 10 + (*all_numbers++ - '0'); >>>>> + } >>>>> + //read digit after dot >>>>> + if (*all_numbers == '.') >>>>> + { >>>>> + double decimal_counter = 1.0; >>>>> + all_numbers++; >>>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>>> + { >>>>> + decimal_counter *= 0.1; >>>>> + result += (*all_numbers++ - '0') * decimal_counter; >>>>> + } >>>>> + } >>>>> + //correct for negative number >>>>> + if (sign == -1) >>>>> + { >>>>> + result = sign * result; >>>>> + } >>>>> + } >>>>> + G.put(result, k, j); >>>>> + j++; >>>>> } >>>>> - //str_pointer should be null since all characters are read. >>>>> - if (*str_pointer){ >>>>> - perror("Error while reading genetic data (mldose_strtod)"); >>>>> - exit(EXIT_FAILURE); >>>>> - } >>>>> - //correct for negative number >>>>> - if (sign == -1){ >>>>> - return sign * result; >>>>> - }else{ >>>>> - return result; >>>>> - } >>>>> - >>>>> } >>>>> >>>>> - >>>>> - >>>>> void gendata::get_var(int var, double * data) >>>>> { >>>>> // Read the genetic data for SNP 'var' and store in the array >>>>> 'data' >>>>> @@ -246,7 +257,7 @@ >>>>> size_t strpos = tmpstr.find("->"); >>>>> if (strpos != string::npos) >>>>> { >>>>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>>>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>>>> } >>>>> else >>>>> { >>>>> @@ -255,8 +266,8 @@ >>>>> if (tmpid != idnames[k]) >>>>> { >>>>> cerr << "phenotype file and dose or probability >>>>> file " >>>>> - << "did not match at line " <>>>> (" << tmpid >>>>> - << " != " << idnames[k] << ")" << endl; >>>>> + << "did not match at line " <>>>> " (" >>>>> + << tmpid << " != " << idnames[k] << ")" >>>>> << endl; >>>>> infile.close(); >>>>> exit(1); >>>>> } >>>>> @@ -267,47 +278,58 @@ >>>>> infile >> tmpstr; >>>>> } >>>>> >>>>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>>> + int oldstyle = 0; >>>>> + if (oldstyle == 1) >>>>> { >>>>> - if (infile.good()) >>>>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>>> { >>>>> - infile >> inStr; >>>>> - // tmpstr contains the dosage/probability in >>>>> - // string form. Convert it to double (if tmpstr is >>>>> - // NaN it will be set to nan). >>>>> - double dosage; >>>>> - char *endptr; >>>>> - errno = 0; // To distinguish success/failure >>>>> - // after strtod() >>>>> + if (infile.good()) >>>>> + { >>>>> + infile >> inStr; >>>>> + // tmpstr contains the dosage/probability in >>>>> + // string form. Convert it to double (if >>>>> tmpstr is >>>>> + // NaN it will be set to nan). >>>>> + double dosage; >>>>> + char *endptr; >>>>> + errno = 0; // To distinguish >>>>> success/failure >>>>> + // after strtod() >>>>> >>>>> - dosage = mldose_strtod(inStr); >>>>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>>>> -// if ((errno == ERANGE && >>>>> -// (dosage == HUGE_VALF || dosage == >>>>> HUGE_VALL)) >>>>> -// || (errno != 0 && dosage == 0)) { >>>>> -// perror("Error while reading genetic data >>>>> (strtod)"); >>>>> -// exit(EXIT_FAILURE); >>>>> -// } >>>>> + dosage = strtod(inStr, &endptr); >>>>> + if ((errno == ERANGE >>>>> + && (dosage == HUGE_VALF || dosage == >>>>> HUGE_VALL)) >>>>> + || (errno != 0 && dosage == 0)) >>>>> + { >>>>> + perror("Error while reading genetic data >>>>> (strtod)"); >>>>> + exit(EXIT_FAILURE); >>>>> + } >>>>> >>>>> - if (endptr == tmpstr.c_str()) { >>>>> - cerr << "No digits were found while reading >>>>> genetic data" >>>>> - << " (individual " <>>>> - << ", position " << j + 1 << ")" >>>>> - << endl; >>>>> - exit(EXIT_FAILURE); >>>>> + if (endptr == tmpstr.c_str()) >>>>> + { >>>>> + cerr >>>>> + << "No digits were found while >>>>> reading genetic data" >>>>> + << " (individual " <>>>> ", position " >>>>> + << j + 1 << ")" << endl; >>>>> + exit(EXIT_FAILURE); >>>>> + } >>>>> + /* If we got here, strtod() successfully >>>>> parsed a number */ >>>>> + G.put(dosage, k, j); >>>>> } >>>>> - >>>>> - /* If we got here, strtod() successfully parsed >>>>> a number */ >>>>> - G.put(dosage, k, j); >>>>> + else >>>>> + { >>>>> + std::cerr << "cannot read dose-file: " << fname >>>>> + << "check skipd and ngpreds >>>>> parameters\n"; >>>>> + infile.close(); >>>>> + exit(1); >>>>> + } >>>>> } >>>>> - else >>>>> - { >>>>> - std::cerr << "cannot read dose-file: " << fname >>>>> - << "check skipd and ngpreds >>>>> parameters\n"; >>>>> - infile.close(); >>>>> - exit(1); >>>>> - } >>>>> } >>>>> + else >>>>> + { >>>>> + std::string all_numbers; >>>>> + all_numbers.reserve(nsnps * ngpreds * 7); >>>>> + std::getline(infile, all_numbers); >>>>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>>>> * ngpreds); >>>>> + } >>>>> k++; >>>>> } >>>>> else >>>>> >>>>> Modified: branches/ProbABEL-0.50/src/gendata.h >>>>> =================================================================== >>>>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>>>> (rev 1663) >>>>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>>>> (rev 1664) >>>>> @@ -44,7 +44,7 @@ >>>>> unsigned int nids; >>>>> unsigned int ngpreds; >>>>> gendata(); >>>>> - double convert( char* source, char** endPtr ); >>>>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>>>> amount_of_numbers); >>>>> >>>>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>>>> ingpreds, >>>>> unsigned int npeople, unsigned int nmeasured, >>>>> >>>>> _______________________________________________ >>>>> Genabel-commits mailing list >>>>> Genabel-commits at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>>> _______________________________________________ >>>> Genabel-commits mailing list >>>> Genabel-commits at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From kooyman at gmail.com Mon Mar 31 23:42:48 2014 From: kooyman at gmail.com (Maarten Kooyman) Date: Mon, 31 Mar 2014 23:42:48 +0200 Subject: [GenABEL-dev] [Genabel-commits] r1664 - branches/ProbABEL-0.50/src In-Reply-To: References: <20140328191241.F38E6185FBC@r-forge.r-project.org> <5335CDCF.8090503@gmail.com> <5335F579.9070608@karssen.org> Message-ID: <5339E158.2040106@gmail.com> Dear All, It might be usefull to make next generation Databel with a interface for IMPUTE2/SHAPEIT and mach/minimac. Having one library/package to read the data would help all projects in usability. I'm not the one waiting to convert my 1kg imputations into other format. Nobody (in user perspective) feels like saving the same hundreds of GB of data in multiple formats. (And that is a practical reason for choosing a program to work with, and might not be the same as the best program) To centralize these function would also benefit method developers. They do not have to bother with writing another parser. Creating a reliable, fast and multi-format parser is boilerplate code and this kind of code you do not want to bother with if you have a new powerful methodology in mind. That is why lots of scientific software is picky on input format. There are offcourse some problems caused by the nature of the data format eg [1]. Kind regards, Maarten [1] One problem is that there is an number of different predictors in those formats. It varies between 1 and 3, where in case of IMPUTE2/SHAPEIT the probabilities do not sum to one. mach/minimac might be converted to 3 predictors since it should[1] add to one. On 31-03-14 20:46, Yury Aulchenko wrote: > I personally find the fact that text outperforms binary disappointing (and, if you forget about technical details - well, strange). On the other hand this is probably good for user as it eradicates the need to do conversion. Especially if we could work with compressed files. Especially if we build interface to work with other type of text outputs (e.g. IMPUTE2 would be a candidate)... > > Yurii > > ---------------- > Sent from mobile device, please excuse possible typos > >> On 28 Mar 2014, at 23:19, "L.C. Karssen" wrote: >> >> Dear all, >> >> (I guess the previous version of this mail went to the commit email >> list, so here it is again for the devel list). >> >> >> Indeed: an impressive speed-up! Well done Maarten. >> >>> On 28-03-14 20:30, Maarten Kooyman wrote: >>> I tested speed of ProbABEL on a dataset 33815 snp / 3485 people adjusted >>> for sex and age (I did not run it in triplet but gives an idea) >>> >>> version 0.42 0.50_branch >>> FV 58 52 >>> mldose 48 12 >>> all times ate in seconds. >>> >>> As you can see the filevector format in the part that slows down the >>> program. When profiling the reading from FV takes up 86% of all the time >>> the program takes. >> >> The current problem with reading from filevector is that the fv dat ais >> stored in floats (this is logical as it means half the disk space usage >> compared to storing doubles, moreover, the imputed data is never more >> precise than a float anyway). >> However, internally ProbABEL uses doubles for calculations. This means >> conversion from float to double must occur at some point. >> >> Simply casting to double gives impression. For example casting a float >> 0.677 to double gives: 0.67699998617172241 >> Therefore, with version 0.4.0 I changed this and used a string as >> intermediate form, followed by strtod(). First I used stringstreams, but >> these turn out to be much too slow for our use case. Now snprintf() is >> used. For the above example the double value is: 0.67700000000000005, >> much closer to what we would like to see. Using this two-step conversion >> means the output when using fv is equal to the output using txt data >> (and equal to using R), within float precision. >> >> Using Maarten's 'strtod' will speed up this part as well, but the >> snprintf() call is still expensive. >> >> Apart from this two-step conversion we may also be inefficient because >> the dosage/probability values are converted one array element at the >> time. Maybe we can gain something there, like Maarten did for the txt >> format and simply sending a whole 'line'/array to the conversion may help. >> >> >> >> >> Given that most people nowadays store their imputation results in chunks >> of chromosomes anyway (i.e. small(er) files), and the fact that I think >> implementing the ability to read gziped files is not difficult, it may >> be time to give mldose.gz files another chance for ProbABEL users. It >> will save them the conversion from mldose.gz to DatABEL. >> Of course we can still support DatABEL files, but (depending on how fast >> reading from gzipped files is), our recommendation could change with the >> upcoming ProbABEL v0.5.0. >> >> Any thoughts on this? >> >> >> Best, >> >> Lennart. >> >> >> >> >> >>>> On 28-03-14 20:15, Yury Aulchenko wrote: >>>> 10 fold is good speed up. An order of magnitude :) >>>> >>>> Wonder how it compares now to the reading from plain text files? >>>> >>>> Y >>>> >>>> ---------------- >>>> Sent from mobile device, please excuse possible typos >>>> >>>>> On 28 Mar 2014, at 20:12, noreply at r-forge.r-project.org wrote: >>>>> >>>>> Author: maartenk >>>>> Date: 2014-03-28 20:12:41 +0100 (Fri, 28 Mar 2014) >>>>> New Revision: 1664 >>>>> >>>>> Modified: >>>>> branches/ProbABEL-0.50/src/gendata.cpp >>>>> branches/ProbABEL-0.50/src/gendata.h >>>>> Log: >>>>> new implementation of reading in numbers of mldose file: this version >>>>> is about a 10(!) fold faster than in ProABEL 0.42 >>>>> >>>>> Modified: branches/ProbABEL-0.50/src/gendata.cpp >>>>> =================================================================== >>>>> --- branches/ProbABEL-0.50/src/gendata.cpp 2014-03-27 21:16:16 UTC >>>>> (rev 1663) >>>>> +++ branches/ProbABEL-0.50/src/gendata.cpp 2014-03-28 19:12:41 UTC >>>>> (rev 1664) >>>>> @@ -40,58 +40,69 @@ >>>>> #endif >>>>> #include "utilities.h" >>>>> >>>>> -double mldose_strtod(const char *str_pointer) { >>>>> - // This function is inspired on some answers found at >>>>> stackoverflow : >>>>> - // eg question 5678932 >>>>> - int sign = 0; >>>>> - double result = 0; >>>>> - //check if not a null pointer or NaN (right now checks only >>>>> first character) >>>>> -//TODO: make catching of NaN more rigid >>>>> - if (!*str_pointer | *str_pointer == 'N'){ >>>>> - return std::numeric_limits::quiet_NaN(); >>>>> + >>>>> +void gendata::mldose_line_to_matrix(int k,const char >>>>> *all_numbers,int amount_of_numbers){ >>>>> + int j = 0; >>>>> + //check if not a null pointer >>>>> + if (!*all_numbers){ >>>>> + perror("Error while reading genetic data (expected pointer >>>>> to char but found a null pointer)"); >>>>> + exit(EXIT_FAILURE); >>>>> } >>>>> - //skip whitespace >>>>> - while (*str_pointer == ' ') >>>>> + while (j>>>> { >>>>> - str_pointer++; >>>>> - } >>>>> - //set sign to -1 if negative: multiply by sign just before return >>>>> - if (*str_pointer == '-') >>>>> - { >>>>> - str_pointer++; >>>>> - sign = -1; >>>>> - } >>>>> - //read digits before dot >>>>> - while (*str_pointer <= '9' && *str_pointer >= '0'){ >>>>> - result = result * 10 + (*str_pointer++ - '0'); >>>>> - } >>>>> - //read digit after dot >>>>> - if (*str_pointer == '.') >>>>> - { >>>>> - double decimal_counter = 1.0; >>>>> - str_pointer++; >>>>> - while (*str_pointer <= '9' && *str_pointer >= '0') >>>>> + double result = 0; >>>>> + //skip whitespace >>>>> + while (*all_numbers == ' ') >>>>> { >>>>> - decimal_counter *= 0.1; >>>>> - result += (*str_pointer++ - '0') * decimal_counter; >>>>> + all_numbers++; >>>>> } >>>>> + //check NaN (right now checks only first character) >>>>> + //TODO: make catching of NaN more rigid >>>>> + if (*all_numbers == 'N') >>>>> + { >>>>> + result = std::numeric_limits::quiet_NaN(); >>>>> + //skip other characters of NaN >>>>> + while ((*all_numbers == 'a') | (*all_numbers == 'N')) >>>>> + { >>>>> + all_numbers++; >>>>> + } >>>>> + } >>>>> + else >>>>> + { >>>>> + int sign = 0; >>>>> + //set sign to -1 if negative: multiply by sign just >>>>> before return >>>>> + if (*all_numbers == '-') >>>>> + { >>>>> + all_numbers++; >>>>> + sign = -1; >>>>> + } >>>>> + //read digits before dot >>>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>>> + { >>>>> + result = result * 10 + (*all_numbers++ - '0'); >>>>> + } >>>>> + //read digit after dot >>>>> + if (*all_numbers == '.') >>>>> + { >>>>> + double decimal_counter = 1.0; >>>>> + all_numbers++; >>>>> + while (*all_numbers <= '9' && *all_numbers >= '0') >>>>> + { >>>>> + decimal_counter *= 0.1; >>>>> + result += (*all_numbers++ - '0') * decimal_counter; >>>>> + } >>>>> + } >>>>> + //correct for negative number >>>>> + if (sign == -1) >>>>> + { >>>>> + result = sign * result; >>>>> + } >>>>> + } >>>>> + G.put(result, k, j); >>>>> + j++; >>>>> } >>>>> - //str_pointer should be null since all characters are read. >>>>> - if (*str_pointer){ >>>>> - perror("Error while reading genetic data (mldose_strtod)"); >>>>> - exit(EXIT_FAILURE); >>>>> - } >>>>> - //correct for negative number >>>>> - if (sign == -1){ >>>>> - return sign * result; >>>>> - }else{ >>>>> - return result; >>>>> - } >>>>> - >>>>> } >>>>> >>>>> - >>>>> - >>>>> void gendata::get_var(int var, double * data) >>>>> { >>>>> // Read the genetic data for SNP 'var' and store in the array >>>>> 'data' >>>>> @@ -246,7 +257,7 @@ >>>>> size_t strpos = tmpstr.find("->"); >>>>> if (strpos != string::npos) >>>>> { >>>>> - tmpid = tmpstr.substr(strpos+2, string::npos); >>>>> + tmpid = tmpstr.substr(strpos + 2, string::npos); >>>>> } >>>>> else >>>>> { >>>>> @@ -255,8 +266,8 @@ >>>>> if (tmpid != idnames[k]) >>>>> { >>>>> cerr << "phenotype file and dose or probability >>>>> file " >>>>> - << "did not match at line " <>>>> (" << tmpid >>>>> - << " != " << idnames[k] << ")" << endl; >>>>> + << "did not match at line " <>>>> " (" >>>>> + << tmpid << " != " << idnames[k] << ")" >>>>> << endl; >>>>> infile.close(); >>>>> exit(1); >>>>> } >>>>> @@ -267,47 +278,58 @@ >>>>> infile >> tmpstr; >>>>> } >>>>> >>>>> - for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>>> + int oldstyle = 0; >>>>> + if (oldstyle == 1) >>>>> { >>>>> - if (infile.good()) >>>>> + for (unsigned int j = 0; j < (nsnps * ngpreds); j++) >>>>> { >>>>> - infile >> inStr; >>>>> - // tmpstr contains the dosage/probability in >>>>> - // string form. Convert it to double (if tmpstr is >>>>> - // NaN it will be set to nan). >>>>> - double dosage; >>>>> - char *endptr; >>>>> - errno = 0; // To distinguish success/failure >>>>> - // after strtod() >>>>> + if (infile.good()) >>>>> + { >>>>> + infile >> inStr; >>>>> + // tmpstr contains the dosage/probability in >>>>> + // string form. Convert it to double (if >>>>> tmpstr is >>>>> + // NaN it will be set to nan). >>>>> + double dosage; >>>>> + char *endptr; >>>>> + errno = 0; // To distinguish >>>>> success/failure >>>>> + // after strtod() >>>>> >>>>> - dosage = mldose_strtod(inStr); >>>>> - //dosage = strtod(tmpstr.c_str(), &endptr); >>>>> -// if ((errno == ERANGE && >>>>> -// (dosage == HUGE_VALF || dosage == >>>>> HUGE_VALL)) >>>>> -// || (errno != 0 && dosage == 0)) { >>>>> -// perror("Error while reading genetic data >>>>> (strtod)"); >>>>> -// exit(EXIT_FAILURE); >>>>> -// } >>>>> + dosage = strtod(inStr, &endptr); >>>>> + if ((errno == ERANGE >>>>> + && (dosage == HUGE_VALF || dosage == >>>>> HUGE_VALL)) >>>>> + || (errno != 0 && dosage == 0)) >>>>> + { >>>>> + perror("Error while reading genetic data >>>>> (strtod)"); >>>>> + exit(EXIT_FAILURE); >>>>> + } >>>>> >>>>> - if (endptr == tmpstr.c_str()) { >>>>> - cerr << "No digits were found while reading >>>>> genetic data" >>>>> - << " (individual " <>>>> - << ", position " << j + 1 << ")" >>>>> - << endl; >>>>> - exit(EXIT_FAILURE); >>>>> + if (endptr == tmpstr.c_str()) >>>>> + { >>>>> + cerr >>>>> + << "No digits were found while >>>>> reading genetic data" >>>>> + << " (individual " <>>>> ", position" >>>>> + << j + 1 << ")" << endl; >>>>> + exit(EXIT_FAILURE); >>>>> + } >>>>> + /* If we got here, strtod() successfully >>>>> parsed a number */ >>>>> + G.put(dosage, k, j); >>>>> } >>>>> - >>>>> - /* If we got here, strtod() successfully parsed >>>>> a number */ >>>>> - G.put(dosage, k, j); >>>>> + else >>>>> + { >>>>> + std::cerr << "cannot read dose-file: " << fname >>>>> + << "check skipd and ngpreds >>>>> parameters\n"; >>>>> + infile.close(); >>>>> + exit(1); >>>>> + } >>>>> } >>>>> - else >>>>> - { >>>>> - std::cerr << "cannot read dose-file: " << fname >>>>> - << "check skipd and ngpreds >>>>> parameters\n"; >>>>> - infile.close(); >>>>> - exit(1); >>>>> - } >>>>> } >>>>> + else >>>>> + { >>>>> + std::string all_numbers; >>>>> + all_numbers.reserve(nsnps * ngpreds * 7); >>>>> + std::getline(infile, all_numbers); >>>>> + mldose_line_to_matrix(k, all_numbers.c_str(), nsnps >>>>> * ngpreds); >>>>> + } >>>>> k++; >>>>> } >>>>> else >>>>> >>>>> Modified: branches/ProbABEL-0.50/src/gendata.h >>>>> =================================================================== >>>>> --- branches/ProbABEL-0.50/src/gendata.h 2014-03-27 21:16:16 UTC >>>>> (rev 1663) >>>>> +++ branches/ProbABEL-0.50/src/gendata.h 2014-03-28 19:12:41 UTC >>>>> (rev 1664) >>>>> @@ -44,7 +44,7 @@ >>>>> unsigned int nids; >>>>> unsigned int ngpreds; >>>>> gendata(); >>>>> - double convert( char* source, char** endPtr ); >>>>> + void mldose_line_to_matrix(int k,const char *all_numbers,int >>>>> amount_of_numbers); >>>>> >>>>> void re_gendata(char * fname, unsigned int insnps, unsigned int >>>>> ingpreds, >>>>> unsigned int npeople, unsigned int nmeasured, >>>>> >>>>> _______________________________________________ >>>>> Genabel-commits mailing list >>>>> Genabel-commits at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>>> _______________________________________________ >>>> Genabel-commits mailing list >>>> Genabel-commits at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel