From yurii.aulchenko at gmail.com Mon Nov 4 15:59:23 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 4 Nov 2013 15:59:23 +0100 Subject: [GenABEL-dev] [genabel-Bugs][4919] Too small reading buffers for long alleles in mach info and legend files In-Reply-To: <20131104101329.D0203186288@r-forge.r-project.org> References: <20131104101329.D0203186288@r-forge.r-project.org> Message-ID: <564D74E9-7B75-4976-B3A9-E2A545DB6D42@gmail.com> It is really great to see three people contributing to the fix of this bug! - bravi, Daniel, Xia, Lennart! Yurii On Nov 4, 2013, at 11:13 AM, wrote: > Bugs item #4919, was changed at 2013-09-19 12:41 by Lennart Karssen > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=4919&group_id=505 > >> Status: Closed > Priority: 3 > Submitted By: Daniel Taliun (dtaliun) > Assigned to: Lennart Karssen (lckarssen) > Summary: Too small reading buffers for long alleles in mach info and legend files >> Resolution: Fixed > Operating System: All > Severity: trivial > Hardware: All > Version: PA v0.4.1 > Component: ProbABEL > URL: > > > Initial Comment: > 1) Segmentation fault when trying to read mach info file that contains alleles longer than 100 characters. > The issue is too small buffer allocated in mlinfo::mlinfo(char * filename, char * mapname) : only 100 characters (data.cpp, line 72) > > 2) The same problem is when reading map file (if --map option is specified). The buffer of 1000 characters is not enough and as a result position is always equal to -999 (data.cpp, line 139). > > Generally insertion/deletion alleles can be extremely long. But buffer of 1MB (1048576 characters) should be enough and cost nothing. > > ---------------------------------------------------------------------- > >> Comment By: Lennart Karssen (lckarssen) > Date: 2013-11-04 11:13 > > Message: > Fixed in SVN r.1360 with the patch provided by Daniel Taliun (patch #4936). > > > Thanks Daniel! > Thanks to Xia Shen for testing the fix. > > ---------------------------------------------------------------------- > > Comment By: Lennart Karssen (lckarssen) > Date: 2013-09-19 13:04 > > Message: > Thanks for the report Daniel, and for the suggested fix. > > ---------------------------------------------------------------------- > > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=4919&group_id=505 From yurii.aulchenko at gmail.com Wed Nov 6 11:35:53 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 6 Nov 2013 11:35:53 +0100 Subject: [GenABEL-dev] Fwd: CRAN package VariABEL In-Reply-To: <527A11CE.9040506@stats.ox.ac.uk> References: <527A11CE.9040506@stats.ox.ac.uk> Message-ID: FYI Maxim, let us know if you are going to maintain it Y ---------- Forwarded message ---------- From: Prof Brian Ripley Date: Wed, Nov 6, 2013 at 10:54 AM Subject: CRAN package VariABEL To: Yurii Aulchenko , "CRAN at R-project.org" < CRAN at r-project.org> The message to Maintainer: Maksim Struchalin bounced, and apparently he left there in May. We will need to remove the package from CRAN unless we have a valid maintainer: so if you know where he is, please ask him urgently to update the package with a new contact address. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Nov 6 11:36:27 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 6 Nov 2013 11:36:27 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package GenABEL In-Reply-To: <20131106092037.2E13F20ABB@gannet.stats.ox.ac.uk> References: <20131106092037.2E13F20ABB@gannet.stats.ox.ac.uk> Message-ID: FYI ---------- Forwarded message ---------- From: Prof Brian Ripley Date: Wed, Nov 6, 2013 at 10:20 AM Subject: C++ which is not valid C++11 in CRAN package GenABEL To: yurii at bionet.nsc.ru We now have compilers (gcc 4.8.2, clang with libcxx headers) with fairly complete C++11 support (which can be selected by -std=c++11). At least one of these is showing compilation errors on your package, which were in many cases warnings under earlier versions of g++ (provided -Wall was used: see 'Writing R Extensions'). People are pressing that this become the default where supported. You can see the compilation logs at e.g. http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html and http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . Please submit an update to CRAN (following the CRAN policies) at http://cran.r-project.org/web/packages/policies.html. Do NOT reply to this email to submit an update! There may be other issues that need fixing: see the CRAN check logs at http://cran.r-project.org/web/checks/check_results_NAME.html, replacing NAME by the name of your package. -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Nov 6 11:37:19 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 6 Nov 2013 11:37:19 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package DatABEL In-Reply-To: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> References: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> Message-ID: FYI I expect this is the same code (filevector?) as that in previous FYI causing complains ---------- Forwarded message ---------- From: Prof Brian Ripley Date: Wed, Nov 6, 2013 at 10:20 AM Subject: C++ which is not valid C++11 in CRAN package DatABEL To: yurii at bionet.nsc.ru We now have compilers (gcc 4.8.2, clang with libcxx headers) with fairly complete C++11 support (which can be selected by -std=c++11). At least one of these is showing compilation errors on your package, which were in many cases warnings under earlier versions of g++ (provided -Wall was used: see 'Writing R Extensions'). People are pressing that this become the default where supported. You can see the compilation logs at e.g. http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html and http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . Please submit an update to CRAN (following the CRAN policies) at http://cran.r-project.org/web/packages/policies.html. Do NOT reply to this email to submit an update! There may be other issues that need fixing: see the CRAN check logs at http://cran.r-project.org/web/checks/check_results_NAME.html, replacing NAME by the name of your package. -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Wed Nov 6 14:22:12 2013 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 06 Nov 2013 14:22:12 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package DatABEL In-Reply-To: References: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> Message-ID: <527A4284.7090002@karssen.org> I added -Wall -std=c++11 to the Makefile in pkg/filevector/. This resulted in an error (about an ambiguous isnan() function definition) The fix in SVN r.1361 fixes that error. Hopefully that's enough. If not, let me know. Lennart. On 11/06/2013 11:37 AM, Yurii Aulchenko wrote: > FYI > > I expect this is the same code (filevector?) as that in previous FYI > causing complains > > ---------- Forwarded message ---------- > From: *Prof Brian Ripley* > > Date: Wed, Nov 6, 2013 at 10:20 AM > Subject: C++ which is not valid C++11 in CRAN package DatABEL > To: yurii at bionet.nsc.ru > > > We now have compilers (gcc 4.8.2, clang with libcxx headers) with > fairly complete C++11 support (which can be selected by -std=c++11). > At least one of these is showing compilation errors on your package, > which were in many cases warnings under earlier versions of g++ > (provided -Wall was used: see 'Writing R Extensions'). People are > pressing that this become the default where supported. > > You can see the compilation logs at e.g. > http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html > and > http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . > > > Please submit an update to CRAN (following the CRAN policies) at > http://cran.r-project.org/web/packages/policies.html. Do NOT reply to > this email to submit an update! > > There may be other issues that need fixing: see the CRAN check logs at > http://cran.r-project.org/web/checks/check_results_NAME.html, > replacing NAME by the name of your package. > > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Wed Nov 6 18:46:19 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Wed, 6 Nov 2013 18:46:19 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package DatABEL In-Reply-To: <527A4284.7090002@karssen.org> References: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> <527A4284.7090002@karssen.org> Message-ID: <-3903376667723708899@unknownmsgid> Thank you very much! - time for genabel resubmission... Dread it a bit as other request was to put all the data in separate package... ---------------------- Yurii Aulchenko (sent from mobile device) > On Nov 6, 2013, at 14:22, "L.C. Karssen" wrote: > > I added -Wall -std=c++11 to the Makefile in pkg/filevector/. This > resulted in an error (about an ambiguous isnan() function definition) > The fix in SVN r.1361 fixes that error. > > Hopefully that's enough. If not, let me know. > > > Lennart. > >> On 11/06/2013 11:37 AM, Yurii Aulchenko wrote: >> FYI >> >> I expect this is the same code (filevector?) as that in previous FYI >> causing complains >> >> ---------- Forwarded message ---------- >> From: *Prof Brian Ripley* > > >> Date: Wed, Nov 6, 2013 at 10:20 AM >> Subject: C++ which is not valid C++11 in CRAN package DatABEL >> To: yurii at bionet.nsc.ru >> >> >> We now have compilers (gcc 4.8.2, clang with libcxx headers) with >> fairly complete C++11 support (which can be selected by -std=c++11). >> At least one of these is showing compilation errors on your package, >> which were in many cases warnings under earlier versions of g++ >> (provided -Wall was used: see 'Writing R Extensions'). People are >> pressing that this become the default where supported. >> >> You can see the compilation logs at e.g. >> http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html >> and >> http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . >> >> >> Please submit an update to CRAN (following the CRAN policies) at >> http://cran.r-project.org/web/packages/policies.html. Do NOT reply to >> this email to submit an update! >> >> There may be other issues that need fixing: see the CRAN check logs at >> http://cran.r-project.org/web/checks/check_results_NAME.html, >> replacing NAME by the name of your package. >> >> >> >> -- >> ----------------------------------------------------- >> Yurii S. Aulchenko >> >> [ LinkedIn ] [ Twitter >> ] [ Blog >> ] >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Wed Nov 6 23:27:11 2013 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 06 Nov 2013 23:27:11 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package DatABEL In-Reply-To: <-3903376667723708899@unknownmsgid> References: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> <527A4284.7090002@karssen.org> <-3903376667723708899@unknownmsgid> Message-ID: <527AC23F.8020103@karssen.org> In the mean time I did a second commit, which fixed the same isnan() error in ITERlib. Now GenABEL compiles without warnings on my system (unless you add -Wall -pedantic to the CFLAGS, but those shouldn't be show stoppers). Hmm... Splitting the data in a separate package. What was the reasoning behind that? OK, it would make for a lighter main package, but for the examples to work, we still need them. And the GenABEL tar.gz is only ~5MB (the actual data directory is ~2.4MB), that should be ok for modern standards, right? The Writing R Extensions manual talks about very large datasets, but mentions 2GB in that respect. We already download the files needed for creating the tutorial separately... Lennart. On 06-11-13 18:46, Yurii Aulchenko wrote: > Thank you very much! - time for genabel resubmission... Dread it a bit > as other request was to put all the data in separate package... > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Nov 6, 2013, at 14:22, "L.C. Karssen" wrote: >> >> I added -Wall -std=c++11 to the Makefile in pkg/filevector/. This >> resulted in an error (about an ambiguous isnan() function definition) >> The fix in SVN r.1361 fixes that error. >> >> Hopefully that's enough. If not, let me know. >> >> >> Lennart. >> >>> On 11/06/2013 11:37 AM, Yurii Aulchenko wrote: >>> FYI >>> >>> I expect this is the same code (filevector?) as that in previous FYI >>> causing complains >>> >>> ---------- Forwarded message ---------- >>> From: *Prof Brian Ripley* >> > >>> Date: Wed, Nov 6, 2013 at 10:20 AM >>> Subject: C++ which is not valid C++11 in CRAN package DatABEL >>> To: yurii at bionet.nsc.ru >>> >>> >>> We now have compilers (gcc 4.8.2, clang with libcxx headers) with >>> fairly complete C++11 support (which can be selected by -std=c++11). >>> At least one of these is showing compilation errors on your package, >>> which were in many cases warnings under earlier versions of g++ >>> (provided -Wall was used: see 'Writing R Extensions'). People are >>> pressing that this become the default where supported. >>> >>> You can see the compilation logs at e.g. >>> http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html >>> and >>> http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . >>> >>> >>> Please submit an update to CRAN (following the CRAN policies) at >>> http://cran.r-project.org/web/packages/policies.html. Do NOT reply to >>> this email to submit an update! >>> >>> There may be other issues that need fixing: see the CRAN check logs at >>> http://cran.r-project.org/web/checks/check_results_NAME.html, >>> replacing NAME by the name of your package. >>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Yurii S. Aulchenko >>> >>> [ LinkedIn ] [ Twitter >>> ] [ Blog >>> ] >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From maksim.struchalin at gmail.com Thu Nov 7 07:51:39 2013 From: maksim.struchalin at gmail.com (Maksim Struchalin) Date: Thu, 07 Nov 2013 13:51:39 +0700 Subject: [GenABEL-dev] Fwd: CRAN package VariABEL In-Reply-To: References: <527A11CE.9040506@stats.ox.ac.uk> Message-ID: <527B387B.4020203@gmail.com> Hi Yurii Thanks for info. I am going to maintain the package. I fixed a bug reported previously on VariABLE and submited the package to CRAN. best, Maksim On 06/11/2013 17:35, Yurii Aulchenko wrote: > FYI > > Maxim, let us know if you are going to maintain it > > Y > > ---------- Forwarded message ---------- > From: *Prof Brian Ripley* > > Date: Wed, Nov 6, 2013 at 10:54 AM > Subject: CRAN package VariABEL > To: Yurii Aulchenko >, "CRAN at R-project.org" > > > > > The message to > > Maintainer: Maksim Struchalin > > > bounced, and apparently he left there in May. We will need to remove > the package from CRAN unless we have a valid maintainer: so if you > know where he is, please ask him urgently to update the package with a > new contact address. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 > (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Thu Nov 7 12:01:59 2013 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 07 Nov 2013 12:01:59 +0100 Subject: [GenABEL-dev] Fwd: CRAN package VariABEL In-Reply-To: <527B387B.4020203@gmail.com> References: <527A11CE.9040506@stats.ox.ac.uk> <527B387B.4020203@gmail.com> Message-ID: <527B7327.202@karssen.org> Hi Maksim, Thanks for fixing bug #4916! Best, Lennart. On 11/07/2013 07:51 AM, Maksim Struchalin wrote: > Hi Yurii > > Thanks for info. I am going to maintain the package. I fixed a bug > reported previously on VariABLE and submited the package to CRAN. > > best, > Maksim > > On 06/11/2013 17:35, Yurii Aulchenko wrote: >> FYI >> >> Maxim, let us know if you are going to maintain it >> >> Y >> >> ---------- Forwarded message ---------- >> From: *Prof Brian Ripley* > > >> Date: Wed, Nov 6, 2013 at 10:54 AM >> Subject: CRAN package VariABEL >> To: Yurii Aulchenko > >, "CRAN at R-project.org" >> > >> >> >> The message to >> >> Maintainer: Maksim Struchalin > > >> >> bounced, and apparently he left there in May. We will need to remove >> the package from CRAN unless we have a valid maintainer: so if you >> know where he is, please ask him urgently to update the package with a >> new contact address. >> >> -- >> Brian D. Ripley, ripley at stats.ox.ac.uk >> >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> >> University of Oxford, Tel: +44 1865 272861 >> (self) >> 1 South Parks Road, +44 1865 272866 >> (PA) >> Oxford OX1 3TG, UK Fax: +44 1865 272595 >> >> >> >> >> -- >> ----------------------------------------------------- >> Yurii S. Aulchenko >> >> [ LinkedIn ] [ Twitter >> ] [ Blog >> ] > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Thu Nov 7 16:34:12 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Thu, 7 Nov 2013 16:34:12 +0100 Subject: [GenABEL-dev] [genabel-Bugs][5065] polygenic() runs into numerical error when changing the way of input In-Reply-To: <20131107144931.1A3F91862C5@r-forge.r-project.org> References: <20131107144931.1A3F91862C5@r-forge.r-project.org> Message-ID: Thank you, Xia, for submitting this bug report! Y On Thu, Nov 7, 2013 at 3:49 PM, wrote: > Bugs item #5065, was opened at 2013-11-07 15:49 by Xia Shen > You can respond by visiting: > > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5065&group_id=505 > > Status: Open > Priority: 3 > Submitted By: Xia Shen (shenxia) > Assigned to: Nobody (None) > Summary: polygenic() runs into numerical error when changing the way of > input > Resolution: Awaiting Response > Operating System: Linux (64bit) > Severity: major > Hardware: None > Version: v1.7-6 > Component: GenABEL > URL: > > > Initial Comment: > I had: > > > ... > 0 -Inf > > Error in nlm(llFUN, p = parsave, y = y, desmat = desmat, relmat = relmat, > : > non-finite value supplied by 'nlm' > > But: > > > ... > > ****************************************** > *** GOOD convergence indicated by FGLS *** > ****************************************** > > Why simply changing the way of input gave different outcome?? By the way, > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Thu Nov 7 21:30:34 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Thu, 7 Nov 2013 21:30:34 +0100 Subject: [GenABEL-dev] Fwd: C++ which is not valid C++11 in CRAN package DatABEL In-Reply-To: <527AC23F.8020103@karssen.org> References: <20131106092037.2A7D520A37@gannet.stats.ox.ac.uk> <527A4284.7090002@karssen.org> <-3903376667723708899@unknownmsgid> <527AC23F.8020103@karssen.org> Message-ID: Re: splitting. No idea. They (CRAN) asked it. Can imagine this is related to CRAN infrastructure (remember they have 1000s of packages). Y On Wed, Nov 6, 2013 at 11:27 PM, L.C. Karssen wrote: > In the mean time I did a second commit, which fixed the same isnan() > error in ITERlib. > Now GenABEL compiles without warnings on my system (unless you add -Wall > -pedantic to the CFLAGS, but those shouldn't be show stoppers). > > > Hmm... Splitting the data in a separate package. What was the reasoning > behind that? OK, it would make for a lighter main package, but for the > examples to work, we still need them. And the GenABEL tar.gz is only > ~5MB (the actual data directory is ~2.4MB), that should be ok for modern > standards, right? The Writing R Extensions manual talks about very large > datasets, but mentions 2GB in that respect. > We already download the files needed for creating the tutorial > separately... > > > Lennart. > > On 06-11-13 18:46, Yurii Aulchenko wrote: > > Thank you very much! - time for genabel resubmission... Dread it a bit > > as other request was to put all the data in separate package... > > > > ---------------------- > > Yurii Aulchenko > > (sent from mobile device) > > > >> On Nov 6, 2013, at 14:22, "L.C. Karssen" wrote: > >> > >> I added -Wall -std=c++11 to the Makefile in pkg/filevector/. This > >> resulted in an error (about an ambiguous isnan() function definition) > >> The fix in SVN r.1361 fixes that error. > >> > >> Hopefully that's enough. If not, let me know. > >> > >> > >> Lennart. > >> > >>> On 11/06/2013 11:37 AM, Yurii Aulchenko wrote: > >>> FYI > >>> > >>> I expect this is the same code (filevector?) as that in previous FYI > >>> causing complains > >>> > >>> ---------- Forwarded message ---------- > >>> From: *Prof Brian Ripley* >>> > > >>> Date: Wed, Nov 6, 2013 at 10:20 AM > >>> Subject: C++ which is not valid C++11 in CRAN package DatABEL > >>> To: yurii at bionet.nsc.ru > >>> > >>> > >>> We now have compilers (gcc 4.8.2, clang with libcxx headers) with > >>> fairly complete C++11 support (which can be selected by -std=c++11). > >>> At least one of these is showing compilation errors on your package, > >>> which were in many cases warnings under earlier versions of g++ > >>> (provided -Wall was used: see 'Writing R Extensions'). People are > >>> pressing that this become the default where supported. > >>> > >>> You can see the compilation logs at e.g. > >>> > http://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/BiasedUrn-00check.html > >>> and > >>> http://www.stats.ox.ac.uk/pub/bdr/memtests/ASAN/BiasedUrn.log . > >>> > >>> > >>> Please submit an update to CRAN (following the CRAN policies) at > >>> http://cran.r-project.org/web/packages/policies.html. Do NOT reply to > >>> this email to submit an update! > >>> > >>> There may be other issues that need fixing: see the CRAN check logs at > >>> http://cran.r-project.org/web/checks/check_results_NAME.html, > >>> replacing NAME by the name of your package. > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------- > >>> Yurii S. Aulchenko > >>> > >>> [ LinkedIn ] [ Twitter > >>> ] [ Blog > >>> ] > >>> > >>> > >>> _______________________________________________ > >>> genabel-devel mailing list > >>> genabel-devel at lists.r-forge.r-project.org > >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > >> > >> -- > >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > >> L.C. Karssen > >> Utrecht > >> The Netherlands > >> > >> lennart at karssen.org > >> http://blog.karssen.org > >> GPG key ID: A88F554A > >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > >> > >> _______________________________________________ > >> genabel-devel mailing list > >> genabel-devel at lists.r-forge.r-project.org > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > ----------------------------------------------------------------- > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > > Stuur mij aub geen Word of Powerpoint bestanden! > Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html > ------------------------------------------------------------------ > > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Fri Nov 8 11:14:07 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 8 Nov 2013 11:14:07 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1370 - pkg/VariABEL/src In-Reply-To: <20131107165459.BAEAC186231@r-forge.r-project.org> References: <20131107165459.BAEAC186231@r-forge.r-project.org> Message-ID: This actually links to a big design issue: how do we arrange the code in a way we can use that from different packages? Fr the moment the solution is that we give a symlink to the "common" code in the repo. At the distrib-build stage, we check the code out (so by following the link the "shared" code is exported rom single location), and build the package. In a regular world one would use libraries for such situation. But we are in the R world. A UseR! this summer I was talking to Hadley Wickham, and he was sure the solution is possible. Y On Thursday, November 7, 2013, wrote: > Author: maksim > Date: 2013-11-07 17:54:59 +0100 (Thu, 07 Nov 2013) > New Revision: 1370 > > Modified: > pkg/VariABEL/src/DAlib > pkg/VariABEL/src/fvlib > Log: > Returned the links back. Why r-forge does not recognize it during bulding > process and can not build the package itslef? > > Modified: pkg/VariABEL/src/DAlib > =================================================================== > --- pkg/VariABEL/src/DAlib 2013-11-07 15:11:39 UTC (rev 1369) > +++ pkg/VariABEL/src/DAlib 2013-11-07 16:54:59 UTC (rev 1370) > @@ -1 +1 @@ > -link genabel/pkg/DatABEL/src/DAlib/ > \ No newline at end of file > +link ../../DatABEL/src/DAlib/ > \ No newline at end of file > > Modified: pkg/VariABEL/src/fvlib > =================================================================== > --- pkg/VariABEL/src/fvlib 2013-11-07 15:11:39 UTC (rev 1369) > +++ pkg/VariABEL/src/fvlib 2013-11-07 16:54:59 UTC (rev 1370) > @@ -1 +1 @@ > -link genabel/pkg/filevector/fvlib > \ No newline at end of file > +link ../../filevector/fvlib > \ No newline at end of file > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Sun Nov 10 09:23:16 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Sun, 10 Nov 2013 09:23:16 +0100 Subject: [GenABEL-dev] Fwd: CRAN packages with C++11 errors on isnan or isinf In-Reply-To: <527F418A.3070308@stats.ox.ac.uk> References: <527F418A.3070308@stats.ox.ac.uk> Message-ID: fyi ---------- Forwarded message ---------- From: Prof Brian Ripley Date: Sun, Nov 10, 2013 at 9:19 AM Subject: CRAN packages with C++11 errors on isnan or isinf To: Yurii Aulchenko , Daniel Taliun < Daniel.Taliun at eurac.edu>, Cedric E Ginestet , Philip Johnson , Franck Picard < franck.picard at univ-lyon1.fr>, The Minh Luong Cc: CRAN This concerns packages DatABEL GWAtoolbox GenABEL LDExplorer NetworkAnalysis adaptivetau cghseg postCP for which we have reported errors on g++ 4.8.x (and clang++ using g++ headers showing on the CRAN check pages) like CastUtils.cpp: In function 'bool checkNan(void*, int)': CastUtils.cpp:189:38: error: call of overloaded 'isnan(double&)' is ambiguous return isnan(*(double*) data); isnan is a macro in C99 included in . isnan is a function in C++11 in , overloaded for float, double and long double and maybe more types. In tracking down a similar issue we found it was due to including both and . That package did so explicitly, but it is easy for this to be done by other headers you include. A remedy was to use std::isnan if you want the C++11 version and ::isnan if you want the C99 version. But note that C++98 (the default standard) includes neither, so the most portable thing to do is to use R's ISNAN via header (and for safety include that header after all the system ones). isinf is analogous, with R_FINITE available. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Tue Nov 12 00:14:30 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 12 Nov 2013 00:14:30 +0100 Subject: [GenABEL-dev] Fwd: CRAN packages with C++11 errors on isnan or isinf In-Reply-To: References: <527F418A.3070308@stats.ox.ac.uk> Message-ID: <528164D6.5050105@karssen.org> Aha, a bit more detailed explanation of the fixes I implemented earlier. I opted for the std:isnan() from c++11. Thanks for sharing this info, Lennart. On 11/10/2013 09:23 AM, Yurii Aulchenko wrote: > fyi > > ---------- Forwarded message ---------- > From: *Prof Brian Ripley* > > Date: Sun, Nov 10, 2013 at 9:19 AM > Subject: CRAN packages with C++11 errors on isnan or isinf > To: Yurii Aulchenko >, > Daniel Taliun >, Cedric E Ginestet > >, Philip > Johnson >, Franck > Picard >, The Minh Luong > > Cc: CRAN > > > > This concerns packages > > DatABEL GWAtoolbox GenABEL LDExplorer NetworkAnalysis adaptivetau cghseg > postCP > > for which we have reported errors on g++ 4.8.x (and clang++ using g++ > headers showing on the CRAN check pages) like > > CastUtils.cpp: In function 'bool checkNan(void*, int)': > CastUtils.cpp:189:38: error: call of overloaded 'isnan(double&)' is > ambiguous > return isnan(*(double*) data); > > isnan is a macro in C99 included in . > > isnan is a function in C++11 in , overloaded for float, double > and long double and maybe more types. > > In tracking down a similar issue we found it was due to including both > and . That package did so explicitly, but it is easy > for this to be done by other headers you include. > > A remedy was to use std::isnan if you want the C++11 version and ::isnan > if you want the C99 version. But note that C++98 (the default standard) > includes neither, so the most portable thing to do is to use R's ISNAN > via header (and for safety include that header after all the > system ones). > > > isinf is analogous, with R_FINITE available. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~__ripley/ > > University of Oxford, Tel: +44 1865 272861 > (self) > 1 South Parks Road, +44 1865 272866 > (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Thu Nov 14 04:16:14 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Thu, 14 Nov 2013 10:16:14 +0700 Subject: [GenABEL-dev] possible incompatibility problems with function "impute2databel" Message-ID: <5284407E.5070801@mail.ru> Dear All, I fixed an error in function "impute2databel" generated by "R CMD check --as-cran" (Revision 1388). This function passed RUnit check but I have some concern that it might not work on mac/win. I will keep my eye on this function for a while. If you see any incompatibility problems, please report it to me. best, Maksi From m.v.struchalin at mail.ru Thu Nov 14 18:06:53 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 15 Nov 2013 00:06:53 +0700 Subject: [GenABEL-dev] moving filevector and iterator to DatABEL Message-ID: <5285032D.5020404@mail.ru> Hi All, I would like to start discussing a plan of moving filevector and iterator libraries in DatABEL. At the current version of GenABEL, the code of iterator and filevector is copied to the body of package and compiled separetly from DatABEL (like DataABEL does not exists at all). It makes some troubles (e.g. for impute2databel). I think we can separate this code from GenABEL and it will be quite easy. I tried to do it, but I got in some problems. As I see, iterator is called from three GenABEL functions: impute2databel, qtscore and summary.snp.data. Two months ago Lennart changed impute2databel in a way that it calls iterator from DatABEL. I see that it works more or less good now (at least on my computer). But qtscore and summary.snp.data use a special version of iterator which is implemented in GenABEL itself (called iteratorGA). I compared iterators from DatABEL and GenABEL and they are quite different. So, as I see, the main problem of moving itarator to DatABEL is in these two function only which should be changed in a way that they use iterator implemented in DatABEL. Another issue is compatibility of impute2databel. As I understand, the current implementation of impute2databel (which call iterator from DatABEL) is not compatible with mac/win. Perhaps the problem is in absense of R_init_DatABEL.cpp file in DatABEL (manual "Writing R Extensions, Registering native routines"). This file conatins some code which tells R about the functions in DatABEL (a function iterator in our case) which are gonna be used from other pakcages. We only need to implement it in DatABEL (I did it and it works for me). I do not know how these changes might affect other *ABEL packages: I see that MixABEL, TestABEL use its own iterators. But, I expect that it does not touch them. So, I propose: 1) Make a test on mac/win and see how R_init_DatABEL.cpp solve the compatibility issue. 2) Change qtscore and summary.snp.data in a way that it calls iterator which is implemented in DatABEL. 3) Test everything and see that it works. It seems easy to do. The only difficulty which I see is changing qtscore and summary.snp.data. For myself, I see a routine work with spending 95% time for understanding what is written there. If some of you can consult me here, it would accelerate everything (for example: what was the reason for creating iteratorGA?) What do you think about the plan? Do you see any pitfalls? Would like to contribute :-)? Small remark: The name iterator is already used in stl C++ library (compliler told it to me when I was implementing R_init_DatABEL.cpp in DatABEL). So, we need to change it to something like "iteratorDA". best, Maksim From m.v.struchalin at mail.ru Thu Nov 14 22:38:52 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 15 Nov 2013 04:38:52 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package Message-ID: <528542EC.1030006@mail.ru> In this email, I propose a new approach which allows to reduce total size of data from 8Mb to 2Mb that reduce the entire GenABEL size from 12Mb to 6Mb. "R CMD check --as-cran" reports that the following sub-directories have too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the last GenABEL submission to CRAN, the maintainers suggested to create a new package called GenABELdata and move all the data there. I run through the data and found that: 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb -> 1.1Mb. - There is a function guzip() from library R.utils which can decompress the files. It works on any OS. - Moreover: the native R function read.table() can read gzip files without decompression. - Even more: it looks like that the biggest file "srgenos.dat" is used only once a long time ago for generating "srdta.RData" and now it is just sitting there and eating space needlessly. 2) We can delete some files from the "data" directory. The deleted files will be generated on the user computer based on the files from exdata. It can be done during INSTALLATION (a line in Makefile?) or on the first load through (|run funcion .onAttach() in R/zzz.R|). It will reduce total size of "data" directory from 2.3Mb to 800Kb. Any objections/suggestions? best, Maksim -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Fri Nov 15 05:53:35 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 15 Nov 2013 11:53:35 +0700 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file Message-ID: <5285A8CF.2030402@mail.ru> An easy way to write a function for conversion a plink format file to a GenABEL format file: Use plink support of 'plug-in' functions (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us to write a simple R script (myscript.R) which is called by plink (plink --file mydata --R myscript.R). plink reads the file mydata (which is in plink format) and iteratively, SNP by SNP, trasfer all the data to a script myscript.R. This script contains a function Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO variable) and store it in a *flv format through calling DatABEL functions. The whole process of conversion will look like this: 1) User asks GenA convert plink file to GenA file 2) GenA looks weather the plink is installed. If it is not installed, then GenA goes to a plink site and download/install it itself (use an R function "download.file" from "utils" package) 3) GenA run a simple line: system('plink --file mydata --R myscript.R') 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv format. This function creates an flv file and then open and close it for saving every single SNP. 5) Work is Done The only issue is how fast the converssion will run: how much time does it take to open a filvector file, store one SNP and close it? I can not find a DatABEL R function for adding SNP to a flv file. Is there a C DatABEL function which can do it? best, Maksim From kooyman at gmail.com Fri Nov 15 10:17:58 2013 From: kooyman at gmail.com (Maarten Kooyman) Date: Fri, 15 Nov 2013 10:17:58 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <5285A8CF.2030402@mail.ru> References: <5285A8CF.2030402@mail.ru> Message-ID: Hi Maksim, This sound like a user friendly addition to convert Plink to DatABEL format. I think open the same file for each SNP is not the way to go. If you have a 1M array data and you want to convert it , there will be 1 million open and close operations send to the machine. I do not have any experience with this kind of numbers of open and close calls, but it sounds to me as an nice try thrash your system. I could not found a lot of information about the weight it puts on the system but here is a small piece of text about it: http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/General_optimization_techniques/Input/Output#Open_files A small warning for checking plink exists: there is also a utility from the putty suite (used for ssh under Windows) called plink. On Debian systems the plink executable is called "p-link" by default. Kind regards, Maarten On Fri, Nov 15, 2013 at 5:53 AM, Maksim Struchalin wrote: > An easy way to write a function for conversion a plink format file to a > GenABEL format file: > > Use plink support of 'plug-in' functions (http://pngu.mgh.harvard.edu/~ > purcell/plink/rfunc.shtml). This allows us to write a simple R script > (myscript.R) which is called by plink (plink --file mydata --R myscript.R). > plink reads the file mydata (which is in plink format) and iteratively, SNP > by SNP, trasfer all the data to a script myscript.R. This script contains a > function Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO > variable) and store it in a *flv format through calling DatABEL functions. > > The whole process of conversion will look like this: > > 1) User asks GenA convert plink file to GenA file > 2) GenA looks weather the plink is installed. If it is not installed, then > GenA goes to a plink site and download/install it itself (use an R function > "download.file" from "utils" package) > 3) GenA run a simple line: system('plink --file mydata --R myscript.R') > 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv > format. This function creates an flv file and then open and close it for > saving every single SNP. > 5) Work is Done > > The only issue is how fast the converssion will run: how much time does it > take to open a filvector file, store one SNP and close it? I can not find a > DatABEL R function for adding SNP to a flv file. Is there a C DatABEL > function which can do it? > > best, > Maksim > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 15 16:04:49 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 15 Nov 2013 16:04:49 +0100 Subject: [GenABEL-dev] possible incompatibility problems with function "impute2databel" In-Reply-To: <5284407E.5070801@mail.ru> References: <5284407E.5070801@mail.ru> Message-ID: <52863811.8070309@karssen.org> Hi Maksim, I had noticed the change from r1320 to r1321 as well and it was still on my todo list (somewhere at the bottom) to check if it was OK. Thanks for taking care of this, Lennart. On 14-11-13 04:16, Maksim Struchalin wrote: > Dear All, > > I fixed an error in function "impute2databel" generated by "R CMD check > --as-cran" (Revision 1388). This function passed RUnit check but I have > some concern that it might not work on mac/win. I will keep my eye on > this function for a while. If you see any incompatibility problems, > please report it to me. > > best, > Maksi > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Fri Nov 15 17:21:18 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 15 Nov 2013 17:21:18 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <528542EC.1030006@mail.ru> References: <528542EC.1030006@mail.ru> Message-ID: <528649FE.7090501@karssen.org> Hi Maksim, On 14-11-13 22:38, Maksim Struchalin wrote: > In this email, I propose a new approach which allows to reduce total > size of data from 8Mb to 2Mb that reduce the entire GenABEL size from > 12Mb to 6Mb. I gues you mean B (bytes) instead of b (bits) here :-). > > "R CMD check --as-cran" reports that the following sub-directories have > too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the > last GenABEL submission to CRAN, the maintainers suggested to create a > new package called GenABELdata and move all the data there. I run > through the data and found that: > 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb > -> 1.1Mb. > - There is a function guzip() from library R.utils which can > decompress the files. It works on any OS. > - Moreover: the native R function read.table() can read gzip files > without decompression. > - Even more: it looks like that the biggest file "srgenos.dat" is > used only once a long time ago for generating "srdta.RData" and now it > is just sitting there and eating space needlessly. Sounds like a waste of space! > 2) We can delete some files from the "data" directory. The deleted files > will be generated on the user computer based on the files from exdata. > It can be done during INSTALLATION (a line in Makefile?) or on the first > load through (|run funcion .onAttach() in R/zzz.R|). This sounds like a perfectly acceptable option. > It will reduce > total size of "data" directory from 2.3Mb to 800Kb. Fantastic! If no one has other objections I say: go ahead. Best, Lennart. > > Any objections/suggestions? > > best, > Maksim > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Fri Nov 15 17:39:12 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 15 Nov 2013 17:39:12 +0100 Subject: [GenABEL-dev] moving filevector and iterator to DatABEL In-Reply-To: <5285032D.5020404@mail.ru> References: <5285032D.5020404@mail.ru> Message-ID: <52864E30.7030009@karssen.org> Hi Maksim, On 14-11-13 18:06, Maksim Struchalin wrote: > Hi All, > > I would like to start discussing a plan of moving filevector and > iterator libraries in DatABEL. I'm not sure if I understand you completely, but filevector is also used by ProbABEL and OmicABEL. DatABEL is built on top of filevector. So my idea is to make filevector a 'real' library (a .so file), that can be used by all of these projects. The result will be the same as (what I think is) your idea: no more symbolic links to the filevector directory in SVN. This way, we can just tell the user to install the fv library (like he/she needs to install e.g. a BLAS library, the GSL or some databse lib. To make the system administrator's life easier we can package the fv lib in a .deb and .rpm package, for example. For ProbABEL we already have this dependency on the Eigen library and a dependency on the Boost libraries will (likely) be added in the future. > At the current version of GenABEL, the > code of iterator and filevector is copied to the body of package and > compiled separetly from DatABEL (like DataABEL does not exists at all). > It makes some troubles (e.g. for impute2databel). I think we can > separate this code from GenABEL and it will be quite easy. Sounds like a good idea. Proper design. > > I tried to do it, but I got in some problems. As I see, iterator is > called from three GenABEL functions: impute2databel, qtscore and > summary.snp.data. Two months ago Lennart changed impute2databel in a way > that it calls iterator from DatABEL. I did this because of CRAN errors. IIRC it was some sort of namespace problem. > I see that it works more or less > good now (at least on my computer). But qtscore and summary.snp.data use > a special version of iterator which is implemented in GenABEL itself > (called iteratorGA). I compared iterators from DatABEL and GenABEL and > they are quite different. So, as I see, the main problem of moving > itarator to DatABEL is in these two function only which should be > changed in a way that they use iterator implemented in DatABEL. I haven't looked at the code, but it sounds like a reasonable, good plan. > > Another issue is compatibility of impute2databel. As I understand, the > current implementation of impute2databel (which call iterator from > DatABEL) is not compatible with mac/win. Are we sure about that? Yurii, can you confirm this? I could check on Windows. > Perhaps the problem is in > absense of R_init_DatABEL.cpp file in DatABEL (manual "Writing R > Extensions, Registering native routines"). This file conatins some code > which tells R about the functions in DatABEL (a function iterator in our > case) which are gonna be used from other pakcages. We only need to > implement it in DatABEL (I did it and it works for me). Do you mean it works for you on other platforms than Linux? > > I do not know how these changes might affect other *ABEL packages: I see > that MixABEL, TestABEL use its own iterators. But, I expect that it does > not touch them. > > So, I propose: > 1) Make a test on mac/win and see how R_init_DatABEL.cpp solve the > compatibility issue. OK. > 2) Change qtscore and summary.snp.data in a way that it calls iterator > which is implemented in DatABEL. Does anybody know what the differences between the different iterator implementations are? And/or why these exist? > 3) Test everything and see that it works. I agree, always a good plan :-). > > > It seems easy to do. The only difficulty which I see is changing qtscore > and summary.snp.data. For myself, I see a routine work with spending 95% > time for understanding what is written there. If some of you can consult > me here, it would accelerate everything (for example: what was the > reason for creating iteratorGA?) I'm sorry, as you can see by my comment below, I don't know either. > > What do you think about the plan? Do you see any pitfalls? Would like to > contribute :-)? Given my suggestion/plan/wish to have filevector as a proper (not R related) library, I think it would be good to see how other R packages handle dependencies on external libraries. And how they interface to them. This should be OK as I'm sure many packages do. I think it is enough to add SystemRequirements: libfilevector to the DESCRIPTION file. With respect to GenABEL depending on a shared library provided by DatABEL I'm a bit worried by the following section in the R extensions manual: http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Linking-to-other-packages: "It is not in general possible to link a DLL in package packA to a DLL provided by package packB". Although section 5.8.1 seems to provide a possibility: "It is possible to link a shared object in package packA to a library provided by package packB under limited circumstances on a Unix-alike OS." A bit of discussion on this can be found on stackoverflow.com: http://stackoverflow.com/questions/12328156/r-package-that-links-to-external-c-library > > Small remark: The name iterator is already used in stl C++ library > (compliler told it to me when I was implementing R_init_DatABEL.cpp in > DatABEL). So, we need to change it to something like "iteratorDA". I completely agree. Just 'iterator' is too confusing with the C++ notion of iterator. Thanks for digging through all this, Lennart. > > best, > Maksim > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Fri Nov 15 17:48:24 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 15 Nov 2013 17:48:24 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <5285A8CF.2030402@mail.ru> References: <5285A8CF.2030402@mail.ru> Message-ID: <52865058.9020105@karssen.org> Hi Maksim, On 15-11-13 05:53, Maksim Struchalin wrote: > An easy way to write a function for conversion a plink format file to a > GenABEL format file: > > Use plink support of 'plug-in' functions Nice find. I didn't know that existed. > (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us > to write a simple R script (myscript.R) which is called by plink (plink > --file mydata --R myscript.R). plink reads the file mydata (which is in > plink format) and iteratively, SNP by SNP, trasfer all the data to a > script myscript.R. This script contains a function > Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO > variable) and store it in a *flv format through calling DatABEL functions. > > The whole process of conversion will look like this: > > 1) User asks GenA convert plink file to GenA file > 2) GenA looks weather the plink is installed. If it is not installed, > then GenA goes to a plink site and download/install it itself (use an R > function "download.file" from "utils" package) > 3) GenA run a simple line: system('plink --file mydata --R myscript.R') > 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv > format. This function creates an flv file and then open and close it for > saving every single SNP. > 5) Work is Done I'm not sure how portable it is to download and run plink. Also, the plink page says: Currently, there is only support for R-plugins for Linux-based and Mac OS PLINK distributions. > > The only issue is how fast the converssion will run: how much time does > it take to open a filvector file, store one SNP and close it? I can not > find a DatABEL R function for adding SNP to a flv file. Is there a C > DatABEL function which can do it? Wouldn't it be easier/possible to use plink to export to text (.csv) and then use filevector's txt2fvf binary (of course this could be done from R using system())? I'm also wondering if going per SNP is really necessary. If I understand it correctly the R script (myscript.R) has to have a function called: Rplink <- function(PHENO,GENO,CLUSTER,COVAR) where GENO is the matrix of genotypes. So we could write that into a DatABEL file at once. Of course you may want to do this per chromosome to reduce memory consumption (not sure how plink/R would handle large data sets). I agree completely with Maarten that opening a filevector file for each SNP will be an I/O killer. Lennart. > > best, > Maksim > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Fri Nov 15 23:45:32 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 15 Nov 2013 23:45:32 +0100 Subject: [GenABEL-dev] Fwd: CRAN package GenABEL In-Reply-To: <52861316.3060003@stats.ox.ac.uk> References: <5285F67C.7070908@stats.ox.ac.uk> <52861316.3060003@stats.ox.ac.uk> Message-ID: FYI ---------- Forwarded message ---------- From: Prof Brian Ripley Date: Fri, Nov 15, 2013 at 1:27 PM Subject: CRAN package GenABEL To: Yurii Aulchenko Cc: CRAN Re the errors shown by clang/libc++: http://www.cplusplus.com/reference/fstream/ifstream/open/ shows how to check an open worked on an ifstream (and ofstream is very similar). The isnan errors shown with g++'s C++11 headers should be using R's ISNAN. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Mon Nov 18 12:10:01 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 12:10:01 +0100 Subject: [GenABEL-dev] Fwd: CRAN packages GenABEL and DatABEL References: <5287285F.5000704@stats.ox.ac.uk> Message-ID: <24BA0C7E-39F9-4E72-AF35-66F19742129F@gmail.com> FYI Begin forwarded message: > From: Prof Brian Ripley > Subject: CRAN packages GenABEL and DatABEL > Date: November 16, 2013 at 09:10:07 AM GMT+1 > To: Yurii Aulchenko , CRAN > > I have corrected the versions on CRAN so they now compile with clang and g++ 4.8.1 in C++98 and C++11 modes. > > It proved tricky to deal with isnan (which is not part of C++98). Only ISNAN worked. ::isnan should work, but does not on the Windows toolchain, and std::isnan (correctly) does not work on the Solaris C++98 compiler. > > Please update your master sources. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Mon Nov 18 12:11:10 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 12:11:10 +0100 Subject: [GenABEL-dev] CRAN packages GenABEL and DatABEL In-Reply-To: <5287285F.5000704@stats.ox.ac.uk> References: <5287285F.5000704@stats.ox.ac.uk> Message-ID: Dear Prof. Ripley, Thank you very much! - we are working on the updated version addressing a number of other issues as well and hope to submit within soon. Yurii On Nov 16, 2013, at 09:10 AM, Prof Brian Ripley wrote: > I have corrected the versions on CRAN so they now compile with clang and g++ 4.8.1 in C++98 and C++11 modes. > > It proved tricky to deal with isnan (which is not part of C++98). Only ISNAN worked. ::isnan should work, but does not on the Windows toolchain, and std::isnan (correctly) does not work on the Solaris C++98 compiler. > > Please update your master sources. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 From yurii.aulchenko at gmail.com Mon Nov 18 12:48:55 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 12:48:55 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <52865058.9020105@karssen.org> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> Message-ID: I would say that in principle DatABEL::text2databel is the "natural" way to go from text-files to DatABEL-files The problem is that 'regular' text input may be allele by allele, not genotype by genotype... (e.g. data are in format "A G", or "A/G", not "0" or "1" or "2"). Y On Nov 15, 2013, at 17:48 PM, L.C. Karssen wrote: > Hi Maksim, > > On 15-11-13 05:53, Maksim Struchalin wrote: >> An easy way to write a function for conversion a plink format file to a >> GenABEL format file: >> >> Use plink support of 'plug-in' functions > > Nice find. I didn't know that existed. > >> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us >> to write a simple R script (myscript.R) which is called by plink (plink >> --file mydata --R myscript.R). plink reads the file mydata (which is in >> plink format) and iteratively, SNP by SNP, trasfer all the data to a >> script myscript.R. This script contains a function >> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO >> variable) and store it in a *flv format through calling DatABEL functions. >> >> The whole process of conversion will look like this: >> >> 1) User asks GenA convert plink file to GenA file >> 2) GenA looks weather the plink is installed. If it is not installed, >> then GenA goes to a plink site and download/install it itself (use an R >> function "download.file" from "utils" package) >> 3) GenA run a simple line: system('plink --file mydata --R myscript.R') >> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv >> format. This function creates an flv file and then open and close it for >> saving every single SNP. >> 5) Work is Done > > I'm not sure how portable it is to download and run plink. Also, the > plink page says: Currently, there is only support for R-plugins for > Linux-based and Mac OS PLINK distributions. > >> >> The only issue is how fast the converssion will run: how much time does >> it take to open a filvector file, store one SNP and close it? I can not >> find a DatABEL R function for adding SNP to a flv file. Is there a C >> DatABEL function which can do it? > > Wouldn't it be easier/possible to use plink to export to text (.csv) and > then use filevector's txt2fvf binary (of course this could be done from > R using system())? > > I'm also wondering if going per SNP is really necessary. If I understand > it correctly the R script (myscript.R) has to have a function called: > Rplink <- function(PHENO,GENO,CLUSTER,COVAR) > where GENO is the matrix of genotypes. So we could write that into a > DatABEL file at once. Of course you may want to do this per chromosome > to reduce memory consumption (not sure how plink/R would handle large > data sets). > > I agree completely with Maarten that opening a filevector file for each > SNP will be an I/O killer. > > > Lennart. > >> >> best, >> Maksim >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > ----------------------------------------------------------------- > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > > Stuur mij aub geen Word of Powerpoint bestanden! > Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html > ------------------------------------------------------------------ > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Mon Nov 18 12:54:16 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 12:54:16 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <528649FE.7090501@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> Message-ID: On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: > Hi Maksim, > > On 14-11-13 22:38, Maksim Struchalin wrote: >> In this email, I propose a new approach which allows to reduce total >> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >> 12Mb to 6Mb. > > I gues you mean B (bytes) instead of b (bits) here :-). > >> >> "R CMD check --as-cran" reports that the following sub-directories have >> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >> last GenABEL submission to CRAN, the maintainers suggested to create a >> new package called GenABELdata and move all the data there. I run >> through the data and found that: >> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >> -> 1.1Mb. >> - There is a function guzip() from library R.utils which can >> decompress the files. It works on any OS. >> - Moreover: the native R function read.table() can read gzip files >> without decompression. >> - Even more: it looks like that the biggest file "srgenos.dat" is >> used only once a long time ago for generating "srdta.RData" and now it >> is just sitting there and eating space needlessly. > > Sounds like a waste of space! > >> 2) We can delete some files from the "data" directory. The deleted files >> will be generated on the user computer based on the files from exdata. >> It can be done during INSTALLATION (a line in Makefile?) or on the first >> load through (|run funcion .onAttach() in R/zzz.R|). > > This sounds like a perfectly acceptable option. I suggest this is done in the "example" which make use of this data, NOT in the INSTALL etc. - we should make things as "robust" as possible and interfere as little as possible with the usual workflow (which is very much system-specific, in that we will need to to test on all platforms) > >> It will reduce >> total size of "data" directory from 2.3Mb to 800Kb. > > Fantastic! If no one has other objections I say: go ahead. > > > Best, > > Lennart. > > >> >> Any objections/suggestions? >> >> best, >> Maksim >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > -- > ----------------------------------------------------------------- > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > > Stuur mij aub geen Word of Powerpoint bestanden! > Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html > ------------------------------------------------------------------ > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Mon Nov 18 12:56:15 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 12:56:15 +0100 Subject: [GenABEL-dev] possible incompatibility problems with function "impute2databel" In-Reply-To: <52863811.8070309@karssen.org> References: <5284407E.5070801@mail.ru> <52863811.8070309@karssen.org> Message-ID: <24A1119F-8D56-4A24-A472-A761A8FEC184@gmail.com> Would be good to solve the issues with GenA (warnings, data-issues) asap - I think that CRAN people may be unhappy - and rightly so - with the fact they now have to fix our code (see latest forward). Y On Nov 15, 2013, at 16:04 PM, L.C. Karssen wrote: > Hi Maksim, > > I had noticed the change from r1320 to r1321 as well and it was still on > my todo list (somewhere at the bottom) to check if it was OK. > > > Thanks for taking care of this, > > Lennart. > > > On 14-11-13 04:16, Maksim Struchalin wrote: >> Dear All, >> >> I fixed an error in function "impute2databel" generated by "R CMD check >> --as-cran" (Revision 1388). This function passed RUnit check but I have >> some concern that it might not work on mac/win. I will keep my eye on >> this function for a while. If you see any incompatibility problems, >> please report it to me. >> >> best, >> Maksi >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > ----------------------------------------------------------------- > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > > Stuur mij aub geen Word of Powerpoint bestanden! > Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html > ------------------------------------------------------------------ > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Mon Nov 18 14:20:16 2013 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 18 Nov 2013 14:20:16 +0100 Subject: [GenABEL-dev] Jenkins continuous integration server available Message-ID: <528A1410.8010802@karssen.org> Dear all, In order to keep an eye on the build stability of the various C/C++ packages in the GenABEL suite, I've taken some time in the past weeks to set up a Jenkins server [1] at http://www.karssen.org/jenkins/ Jenkins checks out the source code from R-forge, runs several static code analysis tools and tries to compile the code. Any errors or warnings that pop up are listed. This way we can make sure that the code is always in a proper (compilable) state and also provides hints on how to improve the code. Maarten Kooyman piloted the use of Jenkins at my previous employer. Many of the things we learned from that setup have been incorporated in this Jenkins install. Thanks a lot Maarten! The ProbABEL project is configured in most detail with checks of the C/C++ style, violations of the Google coding standards and Valgrind checks for memory leaks. More checks still need to be added for ProbABEL and other projects. Consider this a work in progress. Have a look around and let me know if you have any questions or suggestions. Best regards, Lennart. [1] http://jenkins-ci.org/ -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Nov 18 15:11:55 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 15:11:55 +0100 Subject: [GenABEL-dev] new release GenA Message-ID: <33F05BDD-E959-4FEE-94D5-46273BBE7D39@gmail.com> Hi Maksim, Just to make sure - the RUnit tests are not part of the CRAN tests for any of our packages, so in that we do want them to be OK, but the requirement for the CRAN are less strict (basically the package made with makedistrib_GenABEL.sh should pass the checks). This SH strips down a number of things, including RUnit tests (this was requested by CRAN at some stage). For a number of releases, one of the RUnit tests (related to the iterator) fails, but this does not affect the CRAN submission. best, Y From yurii.aulchenko at gmail.com Mon Nov 18 15:33:12 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 18 Nov 2013 15:33:12 +0100 Subject: [GenABEL-dev] Jenkins continuous integration server available In-Reply-To: <528A1410.8010802@karssen.org> References: <528A1410.8010802@karssen.org> Message-ID: <266E3715-BF62-4F9D-AEA5-7C4807BA3BD4@gmail.com> just wow Y On Nov 18, 2013, at 14:20 PM, L.C. Karssen wrote: > Dear all, > > In order to keep an eye on the build stability of the various C/C++ > packages in the GenABEL suite, I've taken some time in the past weeks to > set up a Jenkins server [1] at http://www.karssen.org/jenkins/ > > Jenkins checks out the source code from R-forge, runs several static > code analysis tools and tries to compile the code. Any errors or > warnings that pop up are listed. This way we can make sure that the code > is always in a proper (compilable) state and also provides hints on how > to improve the code. > > Maarten Kooyman piloted the use of Jenkins at my previous employer. Many > of the things we learned from that setup have been incorporated in this > Jenkins install. Thanks a lot Maarten! > > The ProbABEL project is configured in most detail with checks of the > C/C++ style, violations of the Google coding standards and Valgrind > checks for memory leaks. More checks still need to be added for ProbABEL > and other projects. Consider this a work in progress. > > Have a look around and let me know if you have any questions or > suggestions. > > > Best regards, > > Lennart. > > > [1] http://jenkins-ci.org/ > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From fabregat at aices.rwth-aachen.de Mon Nov 18 15:45:03 2013 From: fabregat at aices.rwth-aachen.de (Diego Fabregat Traver) Date: Mon, 18 Nov 2013 15:45:03 +0100 Subject: [GenABEL-dev] Jenkins continuous integration server available In-Reply-To: <266E3715-BF62-4F9D-AEA5-7C4807BA3BD4@gmail.com> References: <528A1410.8010802@karssen.org> <266E3715-BF62-4F9D-AEA5-7C4807BA3BD4@gmail.com> Message-ID: On 18/11/13, Yury Aulchenko wrote: > just wow Yep, it looks pretty cool :) PS: Apparently my programming style is deficient (36 style warnings) xD Best, Diego > Y > > On Nov 18, 2013, at 14:20 PM, L.C. Karssen wrote: > > > Dear all, > > > > In order to keep an eye on the build stability of the various C/C++ > > packages in the GenABEL suite, I've taken some time in the past weeks to > > set up a Jenkins server [1] at http://www.karssen.org/jenkins/ > > > > Jenkins checks out the source code from R-forge, runs several static > > code analysis tools and tries to compile the code. Any errors or > > warnings that pop up are listed. This way we can make sure that the code > > is always in a proper (compilable) state and also provides hints on how > > to improve the code. > > > > Maarten Kooyman piloted the use of Jenkins at my previous employer. Many > > of the things we learned from that setup have been incorporated in this > > Jenkins install. Thanks a lot Maarten! > > > > The ProbABEL project is configured in most detail with checks of the > > C/C++ style, violations of the Google coding standards and Valgrind > > checks for memory leaks. More checks still need to be added for ProbABEL > > and other projects. Consider this a work in progress. > > > > Have a look around and let me know if you have any questions or > > suggestions. > > > > > > Best regards, > > > > Lennart. > > > > > > [1] http://jenkins-ci.org/ > > > > -- > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > L.C. Karssen > > Utrecht > > The Netherlands > > > > lennart at karssen.org > > http://blog.karssen.org > > GPG key ID: A88F554A > > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Tue Nov 19 08:46:51 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 19 Nov 2013 08:46:51 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins Message-ID: <528B176B.2060803@karssen.org> Dear all, The Jenkins setup already shows its value: After Maksim changed the call from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build of ProbABEL was triggered and it failed (because ISNAN() is an R function). I guess this is one more reason to try to convert fvlib into a real (shared) library. Does anyone have another workable solution? Lennart. -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Tue Nov 19 08:56:19 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Tue, 19 Nov 2013 14:56:19 +0700 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B176B.2060803@karssen.org> References: <528B176B.2060803@karssen.org> Message-ID: <528B19A3.5090203@mail.ru> Hi Lennart, How the users under win will install such a library? best, Maksim On 19/11/2013 14:46, L.C. Karssen wrote: > Dear all, > > The Jenkins setup already shows its value: After Maksim changed the call > from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build > of ProbABEL was triggered and it failed (because ISNAN() is an R function). > > I guess this is one more reason to try to convert fvlib into a real > (shared) library. > Does anyone have another workable solution? > > > Lennart. > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Tue Nov 19 09:14:47 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 19 Nov 2013 09:14:47 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B19A3.5090203@mail.ru> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> Message-ID: <528B1DF7.90803@karssen.org> Hi Maksim, Good question... The idea is to generate a .dll file for Windows, but I'm not sure what would be the best way to distribute that. It would be interesting to see how other packages do that. For example, the XML package depends on libxml2: http://cran.r-project.org/web/packages/XML/index.html and the Rcurl package depends on libcurl: http://cran.r-project.org/web/packages/RCurl/index.html In the XML package .zip file for Windows there is a directory libs/x64 and a directory libs/i386. Both contain XML.dll, so I think that for Linux you simply specify a dependency on a library, whereas for Windows the actual .dll is in the package (which is quite logical because Windows lacks the package repositories that most Linux distros have). It seems that for MacOS the .tgz file also contains a lib directory with the .so file. Best regards, Lennart. On 19-11-13 08:56, Maksim Struchalin wrote: > Hi Lennart, > > How the users under win will install such a library? > > best, > Maksim > > On 19/11/2013 14:46, L.C. Karssen wrote: >> Dear all, >> >> The Jenkins setup already shows its value: After Maksim changed the call >> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >> >> I guess this is one more reason to try to convert fvlib into a real >> (shared) library. >> Does anyone have another workable solution? >> >> >> Lennart. >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Tue Nov 19 09:44:45 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Tue, 19 Nov 2013 15:44:45 +0700 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B1DF7.90803@karssen.org> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> Message-ID: <528B24FD.2070000@mail.ru> It seems that your solution is workable but I see little difference with what it is now. Now the filevector code is incorporated in each packages. You propose to follow the same way but pack filelvector code in one file (dll or so) and distribute 9 packages form GenABEL with the same library. Last time I proposed to move filevector in DatABEL. All other packages (GenA and so on) will load DatAB in R and use filevector fucntions from DatA. When DatABEL is loaded through library(DatABEL), the file DatABEL.so is loaded as well. Thus, you do not need to ask users to install additional lib because it is in DatABEL already. I think this is a workable approach that will allow us to delete the filevector code (or filevector so/dll) from all the packages. This is some quote from the R manual how to register functions to make it available from DatAB to GenAB: _______________________________________________ 5.4 Registering native routines By 'native' routine, we mean an entry point in compiled code. In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the specified native routine by looking in the appropriate shared object/DLL. By default, R uses the operating system-specific dynamic loader to lookup the symbol in all loaded DLLs and elsewhere. Alternatively, the author of the DLL can explicitly register routines with R and use a single, platform-independent mechanism for finding the routines in the DLL. One can use this registration mechanism to provide additional information about a routine, including the number and type of the arguments, and also make it available to R programmers under a different name. In the future, registration may be used to implement a form of "secure" or limited native access. _____________________________________________________ Your argument was from "5.8 Linking to other packages: It is not in general possible to link a DLL in package *packA* to a DLL provided by package *packB". *I do not quite understand what they mean under 'link'. May be the mean link a library during intsalltion? best, Maksim On 19/11/2013 15:14, L.C. Karssen wrote: > Hi Maksim, > > Good question... The idea is to generate a .dll file for Windows, but > I'm not sure what would be the best way to distribute that. It would be > interesting to see how other packages do that. For example, the XML > package depends on libxml2: > http://cran.r-project.org/web/packages/XML/index.html and the Rcurl > package depends on libcurl: > http://cran.r-project.org/web/packages/RCurl/index.html > > In the XML package .zip file for Windows there is a directory libs/x64 > and a directory libs/i386. Both contain XML.dll, so I think that for > Linux you simply specify a dependency on a library, whereas for Windows > the actual .dll is in the package (which is quite logical because > Windows lacks the package repositories that most Linux distros have). > It seems that for MacOS the .tgz file also contains a lib directory with > the .so file. > > > Best regards, > > Lennart. > > On 19-11-13 08:56, Maksim Struchalin wrote: >> Hi Lennart, >> >> How the users under win will install such a library? >> >> best, >> Maksim >> >> On 19/11/2013 14:46, L.C. Karssen wrote: >>> Dear all, >>> >>> The Jenkins setup already shows its value: After Maksim changed the call >>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>> >>> I guess this is one more reason to try to convert fvlib into a real >>> (shared) library. >>> Does anyone have another workable solution? >>> >>> >>> Lennart. >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Tue Nov 19 10:10:11 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 19 Nov 2013 10:10:11 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B24FD.2070000@mail.ru> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> Message-ID: <528B2AF3.3030903@karssen.org> Hi ??????, (trying a Russian keyboard layout, no idea if this works...). On 19-11-13 09:44, Maksim Struchalin wrote: > It seems that your solution is workable but I see little difference with > what it is now. Now the filevector code is incorporated in each > packages. This is what I would like to change, indeed. Code that is reused by so many packages should not be copied/symlinked into the code tree of those packages. By symlinking it as we have now, there is no proper way of specifying a version number of the filevector code. Which, in turn means that if something changes in the filevector code all other packages need to be changed immediately (just like what happened with your latest change). If the filevector code have been a proper library we could have simply said that ProbABEL still depends on the old filevector version and take more time to make sure the two play nice together. Moreover, with the filevector code in a separate library the whole isnan() issue would not be a problem. We could simply use std::isnan(), because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). When code is put in a library the internals don't matter as long as the interface (function names + arguments) to the outside doesn't change. > You propose to follow the same way but pack filelvector code > in one file (dll or so) and distribute 9 packages form GenABEL with the > same library. Indeed. The problem with incorporating it all in DatABEL is that non-R packages like ProbABEL and OmicABEL depend on the stuff in the fvlib directory as well. Filevector is central to (almost) all packages in the GenABEL suite, which is why I proposed to make a library out of it. And, as noted above, this way packages can depend on different version of the library. We can of course discuss whether we want to distribute this .so/.dll as a separate (operating system) package or withing the R packages. To me the first option is the 'correct' one, but I see that this may impose on the user (except on Windows and maybe MacOS, where the .so/.dll is included in the R package). > > Last time I proposed to move filevector in DatABEL. All other packages > (GenA and so on) will load DatAB in R and use filevector fucntions from > DatA. When DatABEL is loaded through library(DatABEL), the file > DatABEL.so is loaded as well. I think this is what should be done with the DAlib directory (another symlinked dir). > Thus, you do not need to ask users to > install additional lib because it is in DatABEL already. I think this is > a workable approach that will allow us to delete the filevector code (or > filevector so/dll) from all the packages. > > > This is some quote from the R manual how to register functions to make > it available from DatAB to GenAB: > > > _______________________________________________ > > > 5.4 Registering native routines > > By ?native? routine, we mean an entry point in compiled code. > > In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the > specified native routine by looking in the appropriate shared > object/DLL. By default, R uses the operating system-specific dynamic > loader to lookup the symbol in all loaded DLLs and elsewhere. > Alternatively, the author of the DLL can explicitly register routines > with R and use a single, platform-independent mechanism for finding the > routines in the DLL. One can use this registration mechanism to provide > additional information about a routine, including the number and type of > the arguments, and also make it available to R programmers under a > different name. In the future, registration may be used to implement a > form of ?secure? or limited native access. > > _____________________________________________________ > Hmm, I will have to think about this. This seems to be about how R finds out in which DLL a function is found (and maybe where the DLL is found in the filesystem). I think this is separate from the point below, but I'm not sure. > > > Your argument was from "5.8 Linking to other packages: It is not in > general possible to link a DLL in package *packA* to a DLL provided by > package *packB". *I do not quite understand what they mean under 'link'. > May be the mean link a library during intsalltion? Yes, as far as I understand, they mean linking to a library during installation/compilation. Best, Lennart. > > > best, > Maksim > > > > On 19/11/2013 15:14, L.C. Karssen wrote: >> Hi Maksim, >> >> Good question... The idea is to generate a .dll file for Windows, but >> I'm not sure what would be the best way to distribute that. It would be >> interesting to see how other packages do that. For example, the XML >> package depends on libxml2: >> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >> package depends on libcurl: >> http://cran.r-project.org/web/packages/RCurl/index.html >> >> In the XML package .zip file for Windows there is a directory libs/x64 >> and a directory libs/i386. Both contain XML.dll, so I think that for >> Linux you simply specify a dependency on a library, whereas for Windows >> the actual .dll is in the package (which is quite logical because >> Windows lacks the package repositories that most Linux distros have). >> It seems that for MacOS the .tgz file also contains a lib directory with >> the .so file. >> >> >> Best regards, >> >> Lennart. >> >> On 19-11-13 08:56, Maksim Struchalin wrote: >>> Hi Lennart, >>> >>> How the users under win will install such a library? >>> >>> best, >>> Maksim >>> >>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>> Dear all, >>>> >>>> The Jenkins setup already shows its value: After Maksim changed the call >>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>> >>>> I guess this is one more reason to try to convert fvlib into a real >>>> (shared) library. >>>> Does anyone have another workable solution? >>>> >>>> >>>> Lennart. >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- ----------------------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org Stuur mij aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Tue Nov 19 15:17:11 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Tue, 19 Nov 2013 21:17:11 +0700 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B2AF3.3030903@karssen.org> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> Message-ID: <528B72E7.4080506@mail.ru> Hi Lennart, I see you are improving your Russian :-). I understand your arguments. I think we can combine our two approaches. 1) We make a so/dll from filevector and let it use by ProbABEL/OmicABEL/Another_not_R_softABEL. 2) For GenABEL and other R packages, we make a DatABEL. The code of filevector is the same both for 1) and 2). We only add preprocessor commands (#ifdef and so on) to surround R specific code (ISNAN() and std::isnan). In this case, compiler choose itself weather it buids the lib for R or for OS. If we will want to use only approach 1) for GenABEL in the future, we can quickly swith to it later. best, Maksim On 19/11/2013 16:10, L.C. Karssen wrote: > Hi ??????, > > (trying a Russian keyboard layout, no idea if this works...). > > On 19-11-13 09:44, Maksim Struchalin wrote: >> It seems that your solution is workable but I see little difference with >> what it is now. Now the filevector code is incorporated in each >> packages. > This is what I would like to change, indeed. Code that is reused by so > many packages should not be copied/symlinked into the code tree of those > packages. By symlinking it as we have now, there is no proper way of > specifying a version number of the filevector code. Which, in turn means > that if something changes in the filevector code all other packages need > to be changed immediately (just like what happened with your latest > change). If the filevector code have been a proper library we could have > simply said that ProbABEL still depends on the old filevector version > and take more time to make sure the two play nice together. > > Moreover, with the filevector code in a separate library the whole > isnan() issue would not be a problem. We could simply use std::isnan(), > because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). > When code is put in a library the internals don't matter as long as the > interface (function names + arguments) to the outside doesn't change. > >> You propose to follow the same way but pack filelvector code >> in one file (dll or so) and distribute 9 packages form GenABEL with the >> same library. > Indeed. The problem with incorporating it all in DatABEL is that non-R > packages like ProbABEL and OmicABEL depend on the stuff in the fvlib > directory as well. Filevector is central to (almost) all packages in the > GenABEL suite, which is why I proposed to make a library out of it. And, > as noted above, this way packages can depend on different version of the > library. > > We can of course discuss whether we want to distribute this .so/.dll as > a separate (operating system) package or withing the R packages. To me > the first option is the 'correct' one, but I see that this may impose on > the user (except on Windows and maybe MacOS, where the .so/.dll is > included in the R package). > > >> Last time I proposed to move filevector in DatABEL. All other packages >> (GenA and so on) will load DatAB in R and use filevector fucntions from >> DatA. When DatABEL is loaded through library(DatABEL), the file >> DatABEL.so is loaded as well. > I think this is what should be done with the DAlib directory (another > symlinked dir). > >> Thus, you do not need to ask users to >> install additional lib because it is in DatABEL already. I think this is >> a workable approach that will allow us to delete the filevector code (or >> filevector so/dll) from all the packages. >> >> >> This is some quote from the R manual how to register functions to make >> it available from DatAB to GenAB: >> >> >> _______________________________________________ >> >> >> 5.4 Registering native routines >> >> By 'native' routine, we mean an entry point in compiled code. >> >> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >> specified native routine by looking in the appropriate shared >> object/DLL. By default, R uses the operating system-specific dynamic >> loader to lookup the symbol in all loaded DLLs and elsewhere. >> Alternatively, the author of the DLL can explicitly register routines >> with R and use a single, platform-independent mechanism for finding the >> routines in the DLL. One can use this registration mechanism to provide >> additional information about a routine, including the number and type of >> the arguments, and also make it available to R programmers under a >> different name. In the future, registration may be used to implement a >> form of "secure" or limited native access. >> >> _____________________________________________________ >> > Hmm, I will have to think about this. This seems to be about how R finds > out in which DLL a function is found (and maybe where the DLL is found > in the filesystem). I think this is separate from the point below, but > I'm not sure. > >> >> Your argument was from "5.8 Linking to other packages: It is not in >> general possible to link a DLL in package *packA* to a DLL provided by >> package *packB". *I do not quite understand what they mean under 'link'. >> May be the mean link a library during intsalltion? > Yes, as far as I understand, they mean linking to a library during > installation/compilation. > > > Best, > > Lennart. >> >> best, >> Maksim >> >> >> >> On 19/11/2013 15:14, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> Good question... The idea is to generate a .dll file for Windows, but >>> I'm not sure what would be the best way to distribute that. It would be >>> interesting to see how other packages do that. For example, the XML >>> package depends on libxml2: >>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>> package depends on libcurl: >>> http://cran.r-project.org/web/packages/RCurl/index.html >>> >>> In the XML package .zip file for Windows there is a directory libs/x64 >>> and a directory libs/i386. Both contain XML.dll, so I think that for >>> Linux you simply specify a dependency on a library, whereas for Windows >>> the actual .dll is in the package (which is quite logical because >>> Windows lacks the package repositories that most Linux distros have). >>> It seems that for MacOS the .tgz file also contains a lib directory with >>> the .so file. >>> >>> >>> Best regards, >>> >>> Lennart. >>> >>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>> Hi Lennart, >>>> >>>> How the users under win will install such a library? >>>> >>>> best, >>>> Maksim >>>> >>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>> Dear all, >>>>> >>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>> >>>>> I guess this is one more reason to try to convert fvlib into a real >>>>> (shared) library. >>>>> Does anyone have another workable solution? >>>>> >>>>> >>>>> Lennart. >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Tue Nov 19 22:37:54 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Tue, 19 Nov 2013 22:37:54 +0100 Subject: [GenABEL-dev] [genabel-Bugs][5118] Number of genotypes is limited to (2^31-1)/4 In-Reply-To: <20131119210018.C6FB21861B6@r-forge.r-project.org> References: <20131119210018.C6FB21861B6@r-forge.r-project.org> Message-ID: Interesting. I thought the fact that R>3.0 allows for very long vectors "automatically" fixes the bug reported below. Y On Nov 19, 2013, at 22:00 PM, wrote: > Bugs item #5118, was opened at 2013-11-19 16:00 by Ge Zhang > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5118&group_id=505 > > Status: Open > Priority: 3 > Submitted By: Ge Zhang (zhangge) > Assigned to: Nobody (None) > Summary: Number of genotypes is limited to (2^31-1)/4 > Resolution: None > Operating System: All > Severity: major > Hardware: All > Version: v1.7-6 > Component: GenABEL > URL: > > > Initial Comment: > The current GenABEL can only handle (2^31-1)*4 genotypes, corresponding approximately less than 4.3 million SNPs in 2,000 samples. The problem is caused by the upper limit of vector length 2^31-1 in R/GenABEL. A lot of functions in GenABEL are influenced by this limitation. > > The development team might consider to fix this problem since most machines are running 64-bit systems and R started supporting long vectors since version 3.0. > > ---------------------------------------------------------------------- > > You can respond by visiting: > https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5118&group_id=505 From lennart at karssen.org Wed Nov 20 12:06:27 2013 From: lennart at karssen.org (L.C. Karssen) Date: Wed, 20 Nov 2013 12:06:27 +0100 Subject: [GenABEL-dev] [genabel-Bugs][5118] Number of genotypes is limited to (2^31-1)/4 In-Reply-To: References: <20131119210018.C6FB21861B6@r-forge.r-project.org> Message-ID: <528C97B3.60000@karssen.org> I was surprised too. This bug is also discussed on the forum: http://forum.genabel.org/viewtopic.php?f=6&t=838 Lennart. On 11/19/2013 10:37 PM, Yury Aulchenko wrote: > Interesting. I thought the fact that R>3.0 allows for very long vectors "automatically" fixes the bug reported below. > > Y > > > On Nov 19, 2013, at 22:00 PM, wrote: > >> Bugs item #5118, was opened at 2013-11-19 16:00 by Ge Zhang >> You can respond by visiting: >> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5118&group_id=505 >> >> Status: Open >> Priority: 3 >> Submitted By: Ge Zhang (zhangge) >> Assigned to: Nobody (None) >> Summary: Number of genotypes is limited to (2^31-1)/4 >> Resolution: None >> Operating System: All >> Severity: major >> Hardware: All >> Version: v1.7-6 >> Component: GenABEL >> URL: >> >> >> Initial Comment: >> The current GenABEL can only handle (2^31-1)*4 genotypes, corresponding approximately less than 4.3 million SNPs in 2,000 samples. The problem is caused by the upper limit of vector length 2^31-1 in R/GenABEL. A lot of functions in GenABEL are influenced by this limitation. >> >> The development team might consider to fix this problem since most machines are running 64-bit systems and R started supporting long vectors since version 3.0. >> >> ---------------------------------------------------------------------- >> >> You can respond by visiting: >> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=5118&group_id=505 > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 22 04:44:37 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 22 Nov 2013 10:44:37 +0700 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> Message-ID: <528ED325.8070707@mail.ru> Yes. Looks like it was a bad idea to use plink R-plugin for converting plink files to *ABEL format. Maksim On 18/11/2013 18:48, Yury Aulchenko wrote: > I would say that in principle DatABEL::text2databel is the "natural" > way to go from text-files to DatABEL-files > > The problem is that 'regular' text input may be allele by allele, not > genotype by genotype... (e.g. data are in format "A G", or "A/G", not > "0" or "1" or "2"). > > Y > > On Nov 15, 2013, at 17:48 PM, L.C. Karssen > wrote: > >> Hi Maksim, >> >> On 15-11-13 05:53, Maksim Struchalin wrote: >>> An easy way to write a function for conversion a plink format file to a >>> GenABEL format file: >>> >>> Use plink support of 'plug-in' functions >> >> Nice find. I didn't know that existed. >> >>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml >>> ). This >>> allows us >>> to write a simple R script (myscript.R) which is called by plink (plink >>> --file mydata --R myscript.R). plink reads the file mydata (which is in >>> plink format) and iteratively, SNP by SNP, trasfer all the data to a >>> script myscript.R. This script contains a function >>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO >>> variable) and store it in a *flv format through calling DatABEL >>> functions. >>> >>> The whole process of conversion will look like this: >>> >>> 1) User asks GenA convert plink file to GenA file >>> 2) GenA looks weather the plink is installed. If it is not installed, >>> then GenA goes to a plink site and download/install it itself (use an R >>> function "download.file" from "utils" package) >>> 3) GenA run a simple line: system('plink --file mydata --R myscript.R') >>> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv >>> format. This function creates an flv file and then open and close it for >>> saving every single SNP. >>> 5) Work is Done >> >> I'm not sure how portable it is to download and run plink. Also, the >> plink page says: Currently, there is only support for R-plugins for >> Linux-based and Mac OS PLINK distributions. >> >>> >>> The only issue is how fast the converssion will run: how much time does >>> it take to open a filvector file, store one SNP and close it? I can not >>> find a DatABEL R function for adding SNP to a flv file. Is there a C >>> DatABEL function which can do it? >> >> Wouldn't it be easier/possible to use plink to export to text (.csv) and >> then use filevector's txt2fvf binary (of course this could be done from >> R using system())? >> >> I'm also wondering if going per SNP is really necessary. If I understand >> it correctly the R script (myscript.R) has to have a function called: >> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) >> where GENO is the matrix of genotypes. So we could write that into a >> DatABEL file at once. Of course you may want to do this per chromosome >> to reduce memory consumption (not sure how plink/R would handle large >> data sets). >> >> I agree completely with Maarten that opening a filevector file for each >> SNP will be an I/O killer. >> >> >> Lennart. >> >>> >>> best, >>> Maksim >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> -- >> ----------------------------------------------------------------- >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> >> Stuur mij aub geen Word of Powerpoint bestanden! >> Ziehttp://www.gnu.org/philosophy/no-word-attachments.nl.html >> ------------------------------------------------------------------ >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Fri Nov 22 09:54:29 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 22 Nov 2013 09:54:29 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528ED325.8070707@mail.ru> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> Message-ID: Too slow, too difficult for the user, or both? :) On Friday, November 22, 2013, Maksim Struchalin wrote: > Yes. Looks like it was a bad idea to use plink R-plugin for converting > plink files to *ABEL format. > Maksim > > On 18/11/2013 18:48, Yury Aulchenko wrote: > > I would say that in principle DatABEL::text2databel is the "natural" way > to go from text-files to DatABEL-files > > The problem is that 'regular' text input may be allele by allele, not > genotype by genotype... (e.g. data are in format "A G", or "A/G", not "0" > or "1" or "2"). > > Y > > On Nov 15, 2013, at 17:48 PM, L.C. Karssen wrote: > > Hi Maksim, > > On 15-11-13 05:53, Maksim Struchalin wrote: > > An easy way to write a function for conversion a plink format file to a > GenABEL format file: > > Use plink support of 'plug-in' functions > > > Nice find. I didn't know that existed. > > (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us > to write a simple R script (myscript.R) which is called by plink (plink > --file mydata --R myscript.R). plink reads the file mydata (which is in > plink format) and iteratively, SNP by SNP, trasfer all the data to a > script myscript.R. This script contains a function > Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO > variable) and store it in a *flv format through calling DatABEL functions. > > The whole process of conversion will look like this: > > 1) User asks GenA convert plink file to GenA file > 2) GenA looks weather the plink is installed. If it is not installed, > then GenA goes to a plink site and download/install it itself (use an R > function "download.file" from "utils" package) > 3) GenA run a simple line: system('plink --file mydata --R myscript.R') > 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv > format. This function creates an flv file and then open and close it for > saving every single SNP. > 5) Work is Done > > > I'm not sure how portable it is to download and run plink. Also, the > plink page says: Currently, there is only support for R-plugins for > Linux-based and Mac OS PLINK distributions. > > > The only issue is how fast the converssion will run: how much time does > it take to open a filvector file, store one SNP and close it? I can not > find a DatABEL R function for adding SNP to a flv file. Is there a C > DatABEL function which can do it? > > > Wouldn't it be easier/possible to use plink to export to text (.csv) and > then use filevector's txt2fvf binary (of course this could be done from > R using system())? > > I'm also wondering if going per SNP is really necessary. If I understand > it correctly the R script (myscript.R) has to have a function called: > Rplink <- function(PHENO,GENO,CLUSTER,COVAR) > where GENO is the matrix of genotypes. So we could write that into a > DatABEL file at once. Of course you may want to do this per chromosome > to reduce memory consumption (not sure how plink/R would handle large > data sets). > > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 22 10:04:30 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 22 Nov 2013 10:04:30 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> Message-ID: <528F1E1E.6050300@karssen.org> How difficult would it be to import .bed files [1] instead of the text conversion? Given the binary data of both the .bed and the GenABEL format, wouldn't conversion be much quicker? Lennart. [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: > Too slow, too difficult for the user, or both? :) > > On Friday, November 22, 2013, Maksim Struchalin wrote: > > Yes. Looks like it was a bad idea to use plink R-plugin for > converting plink files to *ABEL format. > Maksim > > On 18/11/2013 18:48, Yury Aulchenko wrote: >> I would say that in principle DatABEL::text2databel is the >> "natural" way to go from text-files to DatABEL-files >> >> The problem is that 'regular' text input may be allele by allele, >> not genotype by genotype... (e.g. data are in format "A G", or >> "A/G", not "0" or "1" or "2"). >> >> Y >> >> On Nov 15, 2013, at 17:48 PM, L.C. Karssen >> wrote: >> >>> Hi Maksim, >>> >>> On 15-11-13 05:53, Maksim Struchalin wrote: >>>> An easy way to write a function for conversion a plink format >>>> file to a >>>> GenABEL format file: >>>> >>>> Use plink support of 'plug-in' functions >>> >>> Nice find. I didn't know that existed. >>> >>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml >>>> ). >>>> This allows us >>>> to write a simple R script (myscript.R) which is called by plink >>>> (plink >>>> --file mydata --R myscript.R). plink reads the file mydata >>>> (which is in >>>> plink format) and iteratively, SNP by SNP, trasfer all the data to a >>>> script myscript.R. This script contains a function >>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO >>>> variable) and store it in a *flv format through calling DatABEL >>>> functions. >>>> >>>> The whole process of conversion will look like this: >>>> >>>> 1) User asks GenA convert plink file to GenA file >>>> 2) GenA looks weather the plink is installed. If it is not >>>> installed, >>>> then GenA goes to a plink site and download/install it itself >>>> (use an R >>>> function "download.file" from "utils" package) >>>> 3) GenA run a simple line: system('plink --file mydata --R >>>> myscript.R') >>>> 4) Rplink function (from myscript.R) gets every SNP and stote it >>>> in *flv >>>> format. This function creates an flv file and then open and >>>> close it for >>>> saving every single SNP. >>>> 5) Work is Done >>> >>> I'm not sure how portable it is to download and run plink. Also, the >>> plink page says: Currently, there is only support for R-plugins for >>> Linux-based and Mac OS PLINK distributions. >>> >>>> >>>> The only issue is how fast the converssion will run: how much >>>> time does >>>> it take to open a filvector file, store one SNP and close it? I >>>> can not >>>> find a DatABEL R function for adding SNP to a flv file. Is there a C >>>> DatABEL function which can do it? >>> >>> Wouldn't it be easier/possible to use plink to export to text >>> (.csv) and >>> then use filevector's txt2fvf binary (of course this could be >>> done from >>> R using system())? >>> >>> I'm also wondering if going per SNP is really necessary. If I >>> understand >>> it correctly the R script (myscript.R) has to have a function called: >>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) >>> where GENO is the matrix of genotypes. So we could write that into a >>> DatABEL file at once. Of course you may want to do this per >>> chromosome >>> to reduce memory consumption (not sure how plink/R would handle large >>> data sets). >>> > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.kooijman at erasmusmc.nl Fri Nov 22 10:54:16 2013 From: m.kooijman at erasmusmc.nl (Maarten Kooyman) Date: Fri, 22 Nov 2013 10:54:16 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528F1E1E.6050300@karssen.org> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> Message-ID: <528F29C8.40003@erasmusmc.nl> There is also a function in to read bed files in the bioconductor snpStats package. This might be a vantage point. see : search.bioconductor.jp/codes/6594 Maarten Kooyman On 11/22/2013 10:04 AM, L.C. Karssen wrote: > How difficult would it be to import .bed files [1] instead of the text > conversion? Given the binary data of both the .bed and the GenABEL > format, wouldn't conversion be much quicker? > > > Lennart. > > [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml > > > On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: >> Too slow, too difficult for the user, or both? :) >> >> On Friday, November 22, 2013, Maksim Struchalin wrote: >> >> Yes. Looks like it was a bad idea to use plink R-plugin for >> converting plink files to *ABEL format. >> Maksim >> >> On 18/11/2013 18:48, Yury Aulchenko wrote: >>> I would say that in principle DatABEL::text2databel is the >>> "natural" way to go from text-files to DatABEL-files >>> >>> The problem is that 'regular' text input may be allele by allele, >>> not genotype by genotype... (e.g. data are in format "A G", or >>> "A/G", not "0" or "1" or "2"). >>> >>> Y >>> >>> On Nov 15, 2013, at 17:48 PM, L.C. Karssen >>> wrote: >>> >>>> Hi Maksim, >>>> >>>> On 15-11-13 05:53, Maksim Struchalin wrote: >>>>> An easy way to write a function for conversion a plink format >>>>> file to a >>>>> GenABEL format file: >>>>> >>>>> Use plink support of 'plug-in' functions >>>> >>>> Nice find. I didn't know that existed. >>>> >>>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml >>>>> ). >>>>> This allows us >>>>> to write a simple R script (myscript.R) which is called by plink >>>>> (plink >>>>> --file mydata --R myscript.R). plink reads the file mydata >>>>> (which is in >>>>> plink format) and iteratively, SNP by SNP, trasfer all the data to a >>>>> script myscript.R. This script contains a function >>>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO >>>>> variable) and store it in a *flv format through calling DatABEL >>>>> functions. >>>>> >>>>> The whole process of conversion will look like this: >>>>> >>>>> 1) User asks GenA convert plink file to GenA file >>>>> 2) GenA looks weather the plink is installed. If it is not >>>>> installed, >>>>> then GenA goes to a plink site and download/install it itself >>>>> (use an R >>>>> function "download.file" from "utils" package) >>>>> 3) GenA run a simple line: system('plink --file mydata --R >>>>> myscript.R') >>>>> 4) Rplink function (from myscript.R) gets every SNP and stote it >>>>> in *flv >>>>> format. This function creates an flv file and then open and >>>>> close it for >>>>> saving every single SNP. >>>>> 5) Work is Done >>>> >>>> I'm not sure how portable it is to download and run plink. Also, the >>>> plink page says: Currently, there is only support for R-plugins for >>>> Linux-based and Mac OS PLINK distributions. >>>> >>>>> >>>>> The only issue is how fast the converssion will run: how much >>>>> time does >>>>> it take to open a filvector file, store one SNP and close it? I >>>>> can not >>>>> find a DatABEL R function for adding SNP to a flv file. Is there a C >>>>> DatABEL function which can do it? >>>> >>>> Wouldn't it be easier/possible to use plink to export to text >>>> (.csv) and >>>> then use filevector's txt2fvf binary (of course this could be >>>> done from >>>> R using system())? >>>> >>>> I'm also wondering if going per SNP is really necessary. If I >>>> understand >>>> it correctly the R script (myscript.R) has to have a function called: >>>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) >>>> where GENO is the matrix of genotypes. So we could write that into a >>>> DatABEL file at once. Of course you may want to do this per >>>> chromosome >>>> to reduce memory consumption (not sure how plink/R would handle large >>>> data sets). >>>> >> >> >> -- >> ----------------------------------------------------- >> Yurii S. Aulchenko >> >> [ LinkedIn ] [ Twitter >> ] [ Blog >> ] >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 22 11:08:16 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 22 Nov 2013 11:08:16 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528F29C8.40003@erasmusmc.nl> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> <528F29C8.40003@erasmusmc.nl> Message-ID: <528F2D10.8060303@karssen.org> Thanks Maarten, that's a good finding. It does seem to return the data (incl. genotype data) in a list. I'm not sure how well that will work RAM-wise for large data sets. On the other hand the function does allow SNP selection, so maybe conversion could be done per chromosome. Lennart. On 11/22/2013 10:54 AM, Maarten Kooyman wrote: > There is also a function in to read bed files in the bioconductor > snpStats package. This might be a vantage point. > > see : search.bioconductor.jp/codes/6594 > > Maarten Kooyman > > On 11/22/2013 10:04 AM, L.C. Karssen wrote: >> How difficult would it be to import .bed files [1] instead of the text >> conversion? Given the binary data of both the .bed and the GenABEL >> format, wouldn't conversion be much quicker? >> >> >> Lennart. >> >> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml >> >> >> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: >>> Too slow, too difficult for the user, or both? :) >>> >>> On Friday, November 22, 2013, Maksim Struchalin wrote: >>> >>> Yes. Looks like it was a bad idea to use plink R-plugin for >>> converting plink files to *ABEL format. >>> Maksim >>> >>> On 18/11/2013 18:48, Yury Aulchenko wrote: >>>> I would say that in principle DatABEL::text2databel is the >>>> "natural" way to go from text-files to DatABEL-files >>>> >>>> The problem is that 'regular' text input may be allele by allele, >>>> not genotype by genotype... (e.g. data are in format "A G", or >>>> "A/G", not "0" or "1" or "2"). >>>> >>>> Y >>>> >>>> On Nov 15, 2013, at 17:48 PM, L.C. Karssen >>>> wrote: >>>> >>>>> Hi Maksim, >>>>> >>>>> On 15-11-13 05:53, Maksim Struchalin wrote: >>>>>> An easy way to write a function for conversion a plink format >>>>>> file to a >>>>>> GenABEL format file: >>>>>> >>>>>> Use plink support of 'plug-in' functions >>>>> >>>>> Nice find. I didn't know that existed. >>>>> >>>>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml >>>>>> ). >>>>>> This allows us >>>>>> to write a simple R script (myscript.R) which is called by plink >>>>>> (plink >>>>>> --file mydata --R myscript.R). plink reads the file mydata >>>>>> (which is in >>>>>> plink format) and iteratively, SNP by SNP, trasfer all the > data to a >>>>>> script myscript.R. This script contains a function >>>>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO >>>>>> variable) and store it in a *flv format through calling DatABEL >>>>>> functions. >>>>>> >>>>>> The whole process of conversion will look like this: >>>>>> >>>>>> 1) User asks GenA convert plink file to GenA file >>>>>> 2) GenA looks weather the plink is installed. If it is not >>>>>> installed, >>>>>> then GenA goes to a plink site and download/install it itself >>>>>> (use an R >>>>>> function "download.file" from "utils" package) >>>>>> 3) GenA run a simple line: system('plink --file mydata --R >>>>>> myscript.R') >>>>>> 4) Rplink function (from myscript.R) gets every SNP and stote it >>>>>> in *flv >>>>>> format. This function creates an flv file and then open and >>>>>> close it for >>>>>> saving every single SNP. >>>>>> 5) Work is Done >>>>> >>>>> I'm not sure how portable it is to download and run plink. > Also, the >>>>> plink page says: Currently, there is only support for R-plugins for >>>>> Linux-based and Mac OS PLINK distributions. >>>>> >>>>>> >>>>>> The only issue is how fast the converssion will run: how much >>>>>> time does >>>>>> it take to open a filvector file, store one SNP and close it? I >>>>>> can not >>>>>> find a DatABEL R function for adding SNP to a flv file. Is > there a C >>>>>> DatABEL function which can do it? >>>>> >>>>> Wouldn't it be easier/possible to use plink to export to text >>>>> (.csv) and >>>>> then use filevector's txt2fvf binary (of course this could be >>>>> done from >>>>> R using system())? >>>>> >>>>> I'm also wondering if going per SNP is really necessary. If I >>>>> understand >>>>> it correctly the R script (myscript.R) has to have a function > called: >>>>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) >>>>> where GENO is the matrix of genotypes. So we could write that > into a >>>>> DatABEL file at once. Of course you may want to do this per >>>>> chromosome >>>>> to reduce memory consumption (not sure how plink/R would handle > large >>>>> data sets). >>>>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Yurii S. Aulchenko >>> >>> [ LinkedIn ] [ Twitter >>> ] [ Blog >>> ] >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.kooijman at erasmusmc.nl Fri Nov 22 11:23:13 2013 From: m.kooijman at erasmusmc.nl (Maarten Kooyman) Date: Fri, 22 Nov 2013 11:23:13 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528F2D10.8060303@karssen.org> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> <528F29C8.40003@erasmusmc.nl> <528F2D10.8060303@karssen.org> Message-ID: <528F3091.6030200@erasmusmc.nl> snpStats does have a GPL-3 licence, so we can use the code and do whatever we want with it so long as we keep it under the same licence and give credit to to the owner/writer of the code. Is it possible to adjust the snpStats code to dump the genotypes of the bed file directly into DatABEL format? This sound to me as the fastest(as in CPU time) option. Maarten Kooyman On 11/22/2013 11:08 AM, L.C. Karssen wrote: > Thanks Maarten, that's a good finding. It does seem to return the data > (incl. genotype data) in a list. I'm not sure how well that will work > RAM-wise for large data sets. On the other hand the function does allow > SNP selection, so maybe conversion could be done per chromosome. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.pirastu at burlo.trieste.it Fri Nov 22 11:34:09 2013 From: nicola.pirastu at burlo.trieste.it (Nicola Pirastu) Date: Fri, 22 Nov 2013 11:34:09 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528F3091.6030200@erasmusmc.nl> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> <528F29C8.40003@erasmusmc.nl> <528F2D10.8060303@karssen.org> <528F3091.6030200@erasmusmc.nl> Message-ID: <631BE31E-8E3B-4E6D-9ACF-74147E3B7103@burlo.trieste.it> Just to add my two cents, there is also read.plink {snpMatrix} which returns a simple matrix, might require less from the RAM point of view. Nicola Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, Chirurgical and Health Department University of Trieste Medical Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. +390403785539 Il giorno 22/nov/2013, alle ore 11:23, Maarten Kooyman > ha scritto: snpStats does have a GPL-3 licence, so we can use the code and do whatever we want with it so long as we keep it under the same licence and give credit to to the owner/writer of the code. Is it possible to adjust the snpStats code to dump the genotypes of the bed file directly into DatABEL format? This sound to me as the fastest(as in CPU time) option. Maarten Kooyman On 11/22/2013 11:08 AM, L.C. Karssen wrote: > Thanks Maarten, that's a good finding. It does seem to return the data > (incl. genotype data) in a list. I'm not sure how well that will work > RAM-wise for large data sets. On the other hand the function does allow > SNP selection, so maybe conversion could be done per chromosome. > _______________________________________________ genabel-devel mailing list genabel-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute nel messaggio o nei suoi allegati. Se non siete i destinatari indicati nel messaggio, o responsabili per la sua consegna alla persona, o se avete ricevuto il messaggio per errore, siete pregati di non trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY NOTICE Confidential information may be contained in this message or in its attachments. If you are not the addressee indicated in this message, or responsible for message delivering to that person, or if you have received this message in error, you may not transcribe, copy or deliver this message to anyone. In that case, you should delete this message and its attachments. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 22 15:34:32 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 22 Nov 2013 15:34:32 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528B72E7.4080506@mail.ru> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> Message-ID: <528F6B78.20004@karssen.org> Hi ??????, On 11/19/2013 03:17 PM, Maksim Struchalin wrote: > Hi Lennart, > > I see you are improving your Russian :-). Getting to know the Russian alphabet is step one :-). > > I understand your arguments. I think we can combine our two approaches. > 1) We make a so/dll from filevector and let it use by > ProbABEL/OmicABEL/Another_not_R_softABEL. > 2) For GenABEL and other R packages, we make a DatABEL. > > The code of filevector is the same both for 1) and 2). But that doesn't solve the problem of having symlinks to the fvlib directory in our SVN tree... Which means that any update to filevector can make the depending package (DatABEL) become uncompilable. In the mean time I've set the first steps towards 'libfilevector' in SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but more polishing is needed. > We only add > preprocessor commands (#ifdef and so on) to surround R specific code > (ISNAN() and std::isnan). In this case, compiler choose itself weather > it buids the lib for R or for OS. > > If we will want to use only approach 1) for GenABEL in the future, we > can quickly swith to it later. True, for now this will work. Best, Lennart. > > best, > Maksim > > > > On 19/11/2013 16:10, L.C. Karssen wrote: >> Hi ??????, >> >> (trying a Russian keyboard layout, no idea if this works...). >> >> On 19-11-13 09:44, Maksim Struchalin wrote: >>> It seems that your solution is workable but I see little difference with >>> what it is now. Now the filevector code is incorporated in each >>> packages. >> This is what I would like to change, indeed. Code that is reused by so >> many packages should not be copied/symlinked into the code tree of those >> packages. By symlinking it as we have now, there is no proper way of >> specifying a version number of the filevector code. Which, in turn means >> that if something changes in the filevector code all other packages need >> to be changed immediately (just like what happened with your latest >> change). If the filevector code have been a proper library we could have >> simply said that ProbABEL still depends on the old filevector version >> and take more time to make sure the two play nice together. >> >> Moreover, with the filevector code in a separate library the whole >> isnan() issue would not be a problem. We could simply use std::isnan(), >> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >> When code is put in a library the internals don't matter as long as the >> interface (function names + arguments) to the outside doesn't change. >> >>> You propose to follow the same way but pack filelvector code >>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>> same library. >> Indeed. The problem with incorporating it all in DatABEL is that non-R >> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >> directory as well. Filevector is central to (almost) all packages in the >> GenABEL suite, which is why I proposed to make a library out of it. And, >> as noted above, this way packages can depend on different version of the >> library. >> >> We can of course discuss whether we want to distribute this .so/.dll as >> a separate (operating system) package or withing the R packages. To me >> the first option is the 'correct' one, but I see that this may impose on >> the user (except on Windows and maybe MacOS, where the .so/.dll is >> included in the R package). >> >> >>> Last time I proposed to move filevector in DatABEL. All other packages >>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>> DatA. When DatABEL is loaded through library(DatABEL), the file >>> DatABEL.so is loaded as well. >> I think this is what should be done with the DAlib directory (another >> symlinked dir). >> >>> Thus, you do not need to ask users to >>> install additional lib because it is in DatABEL already. I think this is >>> a workable approach that will allow us to delete the filevector code (or >>> filevector so/dll) from all the packages. >>> >>> >>> This is some quote from the R manual how to register functions to make >>> it available from DatAB to GenAB: >>> >>> >>> _______________________________________________ >>> >>> >>> 5.4 Registering native routines >>> >>> By ?native? routine, we mean an entry point in compiled code. >>> >>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>> specified native routine by looking in the appropriate shared >>> object/DLL. By default, R uses the operating system-specific dynamic >>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>> Alternatively, the author of the DLL can explicitly register routines >>> with R and use a single, platform-independent mechanism for finding the >>> routines in the DLL. One can use this registration mechanism to provide >>> additional information about a routine, including the number and type of >>> the arguments, and also make it available to R programmers under a >>> different name. In the future, registration may be used to implement a >>> form of ?secure? or limited native access. >>> >>> _____________________________________________________ >>> >> Hmm, I will have to think about this. This seems to be about how R finds >> out in which DLL a function is found (and maybe where the DLL is found >> in the filesystem). I think this is separate from the point below, but >> I'm not sure. >> >>> >>> Your argument was from "5.8 Linking to other packages: It is not in >>> general possible to link a DLL in package *packA* to a DLL provided by >>> package *packB". *I do not quite understand what they mean under 'link'. >>> May be the mean link a library during intsalltion? >> Yes, as far as I understand, they mean linking to a library during >> installation/compilation. >> >> >> Best, >> >> Lennart. >>> >>> best, >>> Maksim >>> >>> >>> >>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>> Hi Maksim, >>>> >>>> Good question... The idea is to generate a .dll file for Windows, but >>>> I'm not sure what would be the best way to distribute that. It would be >>>> interesting to see how other packages do that. For example, the XML >>>> package depends on libxml2: >>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>> package depends on libcurl: >>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>> >>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>> Linux you simply specify a dependency on a library, whereas for Windows >>>> the actual .dll is in the package (which is quite logical because >>>> Windows lacks the package repositories that most Linux distros have). >>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>> the .so file. >>>> >>>> >>>> Best regards, >>>> >>>> Lennart. >>>> >>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>> Hi Lennart, >>>>> >>>>> How the users under win will install such a library? >>>>> >>>>> best, >>>>> Maksim >>>>> >>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>> Dear all, >>>>>> >>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>> >>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>> (shared) library. >>>>>> Does anyone have another workable solution? >>>>>> >>>>>> >>>>>> Lennart. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Fri Nov 22 17:51:53 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 22 Nov 2013 17:51:53 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <528F1E1E.6050300@karssen.org> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> Message-ID: Great idea I know nothing of plink bin format, but many packages make use of it, so it should be not that complicated. Also plink is gnu GPL if I remember correctly so we can use the code if needed Y On Friday, November 22, 2013, L.C. Karssen wrote: > How difficult would it be to import .bed files [1] instead of the text > conversion? Given the binary data of both the .bed and the GenABEL > format, wouldn't conversion be much quicker? > > > Lennart. > > [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml > > > On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: > > Too slow, too difficult for the user, or both? :) > > > > On Friday, November 22, 2013, Maksim Struchalin wrote: > > > > Yes. Looks like it was a bad idea to use plink R-plugin for > > converting plink files to *ABEL format. > > Maksim > > > > On 18/11/2013 18:48, Yury Aulchenko wrote: > >> I would say that in principle DatABEL::text2databel is the > >> "natural" way to go from text-files to DatABEL-files > >> > >> The problem is that 'regular' text input may be allele by allele, > >> not genotype by genotype... (e.g. data are in format "A G", or > >> "A/G", not "0" or "1" or "2"). > >> > >> Y > >> > >> On Nov 15, 2013, at 17:48 PM, L.C. Karssen > > > >> wrote: > >> > >>> Hi Maksim, > >>> > >>> On 15-11-13 05:53, Maksim Struchalin wrote: > >>>> An easy way to write a function for conversion a plink format > >>>> file to a > >>>> GenABEL format file: > >>>> > >>>> Use plink support of 'plug-in' functions > >>> > >>> Nice find. I didn't know that existed. > >>> > >>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml > >>>> ). > >>>> This allows us > >>>> to write a simple R script (myscript.R) which is called by plink > >>>> (plink > >>>> --file mydata --R myscript.R). plink reads the file mydata > >>>> (which is in > >>>> plink format) and iteratively, SNP by SNP, trasfer all the data > to a > >>>> script myscript.R. This script contains a function > >>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO > >>>> variable) and store it in a *flv format through calling DatABEL > >>>> functions. > >>>> > >>>> The whole process of conversion will look like this: > >>>> > >>>> 1) User asks GenA convert plink file to GenA file > >>>> 2) GenA looks weather the plink is installed. If it is not > >>>> installed, > >>>> then GenA goes to a plink site and download/install it itself > >>>> (use an R > >>>> function "download.file" from "utils" package) > >>>> 3) GenA run a simple line: system('plink --file mydata --R > >>>> myscript.R') > >>>> 4) Rplink function (from myscript.R) gets every SNP and stote it > >>>> in *flv > >>>> format. This function creates an flv file and then open and > >>>> close it for > >>>> saving every single SNP. > >>>> 5) Work is Done > >>> > >>> I'm not sure how portable it is to download and run plink. Also, > the > >>> plink page says: Currently, there is only support for R-plugins for > >>> Linux-based and Mac OS PLINK distributions. > >>> > >>>> > >>>> The only issue is how fast the converssion will run: how much > >>>> time does > >>>> it take to open a filvector file, store one SNP and close it? I > >>>> can not > >>>> find a DatABEL R function for adding SNP to a flv file. Is there > a C > >>>> DatABEL function which can do it? > >>> > >>> Wouldn't it be easier/possible to use plink to export to text > >>> (.csv) and > >>> then use filevector's txt2fvf binary (of course this could be > >>> done from > >>> R using system())? > >>> > >>> I'm also wondering if going per SNP is really necessary. If I > >>> understand > >>> it correctly the R script (myscript.R) has to have a function > called: > >>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) > >>> where GENO is the matrix of genotypes. So we could write that into > a > >>> DatABEL file at once. Of course you may want to do this per > >>> chromosome > >>> to reduce memory consumption (not sure how plink/R would handle > large > >>> data sets). > >>> > > > > > > -- > > ----------------------------------------------------- > > Yurii S. Aulchenko > > > > [ LinkedIn ] [ Twitter > > ] [ Blog > > ] > > > > > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > -- ----------------------------------------------------- Yurii S. Aulchenko [ LinkedIn ] [ Twitter] [ Blog ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Sat Nov 23 15:12:56 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Sat, 23 Nov 2013 21:12:56 +0700 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <528F6B78.20004@karssen.org> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> Message-ID: <5290B7E8.7060901@mail.ru> Hi Lennart, https://app.box.com/s/iy41ug5qg4sbylul9oyn This is an example demonstrating how a GenABEL function "impute2databel" calls a function "iteratorDA" from DatABEL. Here, GenABEL is compiled without flv's and iterator's code (I deleted it from src). Could you run the test?: 0) Dowload the file test_GenABEL_iterator.tar.gz from https://app.box.com/s/iy41ug5qg4sbylul9oyn 1) decompress test_GenABEL_iterator.tar.gz 2) cd test_GenABEL_iterator 3) R CMD INSTALL DatABEL_0.9-4.tar.gz 4) R CMD INSTALL GenABEL_1.7-7.tar.gz 5) run test.R It works on my Ubuntu. If it works on your Ubuntu, win and mac, then we can delete from GenABEL the simlinks to flv and databel. best, ?????? On 22/11/2013 21:34, L.C. Karssen wrote: > Hi ??????, > > > On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >> Hi Lennart, >> >> I see you are improving your Russian :-). > Getting to know the Russian alphabet is step one :-). > >> I understand your arguments. I think we can combine our two approaches. >> 1) We make a so/dll from filevector and let it use by >> ProbABEL/OmicABEL/Another_not_R_softABEL. >> 2) For GenABEL and other R packages, we make a DatABEL. >> >> The code of filevector is the same both for 1) and 2). > But that doesn't solve the problem of having symlinks to the fvlib > directory in our SVN tree... Which means that any update to filevector > can make the depending package (DatABEL) become uncompilable. > > In the mean time I've set the first steps towards 'libfilevector' in > SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but > more polishing is needed. > > >> We only add >> preprocessor commands (#ifdef and so on) to surround R specific code >> (ISNAN() and std::isnan). In this case, compiler choose itself weather >> it buids the lib for R or for OS. >> >> If we will want to use only approach 1) for GenABEL in the future, we >> can quickly swith to it later. > True, for now this will work. > > > Best, > > Lennart. > >> best, >> Maksim >> >> >> >> On 19/11/2013 16:10, L.C. Karssen wrote: >>> Hi ??????, >>> >>> (trying a Russian keyboard layout, no idea if this works...). >>> >>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>> It seems that your solution is workable but I see little difference with >>>> what it is now. Now the filevector code is incorporated in each >>>> packages. >>> This is what I would like to change, indeed. Code that is reused by so >>> many packages should not be copied/symlinked into the code tree of those >>> packages. By symlinking it as we have now, there is no proper way of >>> specifying a version number of the filevector code. Which, in turn means >>> that if something changes in the filevector code all other packages need >>> to be changed immediately (just like what happened with your latest >>> change). If the filevector code have been a proper library we could have >>> simply said that ProbABEL still depends on the old filevector version >>> and take more time to make sure the two play nice together. >>> >>> Moreover, with the filevector code in a separate library the whole >>> isnan() issue would not be a problem. We could simply use std::isnan(), >>> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >>> When code is put in a library the internals don't matter as long as the >>> interface (function names + arguments) to the outside doesn't change. >>> >>>> You propose to follow the same way but pack filelvector code >>>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>>> same library. >>> Indeed. The problem with incorporating it all in DatABEL is that non-R >>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>> directory as well. Filevector is central to (almost) all packages in the >>> GenABEL suite, which is why I proposed to make a library out of it. And, >>> as noted above, this way packages can depend on different version of the >>> library. >>> >>> We can of course discuss whether we want to distribute this .so/.dll as >>> a separate (operating system) package or withing the R packages. To me >>> the first option is the 'correct' one, but I see that this may impose on >>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>> included in the R package). >>> >>> >>>> Last time I proposed to move filevector in DatABEL. All other packages >>>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>> DatABEL.so is loaded as well. >>> I think this is what should be done with the DAlib directory (another >>> symlinked dir). >>> >>>> Thus, you do not need to ask users to >>>> install additional lib because it is in DatABEL already. I think this is >>>> a workable approach that will allow us to delete the filevector code (or >>>> filevector so/dll) from all the packages. >>>> >>>> >>>> This is some quote from the R manual how to register functions to make >>>> it available from DatAB to GenAB: >>>> >>>> >>>> _______________________________________________ >>>> >>>> >>>> 5.4 Registering native routines >>>> >>>> By 'native' routine, we mean an entry point in compiled code. >>>> >>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>>> specified native routine by looking in the appropriate shared >>>> object/DLL. By default, R uses the operating system-specific dynamic >>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>> Alternatively, the author of the DLL can explicitly register routines >>>> with R and use a single, platform-independent mechanism for finding the >>>> routines in the DLL. One can use this registration mechanism to provide >>>> additional information about a routine, including the number and type of >>>> the arguments, and also make it available to R programmers under a >>>> different name. In the future, registration may be used to implement a >>>> form of "secure" or limited native access. >>>> >>>> _____________________________________________________ >>>> >>> Hmm, I will have to think about this. This seems to be about how R finds >>> out in which DLL a function is found (and maybe where the DLL is found >>> in the filesystem). I think this is separate from the point below, but >>> I'm not sure. >>> >>>> Your argument was from "5.8 Linking to other packages: It is not in >>>> general possible to link a DLL in package *packA* to a DLL provided by >>>> package *packB". *I do not quite understand what they mean under 'link'. >>>> May be the mean link a library during intsalltion? >>> Yes, as far as I understand, they mean linking to a library during >>> installation/compilation. >>> >>> >>> Best, >>> >>> Lennart. >>>> best, >>>> Maksim >>>> >>>> >>>> >>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>> Hi Maksim, >>>>> >>>>> Good question... The idea is to generate a .dll file for Windows, but >>>>> I'm not sure what would be the best way to distribute that. It would be >>>>> interesting to see how other packages do that. For example, the XML >>>>> package depends on libxml2: >>>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>>> package depends on libcurl: >>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>> >>>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>>> Linux you simply specify a dependency on a library, whereas for Windows >>>>> the actual .dll is in the package (which is quite logical because >>>>> Windows lacks the package repositories that most Linux distros have). >>>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>>> the .so file. >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Lennart. >>>>> >>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>> Hi Lennart, >>>>>> >>>>>> How the users under win will install such a library? >>>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>> Dear all, >>>>>>> >>>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>>> >>>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>>> (shared) library. >>>>>>> Does anyone have another workable solution? >>>>>>> >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Mon Nov 25 11:01:10 2013 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 25 Nov 2013 11:01:10 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <5290B7E8.7060901@mail.ru> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> <5290B7E8.7060901@mail.ru> Message-ID: <52931FE6.7020509@karssen.org> Hi Maksim, I've run the tests you described below on my Ubuntu installation and it works. I only changed the name of the output files so that I could compare them with the ones you created. The .fvd files are the same, but the .fvi files are not. Maybe the .fvi file contains a date/time? $ md5sum *dose.fv? 5de29707c2b2a09aa7e07569f35a565d TEST10x15_T.geno.dose.fvd 91c2b36a71bdfabe5a568d1c1c90a308 TEST10x15_T.geno.dose.fvi 5de29707c2b2a09aa7e07569f35a565d TEST10x15_T.geno.lck.dose.fvd e811a8c5184bd2c7cb64b1b3bdde7827 TEST10x15_T.geno.lck.dose.fvi Let me know if there is something else I can test. Best, Lennart. On 11/23/2013 03:12 PM, Maksim Struchalin wrote: > Hi Lennart, > > https://app.box.com/s/iy41ug5qg4sbylul9oyn > > This is an example demonstrating how a GenABEL function "impute2databel" > calls a function "iteratorDA" from DatABEL. Here, GenABEL is compiled > without flv's and iterator's code (I deleted it from src). > > Could you run the test?: > 0) Dowload the file test_GenABEL_iterator.tar.gz from > https://app.box.com/s/iy41ug5qg4sbylul9oyn > 1) decompress test_GenABEL_iterator.tar.gz > 2) cd test_GenABEL_iterator > 3) R CMD INSTALL DatABEL_0.9-4.tar.gz > 4) R CMD INSTALL GenABEL_1.7-7.tar.gz > 5) run test.R > > It works on my Ubuntu. If it works on your Ubuntu, win and mac, then we > can delete from GenABEL the simlinks to flv and databel. > > best, > ?????? > > > On 22/11/2013 21:34, L.C. Karssen wrote: >> Hi ??????, >> >> >> On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >>> Hi Lennart, >>> >>> I see you are improving your Russian :-). >> Getting to know the Russian alphabet is step one :-). >> >>> I understand your arguments. I think we can combine our two approaches. >>> 1) We make a so/dll from filevector and let it use by >>> ProbABEL/OmicABEL/Another_not_R_softABEL. >>> 2) For GenABEL and other R packages, we make a DatABEL. >>> >>> The code of filevector is the same both for 1) and 2). >> But that doesn't solve the problem of having symlinks to the fvlib >> directory in our SVN tree... Which means that any update to filevector >> can make the depending package (DatABEL) become uncompilable. >> >> In the mean time I've set the first steps towards 'libfilevector' in >> SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but >> more polishing is needed. >> >> >>> We only add >>> preprocessor commands (#ifdef and so on) to surround R specific code >>> (ISNAN() and std::isnan). In this case, compiler choose itself weather >>> it buids the lib for R or for OS. >>> >>> If we will want to use only approach 1) for GenABEL in the future, we >>> can quickly swith to it later. >> True, for now this will work. >> >> >> Best, >> >> Lennart. >> >>> best, >>> Maksim >>> >>> >>> >>> On 19/11/2013 16:10, L.C. Karssen wrote: >>>> Hi ??????, >>>> >>>> (trying a Russian keyboard layout, no idea if this works...). >>>> >>>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>>> It seems that your solution is workable but I see little difference with >>>>> what it is now. Now the filevector code is incorporated in each >>>>> packages. >>>> This is what I would like to change, indeed. Code that is reused by so >>>> many packages should not be copied/symlinked into the code tree of those >>>> packages. By symlinking it as we have now, there is no proper way of >>>> specifying a version number of the filevector code. Which, in turn means >>>> that if something changes in the filevector code all other packages need >>>> to be changed immediately (just like what happened with your latest >>>> change). If the filevector code have been a proper library we could have >>>> simply said that ProbABEL still depends on the old filevector version >>>> and take more time to make sure the two play nice together. >>>> >>>> Moreover, with the filevector code in a separate library the whole >>>> isnan() issue would not be a problem. We could simply use std::isnan(), >>>> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >>>> When code is put in a library the internals don't matter as long as the >>>> interface (function names + arguments) to the outside doesn't change. >>>> >>>>> You propose to follow the same way but pack filelvector code >>>>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>>>> same library. >>>> Indeed. The problem with incorporating it all in DatABEL is that non-R >>>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>>> directory as well. Filevector is central to (almost) all packages in the >>>> GenABEL suite, which is why I proposed to make a library out of it. And, >>>> as noted above, this way packages can depend on different version of the >>>> library. >>>> >>>> We can of course discuss whether we want to distribute this .so/.dll as >>>> a separate (operating system) package or withing the R packages. To me >>>> the first option is the 'correct' one, but I see that this may impose on >>>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>>> included in the R package). >>>> >>>> >>>>> Last time I proposed to move filevector in DatABEL. All other packages >>>>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>>> DatABEL.so is loaded as well. >>>> I think this is what should be done with the DAlib directory (another >>>> symlinked dir). >>>> >>>>> Thus, you do not need to ask users to >>>>> install additional lib because it is in DatABEL already. I think this is >>>>> a workable approach that will allow us to delete the filevector code (or >>>>> filevector so/dll) from all the packages. >>>>> >>>>> >>>>> This is some quote from the R manual how to register functions to make >>>>> it available from DatAB to GenAB: >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> >>>>> 5.4 Registering native routines >>>>> >>>>> By ?native? routine, we mean an entry point in compiled code. >>>>> >>>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>>>> specified native routine by looking in the appropriate shared >>>>> object/DLL. By default, R uses the operating system-specific dynamic >>>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>>> Alternatively, the author of the DLL can explicitly register routines >>>>> with R and use a single, platform-independent mechanism for finding the >>>>> routines in the DLL. One can use this registration mechanism to provide >>>>> additional information about a routine, including the number and type of >>>>> the arguments, and also make it available to R programmers under a >>>>> different name. In the future, registration may be used to implement a >>>>> form of ?secure? or limited native access. >>>>> >>>>> _____________________________________________________ >>>>> >>>> Hmm, I will have to think about this. This seems to be about how R finds >>>> out in which DLL a function is found (and maybe where the DLL is found >>>> in the filesystem). I think this is separate from the point below, but >>>> I'm not sure. >>>> >>>>> Your argument was from "5.8 Linking to other packages: It is not in >>>>> general possible to link a DLL in package *packA* to a DLL provided by >>>>> package *packB". *I do not quite understand what they mean under 'link'. >>>>> May be the mean link a library during intsalltion? >>>> Yes, as far as I understand, they mean linking to a library during >>>> installation/compilation. >>>> >>>> >>>> Best, >>>> >>>> Lennart. >>>>> best, >>>>> Maksim >>>>> >>>>> >>>>> >>>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>>> Hi Maksim, >>>>>> >>>>>> Good question... The idea is to generate a .dll file for Windows, but >>>>>> I'm not sure what would be the best way to distribute that. It would be >>>>>> interesting to see how other packages do that. For example, the XML >>>>>> package depends on libxml2: >>>>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>>>> package depends on libcurl: >>>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>>> >>>>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>>>> Linux you simply specify a dependency on a library, whereas for Windows >>>>>> the actual .dll is in the package (which is quite logical because >>>>>> Windows lacks the package repositories that most Linux distros have). >>>>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>>>> the .so file. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>>> Hi Lennart, >>>>>>> >>>>>>> How the users under win will install such a library? >>>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>>>> >>>>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>>>> (shared) library. >>>>>>>> Does anyone have another workable solution? >>>>>>>> >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Nov 25 12:26:55 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 25 Nov 2013 12:26:55 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <5290B7E8.7060901@mail.ru> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> <5290B7E8.7060901@mail.ru> Message-ID: <2C15F0F0-C9DA-4F65-A879-5961F7EBECC6@gmail.com> Tried this on Mac OS X, see below On Nov 23, 2013, at 15:12 PM, Maksim Struchalin wrote: > Hi Lennart, > > https://app.box.com/s/iy41ug5qg4sbylul9oyn > > This is an example demonstrating how a GenABEL function "impute2databel" calls a function "iteratorDA" from DatABEL. Here, GenABEL is compiled without flv's and iterator's code (I deleted it from src). > > Could you run the test?: > 0) Dowload the file test_GenABEL_iterator.tar.gz from https://app.box.com/s/iy41ug5qg4sbylul9oyn > 1) decompress test_GenABEL_iterator.tar.gz > 2) cd test_GenABEL_iterator > 3) R CMD INSTALL DatABEL_0.9-4.tar.gz > 4) R CMD INSTALL GenABEL_1.7-7.tar.gz getting many warnings at step (4) > 5) run test.R getting > x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", out="TEST10x15_T.geno", makeprob=FALSE, old=TRUE) Loading required package: DatABEL DatABEL v.0.9-4 (March 12, 2013) loaded Options in effect: --infile = TEST10x15.geno --outfile = ./tmp333314 --skiprows = OFF --skipcols = 5 --cnrow = OFF --rncol = ON, using column 2 of 'TEST10x15.geno' --transpose = ON --Rmatrix = OFF --nanString = NA Number of lines in source file is 10 Number of words in source file is 20 skiprows = 0 cnrow = 0 skipcols = 5 rncol = 2 Rmatrix = 0 numWords = 20 Creating file with numRows = 10 Creating file with numColumns = 15 text2fvf finished. File 'TEST10x15_T.geno.dose' already exists. ERROR in Rstuff:failed in ini_empty_FileMatrix_RError in !result : invalid argument type Calls: impute2databel -> apply2dfo -> make_empty_fvf In addition: Warning message: In uninames(.Object at data) : uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names Execution halted > > It works on my Ubuntu. If it works on your Ubuntu, win and mac, then we can delete from GenABEL the simlinks to flv and databel. > > best, > ?????? > > > On 22/11/2013 21:34, L.C. Karssen wrote: >> Hi ??????, >> >> >> On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >>> Hi Lennart, >>> >>> I see you are improving your Russian :-). >> Getting to know the Russian alphabet is step one :-). >> >>> I understand your arguments. I think we can combine our two approaches. >>> 1) We make a so/dll from filevector and let it use by >>> ProbABEL/OmicABEL/Another_not_R_softABEL. >>> 2) For GenABEL and other R packages, we make a DatABEL. >>> >>> The code of filevector is the same both for 1) and 2). >> But that doesn't solve the problem of having symlinks to the fvlib >> directory in our SVN tree... Which means that any update to filevector >> can make the depending package (DatABEL) become uncompilable. >> >> In the mean time I've set the first steps towards 'libfilevector' in >> SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but >> more polishing is needed. >> >> >>> We only add >>> preprocessor commands (#ifdef and so on) to surround R specific code >>> (ISNAN() and std::isnan). In this case, compiler choose itself weather >>> it buids the lib for R or for OS. >>> >>> If we will want to use only approach 1) for GenABEL in the future, we >>> can quickly swith to it later. >> True, for now this will work. >> >> >> Best, >> >> Lennart. >> >>> best, >>> Maksim >>> >>> >>> >>> On 19/11/2013 16:10, L.C. Karssen wrote: >>>> Hi ??????, >>>> >>>> (trying a Russian keyboard layout, no idea if this works...). >>>> >>>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>>> It seems that your solution is workable but I see little difference with >>>>> what it is now. Now the filevector code is incorporated in each >>>>> packages. >>>> This is what I would like to change, indeed. Code that is reused by so >>>> many packages should not be copied/symlinked into the code tree of those >>>> packages. By symlinking it as we have now, there is no proper way of >>>> specifying a version number of the filevector code. Which, in turn means >>>> that if something changes in the filevector code all other packages need >>>> to be changed immediately (just like what happened with your latest >>>> change). If the filevector code have been a proper library we could have >>>> simply said that ProbABEL still depends on the old filevector version >>>> and take more time to make sure the two play nice together. >>>> >>>> Moreover, with the filevector code in a separate library the whole >>>> isnan() issue would not be a problem. We could simply use std::isnan(), >>>> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >>>> When code is put in a library the internals don't matter as long as the >>>> interface (function names + arguments) to the outside doesn't change. >>>> >>>>> You propose to follow the same way but pack filelvector code >>>>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>>>> same library. >>>> Indeed. The problem with incorporating it all in DatABEL is that non-R >>>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>>> directory as well. Filevector is central to (almost) all packages in the >>>> GenABEL suite, which is why I proposed to make a library out of it. And, >>>> as noted above, this way packages can depend on different version of the >>>> library. >>>> >>>> We can of course discuss whether we want to distribute this .so/.dll as >>>> a separate (operating system) package or withing the R packages. To me >>>> the first option is the 'correct' one, but I see that this may impose on >>>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>>> included in the R package). >>>> >>>> >>>>> Last time I proposed to move filevector in DatABEL. All other packages >>>>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>>> DatABEL.so is loaded as well. >>>> I think this is what should be done with the DAlib directory (another >>>> symlinked dir). >>>> >>>>> Thus, you do not need to ask users to >>>>> install additional lib because it is in DatABEL already. I think this is >>>>> a workable approach that will allow us to delete the filevector code (or >>>>> filevector so/dll) from all the packages. >>>>> >>>>> >>>>> This is some quote from the R manual how to register functions to make >>>>> it available from DatAB to GenAB: >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> >>>>> 5.4 Registering native routines >>>>> >>>>> By ?native? routine, we mean an entry point in compiled code. >>>>> >>>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>>>> specified native routine by looking in the appropriate shared >>>>> object/DLL. By default, R uses the operating system-specific dynamic >>>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>>> Alternatively, the author of the DLL can explicitly register routines >>>>> with R and use a single, platform-independent mechanism for finding the >>>>> routines in the DLL. One can use this registration mechanism to provide >>>>> additional information about a routine, including the number and type of >>>>> the arguments, and also make it available to R programmers under a >>>>> different name. In the future, registration may be used to implement a >>>>> form of ?secure? or limited native access. >>>>> >>>>> _____________________________________________________ >>>>> >>>> Hmm, I will have to think about this. This seems to be about how R finds >>>> out in which DLL a function is found (and maybe where the DLL is found >>>> in the filesystem). I think this is separate from the point below, but >>>> I'm not sure. >>>> >>>>> Your argument was from "5.8 Linking to other packages: It is not in >>>>> general possible to link a DLL in package *packA* to a DLL provided by >>>>> package *packB". *I do not quite understand what they mean under 'link'. >>>>> May be the mean link a library during intsalltion? >>>> Yes, as far as I understand, they mean linking to a library during >>>> installation/compilation. >>>> >>>> >>>> Best, >>>> >>>> Lennart. >>>>> best, >>>>> Maksim >>>>> >>>>> >>>>> >>>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>>> Hi Maksim, >>>>>> >>>>>> Good question... The idea is to generate a .dll file for Windows, but >>>>>> I'm not sure what would be the best way to distribute that. It would be >>>>>> interesting to see how other packages do that. For example, the XML >>>>>> package depends on libxml2: >>>>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>>>> package depends on libcurl: >>>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>>> >>>>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>>>> Linux you simply specify a dependency on a library, whereas for Windows >>>>>> the actual .dll is in the package (which is quite logical because >>>>>> Windows lacks the package repositories that most Linux distros have). >>>>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>>>> the .so file. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>>> Hi Lennart, >>>>>>> >>>>>>> How the users under win will install such a library? >>>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>>>> >>>>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>>>> (shared) library. >>>>>>>> Does anyone have another workable solution? >>>>>>>> >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Mon Nov 25 13:23:59 2013 From: lennart at karssen.org (L.C. Karssen) Date: Mon, 25 Nov 2013 13:23:59 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <2C15F0F0-C9DA-4F65-A879-5961F7EBECC6@gmail.com> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> <5290B7E8.7060901@mail.ru> <2C15F0F0-C9DA-4F65-A879-5961F7EBECC6@gmail.com> Message-ID: <5293415F.1090201@karssen.org> I had the same issues as Yurii reported: - compilation warnings when installing - the 'file already exists' warning: this was one of the reasons why I renamed the filename in the 'out=' option in test.R - I also got the warning about not-unique column names, but I assumed that wasn't the issue we were trying to fix here, so I didn't mention it. Yurii, does the script finish when you change the output file name in test.R? Lennart. On 11/25/2013 12:26 PM, Yury Aulchenko wrote: > Tried this on Mac OS X, see below > > On Nov 23, 2013, at 15:12 PM, Maksim Struchalin > wrote: > >> Hi Lennart, >> >> https://app.box.com/s/iy41ug5qg4sbylul9oyn >> >> This is an example demonstrating how a GenABEL function >> "impute2databel" calls a function "iteratorDA" from DatABEL. Here, >> GenABEL is compiled without flv's and iterator's code (I deleted it >> from src). >> >> Could you run the test?: >> 0) Dowload the file test_GenABEL_iterator.tar.gz from >> https://app.box.com/s/iy41ug5qg4sbylul9oyn >> 1) decompress test_GenABEL_iterator.tar.gz >> 2) cd test_GenABEL_iterator >> 3) R CMD INSTALL DatABEL_0.9-4.tar.gz >> 4) R CMD INSTALL GenABEL_1.7-7.tar.gz > > getting many warnings at step (4) > >> 5) run test.R > > getting > >> x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", > out="TEST10x15_T.geno", makeprob=FALSE, old=TRUE) > Loading required package: DatABEL > DatABEL v.0.9-4 (March 12, 2013) loaded > > Options in effect: > --infile = TEST10x15.geno > --outfile = ./tmp333314 > --skiprows = OFF > --skipcols = 5 > --cnrow = OFF > --rncol = ON, using column 2 of 'TEST10x15.geno' > --transpose = ON > --Rmatrix = OFF > --nanString = NA > Number of lines in source file is 10 > Number of words in source file is 20 > skiprows = 0 > cnrow = 0 > skipcols = 5 > rncol = 2 > Rmatrix = 0 > numWords = 20 > Creating file with numRows = 10 > Creating file with numColumns = 15 > text2fvf finished. > File 'TEST10x15_T.geno.dose' already exists. > ERROR in Rstuff:failed in ini_empty_FileMatrix_RError in !result : > invalid argument type > Calls: impute2databel -> apply2dfo -> make_empty_fvf > In addition: Warning message: > In uninames(.Object at data) : > uninames: some column names are not unique; use > set_dimnames/get_dimnames for non-unique row/col names > Execution halted > > >> >> It works on my Ubuntu. If it works on your Ubuntu, win and mac, then >> we can delete from GenABEL the simlinks to flv and databel. >> >> best, >> ?????? >> >> >> On 22/11/2013 21:34, L.C. Karssen wrote: >>> Hi ??????, >>> >>> >>> On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >>>> Hi Lennart, >>>> >>>> I see you are improving your Russian :-). >>> Getting to know the Russian alphabet is step one :-). >>> >>>> I understand your arguments. I think we can combine our two approaches. >>>> 1) We make a so/dll from filevector and let it use by >>>> ProbABEL/OmicABEL/Another_not_R_softABEL. >>>> 2) For GenABEL and other R packages, we make a DatABEL. >>>> >>>> The code of filevector is the same both for 1) and 2). >>> But that doesn't solve the problem of having symlinks to the fvlib >>> directory in our SVN tree... Which means that any update to filevector >>> can make the depending package (DatABEL) become uncompilable. >>> >>> In the mean time I've set the first steps towards 'libfilevector' in >>> SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but >>> more polishing is needed. >>> >>> >>>> We only add >>>> preprocessor commands (#ifdef and so on) to surround R specific code >>>> (ISNAN() and std::isnan). In this case, compiler choose itself weather >>>> it buids the lib for R or for OS. >>>> >>>> If we will want to use only approach 1) for GenABEL in the future, we >>>> can quickly swith to it later. >>> True, for now this will work. >>> >>> >>> Best, >>> >>> Lennart. >>> >>>> best, >>>> Maksim >>>> >>>> >>>> >>>> On 19/11/2013 16:10, L.C. Karssen wrote: >>>>> Hi ??????, >>>>> >>>>> (trying a Russian keyboard layout, no idea if this works...). >>>>> >>>>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>>>> It seems that your solution is workable but I see little difference with >>>>>> what it is now. Now the filevector code is incorporated in each >>>>>> packages. >>>>> This is what I would like to change, indeed. Code that is reused by so >>>>> many packages should not be copied/symlinked into the code tree of those >>>>> packages. By symlinking it as we have now, there is no proper way of >>>>> specifying a version number of the filevector code. Which, in turn means >>>>> that if something changes in the filevector code all other packages need >>>>> to be changed immediately (just like what happened with your latest >>>>> change). If the filevector code have been a proper library we could have >>>>> simply said that ProbABEL still depends on the old filevector version >>>>> and take more time to make sure the two play nice together. >>>>> >>>>> Moreover, with the filevector code in a separate library the whole >>>>> isnan() issue would not be a problem. We could simply use std::isnan(), >>>>> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >>>>> When code is put in a library the internals don't matter as long as the >>>>> interface (function names + arguments) to the outside doesn't change. >>>>> >>>>>> You propose to follow the same way but pack filelvector code >>>>>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>>>>> same library. >>>>> Indeed. The problem with incorporating it all in DatABEL is that non-R >>>>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>>>> directory as well. Filevector is central to (almost) all packages in the >>>>> GenABEL suite, which is why I proposed to make a library out of it. And, >>>>> as noted above, this way packages can depend on different version of the >>>>> library. >>>>> >>>>> We can of course discuss whether we want to distribute this .so/.dll as >>>>> a separate (operating system) package or withing the R packages. To me >>>>> the first option is the 'correct' one, but I see that this may impose on >>>>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>>>> included in the R package). >>>>> >>>>> >>>>>> Last time I proposed to move filevector in DatABEL. All other packages >>>>>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>>>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>>>> DatABEL.so is loaded as well. >>>>> I think this is what should be done with the DAlib directory (another >>>>> symlinked dir). >>>>> >>>>>> Thus, you do not need to ask users to >>>>>> install additional lib because it is in DatABEL already. I think this is >>>>>> a workable approach that will allow us to delete the filevector code (or >>>>>> filevector so/dll) from all the packages. >>>>>> >>>>>> >>>>>> This is some quote from the R manual how to register functions to make >>>>>> it available from DatAB to GenAB: >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>>> >>>>>> 5.4 Registering native routines >>>>>> >>>>>> By ?native? routine, we mean an entry point in compiled code. >>>>>> >>>>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>>>>> specified native routine by looking in the appropriate shared >>>>>> object/DLL. By default, R uses the operating system-specific dynamic >>>>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>>>> Alternatively, the author of the DLL can explicitly register routines >>>>>> with R and use a single, platform-independent mechanism for finding the >>>>>> routines in the DLL. One can use this registration mechanism to provide >>>>>> additional information about a routine, including the number and type of >>>>>> the arguments, and also make it available to R programmers under a >>>>>> different name. In the future, registration may be used to implement a >>>>>> form of ?secure? or limited native access. >>>>>> >>>>>> _____________________________________________________ >>>>>> >>>>> Hmm, I will have to think about this. This seems to be about how R finds >>>>> out in which DLL a function is found (and maybe where the DLL is found >>>>> in the filesystem). I think this is separate from the point below, but >>>>> I'm not sure. >>>>> >>>>>> Your argument was from "5.8 Linking to other packages: It is not in >>>>>> general possible to link a DLL in package *packA* to a DLL provided by >>>>>> package *packB". *I do not quite understand what they mean under 'link'. >>>>>> May be the mean link a library during intsalltion? >>>>> Yes, as far as I understand, they mean linking to a library during >>>>> installation/compilation. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>>> best, >>>>>> Maksim >>>>>> >>>>>> >>>>>> >>>>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>>>> Hi Maksim, >>>>>>> >>>>>>> Good question... The idea is to generate a .dll file for Windows, but >>>>>>> I'm not sure what would be the best way to distribute that. It would be >>>>>>> interesting to see how other packages do that. For example, the XML >>>>>>> package depends on libxml2: >>>>>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>>>>> package depends on libcurl: >>>>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>>>> >>>>>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>>>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>>>>> Linux you simply specify a dependency on a library, whereas for Windows >>>>>>> the actual .dll is in the package (which is quite logical because >>>>>>> Windows lacks the package repositories that most Linux distros have). >>>>>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>>>>> the .so file. >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>>>> Hi Lennart, >>>>>>>> >>>>>>>> How the users under win will install such a library? >>>>>>>> >>>>>>>> best, >>>>>>>> Maksim >>>>>>>> >>>>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>>>>> >>>>>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>>>>> (shared) library. >>>>>>>>> Does anyone have another workable solution? >>>>>>>>> >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Mon Nov 25 13:28:53 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Mon, 25 Nov 2013 13:28:53 +0100 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: <5293415F.1090201@karssen.org> References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> <5290B7E8.7060901@mail.ru> <2C15F0F0-C9DA-4F65-A879-5961F7EBECC6@gmail.com> <5293415F.1090201@karssen.org> Message-ID: after renaming out-file-name I get (finished with warnings): > x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", out="TEST10x15_T1.geno", makeprob=FALSE, old=TRUE) Loading required package: DatABEL DatABEL v.0.9-4 (March 12, 2013) loaded Options in effect: --infile = TEST10x15.geno --outfile = ./tmp300070 --skiprows = OFF --skipcols = 5 --cnrow = OFF --rncol = ON, using column 2 of 'TEST10x15.geno' --transpose = ON --Rmatrix = OFF --nanString = NA Number of lines in source file is 10 Number of words in source file is 20 skiprows = 0 cnrow = 0 skipcols = 5 rncol = 2 Rmatrix = 0 numWords = 20 Creating file with numRows = 10 Creating file with numColumns = 15 text2fvf finished. Loss of precision / loss of data during conversion from DOUBLE to FLOAT. Futher conversion warnings omitted. Read 4 items Read 4 items Read 4 items Read 20 items Warning messages: 1: In uninames(.Object at data) : uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names 2: In uninames(x at data) : uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names 3: In uninames(x at data) : uninames: some column names are not unique; use set_dimnames/get_dimnames for non-unique row/col names > On Nov 25, 2013, at 13:23 PM, L.C. Karssen wrote: > I had the same issues as Yurii reported: > - compilation warnings when installing > - the 'file already exists' warning: this was one of the reasons why I > renamed the filename in the 'out=' option in test.R > - I also got the warning about not-unique column names, but I assumed > that wasn't the issue we were trying to fix here, so I didn't mention it. > > Yurii, does the script finish when you change the output file name in > test.R? > > > > Lennart. > > On 11/25/2013 12:26 PM, Yury Aulchenko wrote: >> Tried this on Mac OS X, see below >> >> On Nov 23, 2013, at 15:12 PM, Maksim Struchalin > > wrote: >> >>> Hi Lennart, >>> >>> https://app.box.com/s/iy41ug5qg4sbylul9oyn >>> >>> This is an example demonstrating how a GenABEL function >>> "impute2databel" calls a function "iteratorDA" from DatABEL. Here, >>> GenABEL is compiled without flv's and iterator's code (I deleted it >>> from src). >>> >>> Could you run the test?: >>> 0) Dowload the file test_GenABEL_iterator.tar.gz from >>> https://app.box.com/s/iy41ug5qg4sbylul9oyn >>> 1) decompress test_GenABEL_iterator.tar.gz >>> 2) cd test_GenABEL_iterator >>> 3) R CMD INSTALL DatABEL_0.9-4.tar.gz >>> 4) R CMD INSTALL GenABEL_1.7-7.tar.gz >> >> getting many warnings at step (4) >> >>> 5) run test.R >> >> getting >> >>> x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", >> out="TEST10x15_T.geno", makeprob=FALSE, old=TRUE) >> Loading required package: DatABEL >> DatABEL v.0.9-4 (March 12, 2013) loaded >> >> Options in effect: >> --infile = TEST10x15.geno >> --outfile = ./tmp333314 >> --skiprows = OFF >> --skipcols = 5 >> --cnrow = OFF >> --rncol = ON, using column 2 of 'TEST10x15.geno' >> --transpose = ON >> --Rmatrix = OFF >> --nanString = NA >> Number of lines in source file is 10 >> Number of words in source file is 20 >> skiprows = 0 >> cnrow = 0 >> skipcols = 5 >> rncol = 2 >> Rmatrix = 0 >> numWords = 20 >> Creating file with numRows = 10 >> Creating file with numColumns = 15 >> text2fvf finished. >> File 'TEST10x15_T.geno.dose' already exists. >> ERROR in Rstuff:failed in ini_empty_FileMatrix_RError in !result : >> invalid argument type >> Calls: impute2databel -> apply2dfo -> make_empty_fvf >> In addition: Warning message: >> In uninames(.Object at data) : >> uninames: some column names are not unique; use >> set_dimnames/get_dimnames for non-unique row/col names >> Execution halted >> >> >>> >>> It works on my Ubuntu. If it works on your Ubuntu, win and mac, then >>> we can delete from GenABEL the simlinks to flv and databel. >>> >>> best, >>> ?????? >>> >>> >>> On 22/11/2013 21:34, L.C. Karssen wrote: >>>> Hi ??????, >>>> >>>> >>>> On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >>>>> Hi Lennart, >>>>> >>>>> I see you are improving your Russian :-). >>>> Getting to know the Russian alphabet is step one :-). >>>> >>>>> I understand your arguments. I think we can combine our two approaches. >>>>> 1) We make a so/dll from filevector and let it use by >>>>> ProbABEL/OmicABEL/Another_not_R_softABEL. >>>>> 2) For GenABEL and other R packages, we make a DatABEL. >>>>> >>>>> The code of filevector is the same both for 1) and 2). >>>> But that doesn't solve the problem of having symlinks to the fvlib >>>> directory in our SVN tree... Which means that any update to filevector >>>> can make the depending package (DatABEL) become uncompilable. >>>> >>>> In the mean time I've set the first steps towards 'libfilevector' in >>>> SVN, see commits 1415 and 1416. This works (at least for ProbABEL), but >>>> more polishing is needed. >>>> >>>> >>>>> We only add >>>>> preprocessor commands (#ifdef and so on) to surround R specific code >>>>> (ISNAN() and std::isnan). In this case, compiler choose itself weather >>>>> it buids the lib for R or for OS. >>>>> >>>>> If we will want to use only approach 1) for GenABEL in the future, we >>>>> can quickly swith to it later. >>>> True, for now this will work. >>>> >>>> >>>> Best, >>>> >>>> Lennart. >>>> >>>>> best, >>>>> Maksim >>>>> >>>>> >>>>> >>>>> On 19/11/2013 16:10, L.C. Karssen wrote: >>>>>> Hi ??????, >>>>>> >>>>>> (trying a Russian keyboard layout, no idea if this works...). >>>>>> >>>>>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>>>>> It seems that your solution is workable but I see little difference with >>>>>>> what it is now. Now the filevector code is incorporated in each >>>>>>> packages. >>>>>> This is what I would like to change, indeed. Code that is reused by so >>>>>> many packages should not be copied/symlinked into the code tree of those >>>>>> packages. By symlinking it as we have now, there is no proper way of >>>>>> specifying a version number of the filevector code. Which, in turn means >>>>>> that if something changes in the filevector code all other packages need >>>>>> to be changed immediately (just like what happened with your latest >>>>>> change). If the filevector code have been a proper library we could have >>>>>> simply said that ProbABEL still depends on the old filevector version >>>>>> and take more time to make sure the two play nice together. >>>>>> >>>>>> Moreover, with the filevector code in a separate library the whole >>>>>> isnan() issue would not be a problem. We could simply use std::isnan(), >>>>>> because CRAN wouldn't need to compile the .so/.dll, so no need of ISNAN(). >>>>>> When code is put in a library the internals don't matter as long as the >>>>>> interface (function names + arguments) to the outside doesn't change. >>>>>> >>>>>>> You propose to follow the same way but pack filelvector code >>>>>>> in one file (dll or so) and distribute 9 packages form GenABEL with the >>>>>>> same library. >>>>>> Indeed. The problem with incorporating it all in DatABEL is that non-R >>>>>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>>>>> directory as well. Filevector is central to (almost) all packages in the >>>>>> GenABEL suite, which is why I proposed to make a library out of it. And, >>>>>> as noted above, this way packages can depend on different version of the >>>>>> library. >>>>>> >>>>>> We can of course discuss whether we want to distribute this .so/.dll as >>>>>> a separate (operating system) package or withing the R packages. To me >>>>>> the first option is the 'correct' one, but I see that this may impose on >>>>>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>>>>> included in the R package). >>>>>> >>>>>> >>>>>>> Last time I proposed to move filevector in DatABEL. All other packages >>>>>>> (GenA and so on) will load DatAB in R and use filevector fucntions from >>>>>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>>>>> DatABEL.so is loaded as well. >>>>>> I think this is what should be done with the DAlib directory (another >>>>>> symlinked dir). >>>>>> >>>>>>> Thus, you do not need to ask users to >>>>>>> install additional lib because it is in DatABEL already. I think this is >>>>>>> a workable approach that will allow us to delete the filevector code (or >>>>>>> filevector so/dll) from all the packages. >>>>>>> >>>>>>> >>>>>>> This is some quote from the R manual how to register functions to make >>>>>>> it available from DatAB to GenAB: >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>>> >>>>>>> 5.4 Registering native routines >>>>>>> >>>>>>> By ?native? routine, we mean an entry point in compiled code. >>>>>>> >>>>>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must locate the >>>>>>> specified native routine by looking in the appropriate shared >>>>>>> object/DLL. By default, R uses the operating system-specific dynamic >>>>>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>>>>> Alternatively, the author of the DLL can explicitly register routines >>>>>>> with R and use a single, platform-independent mechanism for finding the >>>>>>> routines in the DLL. One can use this registration mechanism to provide >>>>>>> additional information about a routine, including the number and type of >>>>>>> the arguments, and also make it available to R programmers under a >>>>>>> different name. In the future, registration may be used to implement a >>>>>>> form of ?secure? or limited native access. >>>>>>> >>>>>>> _____________________________________________________ >>>>>>> >>>>>> Hmm, I will have to think about this. This seems to be about how R finds >>>>>> out in which DLL a function is found (and maybe where the DLL is found >>>>>> in the filesystem). I think this is separate from the point below, but >>>>>> I'm not sure. >>>>>> >>>>>>> Your argument was from "5.8 Linking to other packages: It is not in >>>>>>> general possible to link a DLL in package *packA* to a DLL provided by >>>>>>> package *packB". *I do not quite understand what they mean under 'link'. >>>>>>> May be the mean link a library during intsalltion? >>>>>> Yes, as far as I understand, they mean linking to a library during >>>>>> installation/compilation. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart. >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>>>>> Hi Maksim, >>>>>>>> >>>>>>>> Good question... The idea is to generate a .dll file for Windows, but >>>>>>>> I'm not sure what would be the best way to distribute that. It would be >>>>>>>> interesting to see how other packages do that. For example, the XML >>>>>>>> package depends on libxml2: >>>>>>>> http://cran.r-project.org/web/packages/XML/index.html and the Rcurl >>>>>>>> package depends on libcurl: >>>>>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>>>>> >>>>>>>> In the XML package .zip file for Windows there is a directory libs/x64 >>>>>>>> and a directory libs/i386. Both contain XML.dll, so I think that for >>>>>>>> Linux you simply specify a dependency on a library, whereas for Windows >>>>>>>> the actual .dll is in the package (which is quite logical because >>>>>>>> Windows lacks the package repositories that most Linux distros have). >>>>>>>> It seems that for MacOS the .tgz file also contains a lib directory with >>>>>>>> the .so file. >>>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>>>>> Hi Lennart, >>>>>>>>> >>>>>>>>> How the users under win will install such a library? >>>>>>>>> >>>>>>>>> best, >>>>>>>>> Maksim >>>>>>>>> >>>>>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>>>>> Dear all, >>>>>>>>>> >>>>>>>>>> The Jenkins setup already shows its value: After Maksim changed the call >>>>>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an automatic build >>>>>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is an R function). >>>>>>>>>> >>>>>>>>>> I guess this is one more reason to try to convert fvlib into a real >>>>>>>>>> (shared) library. >>>>>>>>>> Does anyone have another workable solution? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Lennart. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Mon Nov 25 15:39:21 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Mon, 25 Nov 2013 21:39:21 +0700 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> Message-ID: <52936119.5070704@mail.ru> I checked the read.plink from snpMatrix (Nicola) and snpStats (Maarten). I see that the code under them is quite simple (~40 lines of c code under snpMatrix read.plink). The bed plink format is very similar to GenABEL format (http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml). Looks like that the main difference between them is that the plink bed file has first 3 bytes with some special meaning. The other bytes store genotypes (0, 1, 2 or NA) in 2 bits per genotype (like in GenA). I think it would be easy just to write a C function which convert bed to databel format. Also, we can think about making the bed as the format which is nativelly supported by genabel. For this, we only need a function which extract an array from bed and make iterator to use this function. best, Maksim On 22/11/2013 23:51, Yurii Aulchenko wrote: > Great idea > > I know nothing of plink bin format, but many packages make use of it, > so it should be not that complicated. Also plink is gnu GPL if I > remember correctly so we can use the code if needed > > Y > > On Friday, November 22, 2013, L.C. Karssen wrote: > > How difficult would it be to import .bed files [1] instead of the text > conversion? Given the binary data of both the .bed and the GenABEL > format, wouldn't conversion be much quicker? > > > Lennart. > > [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml > > > > On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: > > Too slow, too difficult for the user, or both? :) > > > > On Friday, November 22, 2013, Maksim Struchalin wrote: > > > > Yes. Looks like it was a bad idea to use plink R-plugin for > > converting plink files to *ABEL format. > > Maksim > > > > On 18/11/2013 18:48, Yury Aulchenko wrote: > >> I would say that in principle DatABEL::text2databel is the > >> "natural" way to go from text-files to DatABEL-files > >> > >> The problem is that 'regular' text input may be allele by > allele, > >> not genotype by genotype... (e.g. data are in format "A G", or > >> "A/G", not "0" or "1" or "2"). > >> > >> Y > >> > >> On Nov 15, 2013, at 17:48 PM, L.C. Karssen > > > >> wrote: > >> > >>> Hi Maksim, > >>> > >>> On 15-11-13 05:53, Maksim Struchalin wrote: > >>>> An easy way to write a function for conversion a plink format > >>>> file to a > >>>> GenABEL format file: > >>>> > >>>> Use plink support of 'plug-in' functions > >>> > >>> Nice find. I didn't know that existed. > >>> > >>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml > > >>>> ). > >>>> This allows us > >>>> to write a simple R script (myscript.R) which is called > by plink > >>>> (plink > >>>> --file mydata --R myscript.R). plink reads the file mydata > >>>> (which is in > >>>> plink format) and iteratively, SNP by SNP, trasfer all > the data to a > >>>> script myscript.R. This script contains a function > >>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every > SNP (GENO > >>>> variable) and store it in a *flv format through calling > DatABEL > >>>> functions. > >>>> > >>>> The whole process of conversion will look like this: > >>>> > >>>> 1) User asks GenA convert plink file to GenA file > >>>> 2) GenA looks weather the plink is installed. If it is not > >>>> installed, > >>>> then GenA goes to a plink site and download/install it itself > >>>> (use an R > >>>> function "download.file" from "utils" package) > >>>> 3) GenA run a simple line: system('plink --file mydata --R > >>>> myscript.R') > >>>> 4) Rplink function (from myscript.R) gets every SNP and > stote it > >>>> in *flv > >>>> format. This function creates an flv file and then open and > >>>> close it for > >>>> saving every single SNP. > >>>> 5) Work is Done > >>> > >>> I'm not sure how portable it is to download and run plink. > Also, the > >>> plink page says: Currently, there is only support for > R-plugins for > >>> Linux-based and Mac OS PLINK distributions. > >>> > >>>> > >>>> The only issue is how fast the converssion will run: how much > >>>> time does > >>>> it take to open a filvector file, store one SNP and close > it? I > >>>> can not > >>>> find a DatABEL R function for adding SNP to a flv file. > Is there a C > >>>> DatABEL function which can do it? > >>> > >>> Wouldn't it be easier/possible to use plink to export to text > >>> (.csv) and > >>> then use filevector's txt2fvf binary (of course this could be > >>> done from > >>> R using system())? > >>> > >>> I'm also wondering if going per SNP is really necessary. If I > >>> understand > >>> it correctly the R script (myscript.R) has to have a > function called: > >>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) > >>> where GENO is the matrix of genotypes. So we could write > that into a > >>> DatABEL file at once. Of course you may want to do this per > >>> chromosome > >>> to reduce memory consumption (not sure how plink/R would > handle large > >>> data sets). > >>> > > > > > > -- > > ----------------------------------------------------- > > Yurii S. Aulchenko > > > > [ LinkedIn ] [ Twitter > > ] [ Blog > > ] > > > > > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > > -- > ----------------------------------------------------- > Yurii S. Aulchenko > > [ LinkedIn ] [ Twitter > ] [ Blog > ] > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Mon Nov 25 19:21:57 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Tue, 26 Nov 2013 01:21:57 +0700 Subject: [GenABEL-dev] ISNAN problem in filevector found by Jenkins In-Reply-To: References: <528B176B.2060803@karssen.org> <528B19A3.5090203@mail.ru> <528B1DF7.90803@karssen.org> <528B24FD.2070000@mail.ru> <528B2AF3.3030903@karssen.org> <528B72E7.4080506@mail.ru> <528F6B78.20004@karssen.org> <5290B7E8.7060901@mail.ru> <2C15F0F0-C9DA-4F65-A879-5961F7EBECC6@gmail.com> <5293415F.1090201@karssen.org> Message-ID: <52939545.5070506@mail.ru> That's good. The warnings are because of that the snp name in the second line of TEST10x15.geno file is the same as in the first one. Fixed in https://app.box.com/s/6irjkikm3mukwvtk1wgw version. Can you download it and run once again? BTW1: The versions of GenABEL and DatABEL here are old (I generated this example a week ago). That's why there are some warnings on other functions. BTW2: Remove the files TEST10x15_T.geno.dose.fvd and TEST10x15_T.geno.dose.fvi after each run. Otherwise you get an error message (File 'TEST10x15_T.geno.dose' already exists.). best, Maksim On 25/11/2013 19:28, Yury Aulchenko wrote: > after renaming out-file-name I get (finished with warnings): > > > x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", > out="TEST10x15_T1.geno", makeprob=FALSE, old=TRUE) > Loading required package: DatABEL > DatABEL v.0.9-4 (March 12, 2013) loaded > > Options in effect: > --infile = TEST10x15.geno > --outfile = ./tmp300070 > --skiprows = OFF > --skipcols = 5 > --cnrow = OFF > --rncol = ON, using column 2 of 'TEST10x15.geno' > --transpose = ON > --Rmatrix = OFF > --nanString = NA > Number of lines in source file is 10 > Number of words in source file is 20 > skiprows = 0 > cnrow = 0 > skipcols = 5 > rncol = 2 > Rmatrix = 0 > numWords = 20 > Creating file with numRows = 10 > Creating file with numColumns = 15 > text2fvf finished. > Loss of precision / loss of data during conversion from DOUBLE to FLOAT. > Futher conversion warnings omitted. > Read 4 items > Read 4 items > Read 4 items > Read 20 items > Warning messages: > 1: In uninames(.Object at data) : > uninames: some column names are not unique; use > set_dimnames/get_dimnames for non-unique row/col names > 2: In uninames(x at data) : > uninames: some column names are not unique; use > set_dimnames/get_dimnames for non-unique row/col names > 3: In uninames(x at data) : > uninames: some column names are not unique; use > set_dimnames/get_dimnames for non-unique row/col names > > > > > On Nov 25, 2013, at 13:23 PM, L.C. Karssen > wrote: > >> I had the same issues as Yurii reported: >> - compilation warnings when installing >> - the 'file already exists' warning: this was one of the reasons why I >> renamed the filename in the 'out=' option in test.R >> - I also got the warning about not-unique column names, but I assumed >> that wasn't the issue we were trying to fix here, so I didn't mention it. >> >> Yurii, does the script finish when you change the output file name in >> test.R? >> >> >> >> Lennart. >> >> On 11/25/2013 12:26 PM, Yury Aulchenko wrote: >>> Tried this on Mac OS X, see below >>> >>> On Nov 23, 2013, at 15:12 PM, Maksim Struchalin >>> >>> > wrote: >>> >>>> Hi Lennart, >>>> >>>> https://app.box.com/s/iy41ug5qg4sbylul9oyn >>>> >>>> This is an example demonstrating how a GenABEL function >>>> "impute2databel" calls a function "iteratorDA" from DatABEL. Here, >>>> GenABEL is compiled without flv's and iterator's code (I deleted it >>>> from src). >>>> >>>> Could you run the test?: >>>> 0) Dowload the file test_GenABEL_iterator.tar.gz from >>>> https://app.box.com/s/iy41ug5qg4sbylul9oyn >>>> 1) decompress test_GenABEL_iterator.tar.gz >>>> 2) cd test_GenABEL_iterator >>>> 3) R CMD INSTALL DatABEL_0.9-4.tar.gz >>>> 4) R CMD INSTALL GenABEL_1.7-7.tar.gz >>> >>> getting many warnings at step (4) >>> >>>> 5) run test.R >>> >>> getting >>> >>>> x <- impute2databel(geno="TEST10x15.geno", sample="impute.sample5", >>> out="TEST10x15_T.geno", makeprob=FALSE, old=TRUE) >>> Loading required package: DatABEL >>> DatABEL v.0.9-4 (March 12, 2013) loaded >>> >>> Options in effect: >>> --infile = TEST10x15.geno >>> --outfile = ./tmp333314 >>> --skiprows = OFF >>> --skipcols = 5 >>> --cnrow = OFF >>> --rncol = ON, using column 2 of 'TEST10x15.geno' >>> --transpose = ON >>> --Rmatrix = OFF >>> --nanString = NA >>> Number of lines in source file is 10 >>> Number of words in source file is 20 >>> skiprows = 0 >>> cnrow = 0 >>> skipcols = 5 >>> rncol = 2 >>> Rmatrix = 0 >>> numWords = 20 >>> Creating file with numRows = 10 >>> Creating file with numColumns = 15 >>> text2fvf finished. >>> File 'TEST10x15_T.geno.dose' already exists. >>> ERROR in Rstuff:failed in ini_empty_FileMatrix_RError in !result : >>> invalid argument type >>> Calls: impute2databel -> apply2dfo -> make_empty_fvf >>> In addition: Warning message: >>> In uninames(.Object at data) : >>> uninames: some column names are not unique; use >>> set_dimnames/get_dimnames for non-unique row/col names >>> Execution halted >>> >>> >>>> >>>> It works on my Ubuntu. If it works on your Ubuntu, win and mac, then >>>> we can delete from GenABEL the simlinks to flv and databel. >>>> >>>> best, >>>> ?????? >>>> >>>> >>>> On 22/11/2013 21:34, L.C. Karssen wrote: >>>>> Hi ??????, >>>>> >>>>> >>>>> On 11/19/2013 03:17 PM, Maksim Struchalin wrote: >>>>>> Hi Lennart, >>>>>> >>>>>> I see you are improving your Russian :-). >>>>> Getting to know the Russian alphabet is step one :-). >>>>> >>>>>> I understand your arguments. I think we can combine our two >>>>>> approaches. >>>>>> 1) We make a so/dll from filevector and let it use by >>>>>> ProbABEL/OmicABEL/Another_not_R_softABEL. >>>>>> 2) For GenABEL and other R packages, we make a DatABEL. >>>>>> >>>>>> The code of filevector is the same both for 1) and 2). >>>>> But that doesn't solve the problem of having symlinks to the fvlib >>>>> directory in our SVN tree... Which means that any update to filevector >>>>> can make the depending package (DatABEL) become uncompilable. >>>>> >>>>> In the mean time I've set the first steps towards 'libfilevector' in >>>>> SVN, see commits 1415 and 1416. This works (at least for >>>>> ProbABEL), but >>>>> more polishing is needed. >>>>> >>>>> >>>>>> We only add >>>>>> preprocessor commands (#ifdef and so on) to surround R specific code >>>>>> (ISNAN() and std::isnan). In this case, compiler choose itself >>>>>> weather >>>>>> it buids the lib for R or for OS. >>>>>> >>>>>> If we will want to use only approach 1) for GenABEL in the future, we >>>>>> can quickly swith to it later. >>>>> True, for now this will work. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>> >>>>>> >>>>>> On 19/11/2013 16:10, L.C. Karssen wrote: >>>>>>> Hi ??????, >>>>>>> >>>>>>> (trying a Russian keyboard layout, no idea if this works...). >>>>>>> >>>>>>> On 19-11-13 09:44, Maksim Struchalin wrote: >>>>>>>> It seems that your solution is workable but I see little >>>>>>>> difference with >>>>>>>> what it is now. Now the filevector code is incorporated in each >>>>>>>> packages. >>>>>>> This is what I would like to change, indeed. Code that is reused >>>>>>> by so >>>>>>> many packages should not be copied/symlinked into the code tree >>>>>>> of those >>>>>>> packages. By symlinking it as we have now, there is no proper way of >>>>>>> specifying a version number of the filevector code. Which, in >>>>>>> turn means >>>>>>> that if something changes in the filevector code all other >>>>>>> packages need >>>>>>> to be changed immediately (just like what happened with your latest >>>>>>> change). If the filevector code have been a proper library we >>>>>>> could have >>>>>>> simply said that ProbABEL still depends on the old filevector >>>>>>> version >>>>>>> and take more time to make sure the two play nice together. >>>>>>> >>>>>>> Moreover, with the filevector code in a separate library the whole >>>>>>> isnan() issue would not be a problem. We could simply use >>>>>>> std::isnan(), >>>>>>> because CRAN wouldn't need to compile the .so/.dll, so no need >>>>>>> of ISNAN(). >>>>>>> When code is put in a library the internals don't matter as long >>>>>>> as the >>>>>>> interface (function names + arguments) to the outside doesn't >>>>>>> change. >>>>>>> >>>>>>>> You propose to follow the same way but pack filelvector code >>>>>>>> in one file (dll or so) and distribute 9 packages form GenABEL >>>>>>>> with the >>>>>>>> same library. >>>>>>> Indeed. The problem with incorporating it all in DatABEL is that >>>>>>> non-R >>>>>>> packages like ProbABEL and OmicABEL depend on the stuff in the fvlib >>>>>>> directory as well. Filevector is central to (almost) all >>>>>>> packages in the >>>>>>> GenABEL suite, which is why I proposed to make a library out of >>>>>>> it. And, >>>>>>> as noted above, this way packages can depend on different >>>>>>> version of the >>>>>>> library. >>>>>>> >>>>>>> We can of course discuss whether we want to distribute this >>>>>>> .so/.dll as >>>>>>> a separate (operating system) package or withing the R packages. >>>>>>> To me >>>>>>> the first option is the 'correct' one, but I see that this may >>>>>>> impose on >>>>>>> the user (except on Windows and maybe MacOS, where the .so/.dll is >>>>>>> included in the R package). >>>>>>> >>>>>>> >>>>>>>> Last time I proposed to move filevector in DatABEL. All other >>>>>>>> packages >>>>>>>> (GenA and so on) will load DatAB in R and use filevector >>>>>>>> fucntions from >>>>>>>> DatA. When DatABEL is loaded through library(DatABEL), the file >>>>>>>> DatABEL.so is loaded as well. >>>>>>> I think this is what should be done with the DAlib directory >>>>>>> (another >>>>>>> symlinked dir). >>>>>>> >>>>>>>> Thus, you do not need to ask users to >>>>>>>> install additional lib because it is in DatABEL already. I >>>>>>>> think this is >>>>>>>> a workable approach that will allow us to delete the filevector >>>>>>>> code (or >>>>>>>> filevector so/dll) from all the packages. >>>>>>>> >>>>>>>> >>>>>>>> This is some quote from the R manual how to register functions >>>>>>>> to make >>>>>>>> it available from DatAB to GenAB: >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>> >>>>>>>> 5.4 Registering native routines >>>>>>>> >>>>>>>> By 'native' routine, we mean an entry point in compiled code. >>>>>>>> >>>>>>>> In calls to |.C|, |.Call|, |.Fortran| and |.External|, R must >>>>>>>> locate the >>>>>>>> specified native routine by looking in the appropriate shared >>>>>>>> object/DLL. By default, R uses the operating system-specific >>>>>>>> dynamic >>>>>>>> loader to lookup the symbol in all loaded DLLs and elsewhere. >>>>>>>> Alternatively, the author of the DLL can explicitly register >>>>>>>> routines >>>>>>>> with R and use a single, platform-independent mechanism for >>>>>>>> finding the >>>>>>>> routines in the DLL. One can use this registration mechanism to >>>>>>>> provide >>>>>>>> additional information about a routine, including the number >>>>>>>> and type of >>>>>>>> the arguments, and also make it available to R programmers under a >>>>>>>> different name. In the future, registration may be used to >>>>>>>> implement a >>>>>>>> form of "secure" or limited native access. >>>>>>>> >>>>>>>> _____________________________________________________ >>>>>>>> >>>>>>> Hmm, I will have to think about this. This seems to be about how >>>>>>> R finds >>>>>>> out in which DLL a function is found (and maybe where the DLL is >>>>>>> found >>>>>>> in the filesystem). I think this is separate from the point >>>>>>> below, but >>>>>>> I'm not sure. >>>>>>> >>>>>>>> Your argument was from "5.8 Linking to other packages: It is not in >>>>>>>> general possible to link a DLL in package *packA* to a DLL >>>>>>>> provided by >>>>>>>> package *packB". *I do not quite understand what they mean >>>>>>>> under 'link'. >>>>>>>> May be the mean link a library during intsalltion? >>>>>>> Yes, as far as I understand, they mean linking to a library during >>>>>>> installation/compilation. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart. >>>>>>>> best, >>>>>>>> Maksim >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 19/11/2013 15:14, L.C. Karssen wrote: >>>>>>>>> Hi Maksim, >>>>>>>>> >>>>>>>>> Good question... The idea is to generate a .dll file for >>>>>>>>> Windows, but >>>>>>>>> I'm not sure what would be the best way to distribute that. It >>>>>>>>> would be >>>>>>>>> interesting to see how other packages do that. For example, >>>>>>>>> the XML >>>>>>>>> package depends on libxml2: >>>>>>>>> http://cran.r-project.org/web/packages/XML/index.html and the >>>>>>>>> Rcurl >>>>>>>>> package depends on libcurl: >>>>>>>>> http://cran.r-project.org/web/packages/RCurl/index.html >>>>>>>>> >>>>>>>>> In the XML package .zip file for Windows there is a directory >>>>>>>>> libs/x64 >>>>>>>>> and a directory libs/i386. Both contain XML.dll, so I think >>>>>>>>> that for >>>>>>>>> Linux you simply specify a dependency on a library, whereas >>>>>>>>> for Windows >>>>>>>>> the actual .dll is in the package (which is quite logical because >>>>>>>>> Windows lacks the package repositories that most Linux distros >>>>>>>>> have). >>>>>>>>> It seems that for MacOS the .tgz file also contains a lib >>>>>>>>> directory with >>>>>>>>> the .so file. >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>> On 19-11-13 08:56, Maksim Struchalin wrote: >>>>>>>>>> Hi Lennart, >>>>>>>>>> >>>>>>>>>> How the users under win will install such a library? >>>>>>>>>> >>>>>>>>>> best, >>>>>>>>>> Maksim >>>>>>>>>> >>>>>>>>>> On 19/11/2013 14:46, L.C. Karssen wrote: >>>>>>>>>>> Dear all, >>>>>>>>>>> >>>>>>>>>>> The Jenkins setup already shows its value: After Maksim >>>>>>>>>>> changed the call >>>>>>>>>>> from std::isnan() to ISNAN() in fvlib's CastUtils.cpp an >>>>>>>>>>> automatic build >>>>>>>>>>> of ProbABEL was triggered and it failed (because ISNAN() is >>>>>>>>>>> an R function). >>>>>>>>>>> >>>>>>>>>>> I guess this is one more reason to try to convert fvlib into >>>>>>>>>>> a real >>>>>>>>>>> (shared) library. >>>>>>>>>>> Does anyone have another workable solution? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Lennart. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> >>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Tue Nov 26 12:11:18 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Tue, 26 Nov 2013 18:11:18 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> Message-ID: <529481D6.5070200@mail.ru> I am still in the way of compressing GenABEL data. To remind you: the idea consists of compressing the original data text files and use them later for generating RData files (e.g. srdta). Yurii proposed to make RData files in examples which use them. I see now only one way how this idea can be implemented. We replace "data(srdta)" line in every file where it is used by a function e.g. "generate_srdt()" which generate srdta object. The same procedure for other five *.RData files from GenABEL/data. If we follow this way, we have to change 71 files in man directory and, additionally to this, the GenABEL manual. Also, users will not be able to load the srdta set (and others) by typing "data(srdta)" in a command line (how they get used to) and has to know that the function generate_srdt() now services for these needs. This all sounds nasty :-). Making the data during package installation time is also a bad idea as Yurii noted below. Actually, this is impossible because the process of making GenABEL data requires GenABEL functions which are not available during installation time (they are avaialble only after GenABEL installed). I see only one good solution now: move all the GenABEL data to a new package e.g. GenABELdata as it was proposed by CRAN people from the begining. In this case, it is possible to generate RData during installation time using GenABEL functions (which are installed by that time). I think this solution is paltform independent because R rules permit runing *.R scripts to generate data during installation time. What do you think about making a data package for GenABEL? Do you think the name GenABELdata is ok? May be we can move all the *ABEL data in DatABEL package instead of making *ABELdata data packages? best, Maksim On 18/11/2013 18:54, Yury Aulchenko wrote: > On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: > >> Hi Maksim, >> >> On 14-11-13 22:38, Maksim Struchalin wrote: >>> In this email, I propose a new approach which allows to reduce total >>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>> 12Mb to 6Mb. >> I gues you mean B (bytes) instead of b (bits) here :-). >> >>> "R CMD check --as-cran" reports that the following sub-directories have >>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>> last GenABEL submission to CRAN, the maintainers suggested to create a >>> new package called GenABELdata and move all the data there. I run >>> through the data and found that: >>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>> -> 1.1Mb. >>> - There is a function guzip() from library R.utils which can >>> decompress the files. It works on any OS. >>> - Moreover: the native R function read.table() can read gzip files >>> without decompression. >>> - Even more: it looks like that the biggest file "srgenos.dat" is >>> used only once a long time ago for generating "srdta.RData" and now it >>> is just sitting there and eating space needlessly. >> Sounds like a waste of space! >> >>> 2) We can delete some files from the "data" directory. The deleted files >>> will be generated on the user computer based on the files from exdata. >>> It can be done during INSTALLATION (a line in Makefile?) or on the first >>> load through (|run funcion .onAttach() in R/zzz.R|). >> This sounds like a perfectly acceptable option. > > I suggest this is done in the "example" which make use of this data, NOT in the INSTALL etc. - we should make things as "robust" as possible and interfere as little as possible with the usual workflow (which is very much system-specific, in that we will need to to test on all platforms) > > >>> It will reduce >>> total size of "data" directory from 2.3Mb to 800Kb. >> Fantastic! If no one has other objections I say: go ahead. >> >> >> Best, >> >> Lennart. >> >> >>> Any objections/suggestions? >>> >>> best, >>> Maksim >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> -- >> ----------------------------------------------------------------- >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> >> Stuur mij aub geen Word of Powerpoint bestanden! >> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >> ------------------------------------------------------------------ >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Tue Nov 26 14:37:20 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 26 Nov 2013 14:37:20 +0100 Subject: [GenABEL-dev] function for conversion a plink format file to a GenABEL format file In-Reply-To: <52936119.5070704@mail.ru> References: <5285A8CF.2030402@mail.ru> <52865058.9020105@karssen.org> <528ED325.8070707@mail.ru> <528F1E1E.6050300@karssen.org> <52936119.5070704@mail.ru> Message-ID: <5294A410.8090601@karssen.org> Dear Maksim, On 11/25/2013 03:39 PM, Maksim Struchalin wrote: > I checked the read.plink from snpMatrix (Nicola) and snpStats (Maarten). > I see that the code under them is quite simple (~40 lines of c code > under snpMatrix read.plink). > > The bed plink format is very similar to GenABEL format > (http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml). Looks like > that the main difference between them is that the plink bed file has > first 3 bytes with some special meaning. The other bytes store genotypes > (0, 1, 2 or NA) in 2 bits per genotype (like in GenA). That sounds good. I actually never looked under the hood of the GenABEL format, and the plink format is indeed quite simple. If it only the first few bytes that differ, that sounds promising! > > I think it would be easy just to write a C function which convert bed to > databel format. That sounds useful! But what does that mean for the GenABEL functions? Do you propose to let the GenABEL functions (like merge.snp.data/merge.gwaa.data work on DatABEL objects as well)? > Also, we can think about making the bed as the format > which is nativelly supported by genabel. For this, we only need a > function which extract an array from bed and make iterator to use this > function. Similar to my question above: what do you exactly mean? Do you want to change all (relevant) GenABEL functions to work with three backend formats (GenABEL/DatABEL/.bed)? That sounds like quite a lot of work! Or do you simply mean to write import/convert functions between these formats? Best, Lennart. > > best, > Maksim > > > On 22/11/2013 23:51, Yurii Aulchenko wrote: >> Great idea >> >> I know nothing of plink bin format, but many packages make use of it, >> so it should be not that complicated. Also plink is gnu GPL if I >> remember correctly so we can use the code if needed >> >> Y >> >> On Friday, November 22, 2013, L.C. Karssen wrote: >> >> How difficult would it be to import .bed files [1] instead of the text >> conversion? Given the binary data of both the .bed and the GenABEL >> format, wouldn't conversion be much quicker? >> >> >> Lennart. >> >> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml >> >> >> >> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote: >> > Too slow, too difficult for the user, or both? :) >> > >> > On Friday, November 22, 2013, Maksim Struchalin wrote: >> > >> > Yes. Looks like it was a bad idea to use plink R-plugin for >> > converting plink files to *ABEL format. >> > Maksim >> > >> > On 18/11/2013 18:48, Yury Aulchenko wrote: >> >> I would say that in principle DatABEL::text2databel is the >> >> "natural" way to go from text-files to DatABEL-files >> >> >> >> The problem is that 'regular' text input may be allele by >> allele, >> >> not genotype by genotype... (e.g. data are in format "A G", or >> >> "A/G", not "0" or "1" or "2"). >> >> >> >> Y >> >> >> >> On Nov 15, 2013, at 17:48 PM, L.C. Karssen >> > >> >> wrote: >> >> >> >>> Hi Maksim, >> >>> >> >>> On 15-11-13 05:53, Maksim Struchalin wrote: >> >>>> An easy way to write a function for conversion a plink format >> >>>> file to a >> >>>> GenABEL format file: >> >>>> >> >>>> Use plink support of 'plug-in' functions >> >>> >> >>> Nice find. I didn't know that existed. >> >>> >> >>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml >> >> >>>> ). >> >>>> This allows us >> >>>> to write a simple R script (myscript.R) which is called >> by plink >> >>>> (plink >> >>>> --file mydata --R myscript.R). plink reads the file mydata >> >>>> (which is in >> >>>> plink format) and iteratively, SNP by SNP, trasfer all >> the data to a >> >>>> script myscript.R. This script contains a function >> >>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every >> SNP (GENO >> >>>> variable) and store it in a *flv format through calling >> DatABEL >> >>>> functions. >> >>>> >> >>>> The whole process of conversion will look like this: >> >>>> >> >>>> 1) User asks GenA convert plink file to GenA file >> >>>> 2) GenA looks weather the plink is installed. If it is not >> >>>> installed, >> >>>> then GenA goes to a plink site and download/install it itself >> >>>> (use an R >> >>>> function "download.file" from "utils" package) >> >>>> 3) GenA run a simple line: system('plink --file mydata --R >> >>>> myscript.R') >> >>>> 4) Rplink function (from myscript.R) gets every SNP and >> stote it >> >>>> in *flv >> >>>> format. This function creates an flv file and then open and >> >>>> close it for >> >>>> saving every single SNP. >> >>>> 5) Work is Done >> >>> >> >>> I'm not sure how portable it is to download and run plink. >> Also, the >> >>> plink page says: Currently, there is only support for >> R-plugins for >> >>> Linux-based and Mac OS PLINK distributions. >> >>> >> >>>> >> >>>> The only issue is how fast the converssion will run: how much >> >>>> time does >> >>>> it take to open a filvector file, store one SNP and close >> it? I >> >>>> can not >> >>>> find a DatABEL R function for adding SNP to a flv file. >> Is there a C >> >>>> DatABEL function which can do it? >> >>> >> >>> Wouldn't it be easier/possible to use plink to export to text >> >>> (.csv) and >> >>> then use filevector's txt2fvf binary (of course this could be >> >>> done from >> >>> R using system())? >> >>> >> >>> I'm also wondering if going per SNP is really necessary. If I >> >>> understand >> >>> it correctly the R script (myscript.R) has to have a >> function called: >> >>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR) >> >>> where GENO is the matrix of genotypes. So we could write >> that into a >> >>> DatABEL file at once. Of course you may want to do this per >> >>> chromosome >> >>> to reduce memory consumption (not sure how plink/R would >> handle large >> >>> data sets). >> >>> >> > >> > >> > -- >> > ----------------------------------------------------- >> > Yurii S. Aulchenko >> > >> > [ LinkedIn ] [ Twitter >> > ] [ Blog >> > ] >> > >> > >> > >> > _______________________________________________ >> > genabel-devel mailing list >> > genabel-devel at lists.r-forge.r-project.org >> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> >> >> -- >> ----------------------------------------------------- >> Yurii S. Aulchenko >> >> [ LinkedIn ] [ Twitter >> ] [ Blog >> ] >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Tue Nov 26 14:48:04 2013 From: lennart at karssen.org (L.C. Karssen) Date: Tue, 26 Nov 2013 14:48:04 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529481D6.5070200@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> Message-ID: <5294A694.4090306@karssen.org> Hi Maksim, On 11/26/2013 12:11 PM, Maksim Struchalin wrote: > I am still in the way of compressing GenABEL data. > To remind you: the idea consists of compressing the original data text > files and use them later for generating RData files (e.g. srdta). > > Yurii proposed to make RData files in examples which use them. I see now > only one way how this idea can be implemented. We replace "data(srdta)" > line in every file where it is used by a function e.g. "generate_srdt()" > which generate srdta object. The same procedure for other five *.RData > files from GenABEL/data. If we follow this way, we have to change 71 > files in man directory and, additionally to this, the GenABEL manual. > Also, users will not be able to load the srdta set (and others) by > typing "data(srdta)" in a command line (how they get used to) and has to > know that the function generate_srdt() now services for these needs. > This all sounds nasty :-). I'm not sure how many user actually type data(srdta), but I see you point. > > Making the data during package installation time is also a bad idea as > Yurii noted below. Actually, this is impossible because the process of > making GenABEL data requires GenABEL functions which are not available > during installation time (they are avaialble only after GenABEL installed). Good point! > > I see only one good solution now: move all the GenABEL data to a new > package e.g. GenABELdata as it was proposed by CRAN people from the > begining. In this case, it is possible to generate RData during > installation time using GenABEL functions (which are installed by that > time). I think this solution is paltform independent because R rules > permit runing *.R scripts to generate data during installation time. > > What do you think about making a data package for GenABEL? Do you think > the name GenABELdata is ok? May be we can move all the *ABEL data in > DatABEL package instead of making *ABELdata data packages? Sounds like this is the best solution. Thanks for digging in to this. As for the package name, either GenABELdata or GenABEL.data sounds find with me (the latter one being a bit clearer in my opinion). Best, Lennart > > best, > Maksim > > On 18/11/2013 18:54, Yury Aulchenko wrote: >> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >> >>> Hi Maksim, >>> >>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>> In this email, I propose a new approach which allows to reduce total >>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>> 12Mb to 6Mb. >>> I gues you mean B (bytes) instead of b (bits) here :-). >>> >>>> "R CMD check --as-cran" reports that the following sub-directories have >>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>> new package called GenABELdata and move all the data there. I run >>>> through the data and found that: >>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>> -> 1.1Mb. >>>> - There is a function guzip() from library R.utils which can >>>> decompress the files. It works on any OS. >>>> - Moreover: the native R function read.table() can read gzip files >>>> without decompression. >>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>> used only once a long time ago for generating "srdta.RData" and now it >>>> is just sitting there and eating space needlessly. >>> Sounds like a waste of space! >>> >>>> 2) We can delete some files from the "data" directory. The deleted >>>> files >>>> will be generated on the user computer based on the files from exdata. >>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>> first >>>> load through (|run funcion .onAttach() in R/zzz.R|). >>> This sounds like a perfectly acceptable option. >> >> I suggest this is done in the "example" which make use of this data, >> NOT in the INSTALL etc. - we should make things as "robust" as >> possible and interfere as little as possible with the usual workflow >> (which is very much system-specific, in that we will need to to test >> on all platforms) >> >> >>>> It will reduce >>>> total size of "data" directory from 2.3Mb to 800Kb. >>> Fantastic! If no one has other objections I say: go ahead. >>> >>> >>> Best, >>> >>> Lennart. >>> >>> >>>> Any objections/suggestions? >>>> >>>> best, >>>> Maksim >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>> -- >>> ----------------------------------------------------------------- >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> >>> Stuur mij aub geen Word of Powerpoint bestanden! >>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>> ------------------------------------------------------------------ >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Wed Nov 27 19:58:20 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Thu, 28 Nov 2013 01:58:20 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <5294A694.4090306@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> Message-ID: <529640CC.7000006@mail.ru> Hi All, I created a GenABEL.data package where I moved the following data: GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. All the corresponding files are deleted from GenABEL. Also, GenABEL.data contains R directory with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These scripts does not go to the final distribution and needed only for possible future usage. Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after running "R CMD build GenABEL.data". The directories "R" and "inst" are removed by running GenABEL/data/clean.R in "build" process. May be it is not a good idea to do it in such a way but, at least, it is convinient and has no any reflection on end users (suggest a better way plz). The way how GenABEL.data works now is not like how we discussed below. It is impossible to generate files during "R CMD INSTALL" and undisarable during "R CMD build". The best opition was just to move all the data to GenABEL.data from GenABEL (like CRAN people suggested). In this case, we can install GenABEL.data without having GenABEL installed. After this, we install GenABELL. When we run library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only change for users is that they need to install two packages now (GenABEL.data and GebABEL). Now we have sizes of both packages much smaller: 469K for GenABEL and 2.4M for GenABEL.data. It should work now, but if you experience some problems, let me know. best, Maksim On 26/11/2013 20:48, L.C. Karssen wrote: > Hi Maksim, > > On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >> I am still in the way of compressing GenABEL data. >> To remind you: the idea consists of compressing the original data text >> files and use them later for generating RData files (e.g. srdta). >> >> Yurii proposed to make RData files in examples which use them. I see now >> only one way how this idea can be implemented. We replace "data(srdta)" >> line in every file where it is used by a function e.g. "generate_srdt()" >> which generate srdta object. The same procedure for other five *.RData >> files from GenABEL/data. If we follow this way, we have to change 71 >> files in man directory and, additionally to this, the GenABEL manual. >> Also, users will not be able to load the srdta set (and others) by >> typing "data(srdta)" in a command line (how they get used to) and has to >> know that the function generate_srdt() now services for these needs. >> This all sounds nasty :-). > I'm not sure how many user actually type data(srdta), but I see you point. > >> Making the data during package installation time is also a bad idea as >> Yurii noted below. Actually, this is impossible because the process of >> making GenABEL data requires GenABEL functions which are not available >> during installation time (they are avaialble only after GenABEL installed). > Good point! > >> I see only one good solution now: move all the GenABEL data to a new >> package e.g. GenABELdata as it was proposed by CRAN people from the >> begining. In this case, it is possible to generate RData during >> installation time using GenABEL functions (which are installed by that >> time). I think this solution is paltform independent because R rules >> permit runing *.R scripts to generate data during installation time. >> >> What do you think about making a data package for GenABEL? Do you think >> the name GenABELdata is ok? May be we can move all the *ABEL data in >> DatABEL package instead of making *ABELdata data packages? > Sounds like this is the best solution. Thanks for digging in to this. As > for the package name, either GenABELdata or GenABEL.data sounds find > with me (the latter one being a bit clearer in my opinion). > > > Best, > > Lennart > >> best, >> Maksim >> >> On 18/11/2013 18:54, Yury Aulchenko wrote: >>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>> >>>> Hi Maksim, >>>> >>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>> In this email, I propose a new approach which allows to reduce total >>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>> 12Mb to 6Mb. >>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>> >>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>> new package called GenABELdata and move all the data there. I run >>>>> through the data and found that: >>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>> -> 1.1Mb. >>>>> - There is a function guzip() from library R.utils which can >>>>> decompress the files. It works on any OS. >>>>> - Moreover: the native R function read.table() can read gzip files >>>>> without decompression. >>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>> is just sitting there and eating space needlessly. >>>> Sounds like a waste of space! >>>> >>>>> 2) We can delete some files from the "data" directory. The deleted >>>>> files >>>>> will be generated on the user computer based on the files from exdata. >>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>> first >>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>> This sounds like a perfectly acceptable option. >>> I suggest this is done in the "example" which make use of this data, >>> NOT in the INSTALL etc. - we should make things as "robust" as >>> possible and interfere as little as possible with the usual workflow >>> (which is very much system-specific, in that we will need to to test >>> on all platforms) >>> >>> >>>>> It will reduce >>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>> Fantastic! If no one has other objections I say: go ahead. >>>> >>>> >>>> Best, >>>> >>>> Lennart. >>>> >>>> >>>>> Any objections/suggestions? >>>>> >>>>> best, >>>>> Maksim >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> >>>> -- >>>> ----------------------------------------------------------------- >>>> L.C. Karssen >>>> Utrecht >>>> The Netherlands >>>> >>>> lennart at karssen.org >>>> http://blog.karssen.org >>>> >>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>> ------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Wed Nov 27 20:02:42 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 27 Nov 2013 20:02:42 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529481D6.5070200@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> Message-ID: <560B6367-89BD-49DB-9EEC-8B2DEA86C994@gmail.com> On Nov 26, 2013, at 12:11 PM, Maksim Struchalin wrote: > I am still in the way of compressing GenABEL data. > To remind you: the idea consists of compressing the original data text files and use them later for generating RData files (e.g. srdta). > > Yurii proposed to make RData files in examples which use them. I see now only one way how this idea can be implemented. We replace "data(srdta)" line in every file where it is used by a function e.g. "generate_srdt()" which generate srdta object. The same procedure for other five *.RData files from GenABEL/data. If we follow this way, we have to change 71 files in man directory and, additionally to this, the GenABEL manual. Also, users will not be able to load the srdta set (and others) by typing "data(srdta)" in a command line (how they get used to) and has to know that the function generate_srdt() now services for these needs. This all sounds nasty :-). agree, this is too much of a hassle; does not look elegant > > Making the data during package installation time is also a bad idea as Yurii noted below. Actually, this is impossible because the process of making GenABEL data requires GenABEL functions which are not available during installation time (they are avaialble only after GenABEL installed). > > I see only one good solution now: move all the GenABEL data to a new package e.g. GenABELdata as it was proposed by CRAN people from the begining. In this case, it is possible to generate RData during installation time using GenABEL functions (which are installed by that time). I think this solution is paltform independent because R rules permit runing *.R scripts to generate data during installation time. > > What do you think about making a data package for GenABEL? agree > Do you think the name GenABELdata is ok? May be we can move all the *ABEL data in DatABEL package instead of making *ABELdata data packages? What are names used by other guys? GenABELdata or GenABEL.data sound like good names (not sure about "." in the package names) Moving data to DatABEL does not sound good to me - DatABEL is, in principle, not about genetics, so whay should genetic data hang there. best, Y > > best, > Maksim > > On 18/11/2013 18:54, Yury Aulchenko wrote: >> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >> >>> Hi Maksim, >>> >>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>> In this email, I propose a new approach which allows to reduce total >>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>> 12Mb to 6Mb. >>> I gues you mean B (bytes) instead of b (bits) here :-). >>> >>>> "R CMD check --as-cran" reports that the following sub-directories have >>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>> new package called GenABELdata and move all the data there. I run >>>> through the data and found that: >>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>> -> 1.1Mb. >>>> - There is a function guzip() from library R.utils which can >>>> decompress the files. It works on any OS. >>>> - Moreover: the native R function read.table() can read gzip files >>>> without decompression. >>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>> used only once a long time ago for generating "srdta.RData" and now it >>>> is just sitting there and eating space needlessly. >>> Sounds like a waste of space! >>> >>>> 2) We can delete some files from the "data" directory. The deleted files >>>> will be generated on the user computer based on the files from exdata. >>>> It can be done during INSTALLATION (a line in Makefile?) or on the first >>>> load through (|run funcion .onAttach() in R/zzz.R|). >>> This sounds like a perfectly acceptable option. >> >> I suggest this is done in the "example" which make use of this data, NOT in the INSTALL etc. - we should make things as "robust" as possible and interfere as little as possible with the usual workflow (which is very much system-specific, in that we will need to to test on all platforms) >> >> >>>> It will reduce >>>> total size of "data" directory from 2.3Mb to 800Kb. >>> Fantastic! If no one has other objections I say: go ahead. >>> >>> >>> Best, >>> >>> Lennart. >>> >>> >>>> Any objections/suggestions? >>>> >>>> best, >>>> Maksim >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> -- >>> ----------------------------------------------------------------- >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> >>> Stuur mij aub geen Word of Powerpoint bestanden! >>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>> ------------------------------------------------------------------ >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Wed Nov 27 20:07:11 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Wed, 27 Nov 2013 20:07:11 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529640CC.7000006@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> Message-ID: <4DD079E7-8B29-4C7A-A06E-96BA37448858@gmail.com> Wow, very impressive, Maksim! Can you please check if GenABEL.data complies with the naming conventions (I do not recall seeing the names with dots as package names; what other data-packages use as the names?) If naming is ok, do you think we are close to submit to CRAN? With so many changes, I think we should "jump" on the version number (say, to 1.8-0?) best, Yurii On Nov 27, 2013, at 19:58 PM, Maksim Struchalin wrote: > Hi All, > > I created a GenABEL.data package where I moved the following data: GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. All the corresponding files are deleted from GenABEL. > Also, GenABEL.data contains R directory with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These scripts does not go to the final distribution and needed only for possible future usage. > Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after running "R CMD build GenABEL.data". The directories "R" and "inst" are removed by running GenABEL/data/clean.R in "build" process. May be it is not a good idea to do it in such a way but, at least, it is convinient and has no any reflection on end users (suggest a better way plz). > > The way how GenABEL.data works now is not like how we discussed below. It is impossible to generate files during "R CMD INSTALL" and undisarable during "R CMD build". The best opition was just to move all the data to GenABEL.data from GenABEL (like CRAN people suggested). In this case, we can install GenABEL.data without having GenABEL installed. After this, we install GenABELL. When we run library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only change for users is that they need to install two packages now (GenABEL.data and GebABEL). > > Now we have sizes of both packages much smaller: 469K for GenABEL and 2.4M for GenABEL.data. > > It should work now, but if you experience some problems, let me know. > > best, > Maksim > > > On 26/11/2013 20:48, L.C. Karssen wrote: >> Hi Maksim, >> >> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>> I am still in the way of compressing GenABEL data. >>> To remind you: the idea consists of compressing the original data text >>> files and use them later for generating RData files (e.g. srdta). >>> >>> Yurii proposed to make RData files in examples which use them. I see now >>> only one way how this idea can be implemented. We replace "data(srdta)" >>> line in every file where it is used by a function e.g. "generate_srdt()" >>> which generate srdta object. The same procedure for other five *.RData >>> files from GenABEL/data. If we follow this way, we have to change 71 >>> files in man directory and, additionally to this, the GenABEL manual. >>> Also, users will not be able to load the srdta set (and others) by >>> typing "data(srdta)" in a command line (how they get used to) and has to >>> know that the function generate_srdt() now services for these needs. >>> This all sounds nasty :-). >> I'm not sure how many user actually type data(srdta), but I see you point. >> >>> Making the data during package installation time is also a bad idea as >>> Yurii noted below. Actually, this is impossible because the process of >>> making GenABEL data requires GenABEL functions which are not available >>> during installation time (they are avaialble only after GenABEL installed). >> Good point! >> >>> I see only one good solution now: move all the GenABEL data to a new >>> package e.g. GenABELdata as it was proposed by CRAN people from the >>> begining. In this case, it is possible to generate RData during >>> installation time using GenABEL functions (which are installed by that >>> time). I think this solution is paltform independent because R rules >>> permit runing *.R scripts to generate data during installation time. >>> >>> What do you think about making a data package for GenABEL? Do you think >>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>> DatABEL package instead of making *ABELdata data packages? >> Sounds like this is the best solution. Thanks for digging in to this. As >> for the package name, either GenABELdata or GenABEL.data sounds find >> with me (the latter one being a bit clearer in my opinion). >> >> >> Best, >> >> Lennart >> >>> best, >>> Maksim >>> >>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>> >>>>> Hi Maksim, >>>>> >>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>> In this email, I propose a new approach which allows to reduce total >>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>> 12Mb to 6Mb. >>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>> >>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>> new package called GenABELdata and move all the data there. I run >>>>>> through the data and found that: >>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>> -> 1.1Mb. >>>>>> - There is a function guzip() from library R.utils which can >>>>>> decompress the files. It works on any OS. >>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>> without decompression. >>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>> is just sitting there and eating space needlessly. >>>>> Sounds like a waste of space! >>>>> >>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>> files >>>>>> will be generated on the user computer based on the files from exdata. >>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>> first >>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>> This sounds like a perfectly acceptable option. >>>> I suggest this is done in the "example" which make use of this data, >>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>> possible and interfere as little as possible with the usual workflow >>>> (which is very much system-specific, in that we will need to to test >>>> on all platforms) >>>> >>>> >>>>>> It will reduce >>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>> Fantastic! If no one has other objections I say: go ahead. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>> >>>>> >>>>>> Any objections/suggestions? >>>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> >>>>> -- >>>>> ----------------------------------------------------------------- >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> >>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>> ------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Wed Nov 27 20:45:20 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Thu, 28 Nov 2013 02:45:20 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <4DD079E7-8B29-4C7A-A06E-96BA37448858@gmail.com> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <4DD079E7-8B29-4C7A-A06E-96BA37448858@gmail.com> Message-ID: <52964BD0.2050208@mail.ru> About package names: There are 5056 packages on CRAN. 42 of them are data packages and 6 of them has name like packagename.data ("cluster.datasets", "data.table", "gamlss.data", "g.data", "survJamda.data" and "TH.data"). Thus, GenABELdata would be more in line then GenABEL.data. About submission to CRAN: I still see small warnings in R CMD --check output and at least one FAILURE in test.polylik from RUnit test. I think we can make the next submission to CRAN by the end of the next week: one week for testing our new data package + fixing small errors. p.s. We should submit to CRAN GenABEL.data as well. best, Maksim On 28/11/2013 02:07, Yury Aulchenko wrote: > Wow, very impressive, Maksim! > > Can you please check if GenABEL.data complies with the naming > conventions (I do not recall seeing the names with dots as package > names; what other data-packages use as the names?) > > If naming is ok, do you think we are close to submit to CRAN? With so > many changes, I think we should "jump" on the version number (say, to > 1.8-0?) > > best, > Yurii > > On Nov 27, 2013, at 19:58 PM, Maksim Struchalin > > wrote: > >> Hi All, >> >> I created a GenABEL.data package where I moved the following data: >> GenABEL/data/* , inst/exdata/srgenos.dat and >> inst/exdata/srphenos.dat. All the corresponding files are deleted >> from GenABEL. >> Also, GenABEL.data contains R directory with three files (ge03d2c.R, >> ge03d2ex.R and srdta.R). These scripts does not go to the final >> distribution and needed only for possible future usage. >> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after >> running "R CMD build GenABEL.data". The directories "R" and "inst" >> are removed by running GenABEL/data/clean.R in "build" process. May >> be it is not a good idea to do it in such a way but, at least, it is >> convinient and has no any reflection on end users (suggest a better >> way plz). >> >> The way how GenABEL.data works now is not like how we discussed >> below. It is impossible to generate files during "R CMD INSTALL" and >> undisarable during "R CMD build". The best opition was just to move >> all the data to GenABEL.data from GenABEL (like CRAN people >> suggested). In this case, we can install GenABEL.data without having >> GenABEL installed. After this, we install GenABELL. When we run >> library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only >> change for users is that they need to install two packages now >> (GenABEL.data and GebABEL). >> >> Now we have sizes of both packages much smaller: 469K for GenABEL and >> 2.4M for GenABEL.data. >> >> It should work now, but if you experience some problems, let me know. >> >> best, >> Maksim >> >> >> On 26/11/2013 20:48, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>> I am still in the way of compressing GenABEL data. >>>> To remind you: the idea consists of compressing the original data text >>>> files and use them later for generating RData files (e.g. srdta). >>>> >>>> Yurii proposed to make RData files in examples which use them. I see now >>>> only one way how this idea can be implemented. We replace "data(srdta)" >>>> line in every file where it is used by a function e.g. "generate_srdt()" >>>> which generate srdta object. The same procedure for other five *.RData >>>> files from GenABEL/data. If we follow this way, we have to change 71 >>>> files in man directory and, additionally to this, the GenABEL manual. >>>> Also, users will not be able to load the srdta set (and others) by >>>> typing "data(srdta)" in a command line (how they get used to) and has to >>>> know that the function generate_srdt() now services for these needs. >>>> This all sounds nasty :-). >>> I'm not sure how many user actually type data(srdta), but I see you point. >>> >>>> Making the data during package installation time is also a bad idea as >>>> Yurii noted below. Actually, this is impossible because the process of >>>> making GenABEL data requires GenABEL functions which are not available >>>> during installation time (they are avaialble only after GenABEL installed). >>> Good point! >>> >>>> I see only one good solution now: move all the GenABEL data to a new >>>> package e.g. GenABELdata as it was proposed by CRAN people from the >>>> begining. In this case, it is possible to generate RData during >>>> installation time using GenABEL functions (which are installed by that >>>> time). I think this solution is paltform independent because R rules >>>> permit runing *.R scripts to generate data during installation time. >>>> >>>> What do you think about making a data package for GenABEL? Do you think >>>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>>> DatABEL package instead of making *ABELdata data packages? >>> Sounds like this is the best solution. Thanks for digging in to this. As >>> for the package name, either GenABELdata or GenABEL.data sounds find >>> with me (the latter one being a bit clearer in my opinion). >>> >>> >>> Best, >>> >>> Lennart >>> >>>> best, >>>> Maksim >>>> >>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>>> >>>>>> Hi Maksim, >>>>>> >>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>> In this email, I propose a new approach which allows to reduce total >>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>>> 12Mb to 6Mb. >>>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>>> >>>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>>> new package called GenABELdata and move all the data there. I run >>>>>>> through the data and found that: >>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>>> -> 1.1Mb. >>>>>>> - There is a function guzip() from library R.utils which can >>>>>>> decompress the files. It works on any OS. >>>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>>> without decompression. >>>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>>> is just sitting there and eating space needlessly. >>>>>> Sounds like a waste of space! >>>>>> >>>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>>> files >>>>>>> will be generated on the user computer based on the files from exdata. >>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>>> first >>>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>>> This sounds like a perfectly acceptable option. >>>>> I suggest this is done in the "example" which make use of this data, >>>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>>> possible and interfere as little as possible with the usual workflow >>>>> (which is very much system-specific, in that we will need to to test >>>>> on all platforms) >>>>> >>>>> >>>>>>> It will reduce >>>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>>> Fantastic! If no one has other objections I say: go ahead. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> >>>>>>> Any objections/suggestions? >>>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>>> >>>>>> -- >>>>>> ----------------------------------------------------------------- >>>>>> L.C. Karssen >>>>>> Utrecht >>>>>> The Netherlands >>>>>> >>>>>> lennart at karssen.org >>>>>> http://blog.karssen.org >>>>>> >>>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>>> Ziehttp://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>> ------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Thu Nov 28 10:38:54 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Thu, 28 Nov 2013 10:38:54 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <20131128043855.1EDED1861B7@r-forge.r-project.org> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> Message-ID: <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> Yakov, please pay attention to this commit (I know that you were changing the code of the GC procedures recently) YA On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: > Author: maksim > Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) > New Revision: 1429 > > Modified: > pkg/GenABEL/man/PGC.Rd > Log: > Deleteted comments because R CMD check --as-cran said that these lines are wider than 100 characters and they will be truncated in the PDF manual. This should not have any influence on the code generated from it. > > Modified: pkg/GenABEL/man/PGC.Rd > =================================================================== > --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) > +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) > @@ -63,8 +63,6 @@ > s <- summary(ge03d2) > freq <- s$Q.2 > result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, plot=TRUE, lmax=1.1,start.corr=FALSE) > -#"group_regress" is better to use when we have more than 50K SNPs > -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) > } > \author{ > Yakov Tsepilov > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Thu Nov 28 11:59:32 2013 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 28 Nov 2013 11:59:32 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529640CC.7000006@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> Message-ID: <52972214.8030602@karssen.org> Hi Maksim, First of all, thanks for the good work! On 11/27/2013 07:58 PM, Maksim Struchalin wrote: > Hi All, > > I created a GenABEL.data package where I moved the following data: > GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. > All the corresponding files are deleted from GenABEL. > Also, GenABEL.data contains R directory with three files (ge03d2c.R, > ge03d2ex.R and srdta.R). These scripts does not go to the final > distribution and needed only for possible future usage. > Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after > running "R CMD build GenABEL.data". The directories "R" and "inst" are > removed by running GenABEL/data/clean.R in "build" process. May be it is > not a good idea to do it in such a way but, at least, it is convinient > and has no any reflection on end users (suggest a better way plz). > > The way how GenABEL.data works now is not like how we discussed below. > It is impossible to generate files during "R CMD INSTALL" and > undisarable during "R CMD build". The best opition was just to move all > the data to GenABEL.data from GenABEL (like CRAN people suggested). In > this case, we can install GenABEL.data without having GenABEL installed. > After this, we install GenABELL. This sounds very strange to me. Does the user first need to install the GenABEL.data package and then the 'main' GenABEL package? Or do I misunderstand you? What happens if the user installs them in a different order? I guess that shouldn't matter, right, as the package contains only data? > When we run library(GenABEL), it > automaticly attaches GenBEL.data. Thus, the only change for users is > that they need to install two packages now (GenABEL.data and GebABEL). And GenABEL.data is only needed if they actually want to use the examples, right? Or do we simply put GenABEL.data in the list of required packages in the DESCRIPTION file? Thanks, Lennart. > > Now we have sizes of both packages much smaller: 469K for GenABEL and > 2.4M for GenABEL.data. > > It should work now, but if you experience some problems, let me know. > > best, > Maksim > > > On 26/11/2013 20:48, L.C. Karssen wrote: >> Hi Maksim, >> >> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>> I am still in the way of compressing GenABEL data. >>> To remind you: the idea consists of compressing the original data text >>> files and use them later for generating RData files (e.g. srdta). >>> >>> Yurii proposed to make RData files in examples which use them. I see now >>> only one way how this idea can be implemented. We replace "data(srdta)" >>> line in every file where it is used by a function e.g. "generate_srdt()" >>> which generate srdta object. The same procedure for other five *.RData >>> files from GenABEL/data. If we follow this way, we have to change 71 >>> files in man directory and, additionally to this, the GenABEL manual. >>> Also, users will not be able to load the srdta set (and others) by >>> typing "data(srdta)" in a command line (how they get used to) and has to >>> know that the function generate_srdt() now services for these needs. >>> This all sounds nasty :-). >> I'm not sure how many user actually type data(srdta), but I see you point. >> >>> Making the data during package installation time is also a bad idea as >>> Yurii noted below. Actually, this is impossible because the process of >>> making GenABEL data requires GenABEL functions which are not available >>> during installation time (they are avaialble only after GenABEL installed). >> Good point! >> >>> I see only one good solution now: move all the GenABEL data to a new >>> package e.g. GenABELdata as it was proposed by CRAN people from the >>> begining. In this case, it is possible to generate RData during >>> installation time using GenABEL functions (which are installed by that >>> time). I think this solution is paltform independent because R rules >>> permit runing *.R scripts to generate data during installation time. >>> >>> What do you think about making a data package for GenABEL? Do you think >>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>> DatABEL package instead of making *ABELdata data packages? >> Sounds like this is the best solution. Thanks for digging in to this. As >> for the package name, either GenABELdata or GenABEL.data sounds find >> with me (the latter one being a bit clearer in my opinion). >> >> >> Best, >> >> Lennart >> >>> best, >>> Maksim >>> >>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>> >>>>> Hi Maksim, >>>>> >>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>> In this email, I propose a new approach which allows to reduce total >>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>> 12Mb to 6Mb. >>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>> >>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>> new package called GenABELdata and move all the data there. I run >>>>>> through the data and found that: >>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>> -> 1.1Mb. >>>>>> - There is a function guzip() from library R.utils which can >>>>>> decompress the files. It works on any OS. >>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>> without decompression. >>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>> is just sitting there and eating space needlessly. >>>>> Sounds like a waste of space! >>>>> >>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>> files >>>>>> will be generated on the user computer based on the files from exdata. >>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>> first >>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>> This sounds like a perfectly acceptable option. >>>> I suggest this is done in the "example" which make use of this data, >>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>> possible and interfere as little as possible with the usual workflow >>>> (which is very much system-specific, in that we will need to to test >>>> on all platforms) >>>> >>>> >>>>>> It will reduce >>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>> Fantastic! If no one has other objections I say: go ahead. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>> >>>>> >>>>>> Any objections/suggestions? >>>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> >>>>> -- >>>>> ----------------------------------------------------------------- >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> >>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>> ------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Thu Nov 28 11:59:46 2013 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 28 Nov 2013 11:59:46 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52964BD0.2050208@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <4DD079E7-8B29-4C7A-A06E-96BA37448858@gmail.com> <52964BD0.2050208@mail.ru> Message-ID: <52972222.3000503@karssen.org> Dear all, On 11/27/2013 08:45 PM, Maksim Struchalin wrote: > About package names: There are 5056 packages on CRAN. 42 of them are > data packages and 6 of them has name like packagename.data > ("cluster.datasets", "data.table", "gamlss.data", "g.data", > "survJamda.data" and "TH.data"). Thus, GenABELdata would be more in line > then GenABEL.data. I proposed GenABEL.data because I saw it appear multiple times on CRAN (although I didn't do the counting Maksim did). I still think having the dot in the name is better as it improves the readability of the package name, it provides a better "visual" queue as to what the package contains. The fact that most of the other data packages don't have a dot doesn't carry much weight in my opinion. I think the improvement in readability of the package name (with dot) outweighs the 'conform to the majority' argument. > > About submission to CRAN: I still see small warnings in R CMD --check > output and at least one FAILURE in test.polylik from RUnit test. I think > we can make the next submission to CRAN by the end of the next week: one > week for testing our new data package + fixing small errors. > > p.s. We should submit to CRAN GenABEL.data as well. I agree with Yurii that a jump in minor version number is warranted. Let's go for 1.8-0! Best, Lennart. > > best, > Maksim > > > > On 28/11/2013 02:07, Yury Aulchenko wrote: >> Wow, very impressive, Maksim! >> >> Can you please check if GenABEL.data complies with the naming >> conventions (I do not recall seeing the names with dots as package >> names; what other data-packages use as the names?) >> >> If naming is ok, do you think we are close to submit to CRAN? With so >> many changes, I think we should "jump" on the version number (say, to >> 1.8-0?) >> >> best, >> Yurii >> >> On Nov 27, 2013, at 19:58 PM, Maksim Struchalin >> > wrote: >> >>> Hi All, >>> >>> I created a GenABEL.data package where I moved the following data: >>> GenABEL/data/* , inst/exdata/srgenos.dat and >>> inst/exdata/srphenos.dat. All the corresponding files are deleted >>> from GenABEL. >>> Also, GenABEL.data contains R directory with three files (ge03d2c.R, >>> ge03d2ex.R and srdta.R). These scripts does not go to the final >>> distribution and needed only for possible future usage. >>> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after >>> running "R CMD build GenABEL.data". The directories "R" and "inst" >>> are removed by running GenABEL/data/clean.R in "build" process. May >>> be it is not a good idea to do it in such a way but, at least, it is >>> convinient and has no any reflection on end users (suggest a better >>> way plz). >>> >>> The way how GenABEL.data works now is not like how we discussed >>> below. It is impossible to generate files during "R CMD INSTALL" and >>> undisarable during "R CMD build". The best opition was just to move >>> all the data to GenABEL.data from GenABEL (like CRAN people >>> suggested). In this case, we can install GenABEL.data without having >>> GenABEL installed. After this, we install GenABELL. When we run >>> library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only >>> change for users is that they need to install two packages now >>> (GenABEL.data and GebABEL). >>> >>> Now we have sizes of both packages much smaller: 469K for GenABEL and >>> 2.4M for GenABEL.data. >>> >>> It should work now, but if you experience some problems, let me know. >>> >>> best, >>> Maksim >>> >>> >>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>> Hi Maksim, >>>> >>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>> I am still in the way of compressing GenABEL data. >>>>> To remind you: the idea consists of compressing the original data text >>>>> files and use them later for generating RData files (e.g. srdta). >>>>> >>>>> Yurii proposed to make RData files in examples which use them. I see now >>>>> only one way how this idea can be implemented. We replace "data(srdta)" >>>>> line in every file where it is used by a function e.g. "generate_srdt()" >>>>> which generate srdta object. The same procedure for other five *.RData >>>>> files from GenABEL/data. If we follow this way, we have to change 71 >>>>> files in man directory and, additionally to this, the GenABEL manual. >>>>> Also, users will not be able to load the srdta set (and others) by >>>>> typing "data(srdta)" in a command line (how they get used to) and has to >>>>> know that the function generate_srdt() now services for these needs. >>>>> This all sounds nasty :-). >>>> I'm not sure how many user actually type data(srdta), but I see you point. >>>> >>>>> Making the data during package installation time is also a bad idea as >>>>> Yurii noted below. Actually, this is impossible because the process of >>>>> making GenABEL data requires GenABEL functions which are not available >>>>> during installation time (they are avaialble only after GenABEL installed). >>>> Good point! >>>> >>>>> I see only one good solution now: move all the GenABEL data to a new >>>>> package e.g. GenABELdata as it was proposed by CRAN people from the >>>>> begining. In this case, it is possible to generate RData during >>>>> installation time using GenABEL functions (which are installed by that >>>>> time). I think this solution is paltform independent because R rules >>>>> permit runing *.R scripts to generate data during installation time. >>>>> >>>>> What do you think about making a data package for GenABEL? Do you think >>>>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>>>> DatABEL package instead of making *ABELdata data packages? >>>> Sounds like this is the best solution. Thanks for digging in to this. As >>>> for the package name, either GenABELdata or GenABEL.data sounds find >>>> with me (the latter one being a bit clearer in my opinion). >>>> >>>> >>>> Best, >>>> >>>> Lennart >>>> >>>>> best, >>>>> Maksim >>>>> >>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>>>> >>>>>>> Hi Maksim, >>>>>>> >>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>> In this email, I propose a new approach which allows to reduce total >>>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>>>> 12Mb to 6Mb. >>>>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>>>> >>>>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>>>> new package called GenABELdata and move all the data there. I run >>>>>>>> through the data and found that: >>>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>>>> -> 1.1Mb. >>>>>>>> - There is a function guzip() from library R.utils which can >>>>>>>> decompress the files. It works on any OS. >>>>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>>>> without decompression. >>>>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>>>> is just sitting there and eating space needlessly. >>>>>>> Sounds like a waste of space! >>>>>>> >>>>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>>>> files >>>>>>>> will be generated on the user computer based on the files from exdata. >>>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>>>> first >>>>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>>>> This sounds like a perfectly acceptable option. >>>>>> I suggest this is done in the "example" which make use of this data, >>>>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>>>> possible and interfere as little as possible with the usual workflow >>>>>> (which is very much system-specific, in that we will need to to test >>>>>> on all platforms) >>>>>> >>>>>> >>>>>>>> It will reduce >>>>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>>>> Fantastic! If no one has other objections I say: go ahead. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> >>>>>>>> Any objections/suggestions? >>>>>>>> >>>>>>>> best, >>>>>>>> Maksim >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> ----------------------------------------------------------------- >>>>>>> L.C. Karssen >>>>>>> Utrecht >>>>>>> The Netherlands >>>>>>> >>>>>>> lennart at karssen.org >>>>>>> http://blog.karssen.org >>>>>>> >>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>> ------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From nicola.pirastu at burlo.trieste.it Thu Nov 28 12:06:31 2013 From: nicola.pirastu at burlo.trieste.it (Nicola Pirastu) Date: Thu, 28 Nov 2013 12:06:31 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52972214.8030602@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> Message-ID: Hi all, I've been following this conversation with much interest although I'm sorry I can't contribute much. I was just wondering, could GenABEL.data not be just a dependency on GenABEL? This way installing GenABEL trough install.packages would result in the installation also of GenABEL.data without the user actually having to do it himself. Best. Nicola Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, Chirurgical and Health Department University of Trieste Medical Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. +390403785539 Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" ha scritto: > Hi Maksim, > > First of all, thanks for the good work! > > On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >> Hi All, >> >> I created a GenABEL.data package where I moved the following data: >> GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. >> All the corresponding files are deleted from GenABEL. >> Also, GenABEL.data contains R directory with three files (ge03d2c.R, >> ge03d2ex.R and srdta.R). These scripts does not go to the final >> distribution and needed only for possible future usage. >> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after >> running "R CMD build GenABEL.data". The directories "R" and "inst" are >> removed by running GenABEL/data/clean.R in "build" process. May be it is >> not a good idea to do it in such a way but, at least, it is convinient >> and has no any reflection on end users (suggest a better way plz). >> >> The way how GenABEL.data works now is not like how we discussed below. >> It is impossible to generate files during "R CMD INSTALL" and >> undisarable during "R CMD build". The best opition was just to move all >> the data to GenABEL.data from GenABEL (like CRAN people suggested). In >> this case, we can install GenABEL.data without having GenABEL installed. >> After this, we install GenABELL. > > This sounds very strange to me. Does the user first need to install the > GenABEL.data package and then the 'main' GenABEL package? Or do I > misunderstand you? > What happens if the user installs them in a different order? I guess > that shouldn't matter, right, as the package contains only data? > >> When we run library(GenABEL), it >> automaticly attaches GenBEL.data. Thus, the only change for users is >> that they need to install two packages now (GenABEL.data and GebABEL). > > And GenABEL.data is only needed if they actually want to use the > examples, right? > Or do we simply put GenABEL.data in the list of required packages in the > DESCRIPTION file? > > > Thanks, > > Lennart. > >> >> Now we have sizes of both packages much smaller: 469K for GenABEL and >> 2.4M for GenABEL.data. >> >> It should work now, but if you experience some problems, let me know. >> >> best, >> Maksim >> >> >> On 26/11/2013 20:48, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>> I am still in the way of compressing GenABEL data. >>>> To remind you: the idea consists of compressing the original data text >>>> files and use them later for generating RData files (e.g. srdta). >>>> >>>> Yurii proposed to make RData files in examples which use them. I see now >>>> only one way how this idea can be implemented. We replace "data(srdta)" >>>> line in every file where it is used by a function e.g. "generate_srdt()" >>>> which generate srdta object. The same procedure for other five *.RData >>>> files from GenABEL/data. If we follow this way, we have to change 71 >>>> files in man directory and, additionally to this, the GenABEL manual. >>>> Also, users will not be able to load the srdta set (and others) by >>>> typing "data(srdta)" in a command line (how they get used to) and has to >>>> know that the function generate_srdt() now services for these needs. >>>> This all sounds nasty :-). >>> I'm not sure how many user actually type data(srdta), but I see you point. >>> >>>> Making the data during package installation time is also a bad idea as >>>> Yurii noted below. Actually, this is impossible because the process of >>>> making GenABEL data requires GenABEL functions which are not available >>>> during installation time (they are avaialble only after GenABEL installed). >>> Good point! >>> >>>> I see only one good solution now: move all the GenABEL data to a new >>>> package e.g. GenABELdata as it was proposed by CRAN people from the >>>> begining. In this case, it is possible to generate RData during >>>> installation time using GenABEL functions (which are installed by that >>>> time). I think this solution is paltform independent because R rules >>>> permit runing *.R scripts to generate data during installation time. >>>> >>>> What do you think about making a data package for GenABEL? Do you think >>>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>>> DatABEL package instead of making *ABELdata data packages? >>> Sounds like this is the best solution. Thanks for digging in to this. As >>> for the package name, either GenABELdata or GenABEL.data sounds find >>> with me (the latter one being a bit clearer in my opinion). >>> >>> >>> Best, >>> >>> Lennart >>> >>>> best, >>>> Maksim >>>> >>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>>> >>>>>> Hi Maksim, >>>>>> >>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>> In this email, I propose a new approach which allows to reduce total >>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>>> 12Mb to 6Mb. >>>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>>> >>>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>>> new package called GenABELdata and move all the data there. I run >>>>>>> through the data and found that: >>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>>> -> 1.1Mb. >>>>>>> - There is a function guzip() from library R.utils which can >>>>>>> decompress the files. It works on any OS. >>>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>>> without decompression. >>>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>>> is just sitting there and eating space needlessly. >>>>>> Sounds like a waste of space! >>>>>> >>>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>>> files >>>>>>> will be generated on the user computer based on the files from exdata. >>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>>> first >>>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>>> This sounds like a perfectly acceptable option. >>>>> I suggest this is done in the "example" which make use of this data, >>>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>>> possible and interfere as little as possible with the usual workflow >>>>> (which is very much system-specific, in that we will need to to test >>>>> on all platforms) >>>>> >>>>> >>>>>>> It will reduce >>>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>>> Fantastic! If no one has other objections I say: go ahead. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> >>>>>>> Any objections/suggestions? >>>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>>> >>>>>> -- >>>>>> ----------------------------------------------------------------- >>>>>> L.C. Karssen >>>>>> Utrecht >>>>>> The Netherlands >>>>>> >>>>>> lennart at karssen.org >>>>>> http://blog.karssen.org >>>>>> >>>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>> ------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute nel messaggio o nei suoi allegati. Se non siete i destinatari indicati nel messaggio, o responsabili per la sua consegna alla persona, o se avete ricevuto il messaggio per errore, siete pregati di non trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY NOTICE Confidential information may be contained in this message or in its attachments. If you are not the addressee indicated in this message, or responsible for message delivering to that person, or if you have received this message in error, you may not transcribe, copy or deliver this message to anyone. In that case, you should delete this message and its attachments. Thank you. From yurii.aulchenko at gmail.com Thu Nov 28 12:12:04 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Thu, 28 Nov 2013 12:12:04 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> Message-ID: I would think that GenABEL(.)data is "suggested" and then any examples using the data from this packages start with something like if (require("GenABEL(.)data") ... How do other packages which lean on data-packages solve this? As for the "dot" - I do not have any strong opinion - both options seem ok to me :) best, Yurii On Nov 28, 2013, at 12:06 PM, Nicola Pirastu wrote: > Hi all, > > I've been following this conversation with much interest although I'm sorry I can't contribute much. > > I was just wondering, could GenABEL.data not be just a dependency on GenABEL? This way installing GenABEL trough install.packages would result in the installation also of GenABEL.data without the user actually having to do it himself. > > Best. > > Nicola > > > Dr. Nicola Pirastu PhD > Research Fellow > Medical Sciences, Chirurgical and Health Department > University of Trieste > Medical Genetics > IRCCS Burlo Garofolo > Via dell'Istria 65/1 > 34137 Italy > tel. +390403785539 > > Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" ha scritto: > >> Hi Maksim, >> >> First of all, thanks for the good work! >> >> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>> Hi All, >>> >>> I created a GenABEL.data package where I moved the following data: >>> GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. >>> All the corresponding files are deleted from GenABEL. >>> Also, GenABEL.data contains R directory with three files (ge03d2c.R, >>> ge03d2ex.R and srdta.R). These scripts does not go to the final >>> distribution and needed only for possible future usage. >>> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after >>> running "R CMD build GenABEL.data". The directories "R" and "inst" are >>> removed by running GenABEL/data/clean.R in "build" process. May be it is >>> not a good idea to do it in such a way but, at least, it is convinient >>> and has no any reflection on end users (suggest a better way plz). >>> >>> The way how GenABEL.data works now is not like how we discussed below. >>> It is impossible to generate files during "R CMD INSTALL" and >>> undisarable during "R CMD build". The best opition was just to move all >>> the data to GenABEL.data from GenABEL (like CRAN people suggested). In >>> this case, we can install GenABEL.data without having GenABEL installed. >>> After this, we install GenABELL. >> >> This sounds very strange to me. Does the user first need to install the >> GenABEL.data package and then the 'main' GenABEL package? Or do I >> misunderstand you? >> What happens if the user installs them in a different order? I guess >> that shouldn't matter, right, as the package contains only data? >> >>> When we run library(GenABEL), it >>> automaticly attaches GenBEL.data. Thus, the only change for users is >>> that they need to install two packages now (GenABEL.data and GebABEL). >> >> And GenABEL.data is only needed if they actually want to use the >> examples, right? >> Or do we simply put GenABEL.data in the list of required packages in the >> DESCRIPTION file? >> >> >> Thanks, >> >> Lennart. >> >>> >>> Now we have sizes of both packages much smaller: 469K for GenABEL and >>> 2.4M for GenABEL.data. >>> >>> It should work now, but if you experience some problems, let me know. >>> >>> best, >>> Maksim >>> >>> >>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>> Hi Maksim, >>>> >>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>> I am still in the way of compressing GenABEL data. >>>>> To remind you: the idea consists of compressing the original data text >>>>> files and use them later for generating RData files (e.g. srdta). >>>>> >>>>> Yurii proposed to make RData files in examples which use them. I see now >>>>> only one way how this idea can be implemented. We replace "data(srdta)" >>>>> line in every file where it is used by a function e.g. "generate_srdt()" >>>>> which generate srdta object. The same procedure for other five *.RData >>>>> files from GenABEL/data. If we follow this way, we have to change 71 >>>>> files in man directory and, additionally to this, the GenABEL manual. >>>>> Also, users will not be able to load the srdta set (and others) by >>>>> typing "data(srdta)" in a command line (how they get used to) and has to >>>>> know that the function generate_srdt() now services for these needs. >>>>> This all sounds nasty :-). >>>> I'm not sure how many user actually type data(srdta), but I see you point. >>>> >>>>> Making the data during package installation time is also a bad idea as >>>>> Yurii noted below. Actually, this is impossible because the process of >>>>> making GenABEL data requires GenABEL functions which are not available >>>>> during installation time (they are avaialble only after GenABEL installed). >>>> Good point! >>>> >>>>> I see only one good solution now: move all the GenABEL data to a new >>>>> package e.g. GenABELdata as it was proposed by CRAN people from the >>>>> begining. In this case, it is possible to generate RData during >>>>> installation time using GenABEL functions (which are installed by that >>>>> time). I think this solution is paltform independent because R rules >>>>> permit runing *.R scripts to generate data during installation time. >>>>> >>>>> What do you think about making a data package for GenABEL? Do you think >>>>> the name GenABELdata is ok? May be we can move all the *ABEL data in >>>>> DatABEL package instead of making *ABELdata data packages? >>>> Sounds like this is the best solution. Thanks for digging in to this. As >>>> for the package name, either GenABELdata or GenABEL.data sounds find >>>> with me (the latter one being a bit clearer in my opinion). >>>> >>>> >>>> Best, >>>> >>>> Lennart >>>> >>>>> best, >>>>> Maksim >>>>> >>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen wrote: >>>>>> >>>>>>> Hi Maksim, >>>>>>> >>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>> In this email, I propose a new approach which allows to reduce total >>>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from >>>>>>>> 12Mb to 6Mb. >>>>>>> I gues you mean B (bytes) instead of b (bits) here :-). >>>>>>> >>>>>>>> "R CMD check --as-cran" reports that the following sub-directories have >>>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the >>>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a >>>>>>>> new package called GenABELdata and move all the data there. I run >>>>>>>> through the data and found that: >>>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb >>>>>>>> -> 1.1Mb. >>>>>>>> - There is a function guzip() from library R.utils which can >>>>>>>> decompress the files. It works on any OS. >>>>>>>> - Moreover: the native R function read.table() can read gzip files >>>>>>>> without decompression. >>>>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is >>>>>>>> used only once a long time ago for generating "srdta.RData" and now it >>>>>>>> is just sitting there and eating space needlessly. >>>>>>> Sounds like a waste of space! >>>>>>> >>>>>>>> 2) We can delete some files from the "data" directory. The deleted >>>>>>>> files >>>>>>>> will be generated on the user computer based on the files from exdata. >>>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the >>>>>>>> first >>>>>>>> load through (|run funcion .onAttach() in R/zzz.R|). >>>>>>> This sounds like a perfectly acceptable option. >>>>>> I suggest this is done in the "example" which make use of this data, >>>>>> NOT in the INSTALL etc. - we should make things as "robust" as >>>>>> possible and interfere as little as possible with the usual workflow >>>>>> (which is very much system-specific, in that we will need to to test >>>>>> on all platforms) >>>>>> >>>>>> >>>>>>>> It will reduce >>>>>>>> total size of "data" directory from 2.3Mb to 800Kb. >>>>>>> Fantastic! If no one has other objections I say: go ahead. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> >>>>>>>> Any objections/suggestions? >>>>>>>> >>>>>>>> best, >>>>>>>> Maksim >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> ----------------------------------------------------------------- >>>>>>> L.C. Karssen >>>>>>> Utrecht >>>>>>> The Netherlands >>>>>>> >>>>>>> lennart at karssen.org >>>>>>> http://blog.karssen.org >>>>>>> >>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! >>>>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>> ------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute nel messaggio o nei suoi allegati. Se non siete i destinatari indicati nel messaggio, o responsabili per la sua consegna alla persona, o se avete ricevuto il messaggio per errore, siete pregati di non trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY NOTICE Confidential information may be contained in this message or in its attachments. If you are not the addressee indicated in this message, or responsible for message delivering to that person, or if you have received this message in error, you may not transcribe, copy or deliver this message to anyone. In that case, you should delete this message and its attachments. Thank you. > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Thu Nov 28 12:24:39 2013 From: lennart at karssen.org (L.C. Karssen) Date: Thu, 28 Nov 2013 12:24:39 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> Message-ID: <529727F7.1000204@karssen.org> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: > I would think that GenABEL(.)data is "suggested" and then any > examples using the data from this packages start with something like > > if (require("GenABEL(.)data") ... This sounds like a good solution. > > How do other packages which lean on data-packages solve this? > > As for the "dot" - I do not have any strong opinion - both options > seem ok to me :) Great :-). Then I propose (of course) to stick with the dot, also because that's already used now. Best, Lennart. > > best, Yurii > > > On Nov 28, 2013, at 12:06 PM, Nicola Pirastu > wrote: > >> Hi all, >> >> I've been following this conversation with much interest although >> I'm sorry I can't contribute much. >> >> I was just wondering, could GenABEL.data not be just a dependency >> on GenABEL? This way installing GenABEL trough install.packages >> would result in the installation also of GenABEL.data without the >> user actually having to do it himself. >> >> Best. >> >> Nicola >> >> >> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >> Chirurgical and Health Department University of Trieste Medical >> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >> +390403785539 >> >> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >> ha scritto: >> >>> Hi Maksim, >>> >>> First of all, thanks for the good work! >>> >>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>> Hi All, >>>> >>>> I created a GenABEL.data package where I moved the following >>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>> inst/exdata/srphenos.dat. All the corresponding files are >>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>> scripts does not go to the final distribution and needed only >>>> for possible future usage. Only GenABEL.data/data/* files go to >>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>> GenABEL.data". The directories "R" and "inst" are removed by >>>> running GenABEL/data/clean.R in "build" process. May be it is >>>> not a good idea to do it in such a way but, at least, it is >>>> convinient and has no any reflection on end users (suggest a >>>> better way plz). >>>> >>>> The way how GenABEL.data works now is not like how we discussed >>>> below. It is impossible to generate files during "R CMD >>>> INSTALL" and undisarable during "R CMD build". The best opition >>>> was just to move all the data to GenABEL.data from GenABEL >>>> (like CRAN people suggested). In this case, we can install >>>> GenABEL.data without having GenABEL installed. After this, we >>>> install GenABELL. >>> >>> This sounds very strange to me. Does the user first need to >>> install the GenABEL.data package and then the 'main' GenABEL >>> package? Or do I misunderstand you? What happens if the user >>> installs them in a different order? I guess that shouldn't >>> matter, right, as the package contains only data? >>> >>>> When we run library(GenABEL), it automaticly attaches >>>> GenBEL.data. Thus, the only change for users is that they need >>>> to install two packages now (GenABEL.data and GebABEL). >>> >>> And GenABEL.data is only needed if they actually want to use the >>> examples, right? Or do we simply put GenABEL.data in the list of >>> required packages in the DESCRIPTION file? >>> >>> >>> Thanks, >>> >>> Lennart. >>> >>>> >>>> Now we have sizes of both packages much smaller: 469K for >>>> GenABEL and 2.4M for GenABEL.data. >>>> >>>> It should work now, but if you experience some problems, let me >>>> know. >>>> >>>> best, Maksim >>>> >>>> >>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>> Hi Maksim, >>>>> >>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>> I am still in the way of compressing GenABEL data. To >>>>>> remind you: the idea consists of compressing the original >>>>>> data text files and use them later for generating RData >>>>>> files (e.g. srdta). >>>>>> >>>>>> Yurii proposed to make RData files in examples which use >>>>>> them. I see now only one way how this idea can be >>>>>> implemented. We replace "data(srdta)" line in every file >>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>> generate srdta object. The same procedure for other five >>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>> have to change 71 files in man directory and, additionally >>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>> in a command line (how they get used to) and has to know >>>>>> that the function generate_srdt() now services for these >>>>>> needs. This all sounds nasty :-). >>>>> I'm not sure how many user actually type data(srdta), but I >>>>> see you point. >>>>> >>>>>> Making the data during package installation time is also a >>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>> because the process of making GenABEL data requires GenABEL >>>>>> functions which are not available during installation time >>>>>> (they are avaialble only after GenABEL installed). >>>>> Good point! >>>>> >>>>>> I see only one good solution now: move all the GenABEL data >>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>> CRAN people from the begining. In this case, it is possible >>>>>> to generate RData during installation time using GenABEL >>>>>> functions (which are installed by that time). I think this >>>>>> solution is paltform independent because R rules permit >>>>>> runing *.R scripts to generate data during installation >>>>>> time. >>>>>> >>>>>> What do you think about making a data package for GenABEL? >>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>> all the *ABEL data in DatABEL package instead of making >>>>>> *ABELdata data packages? >>>>> Sounds like this is the best solution. Thanks for digging in >>>>> to this. As for the package name, either GenABELdata or >>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>> clearer in my opinion). >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart >>>>> >>>>>> best, Maksim >>>>>> >>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Maksim, >>>>>>>> >>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>> :-). >>>>>>>> >>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>> all the data there. I run through the data and found >>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>> function guzip() from library R.utils which can >>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>> the native R function read.table() can read gzip >>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>> now it is just sitting there and eating space >>>>>>>>> needlessly. >>>>>>>> Sounds like a waste of space! >>>>>>>> >>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>> in R/zzz.R|). >>>>>>>> This sounds like a perfectly acceptable option. >>>>>>> I suggest this is done in the "example" which make use of >>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>> things as "robust" as possible and interfere as little as >>>>>>> possible with the usual workflow (which is very much >>>>>>> system-specific, in that we will need to to test on all >>>>>>> platforms) >>>>>>> >>>>>>> >>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>> 2.3Mb to 800Kb. >>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>> ahead. >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>>> Any objections/suggestions? >>>>>>>>> >>>>>>>>> best, Maksim >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> -- >>>>>>>> ----------------------------------------------------------------- >>>>>>>> >>>>>>>> L.C. Karssen >>>>>>>> Utrecht The Netherlands >>>>>>>> >>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>> >>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>> >>>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> >>>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> >>>>> >>>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> >>>> >>>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> >>> >>>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>> Utrecht The Netherlands >>> >>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> _______________________________________________ genabel-devel >>> mailing list genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute nel messaggio o nei suoi allegati. Se non siete i destinatari indicati nel messaggio, o responsabili per la sua consegna alla persona, o se avete ricevuto il messaggio per errore, siete pregati di non trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY NOTICE Confidential information may be contained in this message or in its attachments. If you are not the addressee indicated in this message, or responsible for message delivering to that person, or if you have received this message in error, you may not transcribe, copy or deliver this message to anyone. In that case, you should delete this message and its attachments. Thank you. >> _______________________________________________ genabel-devel >> mailing list genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > >> -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 08:43:50 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 14:43:50 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529727F7.1000204@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> Message-ID: <529845B6.7060009@mail.ru> I looked at how other developres deal with issue of dependency between a package and its data.package. I checked out two random packages from CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA and gamlss) dependes on their data packages - that means their DESCRIPTION files contain a reference to their data packages in the "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not Depends/Suggests gamlss). When I made GenABEL depending on GenABEL.data, I kept in my mind the same idea as Nicola pronounced below - that, in this case, GenABEL.data is installed automaticly when users run "install.package(GenABEL)". This is convinient for users who install GenABEL from CRAN and this is in line with GANPA and gamlss but it, probably, does not fully reflect the GenABEL reality. The dependency between GenABEL and GenABL.data is weak - GenABEL is gonna be mostly used without GenABEL.data. So, I support the Yurii's idea about making GenABEL.data as 'suggested' and including 'requre(...'. About dot: Personally, I like GenABEL.data. From this name, It is clear that this package is some kind of a 'subpackage' of GenABEL package and it is not a standalone one. best, Maksim On 28/11/2013 18:24, L.C. Karssen wrote: > > On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >> I would think that GenABEL(.)data is "suggested" and then any >> examples using the data from this packages start with something like >> >> if (require("GenABEL(.)data") ... > This sounds like a good solution. > >> How do other packages which lean on data-packages solve this? >> >> As for the "dot" - I do not have any strong opinion - both options >> seem ok to me :) > Great :-). Then I propose (of course) to stick with the dot, also > because that's already used now. > > > Best, > > Lennart. > > >> best, Yurii >> >> >> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >> wrote: >> >>> Hi all, >>> >>> I've been following this conversation with much interest although >>> I'm sorry I can't contribute much. >>> >>> I was just wondering, could GenABEL.data not be just a dependency >>> on GenABEL? This way installing GenABEL trough install.packages >>> would result in the installation also of GenABEL.data without the >>> user actually having to do it himself. >>> >>> Best. >>> >>> Nicola >>> >>> >>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>> Chirurgical and Health Department University of Trieste Medical >>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>> +390403785539 >>> >>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>> ha scritto: >>> >>>> Hi Maksim, >>>> >>>> First of all, thanks for the good work! >>>> >>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>> Hi All, >>>>> >>>>> I created a GenABEL.data package where I moved the following >>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>> scripts does not go to the final distribution and needed only >>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>> not a good idea to do it in such a way but, at least, it is >>>>> convinient and has no any reflection on end users (suggest a >>>>> better way plz). >>>>> >>>>> The way how GenABEL.data works now is not like how we discussed >>>>> below. It is impossible to generate files during "R CMD >>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>> was just to move all the data to GenABEL.data from GenABEL >>>>> (like CRAN people suggested). In this case, we can install >>>>> GenABEL.data without having GenABEL installed. After this, we >>>>> install GenABELL. >>>> This sounds very strange to me. Does the user first need to >>>> install the GenABEL.data package and then the 'main' GenABEL >>>> package? Or do I misunderstand you? What happens if the user >>>> installs them in a different order? I guess that shouldn't >>>> matter, right, as the package contains only data? >>>> >>>>> When we run library(GenABEL), it automaticly attaches >>>>> GenBEL.data. Thus, the only change for users is that they need >>>>> to install two packages now (GenABEL.data and GebABEL). >>>> And GenABEL.data is only needed if they actually want to use the >>>> examples, right? Or do we simply put GenABEL.data in the list of >>>> required packages in the DESCRIPTION file? >>>> >>>> >>>> Thanks, >>>> >>>> Lennart. >>>> >>>>> Now we have sizes of both packages much smaller: 469K for >>>>> GenABEL and 2.4M for GenABEL.data. >>>>> >>>>> It should work now, but if you experience some problems, let me >>>>> know. >>>>> >>>>> best, Maksim >>>>> >>>>> >>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>> Hi Maksim, >>>>>> >>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>> remind you: the idea consists of compressing the original >>>>>>> data text files and use them later for generating RData >>>>>>> files (e.g. srdta). >>>>>>> >>>>>>> Yurii proposed to make RData files in examples which use >>>>>>> them. I see now only one way how this idea can be >>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>> generate srdta object. The same procedure for other five >>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>> have to change 71 files in man directory and, additionally >>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>> in a command line (how they get used to) and has to know >>>>>>> that the function generate_srdt() now services for these >>>>>>> needs. This all sounds nasty :-). >>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>> see you point. >>>>>> >>>>>>> Making the data during package installation time is also a >>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>> functions which are not available during installation time >>>>>>> (they are avaialble only after GenABEL installed). >>>>>> Good point! >>>>>> >>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>> to generate RData during installation time using GenABEL >>>>>>> functions (which are installed by that time). I think this >>>>>>> solution is paltform independent because R rules permit >>>>>>> runing *.R scripts to generate data during installation >>>>>>> time. >>>>>>> >>>>>>> What do you think about making a data package for GenABEL? >>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>> *ABELdata data packages? >>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>> to this. As for the package name, either GenABELdata or >>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>> clearer in my opinion). >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart >>>>>> >>>>>>> best, Maksim >>>>>>> >>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Maksim, >>>>>>>>> >>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>> :-). >>>>>>>>> >>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>> needlessly. >>>>>>>>> Sounds like a waste of space! >>>>>>>>> >>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>> in R/zzz.R|). >>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>> possible with the usual workflow (which is very much >>>>>>>> system-specific, in that we will need to to test on all >>>>>>>> platforms) >>>>>>>> >>>>>>>> >>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>> ahead. >>>>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Any objections/suggestions? >>>>>>>>>> >>>>>>>>>> best, Maksim >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>>>>> > -- >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> >>>>>>>>> > L.C. Karssen >>>>>>>>> Utrecht The Netherlands >>>>>>>>> >>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>> >>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>>> >>>>>>>>> > ------------------------------------------------------------------ >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> > _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> > _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> > _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> >>>>> > _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> > -- >>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>> Utrecht The Netherlands >>>> >>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>> >>>> _______________________________________________ genabel-devel >>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> > AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute > nel messaggio o nei suoi allegati. Se non siete i destinatari indicati > nel messaggio, o responsabili per la sua consegna alla persona, o se > avete ricevuto il messaggio per errore, siete pregati di non > trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a > cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY > NOTICE Confidential information may be contained in this message or in > its attachments. If you are not the addressee indicated in this message, > or responsible for message delivering to that person, or if you have > received this message in error, you may not transcribe, copy or deliver > this message to anyone. In that case, you should delete this message and > its attachments. Thank you. >>> _______________________________________________ genabel-devel >>> mailing list genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Fri Nov 29 10:20:33 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 16:20:33 +0700 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> Message-ID: <52985C61.2070908@mail.ru> Hi Yakov, I made this change because when I run "R CMD check --as-cran ", I got a message: ______________________________________________ * checking Rd line widths ... NOTE Rd file ?/home/maksim/work/GenABEL_project/genabel/pkg/GenABEL.Rcheck/00_pkg_src/GenABEL/man/PGC.Rd?: \examples lines wider than 100 characters: #result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) ______________________________________________ Here are guidlines how to write .Rd files (http://developer.r-project.org/Rds.html). If I were you, I would replace "result=PGC(d..." by "result<-PGC(d..." because using <- instead of = is common R rule for assignments. It recommends to keep the line length in .Rd file not more then 65 characters but I guess this an old rule and now the limit is 100 characters. best, Maksim On 28/11/2013 16:38, Yury Aulchenko wrote: > Yakov, > > please pay attention to this commit (I know that you were changing the code of the GC procedures recently) > > YA > > On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: > >> Author: maksim >> Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) >> New Revision: 1429 >> >> Modified: >> pkg/GenABEL/man/PGC.Rd >> Log: >> Deleteted comments because R CMD check --as-cran said that these lines are wider than 100 characters and they will be truncated in the PDF manual. This should not have any influence on the code generated from it. >> >> Modified: pkg/GenABEL/man/PGC.Rd >> =================================================================== >> --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) >> +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) >> @@ -63,8 +63,6 @@ >> s <- summary(ge03d2) >> freq <- s$Q.2 >> result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, plot=TRUE, lmax=1.1,start.corr=FALSE) >> -#"group_regress" is better to use when we have more than 50K SNPs >> -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) >> } >> \author{ >> Yakov Tsepilov >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From lennart at karssen.org Fri Nov 29 10:25:57 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 10:25:57 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <52985C61.2070908@mail.ru> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> <52985C61.2070908@mail.ru> Message-ID: <52985DA5.80400@karssen.org> Dear Maksim, Yakov, others, About coding style, please see the document on GenABEL coding standards (work in progress) we created some time ago: http://genabel.r-forge.r-project.org/codingstyle.html It would be great to have uniform coding in all of the GenABEL suite. By the way, suggestions for additions/corrections of the document are welcome! Lennart. On 11/29/2013 10:20 AM, Maksim Struchalin wrote: > Hi Yakov, > > I made this change because when I run "R CMD check --as-cran ", I got a > message: > ______________________________________________ > * checking Rd line widths ... NOTE > Rd file > ?/home/maksim/work/GenABEL_project/genabel/pkg/GenABEL.Rcheck/00_pkg_src/GenABEL/man/PGC.Rd?: > > \examples lines wider than 100 characters: > #result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, > plot=TRUE, start.corr=FALSE,n_quiantile=3) > ______________________________________________ > > Here are guidlines how to write .Rd files > (http://developer.r-project.org/Rds.html). > If I were you, I would replace "result=PGC(d..." by "result<-PGC(d..." > because using <- instead of = is common R rule for assignments. > > It recommends to keep the line length in .Rd file not more then 65 > characters but I guess this an old rule and now the limit is 100 > characters. > > best, > Maksim > > On 28/11/2013 16:38, Yury Aulchenko wrote: >> Yakov, >> >> please pay attention to this commit (I know that you were changing the >> code of the GC procedures recently) >> >> YA >> >> On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: >> >>> Author: maksim >>> Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) >>> New Revision: 1429 >>> >>> Modified: >>> pkg/GenABEL/man/PGC.Rd >>> Log: >>> Deleteted comments because R CMD check --as-cran said that these >>> lines are wider than 100 characters and they will be truncated in the >>> PDF manual. This should not have any influence on the code generated >>> from it. >>> >>> Modified: pkg/GenABEL/man/PGC.Rd >>> =================================================================== >>> --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) >>> +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) >>> @@ -63,8 +63,6 @@ >>> s <- summary(ge03d2) >>> freq <- s$Q.2 >>> result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, >>> plot=TRUE, lmax=1.1,start.corr=FALSE) >>> -#"group_regress" is better to use when we have more than 50K SNPs >>> -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, >>> pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) >>> } >>> \author{ >>> Yakov Tsepilov >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From lennart at karssen.org Fri Nov 29 10:36:10 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 10:36:10 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529845B6.7060009@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> Message-ID: <5298600A.40607@karssen.org> Hi Maksim, On 11/29/2013 08:43 AM, Maksim Struchalin wrote: > I looked at how other developres deal with issue of dependency between a > package and its data.package. I checked out two random packages from > CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA > and gamlss) dependes on their data packages - that means their > DESCRIPTION files contain a reference to their data packages in the > "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not > Depends/Suggests gamlss). > > When I made GenABEL depending on GenABEL.data, I kept in my mind the > same idea as Nicola pronounced below - that, in this case, GenABEL.data > is installed automaticly when users run "install.package(GenABEL)". This > is convinient for users who install GenABEL from CRAN and this is in > line with GANPA and gamlss but it, probably, does not fully reflect the > GenABEL reality. The dependency between GenABEL and GenABL.data is weak > - GenABEL is gonna be mostly used without GenABEL.data. So, I support > the Yurii's idea about making GenABEL.data as 'suggested' and including > 'requre(...'. > I agree with you that the dependence between GA and GA.data is rather weak. On the other hand, why not keep GA.data in Depends? That gives the same behaviour as before (install everything by default). Sounds convenient to me. With modern internet bandwidth the few MB of the data package are not a problem. > About dot: Personally, I like GenABEL.data. From this name, It is clear > that this package is some kind of a 'subpackage' of GenABEL package and > it is not a standalone one. Good point! Best regards, Lennart. > > best, > Maksim > > On 28/11/2013 18:24, L.C. Karssen wrote: >> >> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>> I would think that GenABEL(.)data is "suggested" and then any >>> examples using the data from this packages start with something like >>> >>> if (require("GenABEL(.)data") ... >> This sounds like a good solution. >> >>> How do other packages which lean on data-packages solve this? >>> >>> As for the "dot" - I do not have any strong opinion - both options >>> seem ok to me :) >> Great :-). Then I propose (of course) to stick with the dot, also >> because that's already used now. >> >> >> Best, >> >> Lennart. >> >> >>> best, Yurii >>> >>> >>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>> wrote: >>> >>>> Hi all, >>>> >>>> I've been following this conversation with much interest although >>>> I'm sorry I can't contribute much. >>>> >>>> I was just wondering, could GenABEL.data not be just a dependency >>>> on GenABEL? This way installing GenABEL trough install.packages >>>> would result in the installation also of GenABEL.data without the >>>> user actually having to do it himself. >>>> >>>> Best. >>>> >>>> Nicola >>>> >>>> >>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>> Chirurgical and Health Department University of Trieste Medical >>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>> +390403785539 >>>> >>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>> ha scritto: >>>> >>>>> Hi Maksim, >>>>> >>>>> First of all, thanks for the good work! >>>>> >>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>> Hi All, >>>>>> >>>>>> I created a GenABEL.data package where I moved the following >>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>> scripts does not go to the final distribution and needed only >>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>> not a good idea to do it in such a way but, at least, it is >>>>>> convinient and has no any reflection on end users (suggest a >>>>>> better way plz). >>>>>> >>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>> below. It is impossible to generate files during "R CMD >>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>> (like CRAN people suggested). In this case, we can install >>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>> install GenABELL. >>>>> This sounds very strange to me. Does the user first need to >>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>> package? Or do I misunderstand you? What happens if the user >>>>> installs them in a different order? I guess that shouldn't >>>>> matter, right, as the package contains only data? >>>>> >>>>>> When we run library(GenABEL), it automaticly attaches >>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>> And GenABEL.data is only needed if they actually want to use the >>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>> required packages in the DESCRIPTION file? >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Lennart. >>>>> >>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>> >>>>>> It should work now, but if you experience some problems, let me >>>>>> know. >>>>>> >>>>>> best, Maksim >>>>>> >>>>>> >>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>> Hi Maksim, >>>>>>> >>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>> remind you: the idea consists of compressing the original >>>>>>>> data text files and use them later for generating RData >>>>>>>> files (e.g. srdta). >>>>>>>> >>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>> them. I see now only one way how this idea can be >>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>> generate srdta object. The same procedure for other five >>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>> in a command line (how they get used to) and has to know >>>>>>>> that the function generate_srdt() now services for these >>>>>>>> needs. This all sounds nasty :-). >>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>> see you point. >>>>>>> >>>>>>>> Making the data during package installation time is also a >>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>> functions which are not available during installation time >>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>> Good point! >>>>>>> >>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>> to generate RData during installation time using GenABEL >>>>>>>> functions (which are installed by that time). I think this >>>>>>>> solution is paltform independent because R rules permit >>>>>>>> runing *.R scripts to generate data during installation >>>>>>>> time. >>>>>>>> >>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>> *ABELdata data packages? >>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>> to this. As for the package name, either GenABELdata or >>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>> clearer in my opinion). >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart >>>>>>> >>>>>>>> best, Maksim >>>>>>>> >>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Maksim, >>>>>>>>>> >>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>> :-). >>>>>>>>>> >>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>> needlessly. >>>>>>>>>> Sounds like a waste of space! >>>>>>>>>> >>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>> in R/zzz.R|). >>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>> platforms) >>>>>>>>> >>>>>>>>> >>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>> ahead. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Lennart. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>> >>>>>>>>>>> best, Maksim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>>>>> >> -- >>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> >> L.C. Karssen >>>>>>>>>> Utrecht The Netherlands >>>>>>>>>> >>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>> >>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>>>> >>>>>>>>>> >> ------------------------------------------------------------------ >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>>> >> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> >>>>>> >> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> >> -- >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>> Utrecht The Netherlands >>>>> >>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>> >>>>> _______________________________________________ genabel-devel >>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >> nel messaggio, o responsabili per la sua consegna alla persona, o se >> avete ricevuto il messaggio per errore, siete pregati di non >> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a >> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >> NOTICE Confidential information may be contained in this message or in >> its attachments. If you are not the addressee indicated in this message, >> or responsible for message delivering to that person, or if you have >> received this message in error, you may not transcribe, copy or deliver >> this message to anyone. In that case, you should delete this message and >> its attachments. Thank you. >>>> _______________________________________________ genabel-devel >>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From yurii.aulchenko at gmail.com Fri Nov 29 10:43:34 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 29 Nov 2013 10:43:34 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <5298600A.40607@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> Message-ID: <-3891723065030522185@unknownmsgid> Lennart, Good point about "depends"! Again, my question would be how other people do it? Y ---------------------- Yurii Aulchenko (sent from mobile device) > On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: > > Hi Maksim, > >> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >> I looked at how other developres deal with issue of dependency between a >> package and its data.package. I checked out two random packages from >> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >> and gamlss) dependes on their data packages - that means their >> DESCRIPTION files contain a reference to their data packages in the >> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >> Depends/Suggests gamlss). >> >> When I made GenABEL depending on GenABEL.data, I kept in my mind the >> same idea as Nicola pronounced below - that, in this case, GenABEL.data >> is installed automaticly when users run "install.package(GenABEL)". This >> is convinient for users who install GenABEL from CRAN and this is in >> line with GANPA and gamlss but it, probably, does not fully reflect the >> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >> the Yurii's idea about making GenABEL.data as 'suggested' and including >> 'requre(...'. > > I agree with you that the dependence between GA and GA.data is rather > weak. On the other hand, why not keep GA.data in Depends? That gives the > same behaviour as before (install everything by default). Sounds > convenient to me. > With modern internet bandwidth the few MB of the data package are not a > problem. > >> About dot: Personally, I like GenABEL.data. From this name, It is clear >> that this package is some kind of a 'subpackage' of GenABEL package and >> it is not a standalone one. > > Good point! > > > Best regards, > > Lennart. > >> >> best, >> Maksim >> >>> On 28/11/2013 18:24, L.C. Karssen wrote: >>> >>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>> I would think that GenABEL(.)data is "suggested" and then any >>>> examples using the data from this packages start with something like >>>> >>>> if (require("GenABEL(.)data") ... >>> This sounds like a good solution. >>> >>>> How do other packages which lean on data-packages solve this? >>>> >>>> As for the "dot" - I do not have any strong opinion - both options >>>> seem ok to me :) >>> Great :-). Then I propose (of course) to stick with the dot, also >>> because that's already used now. >>> >>> >>> Best, >>> >>> Lennart. >>> >>> >>>> best, Yurii >>>> >>>> >>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've been following this conversation with much interest although >>>>> I'm sorry I can't contribute much. >>>>> >>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>> would result in the installation also of GenABEL.data without the >>>>> user actually having to do it himself. >>>>> >>>>> Best. >>>>> >>>>> Nicola >>>>> >>>>> >>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>> Chirurgical and Health Department University of Trieste Medical >>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>> +390403785539 >>>>> >>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>> ha scritto: >>>>> >>>>>> Hi Maksim, >>>>>> >>>>>> First of all, thanks for the good work! >>>>>> >>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I created a GenABEL.data package where I moved the following >>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>> scripts does not go to the final distribution and needed only >>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>> better way plz). >>>>>>> >>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>> below. It is impossible to generate files during "R CMD >>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>> install GenABELL. >>>>>> This sounds very strange to me. Does the user first need to >>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>> package? Or do I misunderstand you? What happens if the user >>>>>> installs them in a different order? I guess that shouldn't >>>>>> matter, right, as the package contains only data? >>>>>> >>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>> required packages in the DESCRIPTION file? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Lennart. >>>>>> >>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>> >>>>>>> It should work now, but if you experience some problems, let me >>>>>>> know. >>>>>>> >>>>>>> best, Maksim >>>>>>> >>>>>>> >>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>> Hi Maksim, >>>>>>>> >>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>> data text files and use them later for generating RData >>>>>>>>> files (e.g. srdta). >>>>>>>>> >>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>> needs. This all sounds nasty :-). >>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>> see you point. >>>>>>>> >>>>>>>>> Making the data during package installation time is also a >>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>> functions which are not available during installation time >>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>> Good point! >>>>>>>> >>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>> time. >>>>>>>>> >>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>> *ABELdata data packages? >>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>> clearer in my opinion). >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Lennart >>>>>>>> >>>>>>>>> best, Maksim >>>>>>>>> >>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>> :-). >>>>>>>>>>> >>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>> needlessly. >>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>> >>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>> platforms) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>> ahead. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Lennart. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> -- >>>>>>>>>>> ----------------------------------------------------------------- >>> L.C. Karssen >>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>> >>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>> >>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>> ------------------------------------------------------------------ >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> -- >>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>> Utrecht The Netherlands >>>>>> >>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>> >>>>>> _______________________________________________ genabel-devel >>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>> avete ricevuto il messaggio per errore, siete pregati di non >>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a >>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>> NOTICE Confidential information may be contained in this message or in >>> its attachments. If you are not the addressee indicated in this message, >>> or responsible for message delivering to that person, or if you have >>> received this message in error, you may not transcribe, copy or deliver >>> this message to anyone. In that case, you should delete this message and >>> its attachments. Thank you. >>>>> _______________________________________________ genabel-devel >>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From m.v.struchalin at mail.ru Fri Nov 29 10:54:57 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 16:54:57 +0700 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <52985DA5.80400@karssen.org> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> <52985C61.2070908@mail.ru> <52985DA5.80400@karssen.org> Message-ID: <52986471.8070101@mail.ru> Thanks you Lennart So, I guess 80 lines should be a limit for PGC.Rd. Two suggestions to the document: 1) I would add a recommendation to leave comments to the C/R code. Sometimes it is just impossible to understand what is some code for. 2) Add description at the head of to each file with code like. For example, in every file with code I ever wrote I have a small description on the top like: //#===================================================================================== //# //# Filename: gtps_container.h //# //# Description: Set of functions for merging two snp.data class objects. //# //# Version: 1.0 //# Created: 18-March-2008 //# Revision: none //# //# //# Author: Maksim V. Struchalin, Yurii S. Aulchenko //# Company: ErasmusMC, Epidemiology & Biostatistics Department, The Netherlands. //# Email: m.struchalin@@erasmusmc.nl, i.aoultchenko at erasmusmc.nl //# //#===================================================================================== from which it is clear who wrote the file (person's name and contacts), when and for whom. This is convinient for other developers who is gonna edit this file in the future to address their quiestions directly to the person who wrote the original code. best, Maksim On 29/11/2013 16:25, L.C. Karssen wrote: > Dear Maksim, Yakov, others, > > About coding style, please see the document on GenABEL coding standards > (work in progress) we created some time ago: > http://genabel.r-forge.r-project.org/codingstyle.html > > It would be great to have uniform coding in all of the GenABEL suite. > > By the way, suggestions for additions/corrections of the document are > welcome! > > > Lennart. > > > > On 11/29/2013 10:20 AM, Maksim Struchalin wrote: >> Hi Yakov, >> >> I made this change because when I run "R CMD check --as-cran ", I got a >> message: >> ______________________________________________ >> * checking Rd line widths ... NOTE >> Rd file >> '/home/maksim/work/GenABEL_project/genabel/pkg/GenABEL.Rcheck/00_pkg_src/GenABEL/man/PGC.Rd': >> >> \examples lines wider than 100 characters: >> #result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, >> plot=TRUE, start.corr=FALSE,n_quiantile=3) >> ______________________________________________ >> >> Here are guidlines how to write .Rd files >> (http://developer.r-project.org/Rds.html). >> If I were you, I would replace "result=PGC(d..." by "result<-PGC(d..." >> because using <- instead of = is common R rule for assignments. >> >> It recommends to keep the line length in .Rd file not more then 65 >> characters but I guess this an old rule and now the limit is 100 >> characters. >> >> best, >> Maksim >> >> On 28/11/2013 16:38, Yury Aulchenko wrote: >>> Yakov, >>> >>> please pay attention to this commit (I know that you were changing the >>> code of the GC procedures recently) >>> >>> YA >>> >>> On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: >>> >>>> Author: maksim >>>> Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) >>>> New Revision: 1429 >>>> >>>> Modified: >>>> pkg/GenABEL/man/PGC.Rd >>>> Log: >>>> Deleteted comments because R CMD check --as-cran said that these >>>> lines are wider than 100 characters and they will be truncated in the >>>> PDF manual. This should not have any influence on the code generated >>>> from it. >>>> >>>> Modified: pkg/GenABEL/man/PGC.Rd >>>> =================================================================== >>>> --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) >>>> +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) >>>> @@ -63,8 +63,6 @@ >>>> s <- summary(ge03d2) >>>> freq <- s$Q.2 >>>> result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, >>>> plot=TRUE, lmax=1.1,start.corr=FALSE) >>>> -#"group_regress" is better to use when we have more than 50K SNPs >>>> -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, >>>> pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) >>>> } >>>> \author{ >>>> Yakov Tsepilov >>>> >>>> _______________________________________________ >>>> Genabel-commits mailing list >>>> Genabel-commits at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 29 11:14:42 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 11:14:42 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <52986471.8070101@mail.ru> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> <52985C61.2070908@mail.ru> <52985DA5.80400@karssen.org> <52986471.8070101@mail.ru> Message-ID: <52986912.4070108@karssen.org> Good suggestions. On 11/29/2013 10:54 AM, Maksim Struchalin wrote: > Thanks you Lennart > > So, I guess 80 lines should be a limit for PGC.Rd. > > Two suggestions to the document: > 1) I would add a recommendation to leave comments to the C/R code. > Sometimes it is just impossible to understand what is some code for. Very important! Good documentation (within the code) is extremely important. > 2) Add description at the head of to each file with code like. A header is good. Some points: - Version/revision number: I don't think it's a good idea to include a version number/Revision number. That's what SVN is for. Otherwise you need to update those all the time. - author names: The problem with adding author names is that this can grow a lot if more people work on it. Do we really want this kind of bookkeeping? In principle you can look up in SVN who added what to which file. - Licence: as far as I understand officially/legally each file should have the licence in the header (or at least mentioned). You see that in many GPL programs. - For C/C++ files: Maybe we should use Doxygen-style headers? That will also help document the code (see my recent experiments with that in ProbABEL). Best, Lennart. > > For example, in every file with code I ever wrote I have a small > description on the top like: > > //#===================================================================================== > //# > //# Filename: gtps_container.h > //# > //# Description: Set of functions for merging two snp.data class objects. > //# > //# Version: 1.0 > //# Created: 18-March-2008 > //# Revision: none > //# > //# > //# Author: Maksim V. Struchalin, Yurii S. Aulchenko > //# Company: ErasmusMC, Epidemiology & Biostatistics Department, The Netherlands. > //# Email: m.struchalin@@erasmusmc.nl, i.aoultchenko at erasmusmc.nl > //# > //#===================================================================================== > > > from which it is clear who wrote the file (person's name and contacts), > when and for whom. This is convinient for other developers who is gonna > edit this file in the future to address their quiestions directly to the > person who wrote the original code. > > best, > Maksim > > > On 29/11/2013 16:25, L.C. Karssen wrote: >> Dear Maksim, Yakov, others, >> >> About coding style, please see the document on GenABEL coding standards >> (work in progress) we created some time ago: >> http://genabel.r-forge.r-project.org/codingstyle.html >> >> It would be great to have uniform coding in all of the GenABEL suite. >> >> By the way, suggestions for additions/corrections of the document are >> welcome! >> >> >> Lennart. >> >> >> >> On 11/29/2013 10:20 AM, Maksim Struchalin wrote: >>> Hi Yakov, >>> >>> I made this change because when I run "R CMD check --as-cran ", I got a >>> message: >>> ______________________________________________ >>> * checking Rd line widths ... NOTE >>> Rd file >>> ?/home/maksim/work/GenABEL_project/genabel/pkg/GenABEL.Rcheck/00_pkg_src/GenABEL/man/PGC.Rd?: >>> >>> \examples lines wider than 100 characters: >>> #result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, >>> plot=TRUE, start.corr=FALSE,n_quiantile=3) >>> ______________________________________________ >>> >>> Here are guidlines how to write .Rd files >>> (http://developer.r-project.org/Rds.html). >>> If I were you, I would replace "result=PGC(d..." by "result<-PGC(d..." >>> because using <- instead of = is common R rule for assignments. >>> >>> It recommends to keep the line length in .Rd file not more then 65 >>> characters but I guess this an old rule and now the limit is 100 >>> characters. >>> >>> best, >>> Maksim >>> >>> On 28/11/2013 16:38, Yury Aulchenko wrote: >>>> Yakov, >>>> >>>> please pay attention to this commit (I know that you were changing the >>>> code of the GC procedures recently) >>>> >>>> YA >>>> >>>> On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: >>>> >>>>> Author: maksim >>>>> Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) >>>>> New Revision: 1429 >>>>> >>>>> Modified: >>>>> pkg/GenABEL/man/PGC.Rd >>>>> Log: >>>>> Deleteted comments because R CMD check --as-cran said that these >>>>> lines are wider than 100 characters and they will be truncated in the >>>>> PDF manual. This should not have any influence on the code generated >>>>> from it. >>>>> >>>>> Modified: pkg/GenABEL/man/PGC.Rd >>>>> =================================================================== >>>>> --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) >>>>> +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) >>>>> @@ -63,8 +63,6 @@ >>>>> s <- summary(ge03d2) >>>>> freq <- s$Q.2 >>>>> result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, >>>>> plot=TRUE, lmax=1.1,start.corr=FALSE) >>>>> -#"group_regress" is better to use when we have more than 50K SNPs >>>>> -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, >>>>> pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) >>>>> } >>>>> \author{ >>>>> Yakov Tsepilov >>>>> >>>>> _______________________________________________ >>>>> Genabel-commits mailing list >>>>> Genabel-commits at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 11:42:42 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 17:42:42 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <-3891723065030522185@unknownmsgid> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> Message-ID: <52986FA2.9080706@mail.ru> Hi Yurii & Lennart, Yesterday, you supported the idea of making GenABEL.data as 'suggested': ________________________________________________________________ On 28/11/2013 18:24, L.C. Karssen wrote: > On 11/28/2013 12:12 PM, Yury Aulchenko wrote: > I would think that GenABEL(.)data is "suggested" and then any > examples using the data from this packages start with something like > > if (require("GenABEL(.)data") ... This sounds like a good solution. ________________________________________________________________ Today, you propose to make it 'depends' or I misunderstand something here? About how other people do it: I looked in GANPAdata and gamlss.data packages. They 'depends' on GANPA and gamlss (see my message below). best, Maksim On 29/11/2013 16:43, Yurii Aulchenko wrote: > Lennart, > > Good point about "depends"! > > Again, my question would be how other people do it? > > Y > > ---------------------- > Yurii Aulchenko > (sent from mobile device) > >> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >> >> Hi Maksim, >> >>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>> I looked at how other developres deal with issue of dependency between a >>> package and its data.package. I checked out two random packages from >>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>> and gamlss) dependes on their data packages - that means their >>> DESCRIPTION files contain a reference to their data packages in the >>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>> Depends/Suggests gamlss). >>> >>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>> is installed automaticly when users run "install.package(GenABEL)". This >>> is convinient for users who install GenABEL from CRAN and this is in >>> line with GANPA and gamlss but it, probably, does not fully reflect the >>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>> 'requre(...'. >> I agree with you that the dependence between GA and GA.data is rather >> weak. On the other hand, why not keep GA.data in Depends? That gives the >> same behaviour as before (install everything by default). Sounds >> convenient to me. >> With modern internet bandwidth the few MB of the data package are not a >> problem. >> >>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>> that this package is some kind of a 'subpackage' of GenABEL package and >>> it is not a standalone one. >> Good point! >> >> >> Best regards, >> >> Lennart. >> >>> best, >>> Maksim >>> >>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>> >>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>> examples using the data from this packages start with something like >>>>> >>>>> if (require("GenABEL(.)data") ... >>>> This sounds like a good solution. >>>> >>>>> How do other packages which lean on data-packages solve this? >>>>> >>>>> As for the "dot" - I do not have any strong opinion - both options >>>>> seem ok to me :) >>>> Great :-). Then I propose (of course) to stick with the dot, also >>>> because that's already used now. >>>> >>>> >>>> Best, >>>> >>>> Lennart. >>>> >>>> >>>>> best, Yurii >>>>> >>>>> >>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I've been following this conversation with much interest although >>>>>> I'm sorry I can't contribute much. >>>>>> >>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>> would result in the installation also of GenABEL.data without the >>>>>> user actually having to do it himself. >>>>>> >>>>>> Best. >>>>>> >>>>>> Nicola >>>>>> >>>>>> >>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>> +390403785539 >>>>>> >>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>> ha scritto: >>>>>> >>>>>>> Hi Maksim, >>>>>>> >>>>>>> First of all, thanks for the good work! >>>>>>> >>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>> better way plz). >>>>>>>> >>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>> install GenABELL. >>>>>>> This sounds very strange to me. Does the user first need to >>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>> installs them in a different order? I guess that shouldn't >>>>>>> matter, right, as the package contains only data? >>>>>>> >>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>> required packages in the DESCRIPTION file? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>> >>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>> know. >>>>>>>> >>>>>>>> best, Maksim >>>>>>>> >>>>>>>> >>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>> Hi Maksim, >>>>>>>>> >>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>> files (e.g. srdta). >>>>>>>>>> >>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>> see you point. >>>>>>>>> >>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>> functions which are not available during installation time >>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>> Good point! >>>>>>>>> >>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>> *ABELdata data packages? >>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>> clearer in my opinion). >>>>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Lennart >>>>>>>>> >>>>>>>>>> best, Maksim >>>>>>>>>> >>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>> >>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>> :-). >>>>>>>>>>>> >>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>> needlessly. >>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>> >>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>> platforms) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>> ahead. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Lennart. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>> >>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> -- >>>>>>>>>>>> ----------------------------------------------------------------- >>>> L.C. Karssen >>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>> >>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>> >>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>> ------------------------------------------------------------------ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> -- >>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>> Utrecht The Netherlands >>>>>>> >>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>> >>>>>>> _______________________________________________ genabel-devel >>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>> avete ricevuto il messaggio per errore, siete pregati di non >>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a >>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>> NOTICE Confidential information may be contained in this message or in >>>> its attachments. If you are not the addressee indicated in this message, >>>> or responsible for message delivering to that person, or if you have >>>> received this message in error, you may not transcribe, copy or deliver >>>> this message to anyone. In that case, you should delete this message and >>>> its attachments. Thank you. >>>>>> _______________________________________________ genabel-devel >>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> -- >> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >> L.C. Karssen >> Utrecht >> The Netherlands >> >> lennart at karssen.org >> http://blog.karssen.org >> GPG key ID: A88F554A >> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Fri Nov 29 12:14:59 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 29 Nov 2013 12:14:59 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52986FA2.9080706@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> Message-ID: so it looks other people do "depend"... I am more in favor of "depends" now, but no strong preferences - it is up to you Y On Nov 29, 2013, at 11:42 AM, Maksim Struchalin wrote: > Hi Yurii & Lennart, > > Yesterday, you supported the idea of making GenABEL.data as 'suggested': > > ________________________________________________________________ > On 28/11/2013 18:24, L.C. Karssen wrote: > >> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >> I would think that GenABEL(.)data is "suggested" and then any >> examples using the data from this packages start with something like >> >> if (require("GenABEL(.)data") ... > > This sounds like a good solution. > ________________________________________________________________ > > > Today, you propose to make it 'depends' or I misunderstand something here? > > About how other people do it: I looked in GANPAdata and gamlss.data packages. They 'depends' on GANPA and gamlss (see my message below). > > best, > Maksim > > > On 29/11/2013 16:43, Yurii Aulchenko wrote: >> Lennart, >> >> Good point about "depends"! >> >> Again, my question would be how other people do it? >> >> Y >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>> >>> Hi Maksim, >>> >>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>> I looked at how other developres deal with issue of dependency between a >>>> package and its data.package. I checked out two random packages from >>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>> and gamlss) dependes on their data packages - that means their >>>> DESCRIPTION files contain a reference to their data packages in the >>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>> Depends/Suggests gamlss). >>>> >>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>> is installed automaticly when users run "install.package(GenABEL)". This >>>> is convinient for users who install GenABEL from CRAN and this is in >>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>> 'requre(...'. >>> I agree with you that the dependence between GA and GA.data is rather >>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>> same behaviour as before (install everything by default). Sounds >>> convenient to me. >>> With modern internet bandwidth the few MB of the data package are not a >>> problem. >>> >>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>> it is not a standalone one. >>> Good point! >>> >>> >>> Best regards, >>> >>> Lennart. >>> >>>> best, >>>> Maksim >>>> >>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>> >>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>> examples using the data from this packages start with something like >>>>>> >>>>>> if (require("GenABEL(.)data") ... >>>>> This sounds like a good solution. >>>>> >>>>>> How do other packages which lean on data-packages solve this? >>>>>> >>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>> seem ok to me :) >>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>> because that's already used now. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>> >>>>> >>>>>> best, Yurii >>>>>> >>>>>> >>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've been following this conversation with much interest although >>>>>>> I'm sorry I can't contribute much. >>>>>>> >>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>> would result in the installation also of GenABEL.data without the >>>>>>> user actually having to do it himself. >>>>>>> >>>>>>> Best. >>>>>>> >>>>>>> Nicola >>>>>>> >>>>>>> >>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>> +390403785539 >>>>>>> >>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>> ha scritto: >>>>>>> >>>>>>>> Hi Maksim, >>>>>>>> >>>>>>>> First of all, thanks for the good work! >>>>>>>> >>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>> better way plz). >>>>>>>>> >>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>> install GenABELL. >>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>> matter, right, as the package contains only data? >>>>>>>> >>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>> required packages in the DESCRIPTION file? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>> >>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>> know. >>>>>>>>> >>>>>>>>> best, Maksim >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>> Hi Maksim, >>>>>>>>>> >>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>> >>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>> see you point. >>>>>>>>>> >>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>> Good point! >>>>>>>>>> >>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>> time. >>>>>>>>>>> >>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>> *ABELdata data packages? >>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>> clearer in my opinion). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Lennart >>>>>>>>>> >>>>>>>>>>> best, Maksim >>>>>>>>>>> >>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>> >>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>> :-). >>>>>>>>>>>>> >>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>> >>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>> platforms) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>> ahead. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Lennart. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> -- >>>>>>>>>>>>> ----------------------------------------------------------------- >>>>> L.C. Karssen >>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>> >>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>> >>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>> ------------------------------------------------------------------ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> -- >>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>> Utrecht The Netherlands >>>>>>>> >>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>> >>>>>>>> _______________________________________________ genabel-devel >>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi invitiamo a >>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>> NOTICE Confidential information may be contained in this message or in >>>>> its attachments. If you are not the addressee indicated in this message, >>>>> or responsible for message delivering to that person, or if you have >>>>> received this message in error, you may not transcribe, copy or deliver >>>>> this message to anyone. In that case, you should delete this message and >>>>> its attachments. Thank you. >>>>>>> _______________________________________________ genabel-devel >>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From drosophila.simulans at gmail.com Fri Nov 29 12:15:24 2013 From: drosophila.simulans at gmail.com (Yakov Tsepilov) Date: Fri, 29 Nov 2013 12:15:24 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1429 - pkg/GenABEL/man In-Reply-To: <52985DA5.80400@karssen.org> References: <20131128043855.1EDED1861B7@r-forge.r-project.org> <666AE9E2-530D-4E79-93F6-04A4C1D8D9B6@gmail.com> <52985C61.2070908@mail.ru> <52985DA5.80400@karssen.org> Message-ID: Dear Maksim, Lenart, Thanks a lot for your notifications - I will change the code style regarding standards a little bit later. With kind regards, Yakov Tsepilov. 2013/11/29 L.C. Karssen > Dear Maksim, Yakov, others, > > About coding style, please see the document on GenABEL coding standards > (work in progress) we created some time ago: > http://genabel.r-forge.r-project.org/codingstyle.html > > It would be great to have uniform coding in all of the GenABEL suite. > > By the way, suggestions for additions/corrections of the document are > welcome! > > > Lennart. > > > > On 11/29/2013 10:20 AM, Maksim Struchalin wrote: > > Hi Yakov, > > > > I made this change because when I run "R CMD check --as-cran ", I got a > > message: > > ______________________________________________ > > * checking Rd line widths ... NOTE > > Rd file > > > ?/home/maksim/work/GenABEL_project/genabel/pkg/GenABEL.Rcheck/00_pkg_src/GenABEL/man/PGC.Rd?: > > > > \examples lines wider than 100 characters: > > #result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, pol.d=2, > > plot=TRUE, start.corr=FALSE,n_quiantile=3) > > ______________________________________________ > > > > Here are guidlines how to write .Rd files > > (http://developer.r-project.org/Rds.html). > > If I were you, I would replace "result=PGC(d..." by "result<-PGC(d..." > > because using <- instead of = is common R rule for assignments. > > > > It recommends to keep the line length in .Rd file not more then 65 > > characters but I guess this an old rule and now the limit is 100 > > characters. > > > > best, > > Maksim > > > > On 28/11/2013 16:38, Yury Aulchenko wrote: > >> Yakov, > >> > >> please pay attention to this commit (I know that you were changing the > >> code of the GC procedures recently) > >> > >> YA > >> > >> On Nov 28, 2013, at 05:38 AM, noreply at r-forge.r-project.org wrote: > >> > >>> Author: maksim > >>> Date: 2013-11-28 05:38:54 +0100 (Thu, 28 Nov 2013) > >>> New Revision: 1429 > >>> > >>> Modified: > >>> pkg/GenABEL/man/PGC.Rd > >>> Log: > >>> Deleteted comments because R CMD check --as-cran said that these > >>> lines are wider than 100 characters and they will be truncated in the > >>> PDF manual. This should not have any influence on the code generated > >>> from it. > >>> > >>> Modified: pkg/GenABEL/man/PGC.Rd > >>> =================================================================== > >>> --- pkg/GenABEL/man/PGC.Rd 2013-11-28 04:21:29 UTC (rev 1428) > >>> +++ pkg/GenABEL/man/PGC.Rd 2013-11-28 04:38:54 UTC (rev 1429) > >>> @@ -63,8 +63,6 @@ > >>> s <- summary(ge03d2) > >>> freq <- s$Q.2 > >>> result=PGC(data=chi2.1df,method="median",p=freq,df=1, pol.d=2, > >>> plot=TRUE, lmax=1.1,start.corr=FALSE) > >>> -#"group_regress" is better to use when we have more than 50K SNPs > >>> -#result=PGC(data=chi2.1df,method="group_regress",p=freq,df=1, > >>> pol.d=2, plot=TRUE, start.corr=FALSE,n_quiantile=3) > >>> } > >>> \author{ > >>> Yakov Tsepilov > >>> > >>> _______________________________________________ > >>> Genabel-commits mailing list > >>> Genabel-commits at lists.r-forge.r-project.org > >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > >>> > >> _______________________________________________ > >> genabel-devel mailing list > >> genabel-devel at lists.r-forge.r-project.org > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > >> > > > > _______________________________________________ > > genabel-devel mailing list > > genabel-devel at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 29 12:16:49 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 12:16:49 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52986FA2.9080706@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> Message-ID: <529877A1.8080207@karssen.org> Hi Maksim, Good that you raise this again. I've been thinking about it a bit longer. What is the point of separating the data into a separate package if the user still downloads it automatically ('depends'). The idea behind the data package is of course to only download what is necessary (even if a few MB is not very much). So that would point to using 'suggests'. When I worked in Africa (very limited bandwidth) it was actually really good to have these kind of 'suggests', because then you can only download what is really necessary. Something I haven't tested: what happens if we use 'suggests' and the user wants to run an example (and GA.data is not installed)? Will (s)he get an error/warning message? I guess so if each example in GenABEL has a 'require(GenABEL.data)' line at the start. If the message to the user is very clear then "suggests" is fine. Otherwise I would go with the old behavior (have everything installed): 'depends'. Lennart. On 11/29/2013 11:42 AM, Maksim Struchalin wrote: > Hi Yurii & Lennart, > > Yesterday, you supported the idea of making GenABEL.data as 'suggested': > > ________________________________________________________________ > On 28/11/2013 18:24, L.C. Karssen wrote: > >> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >> I would think that GenABEL(.)data is "suggested" and then any >> examples using the data from this packages start with something like >> >> if (require("GenABEL(.)data") ... > > This sounds like a good solution. > ________________________________________________________________ > > > Today, you propose to make it 'depends' or I misunderstand something here? > > About how other people do it: I looked in GANPAdata and gamlss.data > packages. They 'depends' on GANPA and gamlss (see my message below). > > best, > Maksim > > > On 29/11/2013 16:43, Yurii Aulchenko wrote: >> Lennart, >> >> Good point about "depends"! >> >> Again, my question would be how other people do it? >> >> Y >> >> ---------------------- >> Yurii Aulchenko >> (sent from mobile device) >> >>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>> >>> Hi Maksim, >>> >>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>> I looked at how other developres deal with issue of dependency >>>> between a >>>> package and its data.package. I checked out two random packages from >>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>> and gamlss) dependes on their data packages - that means their >>>> DESCRIPTION files contain a reference to their data packages in the >>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>> Depends/Suggests gamlss). >>>> >>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>> is installed automaticly when users run "install.package(GenABEL)". >>>> This >>>> is convinient for users who install GenABEL from CRAN and this is in >>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>> 'requre(...'. >>> I agree with you that the dependence between GA and GA.data is rather >>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>> same behaviour as before (install everything by default). Sounds >>> convenient to me. >>> With modern internet bandwidth the few MB of the data package are not a >>> problem. >>> >>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>> it is not a standalone one. >>> Good point! >>> >>> >>> Best regards, >>> >>> Lennart. >>> >>>> best, >>>> Maksim >>>> >>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>> >>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>> examples using the data from this packages start with something like >>>>>> >>>>>> if (require("GenABEL(.)data") ... >>>>> This sounds like a good solution. >>>>> >>>>>> How do other packages which lean on data-packages solve this? >>>>>> >>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>> seem ok to me :) >>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>> because that's already used now. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Lennart. >>>>> >>>>> >>>>>> best, Yurii >>>>>> >>>>>> >>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've been following this conversation with much interest although >>>>>>> I'm sorry I can't contribute much. >>>>>>> >>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>> would result in the installation also of GenABEL.data without the >>>>>>> user actually having to do it himself. >>>>>>> >>>>>>> Best. >>>>>>> >>>>>>> Nicola >>>>>>> >>>>>>> >>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>> +390403785539 >>>>>>> >>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>> ha scritto: >>>>>>> >>>>>>>> Hi Maksim, >>>>>>>> >>>>>>>> First of all, thanks for the good work! >>>>>>>> >>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>> better way plz). >>>>>>>>> >>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>> install GenABELL. >>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>> matter, right, as the package contains only data? >>>>>>>> >>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>> required packages in the DESCRIPTION file? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>> >>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>> know. >>>>>>>>> >>>>>>>>> best, Maksim >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>> Hi Maksim, >>>>>>>>>> >>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>> >>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>> see you point. >>>>>>>>>> >>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>> Good point! >>>>>>>>>> >>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>> time. >>>>>>>>>>> >>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>> *ABELdata data packages? >>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>> clearer in my opinion). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Lennart >>>>>>>>>> >>>>>>>>>>> best, Maksim >>>>>>>>>>> >>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>> >>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>> :-). >>>>>>>>>>>>> >>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>> >>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>> platforms) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>> ahead. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Lennart. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>> -- >>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>> >>>>> L.C. Karssen >>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>> >>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>> >>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>> ------------------------------------------------------------------ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>> -- >>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>> Utrecht The Netherlands >>>>>>>> >>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>> >>>>>>>> _______________________________________________ genabel-devel >>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>> invitiamo a >>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>> NOTICE Confidential information may be contained in this message or in >>>>> its attachments. If you are not the addressee indicated in this >>>>> message, >>>>> or responsible for message delivering to that person, or if you have >>>>> received this message in error, you may not transcribe, copy or >>>>> deliver >>>>> this message to anyone. In that case, you should delete this >>>>> message and >>>>> its attachments. Thank you. >>>>>>> _______________________________________________ genabel-devel >>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>> >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> -- >>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>> L.C. Karssen >>> Utrecht >>> The Netherlands >>> >>> lennart at karssen.org >>> http://blog.karssen.org >>> GPG key ID: A88F554A >>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 12:37:25 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 18:37:25 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529877A1.8080207@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> Message-ID: <52987C75.1050308@mail.ru> Changed back to 'depend'. Now users have to install GenABEL.data first then GenABEL. Maksim On 29/11/2013 18:16, L.C. Karssen wrote: > Hi Maksim, > > Good that you raise this again. I've been thinking about it a bit longer. > What is the point of separating the data into a separate package if the > user still downloads it automatically ('depends'). The idea behind the > data package is of course to only download what is necessary (even if a > few MB is not very much). So that would point to using 'suggests'. > > When I worked in Africa (very limited bandwidth) it was actually really > good to have these kind of 'suggests', because then you can only > download what is really necessary. > > Something I haven't tested: what happens if we use 'suggests' and the > user wants to run an example (and GA.data is not installed)? Will (s)he > get an error/warning message? I guess so if each example in GenABEL has > a 'require(GenABEL.data)' line at the start. If the message to the user > is very clear then "suggests" is fine. Otherwise I would go with the old > behavior (have everything installed): 'depends'. > > > > Lennart. > > > > On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >> Hi Yurii & Lennart, >> >> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >> >> ________________________________________________________________ >> On 28/11/2013 18:24, L.C. Karssen wrote: >> >>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>> I would think that GenABEL(.)data is "suggested" and then any >>> examples using the data from this packages start with something like >>> >>> if (require("GenABEL(.)data") ... >> This sounds like a good solution. >> ________________________________________________________________ >> >> >> Today, you propose to make it 'depends' or I misunderstand something here? >> >> About how other people do it: I looked in GANPAdata and gamlss.data >> packages. They 'depends' on GANPA and gamlss (see my message below). >> >> best, >> Maksim >> >> >> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>> Lennart, >>> >>> Good point about "depends"! >>> >>> Again, my question would be how other people do it? >>> >>> Y >>> >>> ---------------------- >>> Yurii Aulchenko >>> (sent from mobile device) >>> >>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>> >>>> Hi Maksim, >>>> >>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>> I looked at how other developres deal with issue of dependency >>>>> between a >>>>> package and its data.package. I checked out two random packages from >>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>> and gamlss) dependes on their data packages - that means their >>>>> DESCRIPTION files contain a reference to their data packages in the >>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>> Depends/Suggests gamlss). >>>>> >>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>> This >>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>> 'requre(...'. >>>> I agree with you that the dependence between GA and GA.data is rather >>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>> same behaviour as before (install everything by default). Sounds >>>> convenient to me. >>>> With modern internet bandwidth the few MB of the data package are not a >>>> problem. >>>> >>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>> it is not a standalone one. >>>> Good point! >>>> >>>> >>>> Best regards, >>>> >>>> Lennart. >>>> >>>>> best, >>>>> Maksim >>>>> >>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>> >>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>> examples using the data from this packages start with something like >>>>>>> >>>>>>> if (require("GenABEL(.)data") ... >>>>>> This sounds like a good solution. >>>>>> >>>>>>> How do other packages which lean on data-packages solve this? >>>>>>> >>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>> seem ok to me :) >>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>> because that's already used now. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> >>>>>>> best, Yurii >>>>>>> >>>>>>> >>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I've been following this conversation with much interest although >>>>>>>> I'm sorry I can't contribute much. >>>>>>>> >>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>> user actually having to do it himself. >>>>>>>> >>>>>>>> Best. >>>>>>>> >>>>>>>> Nicola >>>>>>>> >>>>>>>> >>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>> +390403785539 >>>>>>>> >>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>> ha scritto: >>>>>>>> >>>>>>>>> Hi Maksim, >>>>>>>>> >>>>>>>>> First of all, thanks for the good work! >>>>>>>>> >>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>> better way plz). >>>>>>>>>> >>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>> install GenABELL. >>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>> matter, right, as the package contains only data? >>>>>>>>> >>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>> >>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>> know. >>>>>>>>>> >>>>>>>>>> best, Maksim >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>> >>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>> see you point. >>>>>>>>>>> >>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>> Good point! >>>>>>>>>>> >>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>> time. >>>>>>>>>>>> >>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>> clearer in my opinion). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Lennart >>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>> :-). >>>>>>>>>>>>>> >>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>> platforms) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>> -- >>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>> >>>>>> L.C. Karssen >>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>> >>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>> -- >>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>> Utrecht The Netherlands >>>>>>>>> >>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>> >>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>> invitiamo a >>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>> NOTICE Confidential information may be contained in this message or in >>>>>> its attachments. If you are not the addressee indicated in this >>>>>> message, >>>>>> or responsible for message delivering to that person, or if you have >>>>>> received this message in error, you may not transcribe, copy or >>>>>> deliver >>>>>> this message to anyone. In that case, you should delete this >>>>>> message and >>>>>> its attachments. Thank you. >>>>>>>> _______________________________________________ genabel-devel >>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> -- >>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>> L.C. Karssen >>>> Utrecht >>>> The Netherlands >>>> >>>> lennart at karssen.org >>>> http://blog.karssen.org >>>> GPG key ID: A88F554A >>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 29 12:41:55 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 12:41:55 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52987C75.1050308@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> Message-ID: <52987D83.3050700@karssen.org> Hi Maksim, Sorry if I don't understand you correctly, but that doesn't sound user-friendly. Why should people need to install GA.data first? Actually, if GA depends on GA.data, GA.data is installed automatically when the user runs install.packages("GenABEL"), right? Lennart. On 29-11-13 12:37, Maksim Struchalin wrote: > Changed back to 'depend'. Now users have to install GenABEL.data first > then GenABEL. > Maksim > > On 29/11/2013 18:16, L.C. Karssen wrote: >> Hi Maksim, >> >> Good that you raise this again. I've been thinking about it a bit longer. >> What is the point of separating the data into a separate package if the >> user still downloads it automatically ('depends'). The idea behind the >> data package is of course to only download what is necessary (even if a >> few MB is not very much). So that would point to using 'suggests'. >> >> When I worked in Africa (very limited bandwidth) it was actually really >> good to have these kind of 'suggests', because then you can only >> download what is really necessary. >> >> Something I haven't tested: what happens if we use 'suggests' and the >> user wants to run an example (and GA.data is not installed)? Will (s)he >> get an error/warning message? I guess so if each example in GenABEL has >> a 'require(GenABEL.data)' line at the start. If the message to the user >> is very clear then "suggests" is fine. Otherwise I would go with the old >> behavior (have everything installed): 'depends'. >> >> >> >> Lennart. >> >> >> >> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>> Hi Yurii & Lennart, >>> >>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>> >>> ________________________________________________________________ >>> On 28/11/2013 18:24, L.C. Karssen wrote: >>> >>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>> I would think that GenABEL(.)data is "suggested" and then any >>>> examples using the data from this packages start with something like >>>> >>>> if (require("GenABEL(.)data") ... >>> This sounds like a good solution. >>> ________________________________________________________________ >>> >>> >>> Today, you propose to make it 'depends' or I misunderstand something here? >>> >>> About how other people do it: I looked in GANPAdata and gamlss.data >>> packages. They 'depends' on GANPA and gamlss (see my message below). >>> >>> best, >>> Maksim >>> >>> >>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>> Lennart, >>>> >>>> Good point about "depends"! >>>> >>>> Again, my question would be how other people do it? >>>> >>>> Y >>>> >>>> ---------------------- >>>> Yurii Aulchenko >>>> (sent from mobile device) >>>> >>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>> >>>>> Hi Maksim, >>>>> >>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>> I looked at how other developres deal with issue of dependency >>>>>> between a >>>>>> package and its data.package. I checked out two random packages from >>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>> and gamlss) dependes on their data packages - that means their >>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>> Depends/Suggests gamlss). >>>>>> >>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>> This >>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>> 'requre(...'. >>>>> I agree with you that the dependence between GA and GA.data is rather >>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>> same behaviour as before (install everything by default). Sounds >>>>> convenient to me. >>>>> With modern internet bandwidth the few MB of the data package are not a >>>>> problem. >>>>> >>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>> it is not a standalone one. >>>>> Good point! >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Lennart. >>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>> >>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>> examples using the data from this packages start with something like >>>>>>>> >>>>>>>> if (require("GenABEL(.)data") ... >>>>>>> This sounds like a good solution. >>>>>>> >>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>> >>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>> seem ok to me :) >>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>> because that's already used now. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> >>>>>>>> best, Yurii >>>>>>>> >>>>>>>> >>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I've been following this conversation with much interest although >>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>> >>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>> user actually having to do it himself. >>>>>>>>> >>>>>>>>> Best. >>>>>>>>> >>>>>>>>> Nicola >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>> +390403785539 >>>>>>>>> >>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>> ha scritto: >>>>>>>>> >>>>>>>>>> Hi Maksim, >>>>>>>>>> >>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>> >>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>> better way plz). >>>>>>>>>>> >>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>> install GenABELL. >>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>> >>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Lennart. >>>>>>>>>> >>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>> >>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>> know. >>>>>>>>>>> >>>>>>>>>>> best, Maksim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>> >>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>> >>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>> see you point. >>>>>>>>>>>> >>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>> Good point! >>>>>>>>>>>> >>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>> time. >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Lennart >>>>>>>>>>>> >>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>> -- >>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>> L.C. Karssen >>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>> -- >>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>> Utrecht The Netherlands >>>>>>>>>> >>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>> >>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>> invitiamo a >>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>> message, >>>>>>> or responsible for message delivering to that person, or if you have >>>>>>> received this message in error, you may not transcribe, copy or >>>>>>> deliver >>>>>>> this message to anyone. In that case, you should delete this >>>>>>> message and >>>>>>> its attachments. Thank you. >>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> -- >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> GPG key ID: A88F554A >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 12:41:58 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 18:41:58 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52987C75.1050308@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> Message-ID: <52987D86.40603@mail.ru> From point of view of a user who uses install.package("GenABEL") this version is more convinient because the user does not need to care about anything (nothing changed for him). The version with 'suggests' is not much different from 'depends' but more flexibale in a case if something happens with GenABEL.data. best, Maksim On 29/11/2013 18:37, Maksim Struchalin wrote: > Changed back to 'depend'. Now users have to install GenABEL.data first > then GenABEL. > Maksim > > On 29/11/2013 18:16, L.C. Karssen wrote: >> Hi Maksim, >> >> Good that you raise this again. I've been thinking about it a bit longer. >> What is the point of separating the data into a separate package if the >> user still downloads it automatically ('depends'). The idea behind the >> data package is of course to only download what is necessary (even if a >> few MB is not very much). So that would point to using 'suggests'. >> >> When I worked in Africa (very limited bandwidth) it was actually really >> good to have these kind of 'suggests', because then you can only >> download what is really necessary. >> >> Something I haven't tested: what happens if we use 'suggests' and the >> user wants to run an example (and GA.data is not installed)? Will (s)he >> get an error/warning message? I guess so if each example in GenABEL has >> a 'require(GenABEL.data)' line at the start. If the message to the user >> is very clear then "suggests" is fine. Otherwise I would go with the old >> behavior (have everything installed): 'depends'. >> >> >> >> Lennart. >> >> >> >> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>> Hi Yurii & Lennart, >>> >>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>> >>> ________________________________________________________________ >>> On 28/11/2013 18:24, L.C. Karssen wrote: >>> >>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>> I would think that GenABEL(.)data is "suggested" and then any >>>> examples using the data from this packages start with something like >>>> >>>> if (require("GenABEL(.)data") ... >>> This sounds like a good solution. >>> ________________________________________________________________ >>> >>> >>> Today, you propose to make it 'depends' or I misunderstand something here? >>> >>> About how other people do it: I looked in GANPAdata and gamlss.data >>> packages. They 'depends' on GANPA and gamlss (see my message below). >>> >>> best, >>> Maksim >>> >>> >>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>> Lennart, >>>> >>>> Good point about "depends"! >>>> >>>> Again, my question would be how other people do it? >>>> >>>> Y >>>> >>>> ---------------------- >>>> Yurii Aulchenko >>>> (sent from mobile device) >>>> >>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>> >>>>> Hi Maksim, >>>>> >>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>> I looked at how other developres deal with issue of dependency >>>>>> between a >>>>>> package and its data.package. I checked out two random packages from >>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>> and gamlss) dependes on their data packages - that means their >>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>> Depends/Suggests gamlss). >>>>>> >>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>> This >>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>> 'requre(...'. >>>>> I agree with you that the dependence between GA and GA.data is rather >>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>> same behaviour as before (install everything by default). Sounds >>>>> convenient to me. >>>>> With modern internet bandwidth the few MB of the data package are not a >>>>> problem. >>>>> >>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>> it is not a standalone one. >>>>> Good point! >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Lennart. >>>>> >>>>>> best, >>>>>> Maksim >>>>>> >>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>> >>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>> examples using the data from this packages start with something like >>>>>>>> >>>>>>>> if (require("GenABEL(.)data") ... >>>>>>> This sounds like a good solution. >>>>>>> >>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>> >>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>> seem ok to me :) >>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>> because that's already used now. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>> >>>>>>>> best, Yurii >>>>>>>> >>>>>>>> >>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I've been following this conversation with much interest although >>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>> >>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>> user actually having to do it himself. >>>>>>>>> >>>>>>>>> Best. >>>>>>>>> >>>>>>>>> Nicola >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>> +390403785539 >>>>>>>>> >>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>> ha scritto: >>>>>>>>> >>>>>>>>>> Hi Maksim, >>>>>>>>>> >>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>> >>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>> better way plz). >>>>>>>>>>> >>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>> install GenABELL. >>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>> >>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Lennart. >>>>>>>>>> >>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>> >>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>> know. >>>>>>>>>>> >>>>>>>>>>> best, Maksim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>> >>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>> >>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>> see you point. >>>>>>>>>>>> >>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>> Good point! >>>>>>>>>>>> >>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>> time. >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Lennart >>>>>>>>>>>> >>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>> -- >>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>> L.C. Karssen >>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>> -- >>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>> Utrecht The Netherlands >>>>>>>>>> >>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>> >>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>> mailing listgenabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>> invitiamo a >>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>> message, >>>>>>> or responsible for message delivering to that person, or if you have >>>>>>> received this message in error, you may not transcribe, copy or >>>>>>> deliver >>>>>>> this message to anyone. In that case, you should delete this >>>>>>> message and >>>>>>> its attachments. Thank you. >>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>> mailing listgenabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> genabel-devel mailing list >>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> -- >>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>> L.C. Karssen >>>>> Utrecht >>>>> The Netherlands >>>>> >>>>> lennart at karssen.org >>>>> http://blog.karssen.org >>>>> GPG key ID: A88F554A >>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Fri Nov 29 12:43:59 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 29 Nov 2013 12:43:59 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <529877A1.8080207@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> Message-ID: <70DA5486-27BF-4E6E-8F9F-DFDBF7D454C4@gmail.com> On Nov 29, 2013, at 12:16 PM, L.C. Karssen wrote: > Hi Maksim, > > Good that you raise this again. I've been thinking about it a bit longer. > What is the point of separating the data into a separate package if the > user still downloads it automatically ('depends'). The idea behind the > data package is of course to only download what is necessary (even if a > few MB is not very much). So that would point to using 'suggests'. I think the suggestion for splitting comes from CRAN - for some reason they do NOT like bigger packages. This also concerns the data-packages, BTW. What is the exact reason behind, I do not know > > When I worked in Africa (very limited bandwidth) it was actually really > good to have these kind of 'suggests', because then you can only > download what is really necessary. Good point. However, I think limited bandwidth is of limited concern in our case (genomic data do as for very decent bandwidth anyways). > > Something I haven't tested: what happens if we use 'suggests' and the > user wants to run an example (and GA.data is not installed)? Will (s)he > get an error/warning message? I guess so if each example in GenABEL has > a 'require(GenABEL.data)' line at the start. If the message to the user > is very clear then "suggests" is fine. Otherwise I would go with the old > behavior (have everything installed): 'depends'. Latter is of concern for --as-cran checks - everything which is fine for them is fine for us :) Y > > > > Lennart. > > > > On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >> Hi Yurii & Lennart, >> >> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >> >> ________________________________________________________________ >> On 28/11/2013 18:24, L.C. Karssen wrote: >> >>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>> I would think that GenABEL(.)data is "suggested" and then any >>> examples using the data from this packages start with something like >>> >>> if (require("GenABEL(.)data") ... >> >> This sounds like a good solution. >> ________________________________________________________________ >> >> >> Today, you propose to make it 'depends' or I misunderstand something here? >> >> About how other people do it: I looked in GANPAdata and gamlss.data >> packages. They 'depends' on GANPA and gamlss (see my message below). >> >> best, >> Maksim >> >> >> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>> Lennart, >>> >>> Good point about "depends"! >>> >>> Again, my question would be how other people do it? >>> >>> Y >>> >>> ---------------------- >>> Yurii Aulchenko >>> (sent from mobile device) >>> >>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>> >>>> Hi Maksim, >>>> >>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>> I looked at how other developres deal with issue of dependency >>>>> between a >>>>> package and its data.package. I checked out two random packages from >>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>> and gamlss) dependes on their data packages - that means their >>>>> DESCRIPTION files contain a reference to their data packages in the >>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>> Depends/Suggests gamlss). >>>>> >>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>> This >>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>> 'requre(...'. >>>> I agree with you that the dependence between GA and GA.data is rather >>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>> same behaviour as before (install everything by default). Sounds >>>> convenient to me. >>>> With modern internet bandwidth the few MB of the data package are not a >>>> problem. >>>> >>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>> it is not a standalone one. >>>> Good point! >>>> >>>> >>>> Best regards, >>>> >>>> Lennart. >>>> >>>>> best, >>>>> Maksim >>>>> >>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>> >>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>> examples using the data from this packages start with something like >>>>>>> >>>>>>> if (require("GenABEL(.)data") ... >>>>>> This sounds like a good solution. >>>>>> >>>>>>> How do other packages which lean on data-packages solve this? >>>>>>> >>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>> seem ok to me :) >>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>> because that's already used now. >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Lennart. >>>>>> >>>>>> >>>>>>> best, Yurii >>>>>>> >>>>>>> >>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I've been following this conversation with much interest although >>>>>>>> I'm sorry I can't contribute much. >>>>>>>> >>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>> user actually having to do it himself. >>>>>>>> >>>>>>>> Best. >>>>>>>> >>>>>>>> Nicola >>>>>>>> >>>>>>>> >>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>> +390403785539 >>>>>>>> >>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>> ha scritto: >>>>>>>> >>>>>>>>> Hi Maksim, >>>>>>>>> >>>>>>>>> First of all, thanks for the good work! >>>>>>>>> >>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>> better way plz). >>>>>>>>>> >>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>> install GenABELL. >>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>> matter, right, as the package contains only data? >>>>>>>>> >>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>> >>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>> know. >>>>>>>>>> >>>>>>>>>> best, Maksim >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>> >>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>> see you point. >>>>>>>>>>> >>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>> Good point! >>>>>>>>>>> >>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>> time. >>>>>>>>>>>> >>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>> clearer in my opinion). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Lennart >>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>> :-). >>>>>>>>>>>>>> >>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>> platforms) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>> -- >>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>> >>>>>> L.C. Karssen >>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>> >>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>> -- >>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>> Utrecht The Netherlands >>>>>>>>> >>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>> >>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>> >>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>> invitiamo a >>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>> NOTICE Confidential information may be contained in this message or in >>>>>> its attachments. If you are not the addressee indicated in this >>>>>> message, >>>>>> or responsible for message delivering to that person, or if you have >>>>>> received this message in error, you may not transcribe, copy or >>>>>> deliver >>>>>> this message to anyone. In that case, you should delete this >>>>>> message and >>>>>> its attachments. Thank you. >>>>>>>> _______________________________________________ genabel-devel >>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> -- >>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>> L.C. Karssen >>>> Utrecht >>>> The Netherlands >>>> >>>> lennart at karssen.org >>>> http://blog.karssen.org >>>> GPG key ID: A88F554A >>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > L.C. Karssen > Utrecht > The Netherlands > > lennart at karssen.org > http://blog.karssen.org > GPG key ID: A88F554A > -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel From yurii.aulchenko at gmail.com Fri Nov 29 12:49:30 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 29 Nov 2013 12:49:30 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52987D86.40603@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> <52987D86.40603@mail.ru> Message-ID: <446503B3-43D2-4F80-B167-551250915B30@gmail.com> On Nov 29, 2013, at 12:41 PM, Maksim Struchalin wrote: > From point of view of a user who uses install.package("GenABEL") this version is more convinient because the user does not need to care about anything (nothing changed for him). Agree; when "depends" things are installed automatically (when standard args asking to install deps are supplied). > The version with 'suggests' is not much different from 'depends' but more flexibale in a case if something happens with GenABEL.data. I think that we will still treat these as supplementary package having little value on its own. So I would not expect it drives away. For me, this is rather technical thing - because we were asked by CRAN to do this. I do not think it improves anything, so we just want to make sure nothing breaks / things do not become more difficult for us afterwards. Given limits on the data-pacjkages as well, in terms of design we probably need to think of a set of functions (some are already there) to quickly generate/simulate data to be used in examples. (and not running into problem with example run-time - this is also of concern for CRAN :) ) Y > > best, > Maksim > > On 29/11/2013 18:37, Maksim Struchalin wrote: >> Changed back to 'depend'. Now users have to install GenABEL.data first then GenABEL. >> Maksim >> >> On 29/11/2013 18:16, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> Good that you raise this again. I've been thinking about it a bit longer. >>> What is the point of separating the data into a separate package if the >>> user still downloads it automatically ('depends'). The idea behind the >>> data package is of course to only download what is necessary (even if a >>> few MB is not very much). So that would point to using 'suggests'. >>> >>> When I worked in Africa (very limited bandwidth) it was actually really >>> good to have these kind of 'suggests', because then you can only >>> download what is really necessary. >>> >>> Something I haven't tested: what happens if we use 'suggests' and the >>> user wants to run an example (and GA.data is not installed)? Will (s)he >>> get an error/warning message? I guess so if each example in GenABEL has >>> a 'require(GenABEL.data)' line at the start. If the message to the user >>> is very clear then "suggests" is fine. Otherwise I would go with the old >>> behavior (have everything installed): 'depends'. >>> >>> >>> >>> Lennart. >>> >>> >>> >>> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>>> Hi Yurii & Lennart, >>>> >>>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>>> >>>> ________________________________________________________________ >>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>> >>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>> examples using the data from this packages start with something like >>>>> >>>>> if (require("GenABEL(.)data") ... >>>> This sounds like a good solution. >>>> ________________________________________________________________ >>>> >>>> >>>> Today, you propose to make it 'depends' or I misunderstand something here? >>>> >>>> About how other people do it: I looked in GANPAdata and gamlss.data >>>> packages. They 'depends' on GANPA and gamlss (see my message below). >>>> >>>> best, >>>> Maksim >>>> >>>> >>>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>>> Lennart, >>>>> >>>>> Good point about "depends"! >>>>> >>>>> Again, my question would be how other people do it? >>>>> >>>>> Y >>>>> >>>>> ---------------------- >>>>> Yurii Aulchenko >>>>> (sent from mobile device) >>>>> >>>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>>> >>>>>> Hi Maksim, >>>>>> >>>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>>> I looked at how other developres deal with issue of dependency >>>>>>> between a >>>>>>> package and its data.package. I checked out two random packages from >>>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>>> and gamlss) dependes on their data packages - that means their >>>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>>> Depends/Suggests gamlss). >>>>>>> >>>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>>> This >>>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>>> 'requre(...'. >>>>>> I agree with you that the dependence between GA and GA.data is rather >>>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>>> same behaviour as before (install everything by default). Sounds >>>>>> convenient to me. >>>>>> With modern internet bandwidth the few MB of the data package are not a >>>>>> problem. >>>>>> >>>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>>> it is not a standalone one. >>>>>> Good point! >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Lennart. >>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>>> >>>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>>> examples using the data from this packages start with something like >>>>>>>>> >>>>>>>>> if (require("GenABEL(.)data") ... >>>>>>>> This sounds like a good solution. >>>>>>>> >>>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>>> >>>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>>> seem ok to me :) >>>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>>> because that's already used now. >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>>> best, Yurii >>>>>>>>> >>>>>>>>> >>>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I've been following this conversation with much interest although >>>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>>> >>>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>>> user actually having to do it himself. >>>>>>>>>> >>>>>>>>>> Best. >>>>>>>>>> >>>>>>>>>> Nicola >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>>> +390403785539 >>>>>>>>>> >>>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>>> ha scritto: >>>>>>>>>> >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>>> >>>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>>> better way plz). >>>>>>>>>>>> >>>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>>> install GenABELL. >>>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>>> >>>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Lennart. >>>>>>>>>>> >>>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>>> >>>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>>> know. >>>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>>> see you point. >>>>>>>>>>>>> >>>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>>> Good point! >>>>>>>>>>>>> >>>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>>> time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Lennart >>>>>>>>>>>>> >>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>> L.C. Karssen >>>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>> >>>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>>> invitiamo a >>>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>>> message, >>>>>>>> or responsible for message delivering to that person, or if you have >>>>>>>> received this message in error, you may not transcribe, copy or >>>>>>>> deliver >>>>>>>> this message to anyone. In that case, you should delete this >>>>>>>> message and >>>>>>>> its attachments. Thank you. >>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> -- >>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>>> L.C. Karssen >>>>>> Utrecht >>>>>> The Netherlands >>>>>> >>>>>> lennart at karssen.org >>>>>> http://blog.karssen.org >>>>>> GPG key ID: A88F554A >>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From lennart at karssen.org Fri Nov 29 12:52:04 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 12:52:04 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52987D86.40603@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> <52987D86.40603@mail.ru> Message-ID: <52987FE4.9060103@karssen.org> Hi Maksim, On 29-11-13 12:41, Maksim Struchalin wrote: > From point of view of a user who uses install.package("GenABEL") this > version is more convinient because the user does not need to care about > anything (nothing changed for him). Great! Then it's ok for me. Lennart. > The version with 'suggests' is not > much different from 'depends' but more flexibale in a case if something > happens with GenABEL.data. > > best, > Maksim > > On 29/11/2013 18:37, Maksim Struchalin wrote: >> Changed back to 'depend'. Now users have to install GenABEL.data first >> then GenABEL. >> Maksim >> >> On 29/11/2013 18:16, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> Good that you raise this again. I've been thinking about it a bit longer. >>> What is the point of separating the data into a separate package if the >>> user still downloads it automatically ('depends'). The idea behind the >>> data package is of course to only download what is necessary (even if a >>> few MB is not very much). So that would point to using 'suggests'. >>> >>> When I worked in Africa (very limited bandwidth) it was actually really >>> good to have these kind of 'suggests', because then you can only >>> download what is really necessary. >>> >>> Something I haven't tested: what happens if we use 'suggests' and the >>> user wants to run an example (and GA.data is not installed)? Will (s)he >>> get an error/warning message? I guess so if each example in GenABEL has >>> a 'require(GenABEL.data)' line at the start. If the message to the user >>> is very clear then "suggests" is fine. Otherwise I would go with the old >>> behavior (have everything installed): 'depends'. >>> >>> >>> >>> Lennart. >>> >>> >>> >>> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>>> Hi Yurii & Lennart, >>>> >>>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>>> >>>> ________________________________________________________________ >>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>> >>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>> examples using the data from this packages start with something like >>>>> >>>>> if (require("GenABEL(.)data") ... >>>> This sounds like a good solution. >>>> ________________________________________________________________ >>>> >>>> >>>> Today, you propose to make it 'depends' or I misunderstand something here? >>>> >>>> About how other people do it: I looked in GANPAdata and gamlss.data >>>> packages. They 'depends' on GANPA and gamlss (see my message below). >>>> >>>> best, >>>> Maksim >>>> >>>> >>>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>>> Lennart, >>>>> >>>>> Good point about "depends"! >>>>> >>>>> Again, my question would be how other people do it? >>>>> >>>>> Y >>>>> >>>>> ---------------------- >>>>> Yurii Aulchenko >>>>> (sent from mobile device) >>>>> >>>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>>> >>>>>> Hi Maksim, >>>>>> >>>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>>> I looked at how other developres deal with issue of dependency >>>>>>> between a >>>>>>> package and its data.package. I checked out two random packages from >>>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>>> and gamlss) dependes on their data packages - that means their >>>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>>> Depends/Suggests gamlss). >>>>>>> >>>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>>> This >>>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>>> 'requre(...'. >>>>>> I agree with you that the dependence between GA and GA.data is rather >>>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>>> same behaviour as before (install everything by default). Sounds >>>>>> convenient to me. >>>>>> With modern internet bandwidth the few MB of the data package are not a >>>>>> problem. >>>>>> >>>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>>> it is not a standalone one. >>>>>> Good point! >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Lennart. >>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>>> >>>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>>> examples using the data from this packages start with something like >>>>>>>>> >>>>>>>>> if (require("GenABEL(.)data") ... >>>>>>>> This sounds like a good solution. >>>>>>>> >>>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>>> >>>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>>> seem ok to me :) >>>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>>> because that's already used now. >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>>> best, Yurii >>>>>>>>> >>>>>>>>> >>>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I've been following this conversation with much interest although >>>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>>> >>>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>>> user actually having to do it himself. >>>>>>>>>> >>>>>>>>>> Best. >>>>>>>>>> >>>>>>>>>> Nicola >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>>> +390403785539 >>>>>>>>>> >>>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>>> ha scritto: >>>>>>>>>> >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>>> >>>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>>> better way plz). >>>>>>>>>>>> >>>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>>> install GenABELL. >>>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>>> >>>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Lennart. >>>>>>>>>>> >>>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>>> >>>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>>> know. >>>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>>> see you point. >>>>>>>>>>>>> >>>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>>> Good point! >>>>>>>>>>>>> >>>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>>> time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Lennart >>>>>>>>>>>>> >>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>> L.C. Karssen >>>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>> >>>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>>> invitiamo a >>>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>>> message, >>>>>>>> or responsible for message delivering to that person, or if you have >>>>>>>> received this message in error, you may not transcribe, copy or >>>>>>>> deliver >>>>>>>> this message to anyone. In that case, you should delete this >>>>>>>> message and >>>>>>>> its attachments. Thank you. >>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> -- >>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>>> L.C. Karssen >>>>>> Utrecht >>>>>> The Netherlands >>>>>> >>>>>> lennart at karssen.org >>>>>> http://blog.karssen.org >>>>>> GPG key ID: A88F554A >>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 12:57:11 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 18:57:11 +0700 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52987D83.3050700@karssen.org> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> <52987D83.3050700@karssen.org> Message-ID: <52988117.3050507@mail.ru> Now, it should be installed automatically if users install it from CRAN (I hope). If you install it through R CMD INSTALL, then you have to install GenABEL.data first and then the GenABEL. If you change it from 'Depends' to 'Suggests' then GenABEL can be installed successfully without GenA.data. If user type data(srdta) than it gives an error message saying that srdta does not exist. If users runs one of example, it gets the error message saying that GenABEL.data is not installed. I think in this case user should understand that (s)he should run install.package("GenABEL.data") but I guess there will be users who will get depressed seeing this message. So: 'depends' is user friendly if everything works ok. best, Maksim On 29/11/2013 18:41, L.C. Karssen wrote: > Hi Maksim, > > Sorry if I don't understand you correctly, but that doesn't sound > user-friendly. Why should people need to install GA.data first? > Actually, if GA depends on GA.data, GA.data is installed automatically > when the user runs install.packages("GenABEL"), right? > > > Lennart. > > On 29-11-13 12:37, Maksim Struchalin wrote: >> Changed back to 'depend'. Now users have to install GenABEL.data first >> then GenABEL. >> Maksim >> >> On 29/11/2013 18:16, L.C. Karssen wrote: >>> Hi Maksim, >>> >>> Good that you raise this again. I've been thinking about it a bit longer. >>> What is the point of separating the data into a separate package if the >>> user still downloads it automatically ('depends'). The idea behind the >>> data package is of course to only download what is necessary (even if a >>> few MB is not very much). So that would point to using 'suggests'. >>> >>> When I worked in Africa (very limited bandwidth) it was actually really >>> good to have these kind of 'suggests', because then you can only >>> download what is really necessary. >>> >>> Something I haven't tested: what happens if we use 'suggests' and the >>> user wants to run an example (and GA.data is not installed)? Will (s)he >>> get an error/warning message? I guess so if each example in GenABEL has >>> a 'require(GenABEL.data)' line at the start. If the message to the user >>> is very clear then "suggests" is fine. Otherwise I would go with the old >>> behavior (have everything installed): 'depends'. >>> >>> >>> >>> Lennart. >>> >>> >>> >>> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>>> Hi Yurii & Lennart, >>>> >>>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>>> >>>> ________________________________________________________________ >>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>> >>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>> examples using the data from this packages start with something like >>>>> >>>>> if (require("GenABEL(.)data") ... >>>> This sounds like a good solution. >>>> ________________________________________________________________ >>>> >>>> >>>> Today, you propose to make it 'depends' or I misunderstand something here? >>>> >>>> About how other people do it: I looked in GANPAdata and gamlss.data >>>> packages. They 'depends' on GANPA and gamlss (see my message below). >>>> >>>> best, >>>> Maksim >>>> >>>> >>>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>>> Lennart, >>>>> >>>>> Good point about "depends"! >>>>> >>>>> Again, my question would be how other people do it? >>>>> >>>>> Y >>>>> >>>>> ---------------------- >>>>> Yurii Aulchenko >>>>> (sent from mobile device) >>>>> >>>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>>> >>>>>> Hi Maksim, >>>>>> >>>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>>> I looked at how other developres deal with issue of dependency >>>>>>> between a >>>>>>> package and its data.package. I checked out two random packages from >>>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>>> and gamlss) dependes on their data packages - that means their >>>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>>> Depends/Suggests gamlss). >>>>>>> >>>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>>> This >>>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>>> 'requre(...'. >>>>>> I agree with you that the dependence between GA and GA.data is rather >>>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>>> same behaviour as before (install everything by default). Sounds >>>>>> convenient to me. >>>>>> With modern internet bandwidth the few MB of the data package are not a >>>>>> problem. >>>>>> >>>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>>> it is not a standalone one. >>>>>> Good point! >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Lennart. >>>>>> >>>>>>> best, >>>>>>> Maksim >>>>>>> >>>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>>> >>>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>>> examples using the data from this packages start with something like >>>>>>>>> >>>>>>>>> if (require("GenABEL(.)data") ... >>>>>>>> This sounds like a good solution. >>>>>>>> >>>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>>> >>>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>>> seem ok to me :) >>>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>>> because that's already used now. >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Lennart. >>>>>>>> >>>>>>>> >>>>>>>>> best, Yurii >>>>>>>>> >>>>>>>>> >>>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I've been following this conversation with much interest although >>>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>>> >>>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>>> user actually having to do it himself. >>>>>>>>>> >>>>>>>>>> Best. >>>>>>>>>> >>>>>>>>>> Nicola >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>>> +390403785539 >>>>>>>>>> >>>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>>> ha scritto: >>>>>>>>>> >>>>>>>>>>> Hi Maksim, >>>>>>>>>>> >>>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>>> >>>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>>> better way plz). >>>>>>>>>>>> >>>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>>> install GenABELL. >>>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>>> >>>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Lennart. >>>>>>>>>>> >>>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>>> >>>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>>> know. >>>>>>>>>>>> >>>>>>>>>>>> best, Maksim >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>>> see you point. >>>>>>>>>>>>> >>>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>>> Good point! >>>>>>>>>>>>> >>>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>>> time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Lennart >>>>>>>>>>>>> >>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>> L.C. Karssen >>>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>>> -- >>>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>> >>>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>>> invitiamo a >>>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>>> message, >>>>>>>> or responsible for message delivering to that person, or if you have >>>>>>>> received this message in error, you may not transcribe, copy or >>>>>>>> deliver >>>>>>>> this message to anyone. In that case, you should delete this >>>>>>>> message and >>>>>>>> its attachments. Thank you. >>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> genabel-devel mailing list >>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> -- >>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>>> L.C. Karssen >>>>>> Utrecht >>>>>> The Netherlands >>>>>> >>>>>> lennart at karssen.org >>>>>> http://blog.karssen.org >>>>>> GPG key ID: A88F554A >>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >> > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Fri Nov 29 13:01:03 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 29 Nov 2013 13:01:03 +0100 Subject: [GenABEL-dev] new approach for data storage in GenABEL package In-Reply-To: <52988117.3050507@mail.ru> References: <528542EC.1030006@mail.ru> <528649FE.7090501@karssen.org> <529481D6.5070200@mail.ru> <5294A694.4090306@karssen.org> <529640CC.7000006@mail.ru> <52972214.8030602@karssen.org> <529727F7.1000204@karssen.org> <529845B6.7060009@mail.ru> <5298600A.40607@karssen.org> <-3891723065030522185@unknownmsgid> <52986FA2.9080706@mail.ru> <529877A1.8080207@karssen.org> <52987C75.1050308@mail.ru> <52987D83.3050700@karssen.org> <52988117.3050507@mail.ru> Message-ID: On Nov 29, 2013, at 12:57 PM, Maksim Struchalin wrote: > Now, it should be installed automatically if users install it from CRAN (I hope). > If you install it through R CMD INSTALL, then you have to install GenABEL.data first and then the GenABEL. > > If you change it from 'Depends' to 'Suggests' then GenABEL can be installed successfully without GenA.data. If user type data(srdta) than it gives an error message saying that srdta does not exist. then it will not pass cran checks Y > If users runs one of example, it gets the error message saying that GenABEL.data is not installed. I think in this case user should understand that (s)he should run install.package("GenABEL.data") but I guess there will be users who will get depressed seeing this message. > > So: 'depends' is user friendly if everything works ok. > > best, > Maksim > > On 29/11/2013 18:41, L.C. Karssen wrote: >> Hi Maksim, >> >> Sorry if I don't understand you correctly, but that doesn't sound >> user-friendly. Why should people need to install GA.data first? >> Actually, if GA depends on GA.data, GA.data is installed automatically >> when the user runs install.packages("GenABEL"), right? >> >> >> Lennart. >> >> On 29-11-13 12:37, Maksim Struchalin wrote: >>> Changed back to 'depend'. Now users have to install GenABEL.data first >>> then GenABEL. >>> Maksim >>> >>> On 29/11/2013 18:16, L.C. Karssen wrote: >>>> Hi Maksim, >>>> >>>> Good that you raise this again. I've been thinking about it a bit longer. >>>> What is the point of separating the data into a separate package if the >>>> user still downloads it automatically ('depends'). The idea behind the >>>> data package is of course to only download what is necessary (even if a >>>> few MB is not very much). So that would point to using 'suggests'. >>>> >>>> When I worked in Africa (very limited bandwidth) it was actually really >>>> good to have these kind of 'suggests', because then you can only >>>> download what is really necessary. >>>> >>>> Something I haven't tested: what happens if we use 'suggests' and the >>>> user wants to run an example (and GA.data is not installed)? Will (s)he >>>> get an error/warning message? I guess so if each example in GenABEL has >>>> a 'require(GenABEL.data)' line at the start. If the message to the user >>>> is very clear then "suggests" is fine. Otherwise I would go with the old >>>> behavior (have everything installed): 'depends'. >>>> >>>> >>>> >>>> Lennart. >>>> >>>> >>>> >>>> On 11/29/2013 11:42 AM, Maksim Struchalin wrote: >>>>> Hi Yurii & Lennart, >>>>> >>>>> Yesterday, you supported the idea of making GenABEL.data as 'suggested': >>>>> >>>>> ________________________________________________________________ >>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>> >>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>> examples using the data from this packages start with something like >>>>>> >>>>>> if (require("GenABEL(.)data") ... >>>>> This sounds like a good solution. >>>>> ________________________________________________________________ >>>>> >>>>> >>>>> Today, you propose to make it 'depends' or I misunderstand something here? >>>>> >>>>> About how other people do it: I looked in GANPAdata and gamlss.data >>>>> packages. They 'depends' on GANPA and gamlss (see my message below). >>>>> >>>>> best, >>>>> Maksim >>>>> >>>>> >>>>> On 29/11/2013 16:43, Yurii Aulchenko wrote: >>>>>> Lennart, >>>>>> >>>>>> Good point about "depends"! >>>>>> >>>>>> Again, my question would be how other people do it? >>>>>> >>>>>> Y >>>>>> >>>>>> ---------------------- >>>>>> Yurii Aulchenko >>>>>> (sent from mobile device) >>>>>> >>>>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" wrote: >>>>>>> >>>>>>> Hi Maksim, >>>>>>> >>>>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote: >>>>>>>> I looked at how other developres deal with issue of dependency >>>>>>>> between a >>>>>>>> package and its data.package. I checked out two random packages from >>>>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA >>>>>>>> and gamlss) dependes on their data packages - that means their >>>>>>>> DESCRIPTION files contain a reference to their data packages in the >>>>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not >>>>>>>> Depends/Suggests gamlss). >>>>>>>> >>>>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the >>>>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data >>>>>>>> is installed automaticly when users run "install.package(GenABEL)". >>>>>>>> This >>>>>>>> is convinient for users who install GenABEL from CRAN and this is in >>>>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the >>>>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak >>>>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support >>>>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including >>>>>>>> 'requre(...'. >>>>>>> I agree with you that the dependence between GA and GA.data is rather >>>>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the >>>>>>> same behaviour as before (install everything by default). Sounds >>>>>>> convenient to me. >>>>>>> With modern internet bandwidth the few MB of the data package are not a >>>>>>> problem. >>>>>>> >>>>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear >>>>>>>> that this package is some kind of a 'subpackage' of GenABEL package and >>>>>>>> it is not a standalone one. >>>>>>> Good point! >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Lennart. >>>>>>> >>>>>>>> best, >>>>>>>> Maksim >>>>>>>> >>>>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote: >>>>>>>>> >>>>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote: >>>>>>>>>> I would think that GenABEL(.)data is "suggested" and then any >>>>>>>>>> examples using the data from this packages start with something like >>>>>>>>>> >>>>>>>>>> if (require("GenABEL(.)data") ... >>>>>>>>> This sounds like a good solution. >>>>>>>>> >>>>>>>>>> How do other packages which lean on data-packages solve this? >>>>>>>>>> >>>>>>>>>> As for the "dot" - I do not have any strong opinion - both options >>>>>>>>>> seem ok to me :) >>>>>>>>> Great :-). Then I propose (of course) to stick with the dot, also >>>>>>>>> because that's already used now. >>>>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Lennart. >>>>>>>>> >>>>>>>>> >>>>>>>>>> best, Yurii >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I've been following this conversation with much interest although >>>>>>>>>>> I'm sorry I can't contribute much. >>>>>>>>>>> >>>>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency >>>>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages >>>>>>>>>>> would result in the installation also of GenABEL.data without the >>>>>>>>>>> user actually having to do it himself. >>>>>>>>>>> >>>>>>>>>>> Best. >>>>>>>>>>> >>>>>>>>>>> Nicola >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences, >>>>>>>>>>> Chirurgical and Health Department University of Trieste Medical >>>>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel. >>>>>>>>>>> +390403785539 >>>>>>>>>>> >>>>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen" >>>>>>>>>>> ha scritto: >>>>>>>>>>> >>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>> >>>>>>>>>>>> First of all, thanks for the good work! >>>>>>>>>>>> >>>>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote: >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> I created a GenABEL.data package where I moved the following >>>>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and >>>>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are >>>>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory >>>>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These >>>>>>>>>>>>> scripts does not go to the final distribution and needed only >>>>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to >>>>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build >>>>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by >>>>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is >>>>>>>>>>>>> not a good idea to do it in such a way but, at least, it is >>>>>>>>>>>>> convinient and has no any reflection on end users (suggest a >>>>>>>>>>>>> better way plz). >>>>>>>>>>>>> >>>>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed >>>>>>>>>>>>> below. It is impossible to generate files during "R CMD >>>>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition >>>>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL >>>>>>>>>>>>> (like CRAN people suggested). In this case, we can install >>>>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we >>>>>>>>>>>>> install GenABELL. >>>>>>>>>>>> This sounds very strange to me. Does the user first need to >>>>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL >>>>>>>>>>>> package? Or do I misunderstand you? What happens if the user >>>>>>>>>>>> installs them in a different order? I guess that shouldn't >>>>>>>>>>>> matter, right, as the package contains only data? >>>>>>>>>>>> >>>>>>>>>>>>> When we run library(GenABEL), it automaticly attaches >>>>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need >>>>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL). >>>>>>>>>>>> And GenABEL.data is only needed if they actually want to use the >>>>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of >>>>>>>>>>>> required packages in the DESCRIPTION file? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Lennart. >>>>>>>>>>>> >>>>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for >>>>>>>>>>>>> GenABEL and 2.4M for GenABEL.data. >>>>>>>>>>>>> >>>>>>>>>>>>> It should work now, but if you experience some problems, let me >>>>>>>>>>>>> know. >>>>>>>>>>>>> >>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote: >>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote: >>>>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To >>>>>>>>>>>>>>> remind you: the idea consists of compressing the original >>>>>>>>>>>>>>> data text files and use them later for generating RData >>>>>>>>>>>>>>> files (e.g. srdta). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yurii proposed to make RData files in examples which use >>>>>>>>>>>>>>> them. I see now only one way how this idea can be >>>>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file >>>>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which >>>>>>>>>>>>>>> generate srdta object. The same procedure for other five >>>>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we >>>>>>>>>>>>>>> have to change 71 files in man directory and, additionally >>>>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able >>>>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)" >>>>>>>>>>>>>>> in a command line (how they get used to) and has to know >>>>>>>>>>>>>>> that the function generate_srdt() now services for these >>>>>>>>>>>>>>> needs. This all sounds nasty :-). >>>>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I >>>>>>>>>>>>>> see you point. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Making the data during package installation time is also a >>>>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible >>>>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL >>>>>>>>>>>>>>> functions which are not available during installation time >>>>>>>>>>>>>>> (they are avaialble only after GenABEL installed). >>>>>>>>>>>>>> Good point! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data >>>>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by >>>>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible >>>>>>>>>>>>>>> to generate RData during installation time using GenABEL >>>>>>>>>>>>>>> functions (which are installed by that time). I think this >>>>>>>>>>>>>>> solution is paltform independent because R rules permit >>>>>>>>>>>>>>> runing *.R scripts to generate data during installation >>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What do you think about making a data package for GenABEL? >>>>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move >>>>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making >>>>>>>>>>>>>>> *ABELdata data packages? >>>>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in >>>>>>>>>>>>>> to this. As for the package name, either GenABELdata or >>>>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit >>>>>>>>>>>>>> clearer in my opinion). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Lennart >>>>>>>>>>>>>> >>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote: >>>>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Maksim, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote: >>>>>>>>>>>>>>>>>> In this email, I propose a new approach which allows >>>>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that >>>>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb. >>>>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here >>>>>>>>>>>>>>>>> :-). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following >>>>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb), >>>>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last >>>>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested >>>>>>>>>>>>>>>>>> to create a new package called GenABELdata and move >>>>>>>>>>>>>>>>>> all the data there. I run through the data and found >>>>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip >>>>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a >>>>>>>>>>>>>>>>>> function guzip() from library R.utils which can >>>>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover: >>>>>>>>>>>>>>>>>> the native R function read.table() can read gzip >>>>>>>>>>>>>>>>>> files without decompression. - Even more: it looks >>>>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only >>>>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and >>>>>>>>>>>>>>>>>> now it is just sitting there and eating space >>>>>>>>>>>>>>>>>> needlessly. >>>>>>>>>>>>>>>>> Sounds like a waste of space! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2) We can delete some files from the "data" >>>>>>>>>>>>>>>>>> directory. The deleted files will be generated on the >>>>>>>>>>>>>>>>>> user computer based on the files from exdata. It can >>>>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or >>>>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach() >>>>>>>>>>>>>>>>>> in R/zzz.R|). >>>>>>>>>>>>>>>>> This sounds like a perfectly acceptable option. >>>>>>>>>>>>>>>> I suggest this is done in the "example" which make use of >>>>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make >>>>>>>>>>>>>>>> things as "robust" as possible and interfere as little as >>>>>>>>>>>>>>>> possible with the usual workflow (which is very much >>>>>>>>>>>>>>>> system-specific, in that we will need to to test on all >>>>>>>>>>>>>>>> platforms) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It will reduce total size of "data" directory from >>>>>>>>>>>>>>>>>> 2.3Mb to 800Kb. >>>>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go >>>>>>>>>>>>>>>>> ahead. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Lennart. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any objections/suggestions? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> best, Maksim >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>>>> >>>>>>>>> -- >>>>>>>>>>>>>>>>> ----------------------------------------------------------------- >>>>>>>>>>>>>>>>> >>>>>>>>> L.C. Karssen >>>>>>>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie >>>>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html >>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>>> >>>>>>>>> -- >>>>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen >>>>>>>>>>>> Utrecht The Netherlands >>>>>>>>>>>> >>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A >>>>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>>> >>>>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute >>>>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati >>>>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se >>>>>>>>> avete ricevuto il messaggio per errore, siete pregati di non >>>>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi >>>>>>>>> invitiamo a >>>>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY >>>>>>>>> NOTICE Confidential information may be contained in this message or in >>>>>>>>> its attachments. If you are not the addressee indicated in this >>>>>>>>> message, >>>>>>>>> or responsible for message delivering to that person, or if you have >>>>>>>>> received this message in error, you may not transcribe, copy or >>>>>>>>> deliver >>>>>>>>> this message to anyone. In that case, you should delete this >>>>>>>>> message and >>>>>>>>> its attachments. Thank you. >>>>>>>>>>> _______________________________________________ genabel-devel >>>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> genabel-devel mailing list >>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> genabel-devel mailing list >>>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>>> >>>>>>> -- >>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* >>>>>>> L.C. Karssen >>>>>>> Utrecht >>>>>>> The Netherlands >>>>>>> >>>>>>> lennart at karssen.org >>>>>>> http://blog.karssen.org >>>>>>> GPG key ID: A88F554A >>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- >>>>>>> >>>>>>> _______________________________________________ >>>>>>> genabel-devel mailing list >>>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>>> >>>>>> _______________________________________________ >>>>>> genabel-devel mailing list >>>>>> genabel-devel at lists.r-forge.r-project.org >>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>>>> >>>>> _______________________________________________ >>>>> genabel-devel mailing list >>>>> genabel-devel at lists.r-forge.r-project.org >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>>> >>>> _______________________________________________ >>>> genabel-devel mailing list >>>> genabel-devel at lists.r-forge.r-project.org >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >>> >>> _______________________________________________ >>> genabel-devel mailing list >>> genabel-devel at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel >>> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.v.struchalin at mail.ru Fri Nov 29 13:55:26 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 19:55:26 +0700 Subject: [GenABEL-dev] preparation for the next submission to CRAN Message-ID: <52988EBE.6010109@mail.ru> Dear All, I updated info in file GenABEL/ChangeLog: exaplained what I changed in GenABEL from the last submission. Please, add your updates if you have ones. If I forgot something, please add it. best, Maksim From m.v.struchalin at mail.ru Fri Nov 29 14:04:34 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Fri, 29 Nov 2013 20:04:34 +0700 Subject: [GenABEL-dev] Jenkins continuous integration server available In-Reply-To: <528A1410.8010802@karssen.org> References: <528A1410.8010802@karssen.org> Message-ID: <529890E2.1000008@mail.ru> Hi Lennart, Is it possible to make a reference to http://www.karssen.org/jenkins/ from http://www.genabel.org/developers or from r-forge (or better from both)? It would be more convinient to acess to it + future new developres will know that such an instrument exists. Now, I have to look for your email to figure out the ref to Jenkins. best, Maksim On 18/11/2013 20:20, L.C. Karssen wrote: > Dear all, > > In order to keep an eye on the build stability of the various C/C++ > packages in the GenABEL suite, I've taken some time in the past weeks to > set up a Jenkins server [1] at http://www.karssen.org/jenkins/ > > Jenkins checks out the source code from R-forge, runs several static > code analysis tools and tries to compile the code. Any errors or > warnings that pop up are listed. This way we can make sure that the code > is always in a proper (compilable) state and also provides hints on how > to improve the code. > > Maarten Kooyman piloted the use of Jenkins at my previous employer. Many > of the things we learned from that setup have been incorporated in this > Jenkins install. Thanks a lot Maarten! > > The ProbABEL project is configured in most detail with checks of the > C/C++ style, violations of the Google coding standards and Valgrind > checks for memory leaks. More checks still need to be added for ProbABEL > and other projects. Consider this a work in progress. > > Have a look around and let me know if you have any questions or > suggestions. > > > Best regards, > > Lennart. > > > [1] http://jenkins-ci.org/ > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From yurii.aulchenko at gmail.com Fri Nov 29 14:06:56 2013 From: yurii.aulchenko at gmail.com (Yury Aulchenko) Date: Fri, 29 Nov 2013 14:06:56 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1436 - pkg/GenABEL In-Reply-To: <20131129125014.5BB9A185010@r-forge.r-project.org> References: <20131129125014.5BB9A185010@r-forge.r-project.org> Message-ID: <04259CB5-51AA-4A2E-9437-DC24094AD3DA@gmail.com> Hi Maksim, we are including extract of these changes when announcing new releases - in that, it is important to give some indication what the changes are about on a grand scale (performance improved? bug fixed? - which bug, did it affect all users?) Few questions below On Nov 29, 2013, at 13:50 PM, noreply at r-forge.r-project.org wrote: > Author: maksim > Date: 2013-11-29 13:50:14 +0100 (Fri, 29 Nov 2013) > New Revision: 1436 > > Modified: > pkg/GenABEL/ChangeLog > Log: > Updated info for the next submission. > > Modified: pkg/GenABEL/ChangeLog > =================================================================== > --- pkg/GenABEL/ChangeLog 2013-11-29 11:37:13 UTC (rev 1435) > +++ pkg/GenABEL/ChangeLog 2013-11-29 12:50:14 UTC (rev 1436) > @@ -1,3 +1,10 @@ > +*** v. 1.8-0 > +(2013.12.06) > + > +Fixed WARNINGs and NOTEs from R CMD check --as-cran. Fixed many errors the word "errors" is dangerous (user may start thinking GenABEL was doing wrong) - I think you mean warnings here > reported by Jenkins: corrected coding style, fixed memory leaks, deleted unused variables. > +Repaired the convert.snp.illumina() function. what is this about? - was it acting wrongly? - I can foresee this comment may generate a lot of anxiety in the users who used this function - they start thinking whether there was a major bug and all their work was screwed up? best, Y > +Moved data objects ge03d2.clean, ge03d2c, ge03d2ex.clean, ge03d2ex, ge03d2 and srdta to a new R package called GenABEL.data. This made GenABEL distribution much smaller. > + > *** v. 1.7-7 > (2013.11.04) > Fixed bug #5040: Spelling of the name of G. Svischeva incorrect. > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From lennart at karssen.org Fri Nov 29 15:39:19 2013 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 29 Nov 2013 15:39:19 +0100 Subject: [GenABEL-dev] Jenkins continuous integration server available In-Reply-To: <529890E2.1000008@mail.ru> References: <528A1410.8010802@karssen.org> <529890E2.1000008@mail.ru> Message-ID: <5298A717.1060705@karssen.org> Good suggestion! I'll do that. Lennart. On 11/29/2013 02:04 PM, Maksim Struchalin wrote: > Hi Lennart, > > Is it possible to make a reference to http://www.karssen.org/jenkins/ > from http://www.genabel.org/developers or from r-forge (or better from > both)? It would be more convinient to acess to it + future new > developres will know that such an instrument exists. Now, I have to look > for your email to figure out the ref to Jenkins. > > best, > Maksim > > > On 18/11/2013 20:20, L.C. Karssen wrote: >> Dear all, >> >> In order to keep an eye on the build stability of the various C/C++ >> packages in the GenABEL suite, I've taken some time in the past weeks to >> set up a Jenkins server [1] at http://www.karssen.org/jenkins/ >> >> Jenkins checks out the source code from R-forge, runs several static >> code analysis tools and tries to compile the code. Any errors or >> warnings that pop up are listed. This way we can make sure that the code >> is always in a proper (compilable) state and also provides hints on how >> to improve the code. >> >> Maarten Kooyman piloted the use of Jenkins at my previous employer. Many >> of the things we learned from that setup have been incorporated in this >> Jenkins install. Thanks a lot Maarten! >> >> The ProbABEL project is configured in most detail with checks of the >> C/C++ style, violations of the Google coding standards and Valgrind >> checks for memory leaks. More checks still need to be added for ProbABEL >> and other projects. Consider this a work in progress. >> >> Have a look around and let me know if you have any questions or >> suggestions. >> >> >> Best regards, >> >> Lennart. >> >> >> [1] http://jenkins-ci.org/ >> >> >> >> _______________________________________________ >> genabel-devel mailing list >> genabel-devel at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > > > > _______________________________________________ > genabel-devel mailing list > genabel-devel at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel > -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen Utrecht The Netherlands lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: OpenPGP digital signature URL: From m.v.struchalin at mail.ru Fri Nov 29 19:02:59 2013 From: m.v.struchalin at mail.ru (Maksim Struchalin) Date: Sat, 30 Nov 2013 01:02:59 +0700 Subject: [GenABEL-dev] [Genabel-commits] r1436 - pkg/GenABEL In-Reply-To: <04259CB5-51AA-4A2E-9437-DC24094AD3DA@gmail.com> References: <20131129125014.5BB9A185010@r-forge.r-project.org> <04259CB5-51AA-4A2E-9437-DC24094AD3DA@gmail.com> Message-ID: <5298D6D3.3050200@mail.ru> Hi Yurii, I changed it in a way you suggested. The message about convert.snp.illumina is deleted. It looks like I broke it when was fixing memory leak issues (deleted massives of data before the data was saved in file). best, Maksim On 29/11/2013 20:06, Yury Aulchenko wrote: > Hi Maksim, > > we are including extract of these changes when announcing new releases - in that, it is important to give some indication what the changes are about on a grand scale (performance improved? bug fixed? - which bug, did it affect all users?) > > Few questions below > > > On Nov 29, 2013, at 13:50 PM, noreply at r-forge.r-project.org wrote: > >> Author: maksim >> Date: 2013-11-29 13:50:14 +0100 (Fri, 29 Nov 2013) >> New Revision: 1436 >> >> Modified: >> pkg/GenABEL/ChangeLog >> Log: >> Updated info for the next submission. >> >> Modified: pkg/GenABEL/ChangeLog >> =================================================================== >> --- pkg/GenABEL/ChangeLog 2013-11-29 11:37:13 UTC (rev 1435) >> +++ pkg/GenABEL/ChangeLog 2013-11-29 12:50:14 UTC (rev 1436) >> @@ -1,3 +1,10 @@ >> +*** v. 1.8-0 >> +(2013.12.06) >> + >> +Fixed WARNINGs and NOTEs from R CMD check --as-cran. Fixed many errors > the word "errors" is dangerous (user may start thinking GenABEL was doing wrong) - I think you mean warnings here > >> reported by Jenkins: corrected coding style, fixed memory leaks, deleted unused variables. >> +Repaired the convert.snp.illumina() function. > what is this about? - was it acting wrongly? - I can foresee this comment may generate a lot of anxiety in the users who used this function - they start thinking whether there was a major bug and all their work was screwed up? > > best, > Y > >> +Moved data objects ge03d2.clean, ge03d2c, ge03d2ex.clean, ge03d2ex, ge03d2 and srdta to a new R package called GenABEL.data. This made GenABEL distribution much smaller. >> + >> *** v. 1.7-7 >> (2013.11.04) >> Fixed bug #5040: Spelling of the name of G. Svischeva incorrect. >> >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits From yurii.aulchenko at gmail.com Fri Nov 29 21:38:22 2013 From: yurii.aulchenko at gmail.com (Yurii Aulchenko) Date: Fri, 29 Nov 2013 21:38:22 +0100 Subject: [GenABEL-dev] [Genabel-commits] r1436 - pkg/GenABEL In-Reply-To: <5298D6D3.3050200@mail.ru> References: <20131129125014.5BB9A185010@r-forge.r-project.org> <04259CB5-51AA-4A2E-9437-DC24094AD3DA@gmail.com> <5298D6D3.3050200@mail.ru> Message-ID: <-8329612302679564043@unknownmsgid> Thank you! These are indeed grand level descriptors and we need to be clear and concise o the user :) ---------------------- Yurii Aulchenko (sent from mobile device) > On Nov 29, 2013, at 19:03, Maksim Struchalin wrote: > > Hi Yurii, > > I changed it in a way you suggested. The message about convert.snp.illumina is deleted. It looks like I broke it when was fixing memory leak issues (deleted massives of data before the data was saved in file). > > best, > Maksim > > >> On 29/11/2013 20:06, Yury Aulchenko wrote: >> Hi Maksim, >> >> we are including extract of these changes when announcing new releases - in that, it is important to give some indication what the changes are about on a grand scale (performance improved? bug fixed? - which bug, did it affect all users?) >> >> Few questions below >> >> >>> On Nov 29, 2013, at 13:50 PM, noreply at r-forge.r-project.org wrote: >>> >>> Author: maksim >>> Date: 2013-11-29 13:50:14 +0100 (Fri, 29 Nov 2013) >>> New Revision: 1436 >>> >>> Modified: >>> pkg/GenABEL/ChangeLog >>> Log: >>> Updated info for the next submission. >>> >>> Modified: pkg/GenABEL/ChangeLog >>> =================================================================== >>> --- pkg/GenABEL/ChangeLog 2013-11-29 11:37:13 UTC (rev 1435) >>> +++ pkg/GenABEL/ChangeLog 2013-11-29 12:50:14 UTC (rev 1436) >>> @@ -1,3 +1,10 @@ >>> +*** v. 1.8-0 >>> +(2013.12.06) >>> + >>> +Fixed WARNINGs and NOTEs from R CMD check --as-cran. Fixed many errors >> the word "errors" is dangerous (user may start thinking GenABEL was doing wrong) - I think you mean warnings here >> >>> reported by Jenkins: corrected coding style, fixed memory leaks, deleted unused variables. >>> +Repaired the convert.snp.illumina() function. >> what is this about? - was it acting wrongly? - I can foresee this comment may generate a lot of anxiety in the users who used this function - they start thinking whether there was a major bug and all their work was screwed up? >> >> best, >> Y >> >>> +Moved data objects ge03d2.clean, ge03d2c, ge03d2ex.clean, ge03d2ex, ge03d2 and srdta to a new R package called GenABEL.data. This made GenABEL distribution much smaller. >>> + >>> *** v. 1.7-7 >>> (2013.11.04) >>> Fixed bug #5040: Spelling of the name of G. Svischeva incorrect. >>> >>> _______________________________________________ >>> Genabel-commits mailing list >>> Genabel-commits at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits >> _______________________________________________ >> Genabel-commits mailing list >> Genabel-commits at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits > > _______________________________________________ > Genabel-commits mailing list > Genabel-commits at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits