[GenABEL-dev] new approach for data storage in GenABEL package
Yury Aulchenko
yurii.aulchenko at gmail.com
Wed Nov 27 20:07:11 CET 2013
Wow, very impressive, Maksim!
Can you please check if GenABEL.data complies with the naming conventions (I do not recall seeing the names with dots as package names; what other data-packages use as the names?)
If naming is ok, do you think we are close to submit to CRAN? With so many changes, I think we should "jump" on the version number (say, to 1.8-0?)
best,
Yurii
On Nov 27, 2013, at 19:58 PM, Maksim Struchalin <m.v.struchalin at mail.ru> wrote:
> Hi All,
>
> I created a GenABEL.data package where I moved the following data: GenABEL/data/* , inst/exdata/srgenos.dat and inst/exdata/srphenos.dat. All the corresponding files are deleted from GenABEL.
> Also, GenABEL.data contains R directory with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These scripts does not go to the final distribution and needed only for possible future usage.
> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after running "R CMD build GenABEL.data". The directories "R" and "inst" are removed by running GenABEL/data/clean.R in "build" process. May be it is not a good idea to do it in such a way but, at least, it is convinient and has no any reflection on end users (suggest a better way plz).
>
> The way how GenABEL.data works now is not like how we discussed below. It is impossible to generate files during "R CMD INSTALL" and undisarable during "R CMD build". The best opition was just to move all the data to GenABEL.data from GenABEL (like CRAN people suggested). In this case, we can install GenABEL.data without having GenABEL installed. After this, we install GenABELL. When we run library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only change for users is that they need to install two packages now (GenABEL.data and GebABEL).
>
> Now we have sizes of both packages much smaller: 469K for GenABEL and 2.4M for GenABEL.data.
>
> It should work now, but if you experience some problems, let me know.
>
> best,
> Maksim
>
>
> On 26/11/2013 20:48, L.C. Karssen wrote:
>> Hi Maksim,
>>
>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote:
>>> I am still in the way of compressing GenABEL data.
>>> To remind you: the idea consists of compressing the original data text
>>> files and use them later for generating RData files (e.g. srdta).
>>>
>>> Yurii proposed to make RData files in examples which use them. I see now
>>> only one way how this idea can be implemented. We replace "data(srdta)"
>>> line in every file where it is used by a function e.g. "generate_srdt()"
>>> which generate srdta object. The same procedure for other five *.RData
>>> files from GenABEL/data. If we follow this way, we have to change 71
>>> files in man directory and, additionally to this, the GenABEL manual.
>>> Also, users will not be able to load the srdta set (and others) by
>>> typing "data(srdta)" in a command line (how they get used to) and has to
>>> know that the function generate_srdt() now services for these needs.
>>> This all sounds nasty :-).
>> I'm not sure how many user actually type data(srdta), but I see you point.
>>
>>> Making the data during package installation time is also a bad idea as
>>> Yurii noted below. Actually, this is impossible because the process of
>>> making GenABEL data requires GenABEL functions which are not available
>>> during installation time (they are avaialble only after GenABEL installed).
>> Good point!
>>
>>> I see only one good solution now: move all the GenABEL data to a new
>>> package e.g. GenABELdata as it was proposed by CRAN people from the
>>> begining. In this case, it is possible to generate RData during
>>> installation time using GenABEL functions (which are installed by that
>>> time). I think this solution is paltform independent because R rules
>>> permit runing *.R scripts to generate data during installation time.
>>>
>>> What do you think about making a data package for GenABEL? Do you think
>>> the name GenABELdata is ok? May be we can move all the *ABEL data in
>>> DatABEL package instead of making *ABELdata data packages?
>> Sounds like this is the best solution. Thanks for digging in to this. As
>> for the package name, either GenABELdata or GenABEL.data sounds find
>> with me (the latter one being a bit clearer in my opinion).
>>
>>
>> Best,
>>
>> Lennart
>>
>>> best,
>>> Maksim
>>>
>>> On 18/11/2013 18:54, Yury Aulchenko wrote:
>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen <lennart at karssen.org> wrote:
>>>>
>>>>> Hi Maksim,
>>>>>
>>>>> On 14-11-13 22:38, Maksim Struchalin wrote:
>>>>>> In this email, I propose a new approach which allows to reduce total
>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from
>>>>>> 12Mb to 6Mb.
>>>>> I gues you mean B (bytes) instead of b (bits) here :-).
>>>>>
>>>>>> "R CMD check --as-cran" reports that the following sub-directories have
>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the
>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a
>>>>>> new package called GenABELdata and move all the data there. I run
>>>>>> through the data and found that:
>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb
>>>>>> -> 1.1Mb.
>>>>>> - There is a function guzip() from library R.utils which can
>>>>>> decompress the files. It works on any OS.
>>>>>> - Moreover: the native R function read.table() can read gzip files
>>>>>> without decompression.
>>>>>> - Even more: it looks like that the biggest file "srgenos.dat" is
>>>>>> used only once a long time ago for generating "srdta.RData" and now it
>>>>>> is just sitting there and eating space needlessly.
>>>>> Sounds like a waste of space!
>>>>>
>>>>>> 2) We can delete some files from the "data" directory. The deleted
>>>>>> files
>>>>>> will be generated on the user computer based on the files from exdata.
>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the
>>>>>> first
>>>>>> load through (|run funcion .onAttach() in R/zzz.R|).
>>>>> This sounds like a perfectly acceptable option.
>>>> I suggest this is done in the "example" which make use of this data,
>>>> NOT in the INSTALL etc. - we should make things as "robust" as
>>>> possible and interfere as little as possible with the usual workflow
>>>> (which is very much system-specific, in that we will need to to test
>>>> on all platforms)
>>>>
>>>>
>>>>>> It will reduce
>>>>>> total size of "data" directory from 2.3Mb to 800Kb.
>>>>> Fantastic! If no one has other objections I say: go ahead.
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Lennart.
>>>>>
>>>>>
>>>>>> Any objections/suggestions?
>>>>>>
>>>>>> best,
>>>>>> Maksim
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> genabel-devel mailing list
>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>
>>>>>>
>>>>> --
>>>>> -----------------------------------------------------------------
>>>>> L.C. Karssen
>>>>> Utrecht
>>>>> The Netherlands
>>>>>
>>>>> lennart at karssen.org
>>>>> http://blog.karssen.org
>>>>>
>>>>> Stuur mij aub geen Word of Powerpoint bestanden!
>>>>> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
>>>>> ------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> genabel-devel mailing list
>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>
>>>> _______________________________________________
>>>> genabel-devel mailing list
>>>> genabel-devel at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131127/12de3b7c/attachment-0001.html>
More information about the genabel-devel
mailing list