[GenABEL-dev] new approach for data storage in GenABEL package

Maksim Struchalin m.v.struchalin at mail.ru
Wed Nov 27 20:45:20 CET 2013


About package names: There are 5056 packages on CRAN. 42 of them are 
data packages and 6 of them has name like packagename.data 
("cluster.datasets", "data.table", "gamlss.data", "g.data", 
"survJamda.data" and "TH.data"). Thus, GenABELdata would be more in line 
then GenABEL.data.

About submission to CRAN: I still see small warnings in R CMD --check 
output and at least one FAILURE in test.polylik from RUnit test. I think 
we can make the next submission to CRAN by the end of the next week: one 
week for testing our new data package + fixing small errors.

p.s. We should submit to CRAN GenABEL.data as well.

best,
Maksim



On 28/11/2013 02:07, Yury Aulchenko wrote:
> Wow, very impressive, Maksim!
>
> Can you please check if GenABEL.data complies with the naming 
> conventions (I do not recall seeing the names with dots as package 
> names; what other data-packages use as the names?)
>
> If naming is ok, do you think we are close to submit to CRAN? With so 
> many changes, I think we should "jump" on the version number (say, to 
> 1.8-0?)
>
> best,
> Yurii
>
> On Nov 27, 2013, at 19:58 PM, Maksim Struchalin 
> <m.v.struchalin at mail.ru <mailto:m.v.struchalin at mail.ru>> wrote:
>
>> Hi All,
>>
>> I created a GenABEL.data package where I moved the following data: 
>> GenABEL/data/* , inst/exdata/srgenos.dat and 
>> inst/exdata/srphenos.dat. All the corresponding files are deleted 
>> from GenABEL.
>> Also, GenABEL.data contains R directory with three files (ge03d2c.R, 
>> ge03d2ex.R and srdta.R). These scripts does not go to the final 
>> distribution and needed only for possible future usage.
>> Only GenABEL.data/data/* files go to GenABEL.data_1.0.tar.gz after 
>> running "R CMD build GenABEL.data". The directories "R" and "inst" 
>> are removed by running GenABEL/data/clean.R in "build" process. May 
>> be it is not a good idea to do it in such a way but, at least, it is 
>> convinient and has no any reflection on end users (suggest a better 
>> way plz).
>>
>> The way how GenABEL.data works now is not like how we discussed 
>> below. It is impossible to generate files during "R CMD INSTALL" and 
>> undisarable during "R CMD build". The best opition was just to move 
>> all the data to GenABEL.data from GenABEL (like CRAN people 
>> suggested). In this case, we can install GenABEL.data without having 
>> GenABEL installed. After this, we install GenABELL. When we run 
>> library(GenABEL), it automaticly attaches GenBEL.data. Thus, the only 
>> change for users is that they need to install two packages now 
>> (GenABEL.data and GebABEL).
>>
>> Now we have sizes of both packages much smaller: 469K for GenABEL and 
>> 2.4M for GenABEL.data.
>>
>> It should work now, but if you experience some problems, let me know.
>>
>> best,
>> Maksim
>>
>>
>> On 26/11/2013 20:48, L.C. Karssen wrote:
>>> Hi Maksim,
>>>
>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote:
>>>> I am still in the way of compressing GenABEL data.
>>>> To remind you: the idea consists of compressing the original data text
>>>> files and use them later for generating RData files (e.g. srdta).
>>>>
>>>> Yurii proposed to make RData files in examples which use them. I see now
>>>> only one way how this idea can be implemented. We replace "data(srdta)"
>>>> line in every file where it is used by a function e.g. "generate_srdt()"
>>>> which generate srdta object. The same procedure for other five *.RData
>>>> files from GenABEL/data. If we follow this way, we have to change 71
>>>> files in man directory and, additionally to this, the GenABEL manual.
>>>> Also, users will not be able to load the srdta set (and others) by
>>>> typing "data(srdta)" in a command line (how they get used to) and has to
>>>> know that the function generate_srdt() now services for these needs.
>>>> This all sounds nasty :-).
>>> I'm not sure how many user actually type data(srdta), but I see you point.
>>>
>>>> Making the data during package installation time is also a bad idea as
>>>> Yurii noted below. Actually, this is impossible because the process of
>>>> making GenABEL data requires GenABEL functions which are not available
>>>> during installation time (they are avaialble only after GenABEL installed).
>>> Good point!
>>>
>>>> I see only one good solution now: move all the GenABEL data to a new
>>>> package e.g. GenABELdata as it was proposed by CRAN people from the
>>>> begining. In this case, it is possible to generate RData during
>>>> installation time using GenABEL functions (which are installed by that
>>>> time). I think this solution is paltform independent because R rules
>>>> permit runing *.R scripts to generate data during installation time.
>>>>
>>>> What do you think about making a data package for GenABEL? Do you think
>>>> the name GenABELdata is ok? May be we can move all the *ABEL data in
>>>> DatABEL package instead of making *ABELdata data packages?
>>> Sounds like this is the best solution. Thanks for digging in to this. As
>>> for the package name, either GenABELdata or GenABEL.data sounds find
>>> with me (the latter one being a bit clearer in my opinion).
>>>
>>>
>>> Best,
>>>
>>> Lennart
>>>
>>>> best,
>>>> Maksim
>>>>
>>>> On 18/11/2013 18:54, Yury Aulchenko wrote:
>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen<lennart at karssen.org>  wrote:
>>>>>
>>>>>> Hi Maksim,
>>>>>>
>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote:
>>>>>>> In this email, I propose a new approach which allows to reduce total
>>>>>>> size of data from 8Mb to 2Mb that reduce the entire GenABEL size from
>>>>>>> 12Mb to 6Mb.
>>>>>> I gues you mean B (bytes) instead of b (bits) here :-).
>>>>>>
>>>>>>> "R CMD check --as-cran" reports that the following sub-directories have
>>>>>>> too big size: data (2.3Mb), exdata (5.7Mb) and libs (2.6Mb). After the
>>>>>>> last GenABEL submission to CRAN, the maintainers suggested to create a
>>>>>>> new package called GenABELdata and move all the data there. I run
>>>>>>> through the data and found that:
>>>>>>> 1) "exdata" directory can be compressed by gzip and reduced from 5.8Mb
>>>>>>> -> 1.1Mb.
>>>>>>>      - There is a function guzip() from library R.utils which can
>>>>>>> decompress the files. It works on any OS.
>>>>>>>      - Moreover: the native R function read.table() can read gzip files
>>>>>>> without decompression.
>>>>>>>      - Even more: it looks like that the biggest file "srgenos.dat" is
>>>>>>> used only once a long time ago for generating "srdta.RData" and now it
>>>>>>> is just sitting there and eating space needlessly.
>>>>>> Sounds like a waste of space!
>>>>>>
>>>>>>> 2) We can delete some files from the "data" directory. The deleted
>>>>>>> files
>>>>>>> will be generated on the user computer based on the files from exdata.
>>>>>>> It can be done during INSTALLATION (a line in Makefile?) or on the
>>>>>>> first
>>>>>>> load through (|run funcion .onAttach() in R/zzz.R|).
>>>>>> This sounds like a perfectly acceptable option.
>>>>> I suggest this is done in the "example" which make use of this data,
>>>>> NOT in the INSTALL etc. - we should make things as "robust" as
>>>>> possible and interfere as little as possible with the usual workflow
>>>>> (which is very much system-specific, in that we will need to to test
>>>>> on all platforms)
>>>>>
>>>>>
>>>>>>> It will reduce
>>>>>>> total size of "data" directory from 2.3Mb to 800Kb.
>>>>>> Fantastic! If no one has other objections I say: go ahead.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Lennart.
>>>>>>
>>>>>>
>>>>>>> Any objections/suggestions?
>>>>>>>
>>>>>>> best,
>>>>>>> Maksim
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> genabel-devel mailing list
>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>> -----------------------------------------------------------------
>>>>>> L.C. Karssen
>>>>>> Utrecht
>>>>>> The Netherlands
>>>>>>
>>>>>> lennart at karssen.org
>>>>>> http://blog.karssen.org
>>>>>>
>>>>>> Stuur mij aub geen Word of Powerpoint bestanden!
>>>>>> Ziehttp://www.gnu.org/philosophy/no-word-attachments.nl.html
>>>>>> ------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> genabel-devel mailing list
>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>
>>>>> _______________________________________________
>>>>> genabel-devel mailing list
>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>
>>>> _______________________________________________
>>>> genabel-devel mailing list
>>>> genabel-devel at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>
>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org 
>> <mailto:genabel-devel at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131128/0dec57c2/attachment.html>


More information about the genabel-devel mailing list