[GenABEL-dev] new approach for data storage in GenABEL package

Maksim Struchalin m.v.struchalin at mail.ru
Fri Nov 29 12:37:25 CET 2013


Changed back to 'depend'. Now users have to install GenABEL.data first 
then GenABEL.
Maksim

On 29/11/2013 18:16, L.C. Karssen wrote:
> Hi Maksim,
>
> Good that you raise this again. I've been thinking about it a bit longer.
> What is the point of separating the data into a separate package if the
> user still downloads it automatically ('depends'). The idea behind the
> data package is of course to only download what is necessary (even if a
> few MB is not very much). So that would point to using 'suggests'.
>
> When I worked in Africa (very limited bandwidth) it was actually really
> good to have these kind of 'suggests', because then you can only
> download what is really necessary.
>
> Something I haven't tested: what happens if we use 'suggests' and the
> user wants to run an example (and GA.data is not installed)? Will (s)he
> get an error/warning message? I guess so if each example in GenABEL has
> a 'require(GenABEL.data)' line at the start. If the message to the user
> is very clear then "suggests" is fine. Otherwise I would go with the old
> behavior (have everything installed): 'depends'.
>
>
>
> Lennart.
>
>
>
> On 11/29/2013 11:42 AM, Maksim Struchalin wrote:
>> Hi Yurii & Lennart,
>>
>> Yesterday, you supported the idea of making GenABEL.data as 'suggested':
>>
>> ________________________________________________________________
>> On 28/11/2013 18:24, L.C. Karssen wrote:
>>
>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote:
>>> I would think that GenABEL(.)data is "suggested" and then any
>>> examples using the data from this packages start with something like
>>>
>>> if (require("GenABEL(.)data") ...
>> This sounds like a good solution.
>> ________________________________________________________________
>>
>>
>> Today, you propose to make it 'depends' or I misunderstand something here?
>>
>> About how other people do it: I looked in GANPAdata and gamlss.data
>> packages. They 'depends' on GANPA and gamlss (see my message below).
>>
>> best,
>> Maksim
>>
>>
>> On 29/11/2013 16:43, Yurii Aulchenko wrote:
>>> Lennart,
>>>
>>> Good point about "depends"!
>>>
>>> Again, my question would be how other people do it?
>>>
>>> Y
>>>
>>> ----------------------
>>> Yurii Aulchenko
>>> (sent from mobile device)
>>>
>>>> On Nov 29, 2013, at 10:36, "L.C. Karssen" <lennart at karssen.org> wrote:
>>>>
>>>> Hi Maksim,
>>>>
>>>>> On 11/29/2013 08:43 AM, Maksim Struchalin wrote:
>>>>> I looked at how other developres deal with issue of dependency
>>>>> between a
>>>>> package and its data.package. I checked out two random packages from
>>>>> CRAN: GANPA (GANPAdata) and gamlss (gamlss.data). Both of them (GANPA
>>>>> and gamlss) dependes on their data packages - that means their
>>>>> DESCRIPTION files contain a reference to their data packages in the
>>>>> "Depends:" field. Only GANPAdata suggests GANPA (gamlss.data does not
>>>>> Depends/Suggests gamlss).
>>>>>
>>>>> When I made GenABEL depending on GenABEL.data, I kept in my mind the
>>>>> same idea as Nicola pronounced below - that, in this case, GenABEL.data
>>>>> is installed automaticly when users run "install.package(GenABEL)".
>>>>> This
>>>>> is convinient for users who install GenABEL from CRAN and this is in
>>>>> line with GANPA and gamlss but it, probably, does not fully reflect the
>>>>> GenABEL reality. The dependency between GenABEL and GenABL.data is weak
>>>>> - GenABEL is gonna be mostly used without GenABEL.data. So, I support
>>>>> the Yurii's idea about making GenABEL.data as 'suggested' and including
>>>>> 'requre(...'.
>>>> I agree with you that the dependence between GA and GA.data is rather
>>>> weak. On the other hand, why not keep GA.data in Depends? That gives the
>>>> same behaviour as before (install everything by default). Sounds
>>>> convenient to me.
>>>> With modern internet bandwidth the few MB of the data package are not a
>>>> problem.
>>>>
>>>>> About dot: Personally, I like GenABEL.data. From this name, It is clear
>>>>> that this package is some kind of a 'subpackage' of GenABEL package and
>>>>> it is not a standalone one.
>>>> Good point!
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Lennart.
>>>>
>>>>> best,
>>>>> Maksim
>>>>>
>>>>>> On 28/11/2013 18:24, L.C. Karssen wrote:
>>>>>>
>>>>>>> On 11/28/2013 12:12 PM, Yury Aulchenko wrote:
>>>>>>> I would think that GenABEL(.)data is "suggested" and then any
>>>>>>> examples using the data from this packages start with something like
>>>>>>>
>>>>>>> if (require("GenABEL(.)data") ...
>>>>>> This sounds like a good solution.
>>>>>>
>>>>>>> How do other packages which lean on data-packages solve this?
>>>>>>>
>>>>>>> As for the "dot" - I do not have any strong opinion - both options
>>>>>>> seem ok to me :)
>>>>>> Great :-). Then I propose (of course) to stick with the dot, also
>>>>>> because that's already used now.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Lennart.
>>>>>>
>>>>>>
>>>>>>> best, Yurii
>>>>>>>
>>>>>>>
>>>>>>> On Nov 28, 2013, at 12:06 PM, Nicola Pirastu
>>>>>>> <nicola.pirastu at burlo.trieste.it> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've been following this conversation with much interest although
>>>>>>>> I'm sorry I can't contribute much.
>>>>>>>>
>>>>>>>> I was just wondering, could GenABEL.data not be just a dependency
>>>>>>>> on GenABEL? This way installing GenABEL trough install.packages
>>>>>>>> would result in the installation also of GenABEL.data without the
>>>>>>>> user actually having to do it himself.
>>>>>>>>
>>>>>>>> Best.
>>>>>>>>
>>>>>>>> Nicola
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr. Nicola Pirastu PhD Research Fellow Medical Sciences,
>>>>>>>> Chirurgical and Health Department University of Trieste Medical
>>>>>>>> Genetics IRCCS Burlo Garofolo Via dell'Istria 65/1 34137 Italy tel.
>>>>>>>> +390403785539
>>>>>>>>
>>>>>>>> Il giorno 28/nov/2013, alle ore 11:59, "L.C. Karssen"
>>>>>>>> <lennart at karssen.org> ha scritto:
>>>>>>>>
>>>>>>>>> Hi Maksim,
>>>>>>>>>
>>>>>>>>> First of all, thanks for the good work!
>>>>>>>>>
>>>>>>>>>> On 11/27/2013 07:58 PM, Maksim Struchalin wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> I created a GenABEL.data package where I moved the following
>>>>>>>>>> data: GenABEL/data/* , inst/exdata/srgenos.dat and
>>>>>>>>>> inst/exdata/srphenos.dat. All the corresponding files are
>>>>>>>>>> deleted from GenABEL. Also, GenABEL.data contains R directory
>>>>>>>>>> with three files (ge03d2c.R, ge03d2ex.R and srdta.R). These
>>>>>>>>>> scripts does not go to the final distribution and needed only
>>>>>>>>>> for possible future usage. Only GenABEL.data/data/* files go to
>>>>>>>>>> GenABEL.data_1.0.tar.gz after running "R CMD build
>>>>>>>>>> GenABEL.data". The directories "R" and "inst" are removed by
>>>>>>>>>> running GenABEL/data/clean.R in "build" process. May be it is
>>>>>>>>>> not a good idea to do it in such a way but, at least, it is
>>>>>>>>>> convinient and has no any reflection on end users (suggest a
>>>>>>>>>> better way plz).
>>>>>>>>>>
>>>>>>>>>> The way how GenABEL.data works now is not like how we discussed
>>>>>>>>>> below. It is impossible to generate files during "R CMD
>>>>>>>>>> INSTALL" and undisarable during "R CMD build". The best opition
>>>>>>>>>> was just to move all the data to GenABEL.data from GenABEL
>>>>>>>>>> (like CRAN people suggested). In this case, we can install
>>>>>>>>>> GenABEL.data without having GenABEL installed. After this, we
>>>>>>>>>> install GenABELL.
>>>>>>>>> This sounds very strange to me. Does the user first need to
>>>>>>>>> install the GenABEL.data package and then the 'main' GenABEL
>>>>>>>>> package? Or do I misunderstand you? What happens if the user
>>>>>>>>> installs them in a different order? I guess that shouldn't
>>>>>>>>> matter, right, as the package contains only data?
>>>>>>>>>
>>>>>>>>>> When we run library(GenABEL), it automaticly attaches
>>>>>>>>>> GenBEL.data. Thus, the only change for users is that they need
>>>>>>>>>> to install two packages now (GenABEL.data and GebABEL).
>>>>>>>>> And GenABEL.data is only needed if they actually want to use the
>>>>>>>>> examples, right? Or do we simply put GenABEL.data in the list of
>>>>>>>>> required packages in the DESCRIPTION file?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Lennart.
>>>>>>>>>
>>>>>>>>>> Now we have sizes of both packages much smaller: 469K for
>>>>>>>>>> GenABEL and 2.4M for GenABEL.data.
>>>>>>>>>>
>>>>>>>>>> It should work now, but if you experience some problems, let me
>>>>>>>>>> know.
>>>>>>>>>>
>>>>>>>>>> best, Maksim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 26/11/2013 20:48, L.C. Karssen wrote:
>>>>>>>>>>> Hi Maksim,
>>>>>>>>>>>
>>>>>>>>>>>> On 11/26/2013 12:11 PM, Maksim Struchalin wrote:
>>>>>>>>>>>> I am still in the way of compressing GenABEL data. To
>>>>>>>>>>>> remind you: the idea consists of compressing the original
>>>>>>>>>>>> data text files and use them later for generating RData
>>>>>>>>>>>> files (e.g. srdta).
>>>>>>>>>>>>
>>>>>>>>>>>> Yurii proposed to make RData files in examples which use
>>>>>>>>>>>> them. I see now only one way how this idea can be
>>>>>>>>>>>> implemented. We replace "data(srdta)" line in every file
>>>>>>>>>>>> where it is used by a function e.g. "generate_srdt()" which
>>>>>>>>>>>> generate srdta object. The same procedure for other five
>>>>>>>>>>>> *.RData files from GenABEL/data. If we follow this way, we
>>>>>>>>>>>> have to change 71 files in man directory and, additionally
>>>>>>>>>>>> to this, the GenABEL manual. Also, users will not be able
>>>>>>>>>>>> to load the srdta set (and others) by typing "data(srdta)"
>>>>>>>>>>>> in a command line (how they get used to) and has to know
>>>>>>>>>>>> that the function generate_srdt() now services for these
>>>>>>>>>>>> needs. This all sounds nasty :-).
>>>>>>>>>>> I'm not sure how many user actually type data(srdta), but I
>>>>>>>>>>> see you point.
>>>>>>>>>>>
>>>>>>>>>>>> Making the data during package installation time is also a
>>>>>>>>>>>> bad idea as Yurii noted below. Actually, this is impossible
>>>>>>>>>>>> because the process of making GenABEL data requires GenABEL
>>>>>>>>>>>> functions which are not available during installation time
>>>>>>>>>>>> (they are avaialble only after GenABEL installed).
>>>>>>>>>>> Good point!
>>>>>>>>>>>
>>>>>>>>>>>> I see only one good solution now: move all the GenABEL data
>>>>>>>>>>>> to a new package e.g. GenABELdata as it was proposed by
>>>>>>>>>>>> CRAN people from the begining. In this case, it is possible
>>>>>>>>>>>> to generate RData during installation time using GenABEL
>>>>>>>>>>>> functions (which are installed by that time). I think this
>>>>>>>>>>>> solution is paltform independent because R rules permit
>>>>>>>>>>>> runing *.R scripts to generate data during installation
>>>>>>>>>>>> time.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think about making a data package for GenABEL?
>>>>>>>>>>>> Do you think the name GenABELdata is ok? May be we can move
>>>>>>>>>>>> all the *ABEL data in DatABEL package instead of making
>>>>>>>>>>>> *ABELdata data packages?
>>>>>>>>>>> Sounds like this is the best solution. Thanks for digging in
>>>>>>>>>>> to this. As for the package name, either GenABELdata or
>>>>>>>>>>> GenABEL.data sounds find with me (the latter one being a bit
>>>>>>>>>>> clearer in my opinion).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Lennart
>>>>>>>>>>>
>>>>>>>>>>>> best, Maksim
>>>>>>>>>>>>
>>>>>>>>>>>>> On 18/11/2013 18:54, Yury Aulchenko wrote:
>>>>>>>>>>>>> On Nov 15, 2013, at 17:21 PM, L.C. Karssen
>>>>>>>>>>>>> <lennart at karssen.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Maksim,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 14-11-13 22:38, Maksim Struchalin wrote:
>>>>>>>>>>>>>>> In this email, I propose a new approach which allows
>>>>>>>>>>>>>>> to reduce total size of data from 8Mb to 2Mb that
>>>>>>>>>>>>>>> reduce the entire GenABEL size from 12Mb to 6Mb.
>>>>>>>>>>>>>> I gues you mean B (bytes) instead of b (bits) here
>>>>>>>>>>>>>> :-).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "R CMD check --as-cran" reports that the following
>>>>>>>>>>>>>>> sub-directories have too big size: data (2.3Mb),
>>>>>>>>>>>>>>> exdata (5.7Mb) and libs (2.6Mb). After the last
>>>>>>>>>>>>>>> GenABEL submission to CRAN, the maintainers suggested
>>>>>>>>>>>>>>> to create a new package called GenABELdata and move
>>>>>>>>>>>>>>> all the data there. I run through the data and found
>>>>>>>>>>>>>>> that: 1) "exdata" directory can be compressed by gzip
>>>>>>>>>>>>>>> and reduced from 5.8Mb -> 1.1Mb. - There is a
>>>>>>>>>>>>>>> function guzip() from library R.utils which can
>>>>>>>>>>>>>>> decompress the files. It works on any OS. - Moreover:
>>>>>>>>>>>>>>> the native R function read.table() can read gzip
>>>>>>>>>>>>>>> files without decompression. - Even more: it looks
>>>>>>>>>>>>>>> like that the biggest file "srgenos.dat" is used only
>>>>>>>>>>>>>>> once a long time ago for generating "srdta.RData" and
>>>>>>>>>>>>>>> now it is just sitting there and eating space
>>>>>>>>>>>>>>> needlessly.
>>>>>>>>>>>>>> Sounds like a waste of space!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) We can delete some files from the "data"
>>>>>>>>>>>>>>> directory. The deleted files will be generated on the
>>>>>>>>>>>>>>> user computer based on the files from exdata. It can
>>>>>>>>>>>>>>> be done during INSTALLATION (a line in Makefile?) or
>>>>>>>>>>>>>>> on the first load through (|run funcion .onAttach()
>>>>>>>>>>>>>>> in R/zzz.R|).
>>>>>>>>>>>>>> This sounds like a perfectly acceptable option.
>>>>>>>>>>>>> I suggest this is done in the "example" which make use of
>>>>>>>>>>>>> this data, NOT in the INSTALL etc. - we should make
>>>>>>>>>>>>> things as "robust" as possible and interfere as little as
>>>>>>>>>>>>> possible with the usual workflow (which is very much
>>>>>>>>>>>>> system-specific, in that we will need to to test on all
>>>>>>>>>>>>> platforms)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It will reduce total size of "data" directory from
>>>>>>>>>>>>>>> 2.3Mb to 800Kb.
>>>>>>>>>>>>>> Fantastic! If no one has other objections I say: go
>>>>>>>>>>>>>> ahead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lennart.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any objections/suggestions?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> best, Maksim
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>>>>>>
>>>>>> -- 
>>>>>>>>>>>>>> -----------------------------------------------------------------
>>>>>>>>>>>>>>
>>>>>> L.C. Karssen
>>>>>>>>>>>>>> Utrecht The Netherlands
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> lennart at karssen.org http://blog.karssen.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stuur mij aub geen Word of Powerpoint bestanden! Zie
>>>>>>>>>>>>>> http://www.gnu.org/philosophy/no-word-attachments.nl.html
>>>>>> ------------------------------------------------------------------
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>> genabel-devel mailing list
>>>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>>
>>>>>> -- 
>>>>>>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* L.C. Karssen
>>>>>>>>> Utrecht The Netherlands
>>>>>>>>>
>>>>>>>>> lennart at karssen.org http://blog.karssen.org GPG key ID: A88F554A
>>>>>>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>>>>>>>>>
>>>>>>>>> _______________________________________________ genabel-devel
>>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org
>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>>
>>>>>> AVVISO DI RISERVATEZZA Informazioni riservate possono essere contenute
>>>>>> nel messaggio o nei suoi allegati. Se non siete i destinatari indicati
>>>>>> nel messaggio, o responsabili per la sua consegna alla persona, o se
>>>>>> avete ricevuto il messaggio per errore, siete pregati di non
>>>>>> trascriverlo, copiarlo o inviarlo a nessuno. In tal caso vi
>>>>>> invitiamo a
>>>>>> cancellare il messaggio ed i suoi allegati. Grazie. CONFIDENTIALITY
>>>>>> NOTICE Confidential information may be contained in this message or in
>>>>>> its attachments. If you are not the addressee indicated in this
>>>>>> message,
>>>>>> or responsible for message delivering to that person, or if you have
>>>>>> received this message in error, you may not transcribe, copy or
>>>>>> deliver
>>>>>> this message to anyone. In that case, you should delete this
>>>>>> message and
>>>>>> its attachments. Thank you.
>>>>>>>> _______________________________________________ genabel-devel
>>>>>>>> mailing list genabel-devel at lists.r-forge.r-project.org
>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> genabel-devel mailing list
>>>>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> genabel-devel mailing list
>>>>> genabel-devel at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>>
>>>> -- 
>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>>> L.C. Karssen
>>>> Utrecht
>>>> The Netherlands
>>>>
>>>> lennart at karssen.org
>>>> http://blog.karssen.org
>>>> GPG key ID: A88F554A
>>>> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>>>>
>>>> _______________________________________________
>>>> genabel-devel mailing list
>>>> genabel-devel at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131129/b15c7f9c/attachment-0001.html>


More information about the genabel-devel mailing list