[GenABEL-dev] DatABEL
L.C. Karssen
lennart at karssen.org
Fri May 8 12:20:40 CEST 2015
Hi Benjamin,
Sorry for the late reply. We had a couple of national holidays here.
On 04-05-15 11:52, Benjamin Hofner wrote:
> Dear Lennart,
>
> Thank you very much for your reply.
>
> We will start copying all saved files for transferring the information.
> As we are using databel objects in more complex objects we will have to
> think about options to do this (semi-)automatically.
OK. It would be nice if you could let us know how you solved this.
>
> As a short follow up regarding paralleization: We are not estimating SNP
> effects in parallel but multiple models that combine many SNPs. So the
> option you raised might not be suitable. But again we seem to need to
> think about that. Perhaps it will help to look at the filevector
> documentation. Is it possible that you wanted to include a reference [1]?
Sorry about that. Here is the link to the filevector source code in our
SVN server:
- stable versions:
https://r-forge.r-project.org/scm/viewvc.php/tags/filevector/?root=genabel
- development version:
https://r-forge.r-project.org/scm/viewvc.php/pkg/filevector/?root=genabel
the fvfutil directory contains the regular (non-R) tools.
Best,
Lennart.
>
> Thanks for your help,
> Benjamin
>
> Am 01.05.2015 um 16:57 schrieb L.C. Karssen:
>> Hi Benjamin,
>>
>> Thanks for your interest in DatABEL. Because most of DatABEL was
>> developed before I took over maintenance of the package I have put our
>> development mailing list in CC. Just in case one of the other developers
>> wants to chime in.
>>
>> On 28-04-15 14:29, Benjamin Hofner wrote:
>>> Hi Lennart,
>>>
>>> we are currently trying to use your package DatABEL to store the data
>>> for complex GWAS analysis. We are not using your standard tool sets
>>> implemented in GenABLE and co but are trying to implement a novel method
>>> ourselves. Currently, we are facing several problems which are most
>>> likely related to the fact that you store the data on the HDD and use
>>> pointers (?) to access the data.
>>
>> The DatABEL package is basically an R interface to a lower-level library
>> written in C++, which we call filevector [1]. Maybe it's worth looking
>> at that as well. In the source code repo at [1] you will also find a few
>> utilities written in C++ to convert text files to and from fvi/fvd files.
>>
>> When you create a DatABEL object in R it is indeed basically a pointer
>> to the data in the backing file. The .fvi file contains index data which
>> is then used to quickly read the actual data from the .fvd file.
>>
>>>
>>> 1) How can one store and share databel objects? I.e. is it possible to
>>> store a databel object using save("objectname", file = "data.Rda")? On
>>> one system it works fine.
>>
>> So you say you basically create a DatABEL object using databel() and
>> then want to save that object. Interesting, I never tried that.
>>
>>> It seems to be transferable if one moves the
>>> Rda file together with the fvd and fvi files (and do not rename these).
>>
>> Yes, that's what I expect. Because of the large amount of data the
>> actual object (and therefore your .Rda file) will not be copied from the
>> .fv{i,d} files when creating an object. As you surmised, it's only a
>> pointer to the data (with some associated information like the buffer
>> size).
>>
>>> Couldn't one include this file in the Rda file and or allow to alter the
>>> path via
>>>
>>> backingfilename() <- "newpath/filename"
>>>
>>> 2) We are trying to use multicore aka mclapply techniques to speed up
>>> computations.
>>
>> If I understand it correctly, you would like to share a (saved) DatABEL
>> object among several processes where each process works on a subset of
>> the data in that object. Is that correct?
>>
>> My first reaction is to say that (imputed) genetic data is usually
>> already split into several hundred files (assuming 1kG imputed data), so
>> you could simply use those for data parallelism.
>> But I can see that parallel access to a subset of a DatABEL object has
>> its use.
>>
>>> However, this does not work as the forked processes seem
>>> to have lost the pointer to the databel file. Sequentially, i.e., using
>>> lapply, everything works fine. Do you have any experiences here?
>>
>> Unfortunately not.
>>
>>
>> Best regards,
>>
>> Lennart.
>>
>>
>>> Can you
>>> provide any help? If necessary, I can try to provide a minimal example
>>> that reproduces this problem/error.
>>>
>>> Best regards,
>>> Benjamin
>>
>
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands
lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20150508/89301c97/attachment.sig>
More information about the genabel-devel
mailing list