[GenABEL-dev] export.plink() uses too much memory in GenABEL v1.7-0 (beta)

Yurii Aulchenko yurii.aulchenko at gmail.com
Tue Jul 3 20:15:26 CEST 2012


Lennart,

I think the point is to say warning always - so people are aware that
they better break the operation and do tped.

I would say that rationally we should change defaults as you
suggested, but defaults feel somehow almost sacred to me :)

As no other suggestions are around I would say that either option is
good - so please do what you feel is better. I will be fine with
either option (except "as is" situation).

Yurii
----------------------
Yurii Aulchenko
Independent consultant
(sent from mobile device)

On 3 Jul 2012, at 20:33, "L.C. Karssen" <l.karssen at erasmusmc.nl> wrote:

> That is certainly an option, although than the question is when is a
> data set 'large'. Do we take the user's total RAM into account? Platform
> (on 32 bit windows any process can only use 2GB RAM)?
>
> Anyway, before implementing your suggestion I want to do some more
> checks. Recently we exported a not too large dataset (~750 people, ~100k
> SNPs) and that seemed to work ok, but checking the genotypes afterwards
> showed that many homozygous ones (but not all as far as I can tell) were
> incorrectly exported.
> This may simply the same bug, not leading to a crash this time because
> it was ran on a machine with 128GB RAM and finishing before completely
> filling that up. But I'd like to be sure.
>
>
>
> Lennart.
>
> On 07/03/2012 02:23 PM, Yurii Aulchenko wrote:
>> What about keeping default as before ("ped"), but introduce a message
>> when "ped" is used warning that with large datas sets it is strongly
>> recommended to use "tped" format.
>>
>> Yurii
>>
>> On Mon, Jul 2, 2012 at 7:59 PM, L.C. Karssen <lennart at karssen.org> wrote:
>>> Dear List,
>>>
>>> Sorry for digging deep into the past, but this issue of export.plink()
>>> still hasn't been resolved. After a recent question by e-mail I opened a
>>> bug report and started a forum thread on the subject:
>>> -
>>> https://r-forge.r-project.org/tracker/?func=detail&atid=2058&aid=2055&group_id=505
>>> - http://forum.genabel.org/viewtopic.php?f=6&t=652
>>>
>>> Since solving the issue costs more time than I can presently afford, I
>>> suggest to make export.plink(..., transpose=TRUE) the default (instead
>>> of transpose=FALSE).
>>> Moving to a new default will need to be communicated very clearly, but I
>>> think shipping a function that is broken by default is even worse.
>>>
>>> What are your opinions?
>>>
>>>
>>> Best,
>>>
>>> Lennart.
>>>
>>> On 12/07/2011 05:36 PM, L.C. Karssen wrote:
>>>> Dear Yurii,
>>>>
>>>> We just tried svn revision 827 and had the same problem. We killed the
>>>> program at 51% memory usage (32GB). So, unfortunately I think the
>>>> problem is not solved yet.
>>>>
>>>>
>>>> Lennart
>>>>
>>>>
>>>>
>>>> On 07-12-11 11:36, Yury Aulchenko wrote:
>>>>> Ok, should be fixed in r823 just committed.
>>>>>
>>>>> Let me know if the problem persist
>>>>>
>>>>> On Dec 7, 2011, at 11:13 AM, Yury Aulchenko wrote:
>>>>>
>>>>>> I think this is something I introduced in rev. 810-814 (a new without
>>>>>> delete). Now (hopefully) fixed, will commit changes in next hour.
>>>>>> -------------------------------------------------------
>>>>>> Yurii Aulchenko, PhD, Dr. Habil.
>>>>>> Independent researcher and consultant
>>>>>> yurii [dot] aulchenko [at] gmail [dot] com
>>>>>>
>>>>>> On Dec 7, 2011, at 11:03 AM, L.C. Karssen wrote:
>>>>>>
>>>>>>> Dear list,
>>>>>>>
>>>>>>> We just tried to convert a GenABEL object to plink format using
>>>>>>> export.plink() from GenABEL v 1.7-0 (still under development,
>>>>>>> package built from SVN yesterday), and it nearly brought the machine
>>>>>>> to a halt because it used all available memory (RAM + swap).
>>>>>>>
>>>>>>> Our GenABEL object contained almost 3700 individuals and about 700k
>>>>>>> SNPS.
>>>>>>>
>>>>>>> Have others experienced this as well? I haven't looked at Yurii's
>>>>>>> latest implementation of the function in C++ yet. Hopefully I will
>>>>>>> be able to find some time later today. Does anyone here know how we
>>>>>>> could limit memory usage in C++?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Lennart.
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>
> --
> -----------------------------------------------
> dr. L.C. Karssen
> Erasmus MC
> Department of Epidemiology
> Room Ee2224
>
> Postbus 2040
> 3000 CA Rotterdam
> The Netherlands
>
> phone: +31-10-7044217
> fax: +31-10-7044657
> email: l.karssen at erasmusmc.nl
> GPG key ID: 0E1D39E3
> -----------------------------------------------
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel


More information about the genabel-devel mailing list