[adegenet-forum] dataset too large? Follow-up

valeria montano mirainoshojo at gmail.com
Wed Jul 6 19:53:49 CEST 2011


what about the rbind function? I saw it works for matrices and data-frames,
might it be adapted to merge genind objects? ok, maybe not...

On 6 July 2011 19:25, Jombart, Thibaut <t.jombart at imperial.ac.uk> wrote:

>  Hello,
>
> I thought about it too initially, but unfortunately df2genind is called by
> repool, and I'm afraid this is where the function gets stuck...
>
> May be worth a try, though.
>
> Cheers
>
> Thibaut
>
> *From:*
>
>
> Sébastien Puechmaille [s.puechmaille at gmail.com]
>  *Sent:* 06 July 2011 17:57
>
> *To:* Thomas, Evert (Bioversity-Colombia)
> *Cc:* Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at
>
> *Subject:* Re: [adegenet-forum] dataset too large? Follow-up
>
>  Dear Thomas,
>
> I'm not sure if that would work but it might be worth trying:
> 1- split your data set into many subsets (i.e. 25 subsets with 1,000
> individuls each),
> 2- load them as 25 different genind objects,
> 3-merge the 25 genind objects into a single genind object to have the
> original data as a single genind object (function 'repool'; the markers have
> to be the same for all objects to be merged, but there is no constraint on
> alleles)
>
> Cheers,
>
> Sebastien.
>
> *********************
> Dr. Sébastien Puechmaille
> Max Planck Institute for Ornithology
> Sensory Ecology Group
> Eberhard-Gwinner-Straße
> Haus Nr. 11
> 82319 Seewiesen
> Germany
>
> and
>
> UCD School of Biological and Environmental Sciences
> University College Dublin (Zoology)
> UCD Science and Education Research Center (West)
> Belfield
> Dublin 4
> Ireland
>
> http://batlab.ucd.ie/~spuechmaille/
>
> http://www.ucd.ie/research/people/biologyenvscience/drsebastienpuechmaille/home/
> *********************
>
> On 6 July 2011 13:44, Thomas, Evert (Bioversity-Colombia) <
> E.Thomas at cgiar.org> wrote:
>
>>  Dear Thibaut,****
>>
>> ** **
>>
>> Thanks for this. I have tried running several times overnight now but each
>> time get the message:****
>>
>> ****
>>
>> ** **
>>
>> I am running windows7 on a 64bit system with 4x 2.4GHz and 4Gb RAM, so I
>> don’t think the problem is related to my PC?****
>>
>> Many thanks for any suggestions you might have…****
>>
>> ** **
>>
>> Cheers Evert****
>>
>> ** **
>>
>> (PS when reading in my CSV is use “stringsAsFactor=F”, so that my marker
>> data is read in as characters –could that be the problem?)****
>>
>> *From:* Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
>> *Sent:* Monday, July 04, 2011 11:33 AM
>> *To:* Thomas, Evert (Bioversity-Colombia);
>> adegenet-forum at r-forge.wu-wien.ac.at
>> *Subject:* RE: [adegenet-forum] dataset too large? Follow-up****
>>
>> ** **
>>
>> Dear Thomas,
>>
>> The algorithm for translating your data into individual frequencies is not
>> linear. RAM saturation is likely to cause supplementary delays in any case,
>> but windows is good at having applications freezing/crashing in such cases
>> ("R has stopped working...send a report") . How much memory do you have on
>> your computer? In any case I would recommend running overnight to make sure
>> it just doesn't take ages, but works.
>>
>> We are looking at a big dataset, but it is merely 2-3 times bigger than
>> eHGDP, which was not such a pain to obtain.
>>
>> As for multicore, the package is not available for windows, unfortunately.
>>
>>
>> Importing your data from STRUCTURE won't help, it will actually be longer
>> and more RAM-demanding.
>>
>> On the bright side, once you'll have your data imported, analysis should
>> be slightly less time-consuming.
>>
>> Best
>>
>> Thibaut****
>>
>> ** **
>>   ------------------------------
>>
>> *From:* adegenet-forum-bounces at r-forge.wu-wien.ac.at [
>> adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Thomas, Evert
>> (Bioversity-Colombia) [E.Thomas at CGIAR.ORG]
>> *Sent:* 04 July 2011 16:18
>> *To:* adegenet-forum at r-forge.wu-wien.ac.at
>> *Subject:* Re: [adegenet-forum] dataset too large? Follow-up****
>>
>> Dear,****
>>
>>  ****
>>
>> The problem does not seem to be related to my commands, since I do get
>> results for subsets of my data (1000 individuals takes 40 seconds), but it
>> does not seem to work for my entire dataset of >25000 individuals (should
>> theoretically take 16.6 minutes, but after 4 hours still no result) … any
>> suggestions?  ****
>>
>>
>> many thanks in advance****
>>
>>  ****
>>
>> evert****
>>
>> *From:* adegenet-forum-bounces at r-forge.wu-wien.ac.at [mailto:
>> adegenet-forum-bounces at r-forge.wu-wien.ac.at] *On Behalf Of *Thomas,
>> Evert (Bioversity-Colombia)
>> *Sent:* Friday, July 01, 2011 1:56 PM
>> *To:* adegenet-forum at r-forge.wu-wien.ac.at
>> *Subject:* [adegenet-forum] dataset too large?****
>>
>>  ****
>>
>> Dear colleagues,****
>>
>>  ****
>>
>> I am new to R so apologies for my ignorance, but I have a couple of
>> questions: ****
>>
>>  ****
>>
>> I am trying to use adegenet (on a 64bit system, windows7) for analyzing a
>> SSR dataset. It consists 96 loci and I have >25000 individuals (after
>> resampling). I have loaded the database as a dataframe in R, but am not able
>> to convert to genind format (PC physical memory becomes saturated, while
>> only 10% of CPU is used) . Could this be related to the size of my dataset?
>> Any suggestions?****
>>
>>  ****
>>
>> On another note: Alternatively, I tried importing my data to genind object
>> from the corresponding file in Structure format. However, my version of
>> Structure (2.3.3.) does not seem to generate .stru or .str files, any
>> solution there?****
>>
>>  ****
>>
>> And a last point: I am unable to install/load the R application multicore
>> because it is not among the packages list…****
>>
>>  ****
>>
>> This is what I have done:****
>>
>>  ****
>>
>> I did a read.csv with “header=T”, and then rownames<-cacaoCSV[,1]****
>>
>>  ****
>>
>> The problems occurs with the following command****
>>
>> cacao<-df2genind(cacaoCSV, sep="/",ind.names=NULL, loc.names=NULL,
>> pop=cacaoCSV[,2], missing=NA, ploidy=2, type="codom")****
>>
>>  ****
>>
>>  ****
>>
>> Many thanks in advance for any advice or suggestion you might have!****
>>
>>  ****
>>
>> Enjoy the weekend****
>>
>> Evert Thomas,* PhD*****
>>
>> *Associate Expert, Conservation and Use of *****
>>
>> *Forest Genetic Resources in Latin America*****
>>
>>  ****
>>
>> *Bioversity International*****
>>
>> Regional Office for the Americas****
>>
>> Recta Cali-Palmira Km 17 – CIAT****
>>
>> Cali, Colombia****
>>
>> P.O. Box 6713****
>>
>>  ****
>>
>> *Tel*. 57 2 4450048 / 49 Ext 113****
>>
>> *Fax* 57 2 4450096****
>>
>> *Email*: e.thomas at cgiar.org****
>>
>> *Skype*: evertthomas****
>>
>> *www.bioversityinternational.org <http://UrlBlockedError.aspx>*****
>>
>>  ****
>>
>>  ****
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>
>>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110706/77af6ffe/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 21954 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110706/77af6ffe/attachment-0001.png>


More information about the adegenet-forum mailing list