[adegenet-forum] dataset too large? Follow-up

Sébastien Puechmaille s.puechmaille at gmail.com
Wed Jul 6 18:57:42 CEST 2011


Dear Thomas,

I'm not sure if that would work but it might be worth trying:
1- split your data set into many subsets (i.e. 25 subsets with 1,000
individuls each),
2- load them as 25 different genind objects,
3-merge the 25 genind objects into a single genind object to have the
original data as a single genind object (function 'repool'; the markers have
to be the same for all objects to be merged, but there is no constraint on
alleles)

Cheers,

Sebastien.

*********************
Dr. Sébastien Puechmaille
Max Planck Institute for Ornithology
Sensory Ecology Group
Eberhard-Gwinner-Straße
Haus Nr. 11
82319 Seewiesen
Germany

and

UCD School of Biological and Environmental Sciences
University College Dublin (Zoology)
UCD Science and Education Research Center (West)
Belfield
Dublin 4
Ireland

http://batlab.ucd.ie/~spuechmaille/
http://www.ucd.ie/research/people/biologyenvscience/drsebastienpuechmaille/home/
*********************

On 6 July 2011 13:44, Thomas, Evert (Bioversity-Colombia) <
E.Thomas at cgiar.org> wrote:

> Dear Thibaut,****
>
> ** **
>
> Thanks for this. I have tried running several times overnight now but each
> time get the message:****
>
> ****
>
> ** **
>
> I am running windows7 on a 64bit system with 4x 2.4GHz and 4Gb RAM, so I
> don’t think the problem is related to my PC?****
>
> Many thanks for any suggestions you might have…****
>
> ** **
>
> Cheers Evert****
>
> ** **
>
> (PS when reading in my CSV is use “stringsAsFactor=F”, so that my marker
> data is read in as characters –could that be the problem?)****
>
> *From:* Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
> *Sent:* Monday, July 04, 2011 11:33 AM
> *To:* Thomas, Evert (Bioversity-Colombia);
> adegenet-forum at r-forge.wu-wien.ac.at
> *Subject:* RE: [adegenet-forum] dataset too large? Follow-up****
>
> ** **
>
> Dear Thomas,
>
> The algorithm for translating your data into individual frequencies is not
> linear. RAM saturation is likely to cause supplementary delays in any case,
> but windows is good at having applications freezing/crashing in such cases
> ("R has stopped working...send a report") . How much memory do you have on
> your computer? In any case I would recommend running overnight to make sure
> it just doesn't take ages, but works.
>
> We are looking at a big dataset, but it is merely 2-3 times bigger than
> eHGDP, which was not such a pain to obtain.
>
> As for multicore, the package is not available for windows, unfortunately.
>
> Importing your data from STRUCTURE won't help, it will actually be longer
> and more RAM-demanding.
>
> On the bright side, once you'll have your data imported, analysis should be
> slightly less time-consuming.
>
> Best
>
> Thibaut****
>
> ** **
> ------------------------------
>
> *From:* adegenet-forum-bounces at r-forge.wu-wien.ac.at [
> adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Thomas, Evert
> (Bioversity-Colombia) [E.Thomas at CGIAR.ORG]
> *Sent:* 04 July 2011 16:18
> *To:* adegenet-forum at r-forge.wu-wien.ac.at
> *Subject:* Re: [adegenet-forum] dataset too large? Follow-up****
>
> Dear,****
>
>  ****
>
> The problem does not seem to be related to my commands, since I do get
> results for subsets of my data (1000 individuals takes 40 seconds), but it
> does not seem to work for my entire dataset of >25000 individuals (should
> theoretically take 16.6 minutes, but after 4 hours still no result) … any
> suggestions?  ****
>
>
> many thanks in advance****
>
>  ****
>
> evert****
>
> *From:* adegenet-forum-bounces at r-forge.wu-wien.ac.at [mailto:
> adegenet-forum-bounces at r-forge.wu-wien.ac.at] *On Behalf Of *Thomas, Evert
> (Bioversity-Colombia)
> *Sent:* Friday, July 01, 2011 1:56 PM
> *To:* adegenet-forum at r-forge.wu-wien.ac.at
> *Subject:* [adegenet-forum] dataset too large?****
>
>  ****
>
> Dear colleagues,****
>
>  ****
>
> I am new to R so apologies for my ignorance, but I have a couple of
> questions: ****
>
>  ****
>
> I am trying to use adegenet (on a 64bit system, windows7) for analyzing a
> SSR dataset. It consists 96 loci and I have >25000 individuals (after
> resampling). I have loaded the database as a dataframe in R, but am not able
> to convert to genind format (PC physical memory becomes saturated, while
> only 10% of CPU is used) . Could this be related to the size of my dataset?
> Any suggestions?****
>
>  ****
>
> On another note: Alternatively, I tried importing my data to genind object
> from the corresponding file in Structure format. However, my version of
> Structure (2.3.3.) does not seem to generate .stru or .str files, any
> solution there?****
>
>  ****
>
> And a last point: I am unable to install/load the R application multicore
> because it is not among the packages list…****
>
>  ****
>
> This is what I have done:****
>
>  ****
>
> I did a read.csv with “header=T”, and then rownames<-cacaoCSV[,1]****
>
>  ****
>
> The problems occurs with the following command****
>
> cacao<-df2genind(cacaoCSV, sep="/",ind.names=NULL, loc.names=NULL,
> pop=cacaoCSV[,2], missing=NA, ploidy=2, type="codom")****
>
>  ****
>
>  ****
>
> Many thanks in advance for any advice or suggestion you might have!****
>
>  ****
>
> Enjoy the weekend****
>
> Evert Thomas,* PhD*****
>
> *Associate Expert, Conservation and Use of *****
>
> *Forest Genetic Resources in Latin America*****
>
>  ****
>
> *Bioversity International*****
>
> Regional Office for the Americas****
>
> Recta Cali-Palmira Km 17 – CIAT****
>
> Cali, Colombia****
>
> P.O. Box 6713****
>
>  ****
>
> *Tel*. 57 2 4450048 / 49 Ext 113****
>
> *Fax* 57 2 4450096****
>
> *Email*: e.thomas at cgiar.org****
>
> *Skype*: evertthomas****
>
> *www.bioversityinternational.org <http://UrlBlockedError.aspx>*****
>
>  ****
>
>  ****
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110706/553d6325/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 21954 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110706/553d6325/attachment-0001.png>


More information about the adegenet-forum mailing list