[adegenet-forum] dataset too large? Follow-up

Mon Jul 4 18:33:05 CEST 2011

Dear Thomas,

The algorithm for translating your data into individual frequencies is not linear. RAM saturation is likely to cause supplementary delays in any case, but windows is good at having applications freezing/crashing in such cases ("R has stopped working...send a report") . How much memory do you have on your computer? In any case I would recommend running overnight to make sure it just doesn't take ages, but works.

We are looking at a big dataset, but it is merely 2-3 times bigger than eHGDP, which was not such a pain to obtain.

As for multicore, the package is not available for windows, unfortunately.

Importing your data from STRUCTURE won't help, it will actually be longer and more RAM-demanding.

On the bright side, once you'll have your data imported, analysis should be slightly less time-consuming.

Best

Thibaut

________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Thomas, Evert (Bioversity-Colombia) [E.Thomas at CGIAR.ORG]
Sent: 04 July 2011 16:18
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] dataset too large? Follow-up

Dear,

The problem does not seem to be related to my commands, since I do get results for subsets of my data (1000 individuals takes 40 seconds), but it does not seem to work for my entire dataset of >25000 individuals (should theoretically take 16.6 minutes, but after 4 hours still no result) … any suggestions?

many thanks in advance

evert
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at] On Behalf Of Thomas, Evert (Bioversity-Colombia)
Sent: Friday, July 01, 2011 1:56 PM
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] dataset too large?

Dear colleagues,

I am new to R so apologies for my ignorance, but I have a couple of questions:

I am trying to use adegenet (on a 64bit system, windows7) for analyzing a SSR dataset. It consists 96 loci and I have >25000 individuals (after resampling). I have loaded the database as a dataframe in R, but am not able to convert to genind format (PC physical memory becomes saturated, while only 10% of CPU is used) . Could this be related to the size of my dataset? Any suggestions?

On another note: Alternatively, I tried importing my data to genind object from the corresponding file in Structure format. However, my version of Structure (2.3.3.) does not seem to generate .stru or .str files, any solution there?

And a last point: I am unable to install/load the R application multicore because it is not among the packages list…

This is what I have done:

I did a read.csv with “header=T”, and then rownames<-cacaoCSV[,1]

The problems occurs with the following command
cacao<-df2genind(cacaoCSV, sep="/",ind.names=NULL, loc.names=NULL, pop=cacaoCSV[,2], missing=NA, ploidy=2, type="codom")

Many thanks in advance for any advice or suggestion you might have!

Enjoy the weekend
Evert Thomas, PhD
Associate Expert, Conservation and Use of
Forest Genetic Resources in Latin America

Bioversity International
Regional Office for the Americas
Recta Cali-Palmira Km 17 – CIAT
Cali, Colombia
P.O. Box 6713

Tel. 57 2 4450048 / 49 Ext 113
Fax 57 2 4450096
Email: e.thomas at cgiar.org<mailto:e.thomas at cgiar.org>
Skype: evertthomas
www.bioversityinternational.org<UrlBlockedError.aspx>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110704/dcf64ed1/attachment-0001.htm>