[adegenet-forum] dataset too large? Follow-up

Thomas, Evert (Bioversity-Colombia) E.Thomas at CGIAR.ORG
Mon Jul 11 20:38:38 CEST 2011


Dear all,

 

My colleague Johannes (CC) possibly found a solution for the problem through the use of the "ldply"function from the "plyr" package. With the attached change in the script, I am able to read in my data in less than 10 minutes! 

 

Kind regards Evert

 

 

From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at] On Behalf Of Thomas, Evert (Bioversity-Colombia)
Sent: Thursday, July 07, 2011 1:22 PM
To: Jombart, Thibaut; valeria montano
Cc: adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] dataset too large? Follow-up

 

Hi,

 

splitting the data into parts of 1000 rows and converting them works fine. Merging them together works up to 8k - 10k rows, trying to merge blocks of that size causes the error... 

 

we keep on trying...

 

evert

 

From: Thomas, Evert (Bioversity-Colombia) 
Sent: Wednesday, July 06, 2011 1:17 PM
To: 'Jombart, Thibaut'; valeria montano
Cc: Sébastien Puechmaille; adegenet-forum at r-forge.wu-wien.ac.at
Subject: RE: [adegenet-forum] dataset too large? Follow-up

 

Dear all,

 

Thanks for the suggestions...running it on our powerfull office server does not seem the solution either, it was eating all the memory there as well...  and it does not seem to be related to my data either, because I am able to use GenAlEx in Excell to perform analyses in one go (although had some problems with alleles that were formatted as text instead of number)...We give it a last try with Sébastien's suggestion...

 

Thanks again Evert

 

 

From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] 
Sent: Wednesday, July 06, 2011 1:05 PM
To: valeria montano
Cc: Sébastien Puechmaille; Thomas, Evert (Bioversity-Colombia); adegenet-forum at r-forge.wu-wien.ac.at
Subject: RE: [adegenet-forum] dataset too large? Follow-up

 

 

Hello, 

for this to work, loci and alleles should be exactly the same for all genind objects, which is rarely the case, and won't likely be the case here.

I'm afraid a bigger computer is needed, waiting for possible optimization of the function.

Cheers

Thibaut

________________________________

From: valeria montano [mirainoshojo at gmail.com]
Sent: 06 July 2011 18:53
To: Jombart, Thibaut
Cc: Sébastien Puechmaille; Thomas, Evert (Bioversity-Colombia); adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] dataset too large? Follow-up

what about the rbind function? I saw it works for matrices and data-frames, might it be adapted to merge genind objects? ok, maybe not...

On 6 July 2011 19:25, Jombart, Thibaut <t.jombart at imperial.ac.uk> wrote:

Hello, 

I thought about it too initially, but unfortunately df2genind is called by repool, and I'm afraid this is where the function gets stuck...

May be worth a try, though.

Cheers

Thibaut

From:


Sébastien Puechmaille [s.puechmaille at gmail.com]

Sent: 06 July 2011 17:57 


To: Thomas, Evert (Bioversity-Colombia)

Cc: Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at 


Subject: Re: [adegenet-forum] dataset too large? Follow-up

 

Dear Thomas,

I'm not sure if that would work but it might be worth trying:
1- split your data set into many subsets (i.e. 25 subsets with 1,000 individuls each),
2- load them as 25 different genind objects,
3-merge the 25 genind objects into a single genind object to have the original data as a single genind object (function 'repool'; the markers have to be the same for all objects to be merged, but there is no constraint on alleles)

Cheers,

Sebastien.

*********************
Dr. Sébastien Puechmaille
Max Planck Institute for Ornithology
Sensory Ecology Group
Eberhard-Gwinner-Straße
Haus Nr. 11
82319 Seewiesen
Germany 

and

UCD School of Biological and Environmental Sciences
University College Dublin (Zoology)
UCD Science and Education Research Center (West)
Belfield
Dublin 4
Ireland

http://batlab.ucd.ie/~spuechmaille/ <http://batlab.ucd.ie/%7Espuechmaille/> 
http://www.ucd.ie/research/people/biologyenvscience/drsebastienpuechmaille/home/
*********************

On 6 July 2011 13:44, Thomas, Evert (Bioversity-Colombia) <E.Thomas at cgiar.org> wrote:

Dear Thibaut,

 

Thanks for this. I have tried running several times overnight now but each time get the message:

 

 

I am running windows7 on a 64bit system with 4x 2.4GHz and 4Gb RAM, so I don't think the problem is related to my PC?

Many thanks for any suggestions you might have...

 

Cheers Evert

 

(PS when reading in my CSV is use "stringsAsFactor=F", so that my marker data is read in as characters -could that be the problem?)

From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] 
Sent: Monday, July 04, 2011 11:33 AM
To: Thomas, Evert (Bioversity-Colombia); adegenet-forum at r-forge.wu-wien.ac.at
Subject: RE: [adegenet-forum] dataset too large? Follow-up

 

Dear Thomas, 

The algorithm for translating your data into individual frequencies is not linear. RAM saturation is likely to cause supplementary delays in any case, but windows is good at having applications freezing/crashing in such cases ("R has stopped working...send a report") . How much memory do you have on your computer? In any case I would recommend running overnight to make sure it just doesn't take ages, but works.

We are looking at a big dataset, but it is merely 2-3 times bigger than eHGDP, which was not such a pain to obtain.

As for multicore, the package is not available for windows, unfortunately. 

Importing your data from STRUCTURE won't help, it will actually be longer and more RAM-demanding.

On the bright side, once you'll have your data imported, analysis should be slightly less time-consuming.

Best

Thibaut

 

________________________________

From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Thomas, Evert (Bioversity-Colombia) [E.Thomas at CGIAR.ORG]
Sent: 04 July 2011 16:18
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] dataset too large? Follow-up

Dear,

 

The problem does not seem to be related to my commands, since I do get results for subsets of my data (1000 individuals takes 40 seconds), but it does not seem to work for my entire dataset of >25000 individuals (should theoretically take 16.6 minutes, but after 4 hours still no result) ... any suggestions?  


many thanks in advance

 

evert

From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at] On Behalf Of Thomas, Evert (Bioversity-Colombia)
Sent: Friday, July 01, 2011 1:56 PM
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] dataset too large?

 

Dear colleagues,

 

I am new to R so apologies for my ignorance, but I have a couple of questions: 

 

I am trying to use adegenet (on a 64bit system, windows7) for analyzing a SSR dataset. It consists 96 loci and I have >25000 individuals (after resampling). I have loaded the database as a dataframe in R, but am not able to convert to genind format (PC physical memory becomes saturated, while only 10% of CPU is used) . Could this be related to the size of my dataset? Any suggestions?

 

On another note: Alternatively, I tried importing my data to genind object from the corresponding file in Structure format. However, my version of Structure (2.3.3.) does not seem to generate .stru or .str files, any solution there?

 

And a last point: I am unable to install/load the R application multicore because it is not among the packages list...

 

This is what I have done:

 

I did a read.csv with "header=T", and then rownames<-cacaoCSV[,1]

 

The problems occurs with the following command

cacao<-df2genind(cacaoCSV, sep="/",ind.names=NULL, loc.names=NULL, pop=cacaoCSV[,2], missing=NA, ploidy=2, type="codom")

 

 

Many thanks in advance for any advice or suggestion you might have!

 

Enjoy the weekend

Evert Thomas, PhD

Associate Expert, Conservation and Use of 

Forest Genetic Resources in Latin America

 

Bioversity International

Regional Office for the Americas

Recta Cali-Palmira Km 17 - CIAT

Cali, Colombia

P.O. Box 6713

 

Tel. 57 2 4450048 / 49 Ext 113

Fax 57 2 4450096

Email: e.thomas at cgiar.org

Skype: evertthomas

www.bioversityinternational.org <http://UrlBlockedError.aspx> 

 

 


_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

 


_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110711/4c70f804/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 21954 bytes
Desc: image001.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110711/4c70f804/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: df2genind2.R
Type: application/octet-stream
Size: 6358 bytes
Desc: df2genind2.R
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110711/4c70f804/attachment-0001.obj>


More information about the adegenet-forum mailing list