[Biomod-commits] memory error- again...
Wilfried Thuiller
wilfried.thuiller at ujf-grenoble.fr
Mon Jul 5 17:33:32 CEST 2010
Dear Pep,
> In that huge DF many species are recorded, so I though creating a litle loop so I could go species by species and therefore would not load a huge dataset (Question 1- Is that making any sense?). In each species model, presences and absences are equaled, so we keep prevalence constant, and the strategy of PseudoA selection is SRE. (see script below)
It does make sense to run BIOMOD species by species. This what I am usually doing when using grid systems.
> For the first species (only 4500 presences) R retrieves this memory error when computing RF, which I cannot understand for a DF of 9000 rows (P+A), and I even the number of repetitions is set to 1.
>
> Secondly, I checked that BIOMOD creates an object: Biomod.PA.data with 40000 records aprox. I assume that these are the presences and the ALL absences selected outside the SurfaceRange (strategy selected), and later on BIOMOD will select the number of absences indicated (=Number of Presences in this case). Question 2- Is this correct or is it loading ALL absences outside the range? (maybe that's why it retrieves an error.)
However, it will much more powerful to also create a workspace and data folder to store the results for each species.
From what I understand from your code, everything will be re-written at each iteration.
> Question 3- Any suggestion?
Yes indeed.
First of all, I would rather use random pseudo-absence instead of SRE and select a large number of pseudo-absence. BIOMOD will automatically make sure that the weighted sum of 1 equals the weighted sum of 0 (weighted prevalence = 0.5). We are preparing a MS with virtual data to show that this is a much better solution than using SRE or anything else.
In the case where most of your species have less than 5000 occurrences, selected 20000 absences and it should work rather well.
>
> # i corresponds to the column where the species Presence or Absence is located
>
> for (i in "spColumID") {
>
> Initial.State (Response=Sp.Env[,i], Explanatory=Sp.Env[,c(250,248,251,199,164,181)],IndependentResponse=NULL, IndependentExplanatory=NULL,sp.name=names[i])
>
> Models(GLM = T, TypeGLM = "quad", Test = "AIC", GBM = T, No.trees = 3000, GAM = T, Spline=2, CTA = T, CV.tree = 50, ANN = T, CV.ann = 2, SRE = F, FDA = T, MARS = T, RF = T, NbRunEval = 1, DataSplit = 70, Yweights=NULL, Roc=T, Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent = F, VarImport=5, NbRepPA=1, strategy="sre", coor=Coor, distance=2, nb.absences=sum(Sp.Env[,i]))
>
> }
Depending on which OS you are on, there are different solutions. With MacOS or Unic, you could use R in batch mode.
On Windows, what you wrote is fine excepted that you re-write everything on the same .RData which is not what you want to do (it explains the confusion with the number of records).
First of all, you should create a folder for each species in the loop, then run BIOMOD in this folder, and then once Models is finished, removed everything from the workspce to make sure you are using some data from the previous run.
It could be simply something along these lines, probably to adapt a bit with for your data and OS.
setwd("YOUR FOLDER WHERE YOU WANT TO STORE THE RESULS")
path = getwd()
for (i in "spColumID") {
#Create a folder you store the results for each species separately.
dir.create(paste(path, "/", names[i], sep=""))
#Set the workspace in this folder
setwd(paste(path, "/", names[i], sep=""))
Initial.State (Response=Sp.Env[,i], Explanatory=Sp.Env[,c(250,248,251,199,164,181)],IndependentResponse=NULL, IndependentExplanatory=NULL,sp.name=names[i])
Models(GLM = T, TypeGLM = "quad", Test = "AIC", GBM = T, No.trees = 3000, GAM = T, Spline=2, CTA = T, CV.tree = 50, ANN = T, CV.ann = 2, SRE = F, FDA = T, MARS = T, RF = T, NbRunEval = 1, DataSplit = 70, Yweights=NULL, Roc=T, Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent = F, VarImport=5, NbRepPA=1, strategy="sre", coor=Coor, distance=2, nb.absences=sum(Sp.Env[,i]))
#keep the files you want to keep in a vector before deleting everything
t= c("path", "Sp.Env", "names")
Rem = ls()
#Delete everything but the object in t.
rm(list=(Rem[is.na(match(Rem, t))]))
}
Hope it helps,
Best
Wilfried
>
>
> Thank you very much in advance for reading such a long post with 3 Questions!
>
> Pep
>
>
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79
Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20100705/cec1a13b/attachment-0001.htm>
More information about the Biomod-commits
mailing list