[Biomod-commits] Re : prevalence and pseudoabsences

Sat Apr 23 15:29:52 CEST 2011

Dear Brenna,

> Thanks Bruno & Wilfried,
> 
> So to clarify: I run pseudo.abs - in my case as so:
> 
> PA1 <- pseudo.abs(coor=Sp.Env[,2:3], status=Sp.Env[,1], strategy="random",
>                 env=Sp.Env[,4:10], nb.points=2736, species.name="Rhodiola",
>                 add.pres=F, create.dataset=T, plot=T, pcol="red", acol="grey80")
> 
> This creates two objects, "PA1" (a vector of cell numbers chosen as absences) and "Dataset.Rhodiola.random.partial", a dataframe of coordinates and "status" (zero).
> 
> I would then create a new dataset that has just my presence records (304) and these 2736 absences.  I would run that dataset (Sp.Env.PA1) in the Intial.State() and Models() functions, for example, as so:
> 
> Initial.State(Response=Sp.Env.PA1[,c(1)], Explanatory=Sp.Env.PA1[,4:10],
>                          IndependentResponse=NULL, IndependentExplanatory=NULL,
>                           sp.name="Rhodiola")
> 
> Models(GLM = T, TypeGLM = "simple", Test = "AIC", GBM = T, No.trees = 5000,
>               GAM = T, CTA = T, CV.tree = 100, ANN = T, CV.ann = 5, SRE = F, FDA = T, 
>               MARS = T, RF = T, NbRunEval = 10, DataSplit = 70, Yweights=NULL,
>               NbRepPA=0, Roc=T, Optimized.Threshold.Roc=T, Kappa=T, TSS=T,
>              KeepPredIndependent = F, VarImport=5)
> 
> I keep NbRepPA = 0 so it uses the entire dataset to evaluate the model, maintaining my prevalence at 0.1 (304 presence records/3040 total records in the dataset).
> I think I am correct on everything to this point?

Yes, you are correct. 

> So my question is: I want to do 5 PA pulls (as I would if I ran it in the Models() function, NbRepPA = 5), maintaining my 0.1 prevalence.  But I would then have run Models() five times on 5 datasets (each with different PA pulls).  How does BIOMOD create a final model when using PA pulls (e.g. NbRepPA = 5) within the Models() function, and can I replicate that when I run my PA pulls manually as above?

There is no final model when using several PA sets. There are as many "final models" as PA sets. 
If you want to use several sets of PA yourself, make predictions from every model (using the Projections function for instance on the overall area). Then you'll need to combine them yourself. 
There are several alternatives for combining projections from different models from different PA sets and from different repetitions from cross-validation:

Either you create a simple average and standard deviation from projections in probability values. You can then derive a confidence interval if you want.
You could also perform a weighted sum using weights derived from TSS or ROC for instance. It will give more weights to the best models (from the cross-validation column in Evaluation.results.TSS). 
You could also perform what we usually call a committee averaging where you let the models vote for a presence or an absence. For this, you do not use the probability of occurrence anymore, but rather the presence-absence data directly. You then sum the presence-absences maps. If you have 5 repetitions, 5 models and 5 sets of PA, you thus have at maximum 125. When the sum if equal to 125, it means all repetitions, PA and models agree to say this is a presence, and when you got zero, it means the reverse obviously. Between 0 and 125 will give you the probability of agreement from the models for an absence (after rescaling everything by 125 for instance). This ensemble approach is very close to the Bayesian philosophy with posterior probabilities. I really like this approach, much better than looking at probability of occurrences themselves. 

Now, I am not entirely sure why you want to keep your prevalence. Regression like models are not really good with artificial unbalanced dataset (prevalence different than zero). They are supposed to work well if the prevalence is the true prevalence of the species. This is the case with a perfect stratified sampling, but this is absolutely not when using random sets of pseudo-absence. 
Therefore, the results are usually anyway similar. The main difference being the "true" probability of the models which will be higher for the pseudo-absence are downweighted. however, when they are transformed between 0 and 1, results are usually very similar.
I think Witz and Guisan recently show that using weighted pseudo-absence was better. We also have a paper close to be accepted with Methods in Ecology and Evolution showing the same with virtual datasets. 

Hope it helps,

Wilfried

> 
> I hope this isn't too confusing! 
> Thank you!
> Brenna
> 
> 
> From: Bruno Lafourcade [brunolafourcade at aol.com]
> Sent: Thursday, April 21, 2011 11:37 PM
> To: wilfried.thuiller at ujf-grenoble.fr; Brenna Forester
> Cc: biomod-commits at r-forge.wu-wien.ac.at
> Subject: Re : [Biomod-commits] prevalence and pseudoabsences
> 
> 
> Hi Brenna, 
> 
> The pseudo-absence procedure within the Models function is automated and generates a
> weighting to give a prevalence of 0.5 for each run.
> 
> To make sure that the prevalence doesn't change, you have to build your own pseudo-absence
> data outside of the Models function (even prior to Initial.State). In that way, the Models function
> will not recognize your data as being pseudo.abs and will not weight them, just like for any 
> standard input data.
> 
> Use the pseudo.abs() function to this matter. Don't hesitate to ask for details on how to use it.
> 
> Best,
> Bruno 
> 
> 
> -------
> Bruno Lafourcade
> Statistical tools engineer
> 
> Laboratoire d'Ecologie Alpine, bureau 308
> CNRS - UMR 5553, 2233 rue de la piscine
> 38400 Saint Martin d'Hères
> -------
> 
> 
> -----E-mail d'origine-----
> De : Wilfried Thuiller <wilfried.thuiller at ujf-grenoble.fr>
> A : Brenna Forester <forestb at students.wwu.edu>
> Cc : biomod-commits at lists.r-forge.r-project.org <biomod-commits at r-forge.wu-wien.ac.at>
> Envoyé le : Vendredi, 22 Avril 2011 7:09
> Sujet : Re: [Biomod-commits] prevalence and pseudoabsences
> 
> Dear Brenna,
> 
> Yes and no... 
> 
> If you do not ask for pseudo-absence (NbPA=0), there is no weigthing and all your pseudo-absence will be used at once. Prevalence = 0.1
> If you add NbPA = 3040 (or more), yes, there is. The prevalence = 0.5
> 
> Does it help?
> Wilfried
> 
> 
> 
> Le 22 avr. 2011 à 00:53, Brenna Forester a écrit :
> 
>> Hello,
>> 
>> I see in the "Presentation Manual for BIOMOD" (page 18) the following statement: "In all procedures, BIOMOD ensures that the prevalence of the original data is conserved in the calibration and evaluation datasets."
>> 
>> I have 304 presence records and am running my pseudoabsence pulls with 3040 absences (a prevalence of 0.1).  The number of pixels in my study area is 6808.
>> 
>> From the above quote, I think that BIOMOD is maintaining the original prevalance of 0.1.  Is that correct?  I just want to be sure that there is no weighting of absence records (e.g. weighting to simulate a prevalence of 0.5).
>> 
>> Thank you,
>> Brenna
>> _______________________________________________
>> Biomod-commits mailing list
>> Biomod-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
> 
> --------------------------
> Dr. Wilfried Thuiller
> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
> Université Joseph Fourier
> BP53, 38041 Grenoble cedex 9, France
> tel: +33 (0)4 76 51 44 97
> fax: +33 (0)4 76 51 42 79
> 
> Email: wilfried.thuiller at ujf-grenoble.fr
> Personal website: http://www.will.chez-alice.fr
> Team website: http://www-leca.ujf-grenoble.fr/equipes/emabio.htm
> 
> FP6 European MACIS project: http://www.macis-project.net
> FP6 European EcoChange project: http://www.ecochange-project.eu
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> 
> Biomod-commits mailing list
> 
> 
> Biomod-commits at lists.r-forge.r-project.org
> 
> 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
> 
> 
> 

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Personal website: http://www.will.chez-alice.fr
Team website: http://www-leca.ujf-grenoble.fr/equipes/emabio.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20110423/58a5e713/attachment-0001.htm>