[Biomod-commits] predictive accuracy issue

Wilfried Thuiller wilfried.thuiller at ujf-grenoble.fr
Tue Jan 26 16:34:59 CET 2010


First of all you should read the BIOMOD manual. It is not entirely up-to-date but it should give you the first principles.

Just type the following in the R console

> Biomod.Manuals()

Secondly, BIOMOD is going to do the splitting procedure for you. You do not have to do it yourself. 
When you have "NbRunEval=3", this means you repeat the splitting procedure three times. Every-time the data are split, the models are calibrated on 80% and evaluated on 20%. Then, when it is done three times, only the average evaluation is recorded (in the Cross-Validation column). 

It is only when you have TRULY independent data that you should use the IndependantResponse call.

SO, what you should do: 

> Initial.State(Response = data_1spec[2], Explanatory = data_1spec[,5:lastcol2], sp.name=i)
> 
> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
> nb.absences=1000)


Best
Wilfried





Le 26 janv. 2010 à 15:57, Popko a écrit :

> In reply to your (swift) response: I guess I'm still missing something, but 
> I did not have a 'completely independent dataset' as you assumed, but split 
> my dataset (80/20). I assumed that 20% of the data was used as if it were 
> independent data. This should then have resulted in on average equal results 
> in the first and second column, cross.validation and indepdt.data, resp. In 
> fact the validation results of the independent data are always higher (by 7% 
> on average).
> 
> FYI:
> Initial.State(Response = data_1spec[2], Explanatory = 
> data_1spec[,5:lastcol2], IndependentResponse = data_1spec[2], 
> IndependentExplanatory = data_1spec[,5:lastcol2], sp.name=i)
> 
> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
> nb.absences=1000)
> 
> Popko Wiersma
> SOVON Dutch Centre for Field Ornithology
> 
> --------------------------------------------------
> From: "Wilfried Thuiller" <wilfried.thuiller at ujf-grenoble.fr>
> Sent: Tuesday, January 26, 2010 3:08 PM
> To: "Popko" <popkowiersma at hotmail.com>
> Cc: <biomod-commits at r-forge.wu-wien.ac.at>
> Subject: Re: [Biomod-commits] predictive accuracy issue
> 
>> Dear Popko,
>> 
>> The predictive accuracy estimated during the Calibration phase is NOT done 
>> onto the calibration data (80% of the data in your case) but on the 
>> remaining part (20% in your case). The Cross-validation is thus the mean 
>> of the evaluations onto the 20% (in your case a mean on 3 repetitions x 2 
>> pseudo-absence runs). This is thus not really surprising to see that the 
>> predictive accuracy estimated onto a completely independant dataset 
>> (indepdt.data) is higher than the one estimated onto 20% of the initial 
>> datasets).
>> Therefore, you have showed what you called "lower" or "higher". TSS/Kappa 
>> or AUC are indices not statistics. They should be taken as a statistical 
>> tests.
>> 
>> Finally, if I understood well, you calibrated your models using 70 
>> different variables? What is the purpose of using so many variables? I 
>> suppose that many are correlated. GLM and especially stepwise regressions 
>> are not very good to deal with large number of correlated variables. GBM 
>> or randomForest are better techniques, robust to multi-colinearity and 
>> data hungry methods. Just an opinion...
>> 
>> Hope it helps,
>> Wilfried
>> 
>> 
>> 
>> Le 26 janv. 2010 à 14:51, Popko a écrit :
>> 
>>> Dear colleagues,
>>> 
>>> Looking at my Evaluation.results I noticed that predictive accuracy of
>>> Calibration (called Cross.validation in output table) is lower than the
>>> predictive accuracy using Evaluation (called "indepdt.data" in output
>>> table). This is true for all 63 species I've analyzed, and independent of
>>> method (ROC, kappa, TSS).
>>> Q: How is this possible?
>>> 
>>> This is the model I ran:
>>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>>> nb.absences=1000)
>>> 
>>> Data consisted of presence/absence data with ca. 2500 cases per species.
>>> Some 70 environmental variables were entered in the model.
>>> 
>>> Eagerly awaiting your responses,
>>> 
>>> Popko Wiersma
>>> SOVON Dutch Centre for Field Ornithology
>>> 
>>> 
>>> _______________________________________________
>>> Biomod-commits mailing list
>>> Biomod-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>> 
>> --------------------------
>> Dr. Wilfried Thuiller
>> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
>> Université Joseph Fourier
>> BP53, 38041 Grenoble cedex 9, France
>> tel: +33 (0)4 76 63 54 53
>> fax: +33 (0)4 76 51 42 79
>> 
>> Email: wilfried.thuiller at ujf-grenoble.fr
>> Home page: http://www.will.chez-alice.fr
>> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
>> 
>> FP6 European MACIS project: http://www.macis-project.net
>> FP6 European EcoChange project: http://www.ecochange-project.eu
>> 
>> 
>> 
>> 
>> 
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 63 54 53
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20100126/bf28b31f/attachment-0001.htm 


More information about the Biomod-commits mailing list