[Biomod-commits] predictive accuracy issue

Wilfried Thuiller wilfried.thuiller at ujf-grenoble.fr
Tue Jan 26 15:08:34 CET 2010

Dear Popko,

The predictive accuracy estimated during the Calibration phase is NOT done onto the calibration data (80% of the data in your case) but on the remaining part (20% in your case). The Cross-validation is thus the mean of the evaluations onto the 20% (in your case a mean on 3 repetitions x 2 pseudo-absence runs). This is thus not really surprising to see that the predictive accuracy estimated onto a completely independant dataset (indepdt.data) is higher than the one estimated onto 20% of the initial datasets). 
Therefore, you have showed what you called "lower" or "higher". TSS/Kappa or AUC are indices not statistics. They should be taken as a statistical tests. 

Finally, if I understood well, you calibrated your models using 70 different variables? What is the purpose of using so many variables? I suppose that many are correlated. GLM and especially stepwise regressions are not very good to deal with large number of correlated variables. GBM or randomForest are better techniques, robust to multi-colinearity and data hungry methods. Just an opinion... 

Hope it helps,

Le 26 janv. 2010 à 14:51, Popko a écrit :

> Dear colleagues,
> Looking at my Evaluation.results I noticed that predictive accuracy of 
> Calibration (called Cross.validation in output table) is lower than the 
> predictive accuracy using Evaluation (called "indepdt.data" in output 
> table). This is true for all 63 species I've analyzed, and independent of 
> method (ROC, kappa, TSS).
> Q: How is this possible?
> This is the model I ran:
> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F, 
> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F, 
> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T, 
> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T, 
> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2, 
> nb.absences=1000)
> Data consisted of presence/absence data with ca. 2500 cases per species. 
> Some 70 environmental variables were entered in the model.
> Eagerly awaiting your responses,
> Popko Wiersma
> SOVON Dutch Centre for Field Ornithology
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 63 54 53
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20100126/e2414cc0/attachment.htm 

More information about the Biomod-commits mailing list