[Biomod-commits] predictive accuracy issue

Popko popkowiersma at hotmail.com
Tue Jan 26 15:57:15 CET 2010


In reply to your (swift) response: I guess I'm still missing something, but 
I did not have a 'completely independent dataset' as you assumed, but split 
my dataset (80/20). I assumed that 20% of the data was used as if it were 
independent data. This should then have resulted in on average equal results 
in the first and second column, cross.validation and indepdt.data, resp. In 
fact the validation results of the independent data are always higher (by 7% 
on average).

FYI:
Initial.State(Response = data_1spec[2], Explanatory = 
data_1spec[,5:lastcol2], IndependentResponse = data_1spec[2], 
IndependentExplanatory = data_1spec[,5:lastcol2], sp.name=i)

Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
nb.absences=1000)

Popko Wiersma
SOVON Dutch Centre for Field Ornithology

--------------------------------------------------
From: "Wilfried Thuiller" <wilfried.thuiller at ujf-grenoble.fr>
Sent: Tuesday, January 26, 2010 3:08 PM
To: "Popko" <popkowiersma at hotmail.com>
Cc: <biomod-commits at r-forge.wu-wien.ac.at>
Subject: Re: [Biomod-commits] predictive accuracy issue

> Dear Popko,
>
> The predictive accuracy estimated during the Calibration phase is NOT done 
> onto the calibration data (80% of the data in your case) but on the 
> remaining part (20% in your case). The Cross-validation is thus the mean 
> of the evaluations onto the 20% (in your case a mean on 3 repetitions x 2 
> pseudo-absence runs). This is thus not really surprising to see that the 
> predictive accuracy estimated onto a completely independant dataset 
> (indepdt.data) is higher than the one estimated onto 20% of the initial 
> datasets).
> Therefore, you have showed what you called "lower" or "higher". TSS/Kappa 
> or AUC are indices not statistics. They should be taken as a statistical 
> tests.
>
> Finally, if I understood well, you calibrated your models using 70 
> different variables? What is the purpose of using so many variables? I 
> suppose that many are correlated. GLM and especially stepwise regressions 
> are not very good to deal with large number of correlated variables. GBM 
> or randomForest are better techniques, robust to multi-colinearity and 
> data hungry methods. Just an opinion...
>
> Hope it helps,
> Wilfried
>
>
>
> Le 26 janv. 2010 à 14:51, Popko a écrit :
>
>> Dear colleagues,
>>
>> Looking at my Evaluation.results I noticed that predictive accuracy of
>> Calibration (called Cross.validation in output table) is lower than the
>> predictive accuracy using Evaluation (called "indepdt.data" in output
>> table). This is true for all 63 species I've analyzed, and independent of
>> method (ROC, kappa, TSS).
>> Q: How is this possible?
>>
>> This is the model I ran:
>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>> nb.absences=1000)
>>
>> Data consisted of presence/absence data with ca. 2500 cases per species.
>> Some 70 environmental variables were entered in the model.
>>
>> Eagerly awaiting your responses,
>>
>> Popko Wiersma
>> SOVON Dutch Centre for Field Ornithology
>>
>>
>> _______________________________________________
>> Biomod-commits mailing list
>> Biomod-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>
> --------------------------
> Dr. Wilfried Thuiller
> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
> Université Joseph Fourier
> BP53, 38041 Grenoble cedex 9, France
> tel: +33 (0)4 76 63 54 53
> fax: +33 (0)4 76 51 42 79
>
> Email: wilfried.thuiller at ujf-grenoble.fr
> Home page: http://www.will.chez-alice.fr
> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
>
> FP6 European MACIS project: http://www.macis-project.net
> FP6 European EcoChange project: http://www.ecochange-project.eu
>
>
>
>
> 


More information about the Biomod-commits mailing list