[Biomod-commits] predictive accuracy issue

Tue Jan 26 16:36:49 CET 2010

oops...

Biomod.Manual()

without the s.... there is obviously only one manual ;-)

Le 26 janv. 2010 à 16:34, Wilfried Thuiller a écrit :

> First of all you should read the BIOMOD manual. It is not entirely up-to-date but it should give you the first principles.
> 
> Just type the following in the R console
> 
> > Biomod.Manuals()
> 
> Secondly, BIOMOD is going to do the splitting procedure for you. You do not have to do it yourself. 
> When you have "NbRunEval=3", this means you repeat the splitting procedure three times. Every-time the data are split, the models are calibrated on 80% and evaluated on 20%. Then, when it is done three times, only the average evaluation is recorded (in the Cross-Validation column). 
> 
> It is only when you have TRULY independent data that you should use the IndependantResponse call.
> 
> SO, what you should do: 
> 
>> Initial.State(Response = data_1spec[2], Explanatory = data_1spec[,5:lastcol2], sp.name=i)
>> 
>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>> nb.absences=1000)
> 
> 
> Best
> Wilfried
> 
> 
> 
> 
> 
> Le 26 janv. 2010 à 15:57, Popko a écrit :
> 
>> In reply to your (swift) response: I guess I'm still missing something, but 
>> I did not have a 'completely independent dataset' as you assumed, but split 
>> my dataset (80/20). I assumed that 20% of the data was used as if it were 
>> independent data. This should then have resulted in on average equal results 
>> in the first and second column, cross.validation and indepdt.data, resp. In 
>> fact the validation results of the independent data are always higher (by 7% 
>> on average).
>> 
>> FYI:
>> Initial.State(Response = data_1spec[2], Explanatory = 
>> data_1spec[,5:lastcol2], IndependentResponse = data_1spec[2], 
>> IndependentExplanatory = data_1spec[,5:lastcol2], sp.name=i)
>> 
>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>> nb.absences=1000)
>> 
>> Popko Wiersma
>> SOVON Dutch Centre for Field Ornithology
>> 
>> --------------------------------------------------
>> From: "Wilfried Thuiller" <wilfried.thuiller at ujf-grenoble.fr>
>> Sent: Tuesday, January 26, 2010 3:08 PM
>> To: "Popko" <popkowiersma at hotmail.com>
>> Cc: <biomod-commits at r-forge.wu-wien.ac.at>
>> Subject: Re: [Biomod-commits] predictive accuracy issue
>> 
>>> Dear Popko,
>>> 
>>> The predictive accuracy estimated during the Calibration phase is NOT done 
>>> onto the calibration data (80% of the data in your case) but on the 
>>> remaining part (20% in your case). The Cross-validation is thus the mean 
>>> of the evaluations onto the 20% (in your case a mean on 3 repetitions x 2 
>>> pseudo-absence runs). This is thus not really surprising to see that the 
>>> predictive accuracy estimated onto a completely independant dataset 
>>> (indepdt.data) is higher than the one estimated onto 20% of the initial 
>>> datasets).
>>> Therefore, you have showed what you called "lower" or "higher". TSS/Kappa 
>>> or AUC are indices not statistics. They should be taken as a statistical 
>>> tests.
>>> 
>>> Finally, if I understood well, you calibrated your models using 70 
>>> different variables? What is the purpose of using so many variables? I 
>>> suppose that many are correlated. GLM and especially stepwise regressions 
>>> are not very good to deal with large number of correlated variables. GBM 
>>> or randomForest are better techniques, robust to multi-colinearity and 
>>> data hungry methods. Just an opinion...
>>> 
>>> Hope it helps,
>>> Wilfried
>>> 
>>> 
>>> 
>>> Le 26 janv. 2010 à 14:51, Popko a écrit :
>>> 
>>>> Dear colleagues,
>>>> 
>>>> Looking at my Evaluation.results I noticed that predictive accuracy of
>>>> Calibration (called Cross.validation in output table) is lower than the
>>>> predictive accuracy using Evaluation (called "indepdt.data" in output
>>>> table). This is true for all 63 species I've analyzed, and independent of
>>>> method (ROC, kappa, TSS).
>>>> Q: How is this possible?
>>>> 
>>>> This is the model I ran:
>>>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>>>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>>>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>>>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>>>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>>>> nb.absences=1000)
>>>> 
>>>> Data consisted of presence/absence data with ca. 2500 cases per species.
>>>> Some 70 environmental variables were entered in the model.
>>>> 
>>>> Eagerly awaiting your responses,
>>>> 
>>>> Popko Wiersma
>>>> SOVON Dutch Centre for Field Ornithology
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Biomod-commits mailing list
>>>> Biomod-commits at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>>> 
>>> --------------------------
>>> Dr. Wilfried Thuiller
>>> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
>>> Université Joseph Fourier
>>> BP53, 38041 Grenoble cedex 9, France
>>> tel: +33 (0)4 76 63 54 53
>>> fax: +33 (0)4 76 51 42 79
>>> 
>>> Email: wilfried.thuiller at ujf-grenoble.fr
>>> Home page: http://www.will.chez-alice.fr
>>> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
>>> 
>>> FP6 European MACIS project: http://www.macis-project.net
>>> FP6 European EcoChange project: http://www.ecochange-project.eu
>>> 
>>> 
>>> 
>>> 
>>> 
>> _______________________________________________
>> Biomod-commits mailing list
>> Biomod-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
> 
> --------------------------
> Dr. Wilfried Thuiller
> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
> Université Joseph Fourier
> BP53, 38041 Grenoble cedex 9, France
> tel: +33 (0)4 76 63 54 53
> fax: +33 (0)4 76 51 42 79
> 
> Email: wilfried.thuiller at ujf-grenoble.fr
> Home page: http://www.will.chez-alice.fr
> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
> 
> FP6 European MACIS project: http://www.macis-project.net
> FP6 European EcoChange project: http://www.ecochange-project.eu
> 
> 
> 
> 
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 63 54 53
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20100126/aee5a004/attachment.htm