[Biomod-commits] predictive accuracy issue
Wilfried Thuiller
wilfried.thuiller at ujf-grenoble.fr
Tue Jan 26 16:36:49 CET 2010
oops...
Biomod.Manual()
without the s.... there is obviously only one manual ;-)
Le 26 janv. 2010 à 16:34, Wilfried Thuiller a écrit :
> First of all you should read the BIOMOD manual. It is not entirely up-to-date but it should give you the first principles.
>
> Just type the following in the R console
>
> > Biomod.Manuals()
>
> Secondly, BIOMOD is going to do the splitting procedure for you. You do not have to do it yourself.
> When you have "NbRunEval=3", this means you repeat the splitting procedure three times. Every-time the data are split, the models are calibrated on 80% and evaluated on 20%. Then, when it is done three times, only the average evaluation is recorded (in the Cross-Validation column).
>
> It is only when you have TRULY independent data that you should use the IndependantResponse call.
>
> SO, what you should do:
>
>> Initial.State(Response = data_1spec[2], Explanatory = data_1spec[,5:lastcol2], sp.name=i)
>>
>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>> nb.absences=1000)
>
>
> Best
> Wilfried
>
>
>
>
>
> Le 26 janv. 2010 à 15:57, Popko a écrit :
>
>> In reply to your (swift) response: I guess I'm still missing something, but
>> I did not have a 'completely independent dataset' as you assumed, but split
>> my dataset (80/20). I assumed that 20% of the data was used as if it were
>> independent data. This should then have resulted in on average equal results
>> in the first and second column, cross.validation and indepdt.data, resp. In
>> fact the validation results of the independent data are always higher (by 7%
>> on average).
>>
>> FYI:
>> Initial.State(Response = data_1spec[2], Explanatory =
>> data_1spec[,5:lastcol2], IndependentResponse = data_1spec[2],
>> IndependentExplanatory = data_1spec[,5:lastcol2], sp.name=i)
>>
>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>> nb.absences=1000)
>>
>> Popko Wiersma
>> SOVON Dutch Centre for Field Ornithology
>>
>> --------------------------------------------------
>> From: "Wilfried Thuiller" <wilfried.thuiller at ujf-grenoble.fr>
>> Sent: Tuesday, January 26, 2010 3:08 PM
>> To: "Popko" <popkowiersma at hotmail.com>
>> Cc: <biomod-commits at r-forge.wu-wien.ac.at>
>> Subject: Re: [Biomod-commits] predictive accuracy issue
>>
>>> Dear Popko,
>>>
>>> The predictive accuracy estimated during the Calibration phase is NOT done
>>> onto the calibration data (80% of the data in your case) but on the
>>> remaining part (20% in your case). The Cross-validation is thus the mean
>>> of the evaluations onto the 20% (in your case a mean on 3 repetitions x 2
>>> pseudo-absence runs). This is thus not really surprising to see that the
>>> predictive accuracy estimated onto a completely independant dataset
>>> (indepdt.data) is higher than the one estimated onto 20% of the initial
>>> datasets).
>>> Therefore, you have showed what you called "lower" or "higher". TSS/Kappa
>>> or AUC are indices not statistics. They should be taken as a statistical
>>> tests.
>>>
>>> Finally, if I understood well, you calibrated your models using 70
>>> different variables? What is the purpose of using so many variables? I
>>> suppose that many are correlated. GLM and especially stepwise regressions
>>> are not very good to deal with large number of correlated variables. GBM
>>> or randomForest are better techniques, robust to multi-colinearity and
>>> data hungry methods. Just an opinion...
>>>
>>> Hope it helps,
>>> Wilfried
>>>
>>>
>>>
>>> Le 26 janv. 2010 à 14:51, Popko a écrit :
>>>
>>>> Dear colleagues,
>>>>
>>>> Looking at my Evaluation.results I noticed that predictive accuracy of
>>>> Calibration (called Cross.validation in output table) is lower than the
>>>> predictive accuracy using Evaluation (called "indepdt.data" in output
>>>> table). This is true for all 63 species I've analyzed, and independent of
>>>> method (ROC, kappa, TSS).
>>>> Q: How is this possible?
>>>>
>>>> This is the model I ran:
>>>> Models(GLM=T, TypeGLM="quad", Test="AIC", GBM=F, No.trees=2000, GAM=F,
>>>> Spline=3, CTA=F, CV.tree=50, ANN=F, CV.ann=2, SRE=F, Perc025=T, Perc05=F,
>>>> MDA=F, MARS=F, RF=F, NbRunEval=3, DataSplit=80, Yweights=NULL, Roc=T,
>>>> Optimized.Threshold.Roc=T, Kappa=T, TSS=T, KeepPredIndependent=T,
>>>> VarImport=5, NbRepPA=2, strategy="circles", coor=CoorXY, distance=2,
>>>> nb.absences=1000)
>>>>
>>>> Data consisted of presence/absence data with ca. 2500 cases per species.
>>>> Some 70 environmental variables were entered in the model.
>>>>
>>>> Eagerly awaiting your responses,
>>>>
>>>> Popko Wiersma
>>>> SOVON Dutch Centre for Field Ornithology
>>>>
>>>>
>>>> _______________________________________________
>>>> Biomod-commits mailing list
>>>> Biomod-commits at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>>>
>>> --------------------------
>>> Dr. Wilfried Thuiller
>>> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
>>> Université Joseph Fourier
>>> BP53, 38041 Grenoble cedex 9, France
>>> tel: +33 (0)4 76 63 54 53
>>> fax: +33 (0)4 76 51 42 79
>>>
>>> Email: wilfried.thuiller at ujf-grenoble.fr
>>> Home page: http://www.will.chez-alice.fr
>>> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
>>>
>>> FP6 European MACIS project: http://www.macis-project.net
>>> FP6 European EcoChange project: http://www.ecochange-project.eu
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> Biomod-commits mailing list
>> Biomod-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>
> --------------------------
> Dr. Wilfried Thuiller
> Laboratoire d'Ecologie Alpine, UMR CNRS 5553
> Université Joseph Fourier
> BP53, 38041 Grenoble cedex 9, France
> tel: +33 (0)4 76 63 54 53
> fax: +33 (0)4 76 51 42 79
>
> Email: wilfried.thuiller at ujf-grenoble.fr
> Home page: http://www.will.chez-alice.fr
> Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
>
> FP6 European MACIS project: http://www.macis-project.net
> FP6 European EcoChange project: http://www.ecochange-project.eu
>
>
>
>
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 63 54 53
fax: +33 (0)4 76 51 42 79
Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm
FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20100126/aee5a004/attachment.htm
More information about the Biomod-commits
mailing list