[Biomod-commits] binary consensus output

Thu Jan 20 22:14:28 CET 2011

Dear Sami,

Funny enough, despite the answer to your email is relatively easy, by looking at the code, I discovered a problem that I just fixed. Thanks a lot! 

You understood almost correctly. 

If in the Ensemble.Forecasting call, you have "repetition.models=T", then for the binary transformation apply to Total.Consensus, we take the AVERAGE of the thresholds (and not only PA1). In your example, PA1, PA1_rep1, and PA1_rep2).

If you do not want the repetition added to the ensemble forecast, just replace "repetition.models=F".

Alternatively, you can also remove the projections made with the full model (run on all data) and only keep the average over the different repetitions (made on sub-parts of the data). For this you just to have to put: final.model.out = TRUE. 
There was a bug here, a local variable that was removed accidentally during the process. I fixed it. 

Look at your example below where I run the two cases. Everything works fine. 

In the mean time, I have also modified the two functions Ensemble.Forecasting and Ensemble.Forecasting.raster to allow the binary transformation for the median as well. The threshold is set to the median of the thresholds.

I have uploaded the new functions on R-Forge tonight (rev 258, version 1.1-6.3). Because R-Forge is always a bit long to compile the packages, I have also attached the two updated scripts. If you want to use them already, first load BIOMOD, and then "source" the two scripts. 
Please note that I have not tested the Ensemble.Forecasting.raster with these changes. Any bug report is more than welcome (only the binary transformation by the median has been changed). 

library(BIOMOD)
source("PATH/Ensemble.Forecasting")  ### make sure  to set up the correct path to the file)

Hope it helps,
Wilfried

#####################################################
#### Examples to show the results with and without the "full model".

library(BIOMOD)
data(Sp.Env)

Initial.State(Response=Sp.Env[, c(17:18)], Explanatory = Sp.Env[,c(4:10)], IndependentResponse = NULL, IndependentExplanatory = NULL)

Models(GLM = T, TypeGLM = "poly", Test = "AIC", GBM = F, No.trees = 3000,GAM = F, Spline = 3,CTA = T, CV.tree = 50,ANN = F, CV.ann = 2,FDA = F,SRE = F, quant=0.025,MARS = F,RF = T,NbRunEval = 2, DataSplit = 70,Yweights=NULL, Roc=TRUE,Optimized.Threshold.Roc=TRUE,Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,NbRepPA=1,strategy="random",nb.absences=1000)

Projection(Proj = Sp.Env[,4:10],Proj.name='present',GLM = T,GBM = F,GAM = F,CTA = T,ANN = F,FDA = F,SRE = F, quant=0.025,MARS = F,RF = T,BinRoc = T, BinKappa = F, BinTSS = F,FiltRoc = F, FiltKappa = F, FiltTSS = F,repetition.models=T)

Ensemble.Forecasting(Proj.name= "present",weight.method='Roc',PCA.median=F,binary=T,bin.method='Roc',Test=T,decay=1.6,repetition.models=T, , final.model.out=F)

# check outputs:

# binary output:
load("proj.present/Total_consensus_present_Bin")
binary_weighted_average <- Total_consensus_present_Bin[,,3]

apply(binary_weighted_average, 2, sum)

load("proj.present/Total_consensus_present")
probs_weighted_average <- Total_consensus_present[,,3]

load("proj.present/consensus_present_results")

sum(BinaryTransformation(probs_weighted_average[,1], mean(consensus_present_results[[1]][[3]][2,])))
sum(BinaryTransformation(probs_weighted_average[,2], mean(consensus_present_results[[2]][[3]][2,])))

#### WITHOUT FULL MODEL

Ensemble.Forecasting(Proj.name= "present",weight.method='Roc',PCA.median=F,binary=T,bin.method='Roc',Test=T,decay=1.6,repetition.models=T, final.model.out=T)

# check outputs:
# binary output: WITHOUT FULL MODELS
load("proj.present/Total_consensus_present_Bin")
binary_weighted_average <- Total_consensus_present_Bin[,,3]

apply(binary_weighted_average, 2, sum)

load("proj.present/Total_consensus_present")
probs_weighted_average <- Total_consensus_present[,,3]

load("proj.present/consensus_present_results")

sum(BinaryTransformation(probs_weighted_average[,1], mean(consensus_present_results[[1]][[3]][3,2:3])))
sum(BinaryTransformation(probs_weighted_average[,2], mean(consensus_present_results[[2]][[3]][3,2:3])))

Le 20 janv. 2011 à 19:12, Sami Domisch a écrit :

> Dear modellers, 
> 
> I have a question related to the method, how BIOMOD creates the binary consensus predictions. I played a bit around with the test data which comes with the package, and modelled the present distribution of 2 species (Sp185, Sp191) with 3 algorithms (GLM, CTA, RF x 2 repetitions each, one pseudo-absence-run to keep it simple and fast..). I used the Projection-function using the same present-day variables in order to receive the present-day distribution for the whole study area. Subsequently I used the Ensemble.Forecasting-function to get a consensus model using weighted averages (prob.mean.weighted, weight decay 1.6). So far nothing special about it, I pasted the code below.
> 
> I now compared the binary consensus output BIOMOD created for the two species (i.e. prob.mean.weighted in "Total_consensus_present_Bin") with the probability-output (0-1000) of the consensus prob.mean.weighted-model, after applying the prob.mean.weighted - threshold which is given in the "consensus_present_results"-table (PA1, which used the total data of the two repetitions PA1_rep1 and PA1_rep2). I thus created the binary results manually.
> 
> Now here is my problem: the number of presence-pixels ("1") differ between the two outputs, although they should be identical. For instance, Sp185 and Sp191 have 736 and 1217 presence-pixels, respectively, whereas the manually calculated ones have 678 and 1149 pixels classified as "1". Shouldn't the numbers be the same? How is BIOMOD creating the binary results, did I miss something? I guess this derives from the partitioning of the train/test-data vs. using the total data? 
> 
> I am interested in a solution since I want to average several projections for one species based on different climate scenarios, and binary maps would be essential for me. The idea was quite simple: to average the probabilities of the different climate-scenario runs for each grid cell, and then average the thresholds of these runs to get the binary outputs. And to check this method, I compared the BIOMOD-output and the manually calculated one. However there seems to be some kind of discrepancy...Has anybody a solution or maybe knows a work-around for this issue?
> Any help is appreciated, many thanks in advance!
> Sami
> 
> 
> #############
> 
> load("Sp.Env.rda")
> 
> library(BIOMOD)
> 
> Initial.State(Response=Sp.Env[, c(17:18)], Explanatory = Sp.Env[,c(4:10)],
>        IndependentResponse = NULL, IndependentExplanatory = NULL)
> 
> Models(    GLM = T, TypeGLM = "poly", Test = "AIC", 
>        GBM = F, No.trees = 3000,
>        GAM = F, Spline = 3,
>        CTA = T, CV.tree = 50,
>        ANN = F, CV.ann = 2,
>        FDA = F,
>        SRE = F, quant=0.025,
>        MARS = F,
>        RF = T,
>        NbRunEval = 2, DataSplit = 70,
>        Yweights=NULL, Roc=TRUE, Optimized.Threshold.Roc=TRUE,
>        Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,
>        NbRepPA=1, strategy="random",
>        nb.absences=1000)
> 
> 
> Projection(Proj = Sp.Env[,4:10],
>            Proj.name='present',
>            GLM = T,
>            GBM = F,
>            GAM = F,
>            CTA = T,
>            ANN = F,
>            FDA = F,
>            SRE = F, quant=0.025,
>            MARS = F,
>            RF = T,
>            BinRoc = T, BinKappa = F, BinTSS = F,
>            FiltRoc = F, FiltKappa = F, FiltTSS = F,
>            repetition.models=T)
> 
> 
> Ensemble.Forecasting(Proj.name= "present",
>            weight.method='Roc',
>            PCA.median=F,
>            binary=T,
>            bin.method='Roc',
>            Test=T,
>            decay=1.6,
>            repetition.models=T)
> 
> 
> # check outputs:
> 
> # binary output:
> load("proj.present/Total_consensus_present_Bin")
> binary_weighted_average <- Total_consensus_present_Bin[,,2]
> write.csv( binary_weighted_average, "binary_weighted_average.csv")
> 
> 
> # probs 0-1000:
> load("proj.present/Total_consensus_present")
> probs_weighted_average <- Total_consensus_present[,,2]
> write.csv(probs_weighted_average, "probs_weighted_average.csv")
> 
> # get appropriate threshold for prob.mean.weighted:
> consensus_present_results
> 
> #############
> 
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20110120/6802bc9c/attachment-0001.htm>