[Biomod-commits] binary consensus output

Fri Jan 21 12:40:47 CET 2011

Hi Wilfried,

Just wanted to double-check something - you say below that the full model
projections can be removed by setting final.model.out to TRUE.  However, the
help page for the functions says "final.model.out 	set to True if you
want the total ensemble to be build with the final models taken into
account".  Which is right?

Many thanks,
Raquel 

-----Original Message-----
From: biomod-commits-bounces at lists.r-forge.r-project.org
[mailto:biomod-commits-bounces at lists.r-forge.r-project.org] On Behalf Of
biomod-commits-request at lists.r-forge.r-project.org
Sent: 20 January 2011 09:21 PM
To: biomod-commits at lists.r-forge.r-project.org
Subject: Biomod-commits Digest, Vol 21, Issue 2

Send Biomod-commits mailing list submissions to
	biomod-commits at lists.r-forge.r-project.org

To subscribe or unsubscribe via the World Wide Web, visit

https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

or, via email, send a message with subject or body 'help' to
	biomod-commits-request at lists.r-forge.r-project.org

You can reach the person managing the list at
	biomod-commits-owner at lists.r-forge.r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Biomod-commits digest..."

Today's Topics:

   1. binary consensus output (Sami Domisch)
   2. Re: binary consensus output (Wilfried Thuiller)

----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Jan 2011 19:12:40 +0100
From: "Sami Domisch" <Sami.Domisch at senckenberg.de>
Subject: [Biomod-commits] binary consensus output
To: <biomod-commits at lists.r-forge.r-project.org>
Message-ID: <4D388928020000CD000029BD at snggwia.senckenberg.de>
Content-Type: text/plain; charset=US-ASCII

Dear modellers, 

I have a question related to the method, how BIOMOD creates the binary
consensus predictions. I played a bit around with the test data which comes
with the package, and modelled the present distribution of 2 species (Sp185,
Sp191) with 3 algorithms (GLM, CTA, RF x 2 repetitions each, one
pseudo-absence-run to keep it simple and fast..). I used the
Projection-function using the same present-day variables in order to receive
the present-day distribution for the whole study area. Subsequently I used
the Ensemble.Forecasting-function to get a consensus model using weighted
averages (prob.mean.weighted, weight decay 1.6). So far nothing special
about it, I pasted the code below.

I now compared the binary consensus output BIOMOD created for the two
species (i.e. prob.mean.weighted in "Total_consensus_present_Bin") with the
probability-output (0-1000) of the consensus prob.mean.weighted-model, after
applying the prob.mean.weighted - threshold which is given in the
"consensus_present_results"-table (PA1, which used the total data of the two
repetitions PA1_rep1 and PA1_rep2). I thus created the binary results
manually.

Now here is my problem: the number of presence-pixels ("1") differ between
the two outputs, although they should be identical. For instance, Sp185 and
Sp191 have 736 and 1217 presence-pixels, respectively, whereas the manually
calculated ones have 678 and 1149 pixels classified as "1". Shouldn't the
numbers be the same? How is BIOMOD creating the binary results, did I miss
something? I guess this derives from the partitioning of the train/test-data
vs. using the total data? 

I am interested in a solution since I want to average several projections
for one species based on different climate scenarios, and binary maps would
be essential for me. The idea was quite simple: to average the probabilities
of the different climate-scenario runs for each grid cell, and then average
the thresholds of these runs to get the binary outputs. And to check this
method, I compared the BIOMOD-output and the manually calculated one.
However there seems to be some kind of discrepancy...Has anybody a solution
or maybe knows a work-around for this issue?
Any help is appreciated, many thanks in advance!
Sami

#############

load("Sp.Env.rda")

library(BIOMOD)

Initial.State(Response=Sp.Env[, c(17:18)], Explanatory = Sp.Env[,c(4:10)],
        IndependentResponse = NULL, IndependentExplanatory = NULL)

Models(    GLM = T, TypeGLM = "poly", Test = "AIC", 
        GBM = F, No.trees = 3000,
        GAM = F, Spline = 3,
        CTA = T, CV.tree = 50,
        ANN = F, CV.ann = 2,
        FDA = F,
        SRE = F, quant=0.025,
        MARS = F,
        RF = T,
        NbRunEval = 2, DataSplit = 70,
        Yweights=NULL, Roc=TRUE, Optimized.Threshold.Roc=TRUE,
        Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,
        NbRepPA=1, strategy="random",
        nb.absences=1000)

Projection(Proj = Sp.Env[,4:10],
            Proj.name='present',
            GLM = T,
            GBM = F,
            GAM = F,
            CTA = T,
            ANN = F,
            FDA = F,
            SRE = F, quant=0.025,
            MARS = F,
            RF = T,
            BinRoc = T, BinKappa = F, BinTSS = F,
            FiltRoc = F, FiltKappa = F, FiltTSS = F,
            repetition.models=T)

Ensemble.Forecasting(Proj.name= "present",
            weight.method='Roc',
            PCA.median=F,
            binary=T,
            bin.method='Roc',
            Test=T,
            decay=1.6,
            repetition.models=T)

# check outputs:

# binary output:
load("proj.present/Total_consensus_present_Bin")
binary_weighted_average <- Total_consensus_present_Bin[,,2]
write.csv( binary_weighted_average, "binary_weighted_average.csv")

# probs 0-1000:
load("proj.present/Total_consensus_present")
probs_weighted_average <- Total_consensus_present[,,2]
write.csv(probs_weighted_average, "probs_weighted_average.csv")

# get appropriate threshold for prob.mean.weighted:
consensus_present_results

#############

------------------------------

Message: 2
Date: Thu, 20 Jan 2011 22:14:28 +0100
From: Wilfried Thuiller <wilfried.thuiller at ujf-grenoble.fr>
Subject: Re: [Biomod-commits] binary consensus output
To: Sami Domisch <Sami.Domisch at senckenberg.de>
Cc: biomod-commits at r-forge.wu-wien.ac.at
Message-ID: <43030340-C1C6-4B95-A47A-B13AEFCDD510 at ujf-grenoble.fr>
Content-Type: text/plain; charset="iso-8859-1"

Dear Sami,

Funny enough, despite the answer to your email is relatively easy, by
looking at the code, I discovered a problem that I just fixed. Thanks a lot!

You understood almost correctly. 

If in the Ensemble.Forecasting call, you have "repetition.models=T", then
for the binary transformation apply to Total.Consensus, we take the AVERAGE
of the thresholds (and not only PA1). In your example, PA1, PA1_rep1, and
PA1_rep2).

If you do not want the repetition added to the ensemble forecast, just
replace "repetition.models=F".

Alternatively, you can also remove the projections made with the full model
(run on all data) and only keep the average over the different repetitions
(made on sub-parts of the data). For this you just to have to put:
final.model.out = TRUE. 
There was a bug here, a local variable that was removed accidentally during
the process. I fixed it. 

Look at your example below where I run the two cases. Everything works fine.

In the mean time, I have also modified the two functions
Ensemble.Forecasting and Ensemble.Forecasting.raster to allow the binary
transformation for the median as well. The threshold is set to the median of
the thresholds.

I have uploaded the new functions on R-Forge tonight (rev 258, version
1.1-6.3). Because R-Forge is always a bit long to compile the packages, I
have also attached the two updated scripts. If you want to use them already,
first load BIOMOD, and then "source" the two scripts. 
Please note that I have not tested the Ensemble.Forecasting.raster with
these changes. Any bug report is more than welcome (only the binary
transformation by the median has been changed). 

library(BIOMOD)
source("PATH/Ensemble.Forecasting")  ### make sure  to set up the correct
path to the file)

Hope it helps,
Wilfried

#####################################################
#### Examples to show the results with and without the "full model".

library(BIOMOD)
data(Sp.Env)

Initial.State(Response=Sp.Env[, c(17:18)], Explanatory = Sp.Env[,c(4:10)],
IndependentResponse = NULL, IndependentExplanatory = NULL)

Models(GLM = T, TypeGLM = "poly", Test = "AIC", GBM = F, No.trees = 3000,GAM
= F, Spline = 3,CTA = T, CV.tree = 50,ANN = F, CV.ann = 2,FDA = F,SRE = F,
quant=0.025,MARS = F,RF = T,NbRunEval = 2, DataSplit = 70,Yweights=NULL,
Roc=TRUE,Optimized.Threshold.Roc=TRUE,Kappa=TRUE, TSS=TRUE,
KeepPredIndependent = FALSE,
VarImport=5,NbRepPA=1,strategy="random",nb.absences=1000)

Projection(Proj = Sp.Env[,4:10],Proj.name='present',GLM = T,GBM = F,GAM =
F,CTA = T,ANN = F,FDA = F,SRE = F, quant=0.025,MARS = F,RF = T,BinRoc = T,
BinKappa = F, BinTSS = F,FiltRoc = F, FiltKappa = F, FiltTSS =
F,repetition.models=T)

Ensemble.Forecasting(Proj.name=
"present",weight.method='Roc',PCA.median=F,binary=T,bin.method='Roc',Test=T,
decay=1.6,repetition.models=T, , final.model.out=F)

# check outputs:

# binary output:
load("proj.present/Total_consensus_present_Bin")
binary_weighted_average <- Total_consensus_present_Bin[,,3]

apply(binary_weighted_average, 2, sum)

load("proj.present/Total_consensus_present")
probs_weighted_average <- Total_consensus_present[,,3]

load("proj.present/consensus_present_results")

sum(BinaryTransformation(probs_weighted_average[,1],
mean(consensus_present_results[[1]][[3]][2,])))
sum(BinaryTransformation(probs_weighted_average[,2],
mean(consensus_present_results[[2]][[3]][2,])))

#### WITHOUT FULL MODEL

Ensemble.Forecasting(Proj.name=
"present",weight.method='Roc',PCA.median=F,binary=T,bin.method='Roc',Test=T,
decay=1.6,repetition.models=T, final.model.out=T)

# check outputs:
# binary output: WITHOUT FULL MODELS
load("proj.present/Total_consensus_present_Bin")
binary_weighted_average <- Total_consensus_present_Bin[,,3]

apply(binary_weighted_average, 2, sum)

load("proj.present/Total_consensus_present")
probs_weighted_average <- Total_consensus_present[,,3]

load("proj.present/consensus_present_results")

sum(BinaryTransformation(probs_weighted_average[,1],
mean(consensus_present_results[[1]][[3]][3,2:3])))
sum(BinaryTransformation(probs_weighted_average[,2],
mean(consensus_present_results[[2]][[3]][3,2:3])))

Le 20 janv. 2011 ? 19:12, Sami Domisch a ?crit :

> Dear modellers, 
> 
> I have a question related to the method, how BIOMOD creates the binary
consensus predictions. I played a bit around with the test data which comes
with the package, and modelled the present distribution of 2 species (Sp185,
Sp191) with 3 algorithms (GLM, CTA, RF x 2 repetitions each, one
pseudo-absence-run to keep it simple and fast..). I used the
Projection-function using the same present-day variables in order to receive
the present-day distribution for the whole study area. Subsequently I used
the Ensemble.Forecasting-function to get a consensus model using weighted
averages (prob.mean.weighted, weight decay 1.6). So far nothing special
about it, I pasted the code below.
> 
> I now compared the binary consensus output BIOMOD created for the two
species (i.e. prob.mean.weighted in "Total_consensus_present_Bin") with the
probability-output (0-1000) of the consensus prob.mean.weighted-model, after
applying the prob.mean.weighted - threshold which is given in the
"consensus_present_results"-table (PA1, which used the total data of the two
repetitions PA1_rep1 and PA1_rep2). I thus created the binary results
manually.
> 
> Now here is my problem: the number of presence-pixels ("1") differ between
the two outputs, although they should be identical. For instance, Sp185 and
Sp191 have 736 and 1217 presence-pixels, respectively, whereas the manually
calculated ones have 678 and 1149 pixels classified as "1". Shouldn't the
numbers be the same? How is BIOMOD creating the binary results, did I miss
something? I guess this derives from the partitioning of the train/test-data
vs. using the total data? 
> 
> I am interested in a solution since I want to average several projections
for one species based on different climate scenarios, and binary maps would
be essential for me. The idea was quite simple: to average the probabilities
of the different climate-scenario runs for each grid cell, and then average
the thresholds of these runs to get the binary outputs. And to check this
method, I compared the BIOMOD-output and the manually calculated one.
However there seems to be some kind of discrepancy...Has anybody a solution
or maybe knows a work-around for this issue?
> Any help is appreciated, many thanks in advance!
> Sami
> 
> 
> #############
> 
> load("Sp.Env.rda")
> 
> library(BIOMOD)
> 
> Initial.State(Response=Sp.Env[, c(17:18)], Explanatory = Sp.Env[,c(4:10)],
>        IndependentResponse = NULL, IndependentExplanatory = NULL)
> 
> Models(    GLM = T, TypeGLM = "poly", Test = "AIC", 
>        GBM = F, No.trees = 3000,
>        GAM = F, Spline = 3,
>        CTA = T, CV.tree = 50,
>        ANN = F, CV.ann = 2,
>        FDA = F,
>        SRE = F, quant=0.025,
>        MARS = F,
>        RF = T,
>        NbRunEval = 2, DataSplit = 70,
>        Yweights=NULL, Roc=TRUE, Optimized.Threshold.Roc=TRUE,
>        Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,
>        NbRepPA=1, strategy="random",
>        nb.absences=1000)
> 
> 
> Projection(Proj = Sp.Env[,4:10],
>            Proj.name='present',
>            GLM = T,
>            GBM = F,
>            GAM = F,
>            CTA = T,
>            ANN = F,
>            FDA = F,
>            SRE = F, quant=0.025,
>            MARS = F,
>            RF = T,
>            BinRoc = T, BinKappa = F, BinTSS = F,
>            FiltRoc = F, FiltKappa = F, FiltTSS = F,
>            repetition.models=T)
> 
> 
> Ensemble.Forecasting(Proj.name= "present",
>            weight.method='Roc',
>            PCA.median=F,
>            binary=T,
>            bin.method='Roc',
>            Test=T,
>            decay=1.6,
>            repetition.models=T)
> 
> 
> # check outputs:
> 
> # binary output:
> load("proj.present/Total_consensus_present_Bin")
> binary_weighted_average <- Total_consensus_present_Bin[,,2]
> write.csv( binary_weighted_average, "binary_weighted_average.csv")
> 
> 
> # probs 0-1000:
> load("proj.present/Total_consensus_present")
> probs_weighted_average <- Total_consensus_present[,,2]
> write.csv(probs_weighted_average, "probs_weighted_average.csv")
> 
> # get appropriate threshold for prob.mean.weighted:
> consensus_present_results
> 
> #############
> 
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Universit? Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Home page: http://www.will.chez-alice.fr
Website: http://www-leca.ujf-grenoble.fr/equipes/tde.htm

FP6 European MACIS project: http://www.macis-project.net
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/201
10120/6802bc9c/attachment.htm>

------------------------------

_______________________________________________
Biomod-commits mailing list
Biomod-commits at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits

End of Biomod-commits Digest, Vol 21, Issue 2
*********************************************