No subject
Mon Feb 21 17:26:18 CET 2011
lance of 0.1. Is that correct? I just want to be sure that there is no we=
ighting of absence records (e.g. weighting to simulate a prevalence of 0.5)=
.
Thank you,
Brenna
_______________________________________________
Biomod-commits mailing list
Biomod-commits at lists.r-forge.r-project.org<mailto:Biomod-commits at lists.r-fo=
rge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Universit=E9 Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79
Email: wilfried.thuiller at ujf-grenoble.fr<mailto:wilfried.thuiller at ujf-greno=
ble.fr>
Personal website: http://www.will.chez-alice.fr<http://www.will.chez-alice.=
fr/>
Team website: http://www-leca.ujf-grenoble.fr/equipes/emabio.htm
FP6 European MACIS project: http://www.macis-project.net<http://www.macis-p=
roject.net/>
FP6 European EcoChange project: http://www.ecochange-project.eu<http://www.=
ecochange-project.eu/>
_______________________________________________
Biomod-commits mailing list
Biomod-commits at lists.r-forge.r-project.org<mailto:Biomod-commits at lists.r-fo=
rge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Universit=E9 Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79
Email: wilfried.thuiller at ujf-grenoble.fr<mailto:wilfried.thuiller at ujf-greno=
ble.fr>
Personal website: http://www.will.chez-alice.fr<http://www.will.chez-alice.=
fr/>
Team website: http://www-leca.ujf-grenoble.fr/equipes/emabio.htm
FP6 European MACIS project: http://www.macis-project.net<http://www.macis-p=
roject.net/>
FP6 European EcoChange project: http://www.ecochange-project.eu<http://www.=
ecochange-project.eu/>
--_000_764F8BF814B1364698593EF01B2885552BE04CSN2PRD0102MB103pr_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style id=3D"owaParaStyle" type=3D"text/css">=0A=
<!--=0A=
p=0A=
{margin-top:0;=0A=
margin-bottom:0}=0A=
-->=0A=
P {margin-top:0;margin-bottom:0;}</style>
</head>
<body ocsi=3D"0" fpstyle=3D"1" style=3D"word-wrap: break-word;">
<div style=3D"direction: ltr; font-family: Helvetica; color: rgb(0, 0, 0); =
font-size: 10pt;">
Thank you Wilfried; your answers are extremely helpful!<br>
<br>
I was originally modeling my species with presences =3D pseudo-absences bas=
ed on my reading of the literature. Then I found the Jimenez-Valverde=
, Lobo and Hortal paper in Community Ecology (2009) titled "The effect=
of prevelence and its interaction with sample
size on the reliability of species distribution models". Using =
a virtual species, they found that biased prevalence is only significant wi=
th extremely unbalanced samples, given many caveats (such as reliable train=
ing data & relevant predictors). In practice,
they recommend using as large a sample size as possible, to improve model =
stability & improve sample coverage over the environmental and spatial =
gradients of the study area. This includes using as many absences as =
possible, down to a prevalence of 0.01.
They include discussion of appropriate use of probabilities (since they ar=
e skewed due to prevalence) & appropriate assessment of model performan=
ce (e.g. don't use kappa).<br>
<br>
Anyway - it is a very interesting paper & made me want to try modeling =
my species using a prevalence of 0.1. So I ran my models in three way=
s:<br>
<br>
prevalence =3D 0.1 (presence =3D 304 / PA =3D 2736)<br>
prevalence =3D 0.5 (presence =3D 304 / PA =3D 304)<br>
prevalence =3D 0.5 (presence =3D 304 / PA =3D 2736 weighted)<br>
<br>
I compared ROC and TSS scores for cross-validation, sensitivity and specifi=
city. Models with prevalence =3D 0.1 had the best specificity scores,=
but worst CV and sensitivity. Prevalence =3D 0.5 (304/304) had the b=
est CV and sensitivity scores (with one exception),
with specificity second to the prevalence =3D 0.1 models. Prevalence=
=3D 0.5 (304/2736 weighted) was in the middle. Most of these differe=
nces were relatively small.<br>
<br>
Right now I'm assessing the stability of my 304/304 models to PA pulls, sin=
ce 304 PAs samples a small number of possible absences (total grid cells in=
my study area =3D 6808).<br>
<br>
With real (not virtual) data sets, there are obviously many interacting fac=
tors that influence final CV/sens/spec scores. I was surprised to see=
the relatively small differences made by changing prevalence and # of PA r=
ecords. I'd be interested to hear yours
& others thoughts on these issues. I wonder how your upcoming pa=
per using virtual datasets compares with the Jimenez-Valverde et. al. paper=
?<br>
<br>
Thanks for an interesting discussion!<br>
Brenna<br>
<br>
<div style=3D"font-family: Times New Roman; color: rgb(0, 0, 0); font-size:=
16px;">
<hr tabindex=3D"-1">
<div id=3D"divRpF925400" style=3D"direction: ltr;"><font color=3D"#000000" =
face=3D"Tahoma" size=3D"2"><b>From:</b> Wilfried Thuiller [wilfried.thuille=
r at ujf-grenoble.fr]<br>
<b>Sent:</b> Saturday, April 23, 2011 6:29 AM<br>
<b>To:</b> Brenna Forester<br>
<b>Cc:</b> biomod-commits at lists.r-forge.r-project.org<br>
<b>Subject:</b> Re: Re : [Biomod-commits] prevalence and pseudoabsences<br>
</font><br>
</div>
<div></div>
<div>Dear Brenna,
<div><br>
</div>
<div>
<div>
<blockquote type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-=
collapse: separate; font-family: Helvetica; font-style: normal; font-varian=
t: normal; font-weight: normal; letter-spacing: normal; line-height: normal=
; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; font-size: medium;">
<div>
<div style=3D"direction: ltr; font-family: Helvetica; color: rgb(0, 0, 0); =
font-size: 10pt;">
Thanks Bruno & Wilfried,<br>
<br>
So to clarify: I run pseudo.abs - in my case as so:<br>
<br>
<font size=3D"2"><span style=3D"font-family: 'Courier New';">PA1 <- pseu=
do.abs(coor=3DSp.Env[,2:3], status=3DSp.Env[,1], strategy=3D"random&qu=
ot;,<br>
</span>  =
; <span class=3D"Apple-converted-space"> </span><span style=3D"fo=
nt-family: 'Courier New';">env=3DSp.Env[,4:10], nb.points=3D2736, species.n=
ame=3D"Rhodiola",<br>
</span>  =
; <span class=3D"Apple-converted-space"> </span><span style=3D"fo=
nt-family: 'Courier New';">add.pres=3DF,</span><span style=3D"font-family: =
'Courier New';"><span class=3D"Apple-converted-space"> </span>create.d=
ataset=3DT, plot=3DT, pcol=3D"red", acol=3D"grey80")</s=
pan></font><br>
<br>
This creates two objects, "<span style=3D"font-family: 'Courier New';"=
>PA1</span>" (a vector of cell numbers chosen as absences) and "<=
span style=3D"font-family: 'Courier New';">Dataset.Rhodiola.random.partial<=
/span>", a dataframe of coordinates and "status" (zero).<br>
<br>
I would then create a new dataset that has just my presence records (304) a=
nd these 2736 absences. I would run that dataset (<span style=3D"font=
-family: 'Courier New';">Sp.Env.PA1</span>) in the<span class=3D"Apple-conv=
erted-space"> </span><span style=3D"font-family: 'Courier New';">Intia=
l.State()</span><span class=3D"Apple-converted-space"> </span>and<span=
class=3D"Apple-converted-space"> </span><span style=3D"font-family: '=
Courier New';">Models()<span class=3D"Apple-converted-space"> </span><=
/span>functions,
for example, as so:<br>
<br>
<span style=3D"font-family: 'Courier New';">Initial.State(Response=3DSp.Env=
.PA1[,c(1)], Explanatory=3DSp.Env.PA1[,4:10],<br>
</span> &n=
bsp;  =
;<span class=3D"Apple-converted-space"> </span><span style=3D"font-fam=
ily: 'Courier New';">IndependentResponse=3DNULL,</span><span style=3D"font-=
family: 'Courier New';"><span class=3D"Apple-converted-space"> </span>=
IndependentExplanatory=3DNULL,<br>
</span> &n=
bsp;  =
; <span class=3D"Apple-converted-space"> </span><span style=3D"fo=
nt-family: 'Courier New';">sp.name=3D"Rhodiola")</span><br>
<br>
<span style=3D"font-family: 'Courier New';">Models(GLM =3D T, TypeGLM =3D &=
quot;simple", Test =3D "AIC", GBM =3D T, No.trees =3D 5000,<=
br>
</span> <span=
class=3D"Apple-converted-space"> </span><span style=3D"font-family: '=
Courier New';">GAM =3D T, CTA =3D T,<span class=3D"Apple-converted-space">&=
nbsp;</span></span><span style=3D"font-family: 'Courier New';">CV.tree =3D =
100, ANN =3D T, CV.ann =3D 5, SRE =3D
F, FDA =3D T,<span class=3D"Apple-converted-space"> </span><br>
</span> <span=
class=3D"Apple-converted-space"> </span><span style=3D"font-family: '=
Courier New';">MARS =3D T, RF =3D T,</span><span style=3D"font-family: 'Cou=
rier New';"><span class=3D"Apple-converted-space"> </span>NbRunEval =
=3D 10, DataSplit =3D 70, Yweights=3DNULL,</span><span style=3D"font-family=
: 'Courier New';"><br>
</span> =
<span class=3D"Apple-converted-space"> </span><span style=3D"font-fami=
ly: 'Courier New';">NbRepPA=3D0,<span class=3D"Apple-converted-space"> =
;</span></span><span style=3D"font-family: 'Courier New';">Roc=3DT, Optimiz=
ed.Threshold.Roc=3DT, Kappa=3DT, TSS=3DT,<br>
</span> <span =
class=3D"Apple-converted-space"> </span><span style=3D"font-family: 'C=
ourier New';">KeepPredIndependent =3D F,</span><span style=3D"font-family: =
'Courier New';"><span class=3D"Apple-converted-space"> </span>VarImpor=
t=3D5)</span><br>
<br>
I keep<span class=3D"Apple-converted-space"> </span><span style=3D"fon=
t-family: 'Courier New';">NbRepPA =3D 0<span class=3D"Apple-converted-space=
"> </span></span>so it uses the entire dataset to evaluate the model, =
maintaining my prevalence at 0.1 (304 presence records/3040
total records in the dataset).<br>
I think I am correct on everything to this point?<br>
</div>
</div>
</span></blockquote>
<div><br>
</div>
<div>Yes, you are correct. </div>
<br>
<blockquote type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-=
collapse: separate; font-family: Helvetica; font-style: normal; font-varian=
t: normal; font-weight: normal; letter-spacing: normal; line-height: normal=
; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; font-size: medium;">
<div>
<div style=3D"direction: ltr; font-family: Helvetica; color: rgb(0, 0, 0); =
font-size: 10pt;">
So my question is: I want to do 5 PA pulls (as I would if I ran it in the<s=
pan class=3D"Apple-converted-space"> </span><span style=3D"font-family=
: 'Courier New';">Models()<span class=3D"Apple-converted-space"> </spa=
n></span>function,<span class=3D"Apple-converted-space"> </span><span =
style=3D"font-family: 'Courier New';">NbRepPA
=3D 5</span>), maintaining my 0.1 prevalence. But I would then have =
run<span class=3D"Apple-converted-space"> </span><span style=3D"font-f=
amily: 'Courier New';">Models()<span class=3D"Apple-converted-space"> =
</span></span>five times on 5 datasets (each with different
PA pulls). How does BIOMOD create a final model when using PA pulls =
(e.g.<span class=3D"Apple-converted-space"> </span><span style=3D"font=
-family: 'Courier New';">NbRepPA =3D 5)</span><span class=3D"Apple-converte=
d-space"> </span>within the<span class=3D"Apple-converted-space"> =
;</span><span style=3D"font-family: 'Courier New';">Models()</span><span st=
yle=3D"font-family: 'Courier New';"><span style=3D"font-family: Helvetica;"=
><span class=3D"Apple-converted-space"> </span>function,
and can I replicate that<span class=3D"Apple-converted-space"> </span=
></span></span>when I run my PA pulls manually as above?<br>
</div>
</div>
</span></blockquote>
<div><br>
</div>
<div>There is no final model when using several PA sets. There are as many =
"final models" as PA sets. </div>
<div>If you want to use several sets of PA yourself, make predictions from =
every model (using the Projections function for instance on the overall are=
a). Then you'll need to combine them yourself. </div>
<div>There are several alternatives for combining projections from differen=
t models from different PA sets and from different repetitions from cross-v=
alidation:</div>
<div><br>
</div>
<div>Either you create a simple average and standard deviation from project=
ions in probability values. You can then derive a confidence interval if yo=
u want.</div>
<div>You could also perform a weighted sum using weights derived from TSS o=
r ROC for instance. It will give more weights to the best models (from the =
cross-validation column in Evaluation.results.TSS). </div>
<div>You could also perform what we usually call a committee averaging wher=
e you let the models vote for a presence or an absence. For this, you do no=
t use the probability of occurrence anymore, but rather the presence-absenc=
e data directly. You then sum the
presence-absences maps. If you have 5 repetitions, 5 models and 5 sets of =
PA, you thus have at maximum 125. When the sum if equal to 125, it means al=
l repetitions, PA and models agree to say this is a presence, and when you =
got zero, it means the reverse obviously.
Between 0 and 125 will give you the probability of agreement from the mode=
ls for an absence (after rescaling everything by 125 for instance). This en=
semble approach is very close to the Bayesian philosophy with posterior pro=
babilities. I really like this approach,
much better than looking at probability of occurrences themselves. </=
div>
<div><br>
</div>
<div>Now, I am not entirely sure why you want to keep your prevalence. Regr=
ession like models are not really good with artificial unbalanced dataset (=
prevalence different than zero). They are supposed to work well if the prev=
alence is the true prevalence of
the species. This is the case with a perfect stratified sampling, but this=
is absolutely not when using random sets of pseudo-absence. </div>
<div>Therefore, the results are usually anyway similar. The main difference=
being the "true" probability of the models which will be higher =
for the pseudo-absence are downweighted. however, when they are transformed=
between 0 and 1, results are usually very
similar.</div>
<div>I think Witz and Guisan recently show that using weighted pseudo-absen=
ce was better. We also have a paper close to be accepted with Methods in Ec=
ology and Evolution showing the same with virtual datasets. </div>
<div><br>
</div>
<div>Hope it helps,</div>
<div><br>
</div>
<div>Wilfried</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<br>
<blockquote type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-=
collapse: separate; font-family: Helvetica; font-style: normal; font-varian=
t: normal; font-weight: normal; letter-spacing: normal; line-height: normal=
; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; font-size: medium;">
<div>
<div style=3D"direction: ltr; font-family: Helvetica; color: rgb(0, 0, 0); =
font-size: 10pt;">
<br>
I hope this isn't too confusing!<span class=3D"Apple-converted-space"> =
;</span><br>
Thank you!<br>
Brenna<br>
<br>
<br>
<div style=3D"font-family: 'Times New Roman'; color: rgb(0, 0, 0); font-siz=
e: 16px;">
<hr tabindex=3D"-1">
<div id=3D"divRpF114927" style=3D"direction: ltr;"><font color=3D"#000000" =
face=3D"Tahoma" size=3D"2"><b>From:</b><span class=3D"Apple-converted-space=
"> </span>Bruno Lafourcade [brunolafourcade at aol.com]<br>
<b>Sent:</b><span class=3D"Apple-converted-space"> </span>Thursday, Ap=
ril 21, 2011 11:37 PM<br>
<b>To:</b><span class=3D"Apple-converted-space"> </span><a href=3D"mai=
lto:wilfried.thuiller at ujf-grenoble.fr" target=3D"_blank">wilfried.thuiller@=
ujf-grenoble.fr</a>; Brenna Forester<br>
<b>Cc:</b><span class=3D"Apple-converted-space"> </span><a href=3D"mai=
lto:biomod-commits at r-forge.wu-wien.ac.at" target=3D"_blank">biomod-commits@=
r-forge.wu-wien.ac.at</a><br>
<b>Subject:</b><span class=3D"Apple-converted-space"> </span>Re : [Bio=
mod-commits] prevalence and pseudoabsences<br>
</font><br>
</div>
<div></div>
<div><font color=3D"black" face=3D"arial" size=3D"2"><font color=3D"black" =
face=3D"arial" size=3D"2">
<div><br>
</div>
<div><font face=3D"Arial, Helvetica, sans-serif">Hi Brenna,<span class=3D"A=
pple-converted-space"> </span><br>
<br>
The pseudo-absence procedure within the Models function is automated and ge=
nerates a<br>
weighting to give a prevalence of 0.5 for each run.<br>
<br>
To make sure that the prevalence doesn't change, you have to build your own=
pseudo-absence<br>
data outside of the Models function (even prior to Initial.State). In that =
way, the Models function<br>
will not recognize your data as being pseudo.abs and will not weight them, =
just like for any<span class=3D"Apple-converted-space"> </span><br>
standard input data.<br>
<br>
Use the pseudo.abs() function to this matter. Don't hesitate to ask for det=
ails on how to use it.<br>
<br>
Best,<br>
Bruno<span class=3D"Apple-converted-space"> </span><br>
<br>
<br>
</font></div>
<div style=3D"clear: both;">-------<br>
Bruno Lafourcade<br>
Statistical tools engineer<br>
<br>
Laboratoire d'Ecologie Alpine, bureau 308<br>
CNRS - UMR 5553, 2233 rue de la piscine<br>
38400 Saint Martin d'H=E8res<br>
-------</div>
<div><br>
</div>
<div><br>
</div>
<div style=3D"font-family: arial,helvetica; font-size: 10pt; color: black;"=
>-----E-mail d'origine-----<br>
De : Wilfried Thuiller <<a href=3D"mailto:wilfried.thuiller at ujf-grenoble=
.fr" target=3D"_blank">wilfried.thuiller at ujf-grenoble.fr</a>><br>
A : Brenna Forester <<a href=3D"mailto:forestb at students.wwu.edu" target=
=3D"_blank">forestb at students.wwu.edu</a>><br>
Cc : <a href=3D"mailto:biomod-commits at lists.r-forge.r-project.org" target=
=3D"_blank">
biomod-commits at lists.r-forge.r-project.org</a> <<a href=3D"mailto:biomod=
-commits at r-forge.wu-wien.ac.at" target=3D"_blank">biomod-commits at r-forge.wu=
-wien.ac.at</a>><br>
Envoy=E9 le : Vendredi, 22 Avril 2011 7:09<br>
Sujet : Re: [Biomod-commits] prevalence and pseudoabsences<br>
<br>
<div id=3D"AOLMsgPart_2_edb92e8f-d92e-4871-b43e-ec9efd37ba90">
<div>Dear Brenna,</div>
<div><br>
</div>
<div>Yes and no... </div>
<div><br>
</div>
<div>If you do not ask for pseudo-absence (NbPA=3D0), there is no weigthing=
and all your pseudo-absence will be used at once. Prevalence =3D 0.1</div>
<div>If you add NbPA =3D 3040 (or more), yes, there is. The prevalence =3D =
0.5</div>
<div><br>
</div>
<div>Does it help?</div>
<div>Wilfried</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>Le 22 avr. 2011 =E0 00:53, Brenna Forester a =E9crit :</div>
<br class=3D"Apple-interchange-newline">
<blockquote type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-=
collapse: separate; font-family: Helvetica; font-style: normal; font-varian=
t: normal; font-weight: normal; letter-spacing: normal; line-height: normal=
; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; font-size: medium;">
<div>
<div style=3D"direction: ltr; font-family: Helvetica; color: rgb(0, 0, 0); =
font-size: 10pt;">
Hello,<br>
<br>
I see in the "Presentation Manual for BIOMOD" (page 18) the follo=
wing statement: "In all procedures, BIOMOD ensures that the prevalence=
of the original data is conserved in the calibration and evaluation datase=
ts."<br>
<br>
I have 304 presence records and am running my pseudoabsence pulls with 3040=
absences (a prevalence of 0.1). The number of pixels in my study are=
a is 6808.<br>
<br>
More information about the Biomod-commits
mailing list