[Biomod-commits] Fwd: GAM implementation not properly handling rasterstack()?

Damien Georges damien.georges2 at gmail.com
Mon Feb 4 15:37:04 CET 2013


Hi  Kyle,

You are right, the default formula for gam's are binomial(link = 
"logit")), I'll try to find a way to make this parameter more flexible.

The differences you observed between biomod2 and dismo MAXENT models 
comes from the background given to maxent models (only cells of Pseudo 
Absences selection in biomod against all available cells in dismo). If 
you have set rescal.all.models to TRUE, the scaling glm built to rescal 
maxent predictions can also drive to some differences between the 2 
packages outputs.

I realised that I had remove MARS rescaling step in the last version of 
biomod2 (2.0.3).. That leads to strange prediction.. It is now corrected 
(biomod2 2.0.4) and I guess that predictions will be much more realistic.

Concerning the NA's in evaluation stuff, I'm working on a way to deal 
with it.. will be corrected soon.

Just as a comment, I saw in the example you gave that you are working 
with all worldclim variables and 4*12 other variables... so you are 
working with 87 variables... I guess some of this variables are strongly 
correlated could lead to strange results. I guess you need to make a 
selection to keep only a subset of this variables.

Best,

Damien.

On 25/01/2013 18:04, Kyle Taylor wrote:
> Hey Damien,
>
> Thanks for the feedback.  Biomod/GAM didn't like that I was trying to 
> specify (family = binomial(link = "logit")) in GAMs options. That's 
> the wrong way to specify the family type for GAM using Biomod2.  The 
> proper way is not discussed in biomod2's documentation, however.  
> Thankfully, GAM's implementation in Biomod2 apparently defaults to a 
> logistic link function.  So formally declaring it at runtime is 
> unnecessary.
>
> After removing the offending line and re-running, GAM doesn't report 
> any errors.  Sadly, the model also returns NA, all of its ROC/TSS 
> evaluations fail, and its output becomes useless.  The only models I 
> can get to work with presence-only (random pseudo-absence) runs are 
> GBM, RF, and CTA.  The values below for variable importance 
> demonstrate... I used all WORLDCLIM data available, and specified 6000 
> random background points.  The output doesn't make sense and is 
> totally different from when I run Maxent.jar using the dismo package 
> or straight Java.  I'm going to play around with the default options 
> for maxent used in biomod2 to see how different they are from the 
> default options in straight maxent to see what could be causing the 
> discrepancy.  Then its off to seeing what went wrong with MARS.
>
> Kyle
>
> -- 
>
>  * ROC scores for all component model runs :
> 0.898 0.692 0.966 0.978 0.918 0.892 0.687 0.956 0.995 0.931 0.886 
> 0.681 0.96 0.966 0.924 0.887 0.682 0.958 0.995 0.947
>
>  * variable importance (RUN1, PA1):
>
>         MAXENT MARS   GBM    RF   CTA
> bio01    0.776    1 0.001 0.006 0.024
> bio02    0.000    0 0.001 0.006 0.024
> bio03    0.000    0 0.165 0.007 0.438
> bio04    0.000    0 0.001 0.006 0.031
> bio05    0.000    0 0.003 0.006 0.250
> bio06    0.000    0 0.001 0.006 0.024
> bio07    0.000    0 0.001 0.006 0.024
> bio08    0.000    0 0.001 0.007 0.023
> bio09    0.000    0 0.007 0.006 0.024
> bio10    0.000    0 0.001 0.006 0.024
> bio11    0.000    0 0.001 0.006 0.024
> bio12    0.000    0 0.001 0.006 0.024
> bio13    0.000    0 0.001 0.026 0.024
> bio14    0.000    0 0.001 0.006 0.024
> bio15    0.000    0 0.001 0.006 0.024
> bio16    0.000    0 0.001 0.006 0.024
> bio17    0.000    0 0.001 0.006 0.024
> bio18    0.000    0 0.001 0.006 0.024
> bio19    0.000    0 0.001 0.006 0.024
> prec01   0.000    0 0.001 0.006 0.024
> prec02   0.000    0 0.001 0.006 0.024
> prec03   0.000    0 0.001 0.006 0.026
> prec04   0.000    0 0.003 0.008 0.024
> prec05   0.000    0 0.001 0.006 0.024
> prec06   0.000    0 0.001 0.007 0.018
> prec07   0.000    0 0.001 0.006 0.024
> prec08   0.000    0 0.001 0.006 0.024
> prec09   0.000    0 0.001 0.006 0.024
> prec10   0.000    0 0.001 0.006 0.024
> prec11   0.000    0 0.001 0.006 0.024
> prec12   0.000    0 0.001 0.006 0.024
> tmax01   0.000    0 0.006 0.008 0.263
> tmax02   0.000    0 0.021 0.006 0.055
> tmax03   0.074    0 0.281 0.006 0.606
> tmax04   0.000    0 0.004 0.006 0.025
> tmax05   0.000    0 0.001 0.006 0.025
> tmax06   0.000    0 0.001 0.006 0.024
> tmax07   0.000    0 0.001 0.006 0.024
> tmax08   0.000    0 0.001 0.006 0.024
> tmax09   0.000    0 0.001 0.006 0.024
> tmax10   0.000    0 0.002 0.006 0.024
> tmax11   0.000    0 0.001 0.006 0.024
> tmax12   0.000    0 0.001 0.006 0.024
> tmean01  0.000    0 0.001 0.006 0.024
> tmean02  0.000    0 0.001 0.006 0.024
> tmean03  0.000    0 0.007 0.006 0.049
> tmean04  0.000    0 0.001 0.006 0.024
> tmean05  0.000    0 0.001 0.006 0.024
> tmean06  0.000    0 0.001 0.006 0.024
> tmean07  0.000    0 0.001 0.006 0.024
> tmean08  0.000    0 0.001 0.006 0.024
> tmean09  0.000    0 0.001 0.006 0.024
> tmean10  0.000    0 0.001 0.006 0.024
> tmean11  0.000    0 0.001 0.006 0.024
> tmean12  0.000    0 0.001 0.006 0.024
> tmin01   0.000    0 0.001 0.006 0.024
> tmin02   0.000    0 0.001 0.006 0.024
> tmin03   0.000    0 0.001 0.006 0.024
> tmin04   0.000    0 0.001 0.006 0.024
> tmin05   0.000    0 0.001 0.006 0.024
> tmin06   0.000    0 0.001 0.006 0.029
> tmin07   0.000    0 0.001 0.006 0.024
> tmin08   0.000    0 0.001 0.006 0.024
> tmin09   0.000    0 0.001 0.006 0.024
> tmin10   0.000    0 0.001 0.006 0.024
> tmin11   0.000    0 0.001 0.006 0.024
> tmin12   0.000    0 0.001 0.006 0.024
>
>
> On Thu, 24 Jan 2013 12:30:03 -0700, Damien Georges 
> <damien.georges2 at gmail.com> wrote:
>
>> Hy Kyle,
>>
>> This error means that you one of your variables hasn't enough unique 
>> value
>> to construct your GAM.
>> Maybe you can try to play with the "k" value of GAM option and set it 
>> to a
>> lower value (e.g. 2 or 3). (have a look at choose.k help file of mgcv
>> package) Or you can try to select more pseudo absences at formating 
>> step. A
>> last option should be to compute gam from gam package instead of mgcv 
>> one.
>>
>> Hope that helps,
>>
>> Best,
>>
>> Damien.
>>
>>
>>
>> 2013/1/24 Kyle Taylor <kyle.a.taylor at gmail.com>
>>
>>> Hey all,
>>>
>>> I'm trying to use biomod2 to produce an ensemble of 6 submodels for 
>>> some
>>> presence-only data for my species of interest.  The submodels 
>>> consist of
>>> Maxent, GAM, MARS, Boosted Regression Trees, RandomForest, and
>>> Classification Trees.  The explanatory variables I'm using to train 
>>> these
>>> models are fed-in as a rasterstack().
>>>
>>> To start with, I'm just trying to use simple climate variables from
>>> WORLDCLIM (tmin, tmax, tmean, etc...) to train with.  For the sake of
>>> argument, I'm pulling my pseudo-absence records randomly from the (very
>>> large... all of N. America) background extent of my input rasters.  
>>> All of
>>> my submodels work fine using my presence records and my rasterstack of
>>> WORLDCLIM data, except for GAM.
>>>
>>> GAM reports "A term has fewer unique covariate combinations than 
>>> specified
>>> maximum degrees of freedom".
>>>
>>>    -
>>>
>>> I know the rasters are valid.  I've used them independently with 
>>> Maxent.jar
>>> to produce models.  And to top it off, the other submodels (Maxent, 
>>> MARS,
>>> BRT, RF, CTA) don't appear to throw any errors when using these same 
>>> data.
>>>  Anyone have any ideas?  Am I doing something egregiously wrong with 
>>> my GAM
>>> implementation?  GAM's error message is kind-of cryptic.  I'm 
>>> assuming that
>>> it can't fit a logistic model to my input data, because it simply isn't
>>> "seeing" it.
>>>
>>> Thanks!
>>>
>>> Kyle
>>>
>>> ------ [code snippet]
>>> ------------------------------------------------------------
>>>
>>> # initialize biomod dataset
>>>
>>> # presenceShapefile is of type 'SpatialPoints', predictors is of type
>>> RasterStack
>>>
>>> myBiomodData <- BIOMOD_FormatingData(resp.var  = presenceShapefile,
>>>                                      expl.var  = predictors,
>>>                                      resp.name = "some.species",
>>>                                      PA.nb.rep = 1,
>>>                                      PA.nb.absences = 100,
>>>                                      PA.strategy = 'random',
>>>                                      na.rm = T)
>>> #
>>> # set any customized options for submodels we may need
>>> #
>>>
>>>     if(usingMAXENT) {
>>>       maxentOptions <- list(path_to_maxent.jar =
>>> "/home/kyle/R/x86_64-redhat-linux-gnu-library/2.15/dismo/java/",
>>>                            maximumiterations = 200,
>>>                            visible = FALSE)
>>>     } else { maxentOptions <- NULL }
>>>
>>>     if(usingGAM) {
>>>       gamOptions <- list(family = binomial(link = "logit"))
>>>     } else { gamOptions <- NULL }
>>>
>>>     myBiomodOption <- BIOMOD_ModelingOptions(MAXENT = maxentOptions,
>>>                                              GAM    = gamOptions)
>>>
>>>     save.image("this.rdata") # save a workspace snapshot to 
>>> demonstrate the
>>> objects we are passing into
>>>                              # BIOMOD_Modeling, in-case it craps-out.
>>>
>>> #
>>> # train our models
>>> #
>>>
>>> myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
>>>                                      models = submodels,
>>>                                      models.options = myBiomodOption,
>>>                                      NbRunEval=1,
>>>                                      DataSplit=80,
>>>                                      Yweights=NULL,
>>>                                      VarImport=3,
>>>                                      models.eval.meth = c('TSS','ROC'),
>>>                                      SaveObj = TRUE,
>>>                                      rescal.all.models = TRUE)
>>>
>>> ------ [end code snippet]
>>> ------------------------------------------------------------
>>>
>>> ------ [start biomod2 output
>>> ]--------------------------------------------------------
>>> -=-=-=-=-=-=-=-=-=-=-=-=-= some.species Data Formating
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=
>>>
>>>       ! Response variable is considered as a only presences one... 
>>> Is it
>>> really what you want?
>>>       ! No data has been set aside for modeling evaluation
>>>    > Pseudo Absences Selection checkings...
>>>    > random pseudo absences selection
>>>    > Pseudo absences are selected in explanatory variables
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>>
>>>
>>> Loading required library...
>>> Attaching package: ‘MASS’
>>>
>>> The following object(s) are masked from ‘package:raster’:
>>>
>>>     area, select
>>>
>>> Loading required package: splines
>>> Loaded gbm 1.6.3.2
>>>
>>> randomForest 4.6-7
>>> Type rfNews() to see new features/changes/bug fixes.
>>> This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
>>>
>>>
>>> Checking Models arguments...
>>>
>>> Creating suitable Workdir...
>>>
>>>                         ! Weights where defined to rise a 0.5 
>>> prevalence !
>>>
>>> -=-=-=-  some.species  Modeling Summary
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> 15  environmental variables ( tmax10 tmax11 tmax12 tmean01 tmean02 
>>> tmean03
>>> tmean04 tmean05 tmean06 tmean07 tmean08 tmean09 tmean10 tmean11 
>>> tmean12 )
>>> Number of evaluation repetitions : 2
>>> Models selected : MAXENT MARS GAM GBM RF CTA
>>> Total number of model runs : 12
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 
>>>
>>>
>>>
>>> -=-=-=- Run :  some.species_PA1
>>>
>>>
>>> -=-=-=--=-=-=- some.species_PA1_RUN1
>>>
>>> Model=MAXENT
>>>         Creating Maxent Temp Proj Data..
>>>  Runing Maxent...
>>>
>>>  Getting predictions...
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Multiple Adaptive Regression Splines
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=GAM
>>>          GAM_mgcv algorithm chosen
>>>         User defined control args building..
>>>         Automatic formula generation...
>>>         > GAM (mgcv) modelling...Error in
>>> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>>>   A term has fewer unique covariate combinations than specified maximum
>>> degrees of freedom
>>> In addition: Warning messages:
>>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 2: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>>>
>>>    ! Note :  some.species_PA1_RUN1_GAM failed!
>>>
>>> Model=Generalised Boosting Regression
>>>          500 maximum different trees and  5  Fold Cross-Validation
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Breiman and Cutler's random forests for classification and 
>>> regression
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Classification tree
>>>          5 Fold Cross-Validation
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>>
>>> -=-=-=--=-=-=- some.species_PA1_Full
>>>
>>> Model=MAXENT
>>>         Creating Maxent Temp Proj Data..
>>>  Runing Maxent...
>>>
>>>
>>>
>>>  Getting predictions...
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Multiple Adaptive Regression Splines
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=GAM
>>>          GAM_mgcv algorithm chosen
>>>         User defined control args building..
>>>         Automatic formula generation...
>>>         > GAM (mgcv) modelling...Error in
>>> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>>>   A term has fewer unique covariate combinations than specified maximum
>>> degrees of freedom
>>> In addition: Warning messages:
>>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 2: glm.fit: algorithm did not converge
>>> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 4: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 5: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>>>
>>>    ! Note :  some.species_PA1_Full_GAM failed!
>>>
>>> Model=Generalised Boosting Regression
>>>          500 maximum different trees and  5  Fold Cross-Validation
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Breiman and Cutler's random forests for classification and 
>>> regression
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>> Model=Classification tree
>>>          5 Fold Cross-Validation
>>>         Evaluating Model stuff...
>>>         Evaluating Predictor Contributions...
>>>
>>>         Removing Maxent Temp Data..
>>> -=-=-=- Done
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 
>>>
>>> Warning messages:
>>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> 2: glm.fit: algorithm did not converge
>>> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> ------ [end biomod2
>>> output]-----------------------------------------------------------
>>>
>>> ------ [start R session to see what went
>>> wrong]---------------------------------------
>>>
>>> > load("this.rdata")
>>> > ls()
>>> >
>>>  [1] "args"                       "biomod2Avail"
>>>  [3] "downsampleCoordData_kde"    "downsampleCoordData_simple"
>>>  [5] "gamOptions"                 "gdalAvail"
>>>  [7] "geoTiffToDataframe"         "hdf4ToDataframe"
>>>  [9] "i"                          "j"
>>> [11] "maptoolsAvail"              "maskRastersByVector"
>>> [13] "maxentOptions"              "myBiomodData"
>>> [15] "myBiomodOption"             "predictors"
>>> [17] "presenceShapefile"          "printUsage"
>>> [19] "submodels"                  "usingBRT"
>>> [21] "usingCTA"                   "usingGAM"
>>> [23] "usingGLM"                   "usingMARS"
>>> [25] "usingMAXENT"                "usingRF"
>>>
>>> > summary(predictors)
>>> >
>>>            tmax10   tmax11    tmax12   tmean01   tmean02 tmean03
>>> tmean04
>>> Min.    -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
>>> -3.40e+38
>>> 1st Qu. -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
>>> -3.40e+38
>>> Median   0.00e+00 -2.8e+01 -1.05e+02 -1.92e+02 -1.58e+02 -9.70e+01
>>> -5.00e+00
>>> 3rd Qu.  5.90e+01  0.0e+00  0.00e+00  0.00e+00  0.00e+00 0.00e+00
>>>  0.00e+00
>>> Max.     3.62e+02  3.6e+02  3.58e+02  2.83e+02  2.88e+02 2.94e+02
>>>  3.11e+02
>>> NA's     0.00e+00  0.0e+00  0.00e+00  0.00e+00  0.00e+00 0.00e+00
>>>  0.00e+00
>>>           tmean05   tmean06   tmean07   tmean08  tmean09 tmean10
>>> tmean11
>>> Min.    -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
>>> -3.40e+38
>>> 1st Qu. -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
>>> -3.40e+38
>>> Median   0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00 0.00e+00
>>> -6.70e+01
>>> 3rd Qu.  6.20e+01  1.19e+02  1.47e+02  1.32e+02  7.9e+01 1.90e+01
>>>  0.00e+00
>>> Max.     3.24e+02  3.29e+02  3.58e+02  3.48e+02  3.2e+02 2.98e+02
>>>  2.89e+02
>>> NA's     0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00 0.00e+00
>>>  0.00e+00
>>>           tmean12
>>> Min.    -3.40e+38
>>> 1st Qu. -3.40e+38
>>> Median  -1.55e+02
>>> 3rd Qu.  0.00e+00
>>> Max.     2.85e+02
>>> NA's     0.00e+00
>>> Warning message:
>>> In .local(object, ...) :
>>>   summary is an estimate based on a sample of 1e+05 cells (0.11% of all
>>> cells)
>>> _______________________________________________
>>> Biomod-commits mailing list
>>> Biomod-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits 
>>>
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits



More information about the Biomod-commits mailing list