[Biomod-commits] Fwd: GAM implementation not properly handling rasterstack()?

Kyle Taylor kyle.a.taylor at gmail.com
Fri Jan 25 18:04:17 CET 2013


Hey Damien,

Thanks for the feedback.  Biomod/GAM didn't like that I was trying to  
specify (family = binomial(link = "logit")) in GAMs options.  That's the  
wrong way to specify the family type for GAM using Biomod2.  The proper  
way is not discussed in biomod2's documentation, however.  Thankfully,  
GAM's implementation in Biomod2 apparently defaults to a logistic link  
function.  So formally declaring it at runtime is unnecessary.

After removing the offending line and re-running, GAM doesn't report any  
errors.  Sadly, the model also returns NA, all of its ROC/TSS evaluations  
fail, and its output becomes useless.  The only models I can get to work  
with presence-only (random pseudo-absence) runs are GBM, RF, and CTA.  The  
values below for variable importance demonstrate... I used all WORLDCLIM  
data available, and specified 6000 random background points.  The output  
doesn't make sense and is totally different from when I run Maxent.jar  
using the dismo package or straight Java.  I'm going to play around with  
the default options for maxent used in biomod2 to see how different they  
are from the default options in straight maxent to see what could be  
causing the discrepancy.  Then its off to seeing what went wrong with MARS.

Kyle

--

  * ROC scores for all component model runs :
0.898 0.692 0.966 0.978 0.918 0.892 0.687 0.956 0.995 0.931 0.886 0.681  
0.96 0.966 0.924 0.887 0.682 0.958 0.995 0.947

  * variable importance (RUN1, PA1):

         MAXENT MARS   GBM    RF   CTA
bio01    0.776    1 0.001 0.006 0.024
bio02    0.000    0 0.001 0.006 0.024
bio03    0.000    0 0.165 0.007 0.438
bio04    0.000    0 0.001 0.006 0.031
bio05    0.000    0 0.003 0.006 0.250
bio06    0.000    0 0.001 0.006 0.024
bio07    0.000    0 0.001 0.006 0.024
bio08    0.000    0 0.001 0.007 0.023
bio09    0.000    0 0.007 0.006 0.024
bio10    0.000    0 0.001 0.006 0.024
bio11    0.000    0 0.001 0.006 0.024
bio12    0.000    0 0.001 0.006 0.024
bio13    0.000    0 0.001 0.026 0.024
bio14    0.000    0 0.001 0.006 0.024
bio15    0.000    0 0.001 0.006 0.024
bio16    0.000    0 0.001 0.006 0.024
bio17    0.000    0 0.001 0.006 0.024
bio18    0.000    0 0.001 0.006 0.024
bio19    0.000    0 0.001 0.006 0.024
prec01   0.000    0 0.001 0.006 0.024
prec02   0.000    0 0.001 0.006 0.024
prec03   0.000    0 0.001 0.006 0.026
prec04   0.000    0 0.003 0.008 0.024
prec05   0.000    0 0.001 0.006 0.024
prec06   0.000    0 0.001 0.007 0.018
prec07   0.000    0 0.001 0.006 0.024
prec08   0.000    0 0.001 0.006 0.024
prec09   0.000    0 0.001 0.006 0.024
prec10   0.000    0 0.001 0.006 0.024
prec11   0.000    0 0.001 0.006 0.024
prec12   0.000    0 0.001 0.006 0.024
tmax01   0.000    0 0.006 0.008 0.263
tmax02   0.000    0 0.021 0.006 0.055
tmax03   0.074    0 0.281 0.006 0.606
tmax04   0.000    0 0.004 0.006 0.025
tmax05   0.000    0 0.001 0.006 0.025
tmax06   0.000    0 0.001 0.006 0.024
tmax07   0.000    0 0.001 0.006 0.024
tmax08   0.000    0 0.001 0.006 0.024
tmax09   0.000    0 0.001 0.006 0.024
tmax10   0.000    0 0.002 0.006 0.024
tmax11   0.000    0 0.001 0.006 0.024
tmax12   0.000    0 0.001 0.006 0.024
tmean01  0.000    0 0.001 0.006 0.024
tmean02  0.000    0 0.001 0.006 0.024
tmean03  0.000    0 0.007 0.006 0.049
tmean04  0.000    0 0.001 0.006 0.024
tmean05  0.000    0 0.001 0.006 0.024
tmean06  0.000    0 0.001 0.006 0.024
tmean07  0.000    0 0.001 0.006 0.024
tmean08  0.000    0 0.001 0.006 0.024
tmean09  0.000    0 0.001 0.006 0.024
tmean10  0.000    0 0.001 0.006 0.024
tmean11  0.000    0 0.001 0.006 0.024
tmean12  0.000    0 0.001 0.006 0.024
tmin01   0.000    0 0.001 0.006 0.024
tmin02   0.000    0 0.001 0.006 0.024
tmin03   0.000    0 0.001 0.006 0.024
tmin04   0.000    0 0.001 0.006 0.024
tmin05   0.000    0 0.001 0.006 0.024
tmin06   0.000    0 0.001 0.006 0.029
tmin07   0.000    0 0.001 0.006 0.024
tmin08   0.000    0 0.001 0.006 0.024
tmin09   0.000    0 0.001 0.006 0.024
tmin10   0.000    0 0.001 0.006 0.024
tmin11   0.000    0 0.001 0.006 0.024
tmin12   0.000    0 0.001 0.006 0.024


On Thu, 24 Jan 2013 12:30:03 -0700, Damien Georges  
<damien.georges2 at gmail.com> wrote:

> Hy Kyle,
>
> This error means that you one of your variables hasn't enough unique  
> value
> to construct your GAM.
> Maybe you can try to play with the "k" value of GAM option and set it to  
> a
> lower value (e.g. 2 or 3). (have a look at choose.k help file of mgcv
> package) Or you can try to select more pseudo absences at formating  
> step. A
> last option should be to compute gam from gam package instead of mgcv  
> one.
>
> Hope that helps,
>
> Best,
>
> Damien.
>
>
>
> 2013/1/24 Kyle Taylor <kyle.a.taylor at gmail.com>
>
>> Hey all,
>>
>> I'm trying to use biomod2 to produce an ensemble of 6 submodels for some
>> presence-only data for my species of interest.  The submodels consist of
>> Maxent, GAM, MARS, Boosted Regression Trees, RandomForest, and
>> Classification Trees.  The explanatory variables I'm using to train  
>> these
>> models are fed-in as a rasterstack().
>>
>> To start with, I'm just trying to use simple climate variables from
>> WORLDCLIM (tmin, tmax, tmean, etc...) to train with.  For the sake of
>> argument, I'm pulling my pseudo-absence records randomly from the (very
>> large... all of N. America) background extent of my input rasters.  All  
>> of
>> my submodels work fine using my presence records and my rasterstack of
>> WORLDCLIM data, except for GAM.
>>
>> GAM reports "A term has fewer unique covariate combinations than  
>> specified
>> maximum degrees of freedom".
>>
>>    -
>>
>> I know the rasters are valid.  I've used them independently with  
>> Maxent.jar
>> to produce models.  And to top it off, the other submodels (Maxent,  
>> MARS,
>> BRT, RF, CTA) don't appear to throw any errors when using these same  
>> data.
>>  Anyone have any ideas?  Am I doing something egregiously wrong with my  
>> GAM
>> implementation?  GAM's error message is kind-of cryptic.  I'm assuming  
>> that
>> it can't fit a logistic model to my input data, because it simply isn't
>> "seeing" it.
>>
>> Thanks!
>>
>> Kyle
>>
>> ------ [code snippet]
>> ------------------------------------------------------------
>>
>> # initialize biomod dataset
>>
>> # presenceShapefile is of type 'SpatialPoints', predictors is of type
>> RasterStack
>>
>> myBiomodData <- BIOMOD_FormatingData(resp.var  = presenceShapefile,
>>                                      expl.var  = predictors,
>>                                      resp.name = "some.species",
>>                                      PA.nb.rep = 1,
>>                                      PA.nb.absences = 100,
>>                                      PA.strategy = 'random',
>>                                      na.rm = T)
>> #
>> # set any customized options for submodels we may need
>> #
>>
>>     if(usingMAXENT) {
>>       maxentOptions <- list(path_to_maxent.jar =
>> "/home/kyle/R/x86_64-redhat-linux-gnu-library/2.15/dismo/java/",
>>                            maximumiterations = 200,
>>                            visible = FALSE)
>>     } else { maxentOptions <- NULL }
>>
>>     if(usingGAM) {
>>       gamOptions <- list(family = binomial(link = "logit"))
>>     } else { gamOptions <- NULL }
>>
>>     myBiomodOption <- BIOMOD_ModelingOptions(MAXENT = maxentOptions,
>>                                              GAM    = gamOptions)
>>
>>     save.image("this.rdata") # save a workspace snapshot to demonstrate  
>> the
>> objects we are passing into
>>                              # BIOMOD_Modeling, in-case it craps-out.
>>
>> #
>> # train our models
>> #
>>
>> myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
>>                                      models = submodels,
>>                                      models.options = myBiomodOption,
>>                                      NbRunEval=1,
>>                                      DataSplit=80,
>>                                      Yweights=NULL,
>>                                      VarImport=3,
>>                                      models.eval.meth = c('TSS','ROC'),
>>                                      SaveObj = TRUE,
>>                                      rescal.all.models = TRUE)
>>
>> ------ [end code snippet]
>> ------------------------------------------------------------
>>
>> ------ [start biomod2 output
>> ]--------------------------------------------------------
>> -=-=-=-=-=-=-=-=-=-=-=-=-= some.species Data Formating
>> -=-=-=-=-=-=-=-=-=-=-=-=-=
>>
>>       ! Response variable is considered as a only presences one... Is it
>> really what you want?
>>       ! No data has been set aside for modeling evaluation
>>    > Pseudo Absences Selection checkings...
>>    > random pseudo absences selection
>>    > Pseudo absences are selected in explanatory variables
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>
>>
>> Loading required library...
>> Attaching package: ‘MASS’
>>
>> The following object(s) are masked from ‘package:raster’:
>>
>>     area, select
>>
>> Loading required package: splines
>> Loaded gbm 1.6.3.2
>>
>> randomForest 4.6-7
>> Type rfNews() to see new features/changes/bug fixes.
>> This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
>>
>>
>> Checking Models arguments...
>>
>> Creating suitable Workdir...
>>
>>                         ! Weights where defined to rise a 0.5  
>> prevalence !
>>
>> -=-=-=-  some.species  Modeling Summary
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> 15  environmental variables ( tmax10 tmax11 tmax12 tmean01 tmean02  
>> tmean03
>> tmean04 tmean05 tmean06 tmean07 tmean08 tmean09 tmean10 tmean11 tmean12  
>> )
>> Number of evaluation repetitions : 2
>> Models selected : MAXENT MARS GAM GBM RF CTA
>> Total number of model runs : 12
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>
>>
>> -=-=-=- Run :  some.species_PA1
>>
>>
>> -=-=-=--=-=-=- some.species_PA1_RUN1
>>
>> Model=MAXENT
>>         Creating Maxent Temp Proj Data..
>>  Runing Maxent...
>>
>>  Getting predictions...
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Multiple Adaptive Regression Splines
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=GAM
>>          GAM_mgcv algorithm chosen
>>         User defined control args building..
>>         Automatic formula generation...
>>         > GAM (mgcv) modelling...Error in
>> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>>   A term has fewer unique covariate combinations than specified maximum
>> degrees of freedom
>> In addition: Warning messages:
>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> 2: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>>
>>    ! Note :  some.species_PA1_RUN1_GAM failed!
>>
>> Model=Generalised Boosting Regression
>>          500 maximum different trees and  5  Fold Cross-Validation
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Breiman and Cutler's random forests for classification and  
>> regression
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Classification tree
>>          5 Fold Cross-Validation
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>>
>> -=-=-=--=-=-=- some.species_PA1_Full
>>
>> Model=MAXENT
>>         Creating Maxent Temp Proj Data..
>>  Runing Maxent...
>>
>>
>>
>>  Getting predictions...
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Multiple Adaptive Regression Splines
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=GAM
>>          GAM_mgcv algorithm chosen
>>         User defined control args building..
>>         Automatic formula generation...
>>         > GAM (mgcv) modelling...Error in
>> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>>   A term has fewer unique covariate combinations than specified maximum
>> degrees of freedom
>> In addition: Warning messages:
>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> 2: glm.fit: algorithm did not converge
>> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> 4: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> 5: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>>
>>    ! Note :  some.species_PA1_Full_GAM failed!
>>
>> Model=Generalised Boosting Regression
>>          500 maximum different trees and  5  Fold Cross-Validation
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Breiman and Cutler's random forests for classification and  
>> regression
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>> Model=Classification tree
>>          5 Fold Cross-Validation
>>         Evaluating Model stuff...
>>         Evaluating Predictor Contributions...
>>
>>         Removing Maxent Temp Data..
>> -=-=-=- Done
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Warning messages:
>> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> 2: glm.fit: algorithm did not converge
>> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
>> ------ [end biomod2
>> output]-----------------------------------------------------------
>>
>> ------ [start R session to see what went
>> wrong]---------------------------------------
>>
>> > load("this.rdata")
>> > ls()
>> >
>>  [1] "args"                       "biomod2Avail"
>>  [3] "downsampleCoordData_kde"    "downsampleCoordData_simple"
>>  [5] "gamOptions"                 "gdalAvail"
>>  [7] "geoTiffToDataframe"         "hdf4ToDataframe"
>>  [9] "i"                          "j"
>> [11] "maptoolsAvail"              "maskRastersByVector"
>> [13] "maxentOptions"              "myBiomodData"
>> [15] "myBiomodOption"             "predictors"
>> [17] "presenceShapefile"          "printUsage"
>> [19] "submodels"                  "usingBRT"
>> [21] "usingCTA"                   "usingGAM"
>> [23] "usingGLM"                   "usingMARS"
>> [25] "usingMAXENT"                "usingRF"
>>
>> > summary(predictors)
>> >
>>            tmax10   tmax11    tmax12   tmean01   tmean02   tmean03
>> tmean04
>> Min.    -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
>> -3.40e+38
>> 1st Qu. -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
>> -3.40e+38
>> Median   0.00e+00 -2.8e+01 -1.05e+02 -1.92e+02 -1.58e+02 -9.70e+01
>> -5.00e+00
>> 3rd Qu.  5.90e+01  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
>>  0.00e+00
>> Max.     3.62e+02  3.6e+02  3.58e+02  2.83e+02  2.88e+02  2.94e+02
>>  3.11e+02
>> NA's     0.00e+00  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
>>  0.00e+00
>>           tmean05   tmean06   tmean07   tmean08  tmean09   tmean10
>> tmean11
>> Min.    -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
>> -3.40e+38
>> 1st Qu. -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
>> -3.40e+38
>> Median   0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00
>> -6.70e+01
>> 3rd Qu.  6.20e+01  1.19e+02  1.47e+02  1.32e+02  7.9e+01  1.90e+01
>>  0.00e+00
>> Max.     3.24e+02  3.29e+02  3.58e+02  3.48e+02  3.2e+02  2.98e+02
>>  2.89e+02
>> NA's     0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00
>>  0.00e+00
>>           tmean12
>> Min.    -3.40e+38
>> 1st Qu. -3.40e+38
>> Median  -1.55e+02
>> 3rd Qu.  0.00e+00
>> Max.     2.85e+02
>> NA's     0.00e+00
>> Warning message:
>> In .local(object, ...) :
>>   summary is an estimate based on a sample of 1e+05 cells (0.11% of all
>> cells)
>> _______________________________________________
>> Biomod-commits mailing list
>> Biomod-commits at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits


More information about the Biomod-commits mailing list