[Biomod-commits] Fwd: GAM implementation not properly handling rasterstack()?

Kyle Taylor kyle.a.taylor at gmail.com
Thu Jan 24 20:10:51 CET 2013


Hey all,

I'm trying to use biomod2 to produce an ensemble of 6 submodels for some
presence-only data for my species of interest.  The submodels consist of
Maxent, GAM, MARS, Boosted Regression Trees, RandomForest, and
Classification Trees.  The explanatory variables I'm using to train these
models are fed-in as a rasterstack().

To start with, I'm just trying to use simple climate variables from
WORLDCLIM (tmin, tmax, tmean, etc...) to train with.  For the sake of
argument, I'm pulling my pseudo-absence records randomly from the (very
large... all of N. America) background extent of my input rasters.  All of
my submodels work fine using my presence records and my rasterstack of
WORLDCLIM data, except for GAM.

GAM reports "A term has fewer unique covariate combinations than specified
maximum degrees of freedom".

I know the rasters are valid.  I've used them independently with Maxent.jar
to produce models.  And to top it off, the other submodels (Maxent, MARS,
BRT, RF, CTA) don't appear to throw any errors when using these same data.
 Anyone have any ideas?  Am I doing something egregiously wrong with my GAM
implementation?  GAM's error message is kind-of cryptic.  I'm assuming that
it can't fit a logistic model to my input data, because it simply isn't
"seeing" it.

Thanks!

Kyle

------ [code snippet]
------------------------------------------------------------

# initialize biomod dataset

# presenceShapefile is of type 'SpatialPoints', predictors is of type
RasterStack

myBiomodData <- BIOMOD_FormatingData(resp.var  = presenceShapefile,
                                     expl.var  = predictors,
                                     resp.name = "some.species",
                                     PA.nb.rep = 1,
                                     PA.nb.absences = 100,
                                     PA.strategy = 'random',
                                     na.rm = T)
#
# set any customized options for submodels we may need
#

    if(usingMAXENT) {
      maxentOptions <- list(path_to_maxent.jar =
"/home/kyle/R/x86_64-redhat-linux-gnu-library/2.15/dismo/java/",
                           maximumiterations = 200,
                           visible = FALSE)
    } else { maxentOptions <- NULL }

    if(usingGAM) {
      gamOptions <- list(family = binomial(link = "logit"))
    } else { gamOptions <- NULL }

    myBiomodOption <- BIOMOD_ModelingOptions(MAXENT = maxentOptions,
                                             GAM    = gamOptions)

    save.image("this.rdata") # save a workspace snapshot to demonstrate the
objects we are passing into
                             # BIOMOD_Modeling, in-case it craps-out.

#
# train our models
#

myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
                                     models = submodels,
                                     models.options = myBiomodOption,
                                     NbRunEval=1,
                                     DataSplit=80,
                                     Yweights=NULL,
                                     VarImport=3,
                                     models.eval.meth = c('TSS','ROC'),
                                     SaveObj = TRUE,
                                     rescal.all.models = TRUE)

------ [end code snippet]
------------------------------------------------------------

------ [start biomod2 output
]--------------------------------------------------------
-=-=-=-=-=-=-=-=-=-=-=-=-= some.species Data Formating
-=-=-=-=-=-=-=-=-=-=-=-=-=

      ! Response variable is considered as a only presences one... Is it
really what you want?
      ! No data has been set aside for modeling evaluation
   > Pseudo Absences Selection checkings...
   > random pseudo absences selection
   > Pseudo absences are selected in explanatory variables
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Loading required library...
Attaching package: ‘MASS’

The following object(s) are masked from ‘package:raster’:

    area, select

Loading required package: splines
Loaded gbm 1.6.3.2

randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.


Checking Models arguments...

Creating suitable Workdir...

                        ! Weights where defined to rise a 0.5 prevalence !

-=-=-=-  some.species  Modeling Summary
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
15  environmental variables ( tmax10 tmax11 tmax12 tmean01 tmean02 tmean03
tmean04 tmean05 tmean06 tmean07 tmean08 tmean09 tmean10 tmean11 tmean12 )
Number of evaluation repetitions : 2
Models selected : MAXENT MARS GAM GBM RF CTA
Total number of model runs : 12
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


-=-=-=- Run :  some.species_PA1


-=-=-=--=-=-=- some.species_PA1_RUN1

Model=MAXENT
        Creating Maxent Temp Proj Data..
 Runing Maxent...

 Getting predictions...
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Multiple Adaptive Regression Splines
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=GAM
         GAM_mgcv algorithm chosen
        User defined control args building..
        Automatic formula generation...
        > GAM (mgcv) modelling...Error in
smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
  A term has fewer unique covariate combinations than specified maximum
degrees of freedom
In addition: Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: In storage.mode(tagx) <- "integer" : NAs introduced by coercion

   ! Note :  some.species_PA1_RUN1_GAM failed!

Model=Generalised Boosting Regression
         500 maximum different trees and  5  Fold Cross-Validation
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Breiman and Cutler's random forests for classification and regression
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Classification tree
         5 Fold Cross-Validation
        Evaluating Model stuff...
        Evaluating Predictor Contributions...


-=-=-=--=-=-=- some.species_PA1_Full

Model=MAXENT
        Creating Maxent Temp Proj Data..
 Runing Maxent...



 Getting predictions...
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Multiple Adaptive Regression Splines
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=GAM
         GAM_mgcv algorithm chosen
        User defined control args building..
        Automatic formula generation...
        > GAM (mgcv) modelling...Error in
smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
  A term has fewer unique covariate combinations than specified maximum
degrees of freedom
In addition: Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: algorithm did not converge
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: In storage.mode(tagx) <- "integer" : NAs introduced by coercion

   ! Note :  some.species_PA1_Full_GAM failed!

Model=Generalised Boosting Regression
         500 maximum different trees and  5  Fold Cross-Validation
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Breiman and Cutler's random forests for classification and regression
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

Model=Classification tree
         5 Fold Cross-Validation
        Evaluating Model stuff...
        Evaluating Predictor Contributions...

        Removing Maxent Temp Data..
-=-=-=- Done
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: algorithm did not converge
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
------ [end biomod2
output]-----------------------------------------------------------

------ [start R session to see what went
wrong]---------------------------------------

> load("this.rdata")
> ls()
>
 [1] "args"                       "biomod2Avail"
 [3] "downsampleCoordData_kde"    "downsampleCoordData_simple"
 [5] "gamOptions"                 "gdalAvail"
 [7] "geoTiffToDataframe"         "hdf4ToDataframe"
 [9] "i"                          "j"
[11] "maptoolsAvail"              "maskRastersByVector"
[13] "maxentOptions"              "myBiomodData"
[15] "myBiomodOption"             "predictors"
[17] "presenceShapefile"          "printUsage"
[19] "submodels"                  "usingBRT"
[21] "usingCTA"                   "usingGAM"
[23] "usingGLM"                   "usingMARS"
[25] "usingMAXENT"                "usingRF"

> summary(predictors)
>
           tmax10   tmax11    tmax12   tmean01   tmean02   tmean03   tmean04
Min.    -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
1st Qu. -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
Median   0.00e+00 -2.8e+01 -1.05e+02 -1.92e+02 -1.58e+02 -9.70e+01 -5.00e+00
3rd Qu.  5.90e+01  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
Max.     3.62e+02  3.6e+02  3.58e+02  2.83e+02  2.88e+02  2.94e+02  3.11e+02
NA's     0.00e+00  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
          tmean05   tmean06   tmean07   tmean08  tmean09   tmean10   tmean11
Min.    -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38
1st Qu. -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38
Median   0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00 -6.70e+01
3rd Qu.  6.20e+01  1.19e+02  1.47e+02  1.32e+02  7.9e+01  1.90e+01  0.00e+00
Max.     3.24e+02  3.29e+02  3.58e+02  3.48e+02  3.2e+02  2.98e+02  2.89e+02
NA's     0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00  0.00e+00
          tmean12
Min.    -3.40e+38
1st Qu. -3.40e+38
Median  -1.55e+02
3rd Qu.  0.00e+00
Max.     2.85e+02
NA's     0.00e+00
Warning message:
In .local(object, ...) :
  summary is an estimate based on a sample of 1e+05 cells (0.11% of all
cells)


More information about the Biomod-commits mailing list