[Biomod-commits] Fwd: GAM implementation not properly handling rasterstack()?

Thu Jan 24 20:30:03 CET 2013

Hy Kyle,

This error means that you one of your variables hasn't enough unique value
to construct your GAM.
Maybe you can try to play with the "k" value of GAM option and set it to a
lower value (e.g. 2 or 3). (have a look at choose.k help file of mgcv
package) Or you can try to select more pseudo absences at formating step. A
last option should be to compute gam from gam package instead of mgcv one.

Hope that helps,

Best,

Damien.

2013/1/24 Kyle Taylor <kyle.a.taylor at gmail.com>

> Hey all,
>
> I'm trying to use biomod2 to produce an ensemble of 6 submodels for some
> presence-only data for my species of interest.  The submodels consist of
> Maxent, GAM, MARS, Boosted Regression Trees, RandomForest, and
> Classification Trees.  The explanatory variables I'm using to train these
> models are fed-in as a rasterstack().
>
> To start with, I'm just trying to use simple climate variables from
> WORLDCLIM (tmin, tmax, tmean, etc...) to train with.  For the sake of
> argument, I'm pulling my pseudo-absence records randomly from the (very
> large... all of N. America) background extent of my input rasters.  All of
> my submodels work fine using my presence records and my rasterstack of
> WORLDCLIM data, except for GAM.
>
> GAM reports "A term has fewer unique covariate combinations than specified
> maximum degrees of freedom".
>
>    -
>
> I know the rasters are valid.  I've used them independently with Maxent.jar
> to produce models.  And to top it off, the other submodels (Maxent, MARS,
> BRT, RF, CTA) don't appear to throw any errors when using these same data.
>  Anyone have any ideas?  Am I doing something egregiously wrong with my GAM
> implementation?  GAM's error message is kind-of cryptic.  I'm assuming that
> it can't fit a logistic model to my input data, because it simply isn't
> "seeing" it.
>
> Thanks!
>
> Kyle
>
> ------ [code snippet]
> ------------------------------------------------------------
>
> # initialize biomod dataset
>
> # presenceShapefile is of type 'SpatialPoints', predictors is of type
> RasterStack
>
> myBiomodData <- BIOMOD_FormatingData(resp.var  = presenceShapefile,
>                                      expl.var  = predictors,
>                                      resp.name = "some.species",
>                                      PA.nb.rep = 1,
>                                      PA.nb.absences = 100,
>                                      PA.strategy = 'random',
>                                      na.rm = T)
> #
> # set any customized options for submodels we may need
> #
>
>     if(usingMAXENT) {
>       maxentOptions <- list(path_to_maxent.jar =
> "/home/kyle/R/x86_64-redhat-linux-gnu-library/2.15/dismo/java/",
>                            maximumiterations = 200,
>                            visible = FALSE)
>     } else { maxentOptions <- NULL }
>
>     if(usingGAM) {
>       gamOptions <- list(family = binomial(link = "logit"))
>     } else { gamOptions <- NULL }
>
>     myBiomodOption <- BIOMOD_ModelingOptions(MAXENT = maxentOptions,
>                                              GAM    = gamOptions)
>
>     save.image("this.rdata") # save a workspace snapshot to demonstrate the
> objects we are passing into
>                              # BIOMOD_Modeling, in-case it craps-out.
>
> #
> # train our models
> #
>
> myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
>                                      models = submodels,
>                                      models.options = myBiomodOption,
>                                      NbRunEval=1,
>                                      DataSplit=80,
>                                      Yweights=NULL,
>                                      VarImport=3,
>                                      models.eval.meth = c('TSS','ROC'),
>                                      SaveObj = TRUE,
>                                      rescal.all.models = TRUE)
>
> ------ [end code snippet]
> ------------------------------------------------------------
>
> ------ [start biomod2 output
> ]--------------------------------------------------------
> -=-=-=-=-=-=-=-=-=-=-=-=-= some.species Data Formating
> -=-=-=-=-=-=-=-=-=-=-=-=-=
>
>       ! Response variable is considered as a only presences one... Is it
> really what you want?
>       ! No data has been set aside for modeling evaluation
>    > Pseudo Absences Selection checkings...
>    > random pseudo absences selection
>    > Pseudo absences are selected in explanatory variables
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
> Loading required library...
> Attaching package: ‘MASS’
>
> The following object(s) are masked from ‘package:raster’:
>
>     area, select
>
> Loading required package: splines
> Loaded gbm 1.6.3.2
>
> randomForest 4.6-7
> Type rfNews() to see new features/changes/bug fixes.
> This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
>
>
> Checking Models arguments...
>
> Creating suitable Workdir...
>
>                         ! Weights where defined to rise a 0.5 prevalence !
>
> -=-=-=-  some.species  Modeling Summary
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> 15  environmental variables ( tmax10 tmax11 tmax12 tmean01 tmean02 tmean03
> tmean04 tmean05 tmean06 tmean07 tmean08 tmean09 tmean10 tmean11 tmean12 )
> Number of evaluation repetitions : 2
> Models selected : MAXENT MARS GAM GBM RF CTA
> Total number of model runs : 12
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
> -=-=-=- Run :  some.species_PA1
>
>
> -=-=-=--=-=-=- some.species_PA1_RUN1
>
> Model=MAXENT
>         Creating Maxent Temp Proj Data..
>  Runing Maxent...
>
>  Getting predictions...
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Multiple Adaptive Regression Splines
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=GAM
>          GAM_mgcv algorithm chosen
>         User defined control args building..
>         Automatic formula generation...
>         > GAM (mgcv) modelling...Error in
> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>   A term has fewer unique covariate combinations than specified maximum
> degrees of freedom
> In addition: Warning messages:
> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 2: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>
>    ! Note :  some.species_PA1_RUN1_GAM failed!
>
> Model=Generalised Boosting Regression
>          500 maximum different trees and  5  Fold Cross-Validation
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Breiman and Cutler's random forests for classification and regression
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Classification tree
>          5 Fold Cross-Validation
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
>
> -=-=-=--=-=-=- some.species_PA1_Full
>
> Model=MAXENT
>         Creating Maxent Temp Proj Data..
>  Runing Maxent...
>
>
>
>  Getting predictions...
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Multiple Adaptive Regression Splines
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=GAM
>          GAM_mgcv algorithm chosen
>         User defined control args building..
>         Automatic formula generation...
>         > GAM (mgcv) modelling...Error in
> smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
>   A term has fewer unique covariate combinations than specified maximum
> degrees of freedom
> In addition: Warning messages:
> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 2: glm.fit: algorithm did not converge
> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 4: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 5: In storage.mode(tagx) <- "integer" : NAs introduced by coercion
>
>    ! Note :  some.species_PA1_Full_GAM failed!
>
> Model=Generalised Boosting Regression
>          500 maximum different trees and  5  Fold Cross-Validation
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Breiman and Cutler's random forests for classification and regression
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
> Model=Classification tree
>          5 Fold Cross-Validation
>         Evaluating Model stuff...
>         Evaluating Predictor Contributions...
>
>         Removing Maxent Temp Data..
> -=-=-=- Done
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Warning messages:
> 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 2: glm.fit: algorithm did not converge
> 3: glm.fit: fitted probabilities numerically 0 or 1 occurred
> ------ [end biomod2
> output]-----------------------------------------------------------
>
> ------ [start R session to see what went
> wrong]---------------------------------------
>
> > load("this.rdata")
> > ls()
> >
>  [1] "args"                       "biomod2Avail"
>  [3] "downsampleCoordData_kde"    "downsampleCoordData_simple"
>  [5] "gamOptions"                 "gdalAvail"
>  [7] "geoTiffToDataframe"         "hdf4ToDataframe"
>  [9] "i"                          "j"
> [11] "maptoolsAvail"              "maskRastersByVector"
> [13] "maxentOptions"              "myBiomodData"
> [15] "myBiomodOption"             "predictors"
> [17] "presenceShapefile"          "printUsage"
> [19] "submodels"                  "usingBRT"
> [21] "usingCTA"                   "usingGAM"
> [23] "usingGLM"                   "usingMARS"
> [25] "usingMAXENT"                "usingRF"
>
> > summary(predictors)
> >
>            tmax10   tmax11    tmax12   tmean01   tmean02   tmean03
> tmean04
> Min.    -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
> -3.40e+38
> 1st Qu. -3.40e+38 -3.4e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38
> -3.40e+38
> Median   0.00e+00 -2.8e+01 -1.05e+02 -1.92e+02 -1.58e+02 -9.70e+01
> -5.00e+00
> 3rd Qu.  5.90e+01  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
>  0.00e+00
> Max.     3.62e+02  3.6e+02  3.58e+02  2.83e+02  2.88e+02  2.94e+02
>  3.11e+02
> NA's     0.00e+00  0.0e+00  0.00e+00  0.00e+00  0.00e+00  0.00e+00
>  0.00e+00
>           tmean05   tmean06   tmean07   tmean08  tmean09   tmean10
> tmean11
> Min.    -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
> -3.40e+38
> 1st Qu. -3.40e+38 -3.40e+38 -3.40e+38 -3.40e+38 -3.4e+38 -3.40e+38
> -3.40e+38
> Median   0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00
> -6.70e+01
> 3rd Qu.  6.20e+01  1.19e+02  1.47e+02  1.32e+02  7.9e+01  1.90e+01
>  0.00e+00
> Max.     3.24e+02  3.29e+02  3.58e+02  3.48e+02  3.2e+02  2.98e+02
>  2.89e+02
> NA's     0.00e+00  0.00e+00  0.00e+00  0.00e+00  0.0e+00  0.00e+00
>  0.00e+00
>           tmean12
> Min.    -3.40e+38
> 1st Qu. -3.40e+38
> Median  -1.55e+02
> 3rd Qu.  0.00e+00
> Max.     2.85e+02
> NA's     0.00e+00
> Warning message:
> In .local(object, ...) :
>   summary is an estimate based on a sample of 1e+05 cells (0.11% of all
> cells)
> _______________________________________________
> Biomod-commits mailing list
> Biomod-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/biomod-commits
>