[Biomod-commits] Manual, help files and news

Wed Feb 29 10:55:29 CET 2012

Dear all,

Although we are bit undeceived by the tone of some last emails, we will try to continue bringing our help and support as our limited time allows it. 

In the meantime, I do know the manuals and help files are not always easy to follow or understand. We did our best but I also admit we are spending more time to improve the code, help beginners and work on a new version than improving the clarity of the help and manual files. This is a choice we have made until now but we'll try to balance our efforts a bit more in the future. 
We would be grateful if you would let us know where you experienced clarity problems or unreproducible codes. Without clear examples or precise points, this is difficult for us to know and to correct it. Do not bother sending such emails to the whole list but rather to Damien and me. That way, we'll make sure our help files are more easy to read and understand. 

I would like, however, remind you that BIOMOD is an R-package combining several techniques, concepts, methods, and allows to make somehow infinite combination possibilities. This is impossible for us to think and address all possible bugs, wrong typing or potential misuse. We have tried to think and add error-tracking everywhere but surely this is not perfect. This is the same for the help and manual files. 

Most of the last bug-reports concern the techniques themselves (cp$comp, neural networks, convergence problems) and have nothing to do with BIOMOD. These errors will come up outside of BIOMOD as well. We could indeed code some error-traping around these mistakes, but we feel we should not. The R-sofwtare is made in a way you should think about what you do. This is not something we can do for you, although we are happy to help discussing and helping you shaping your ideas and projects. This is actually our favorite part ;-)
Some species are difficult to model because there are not enough points, or there are imprecise, or the environmental variables are not the most meaningful. Before running BIOMOD or whatever models, try to analyze your data by running explanatory analyses (simple histograms are quite meaningful), correlation matrices between environmental variables. Try also to be logical by selecting the variables you feel as important and reduce their number. Although most techniques such as RandomForest, or Maxent can deal with large number of variables, it usually brings more noise than really meaningful information. This is even more important when you want to project in space and time where the correlation between all your environmental variables might change. 
There is a striking example in the Bradypus example I run yesterday for Andreas on the maxent tutorial data. There were 116 presence only. In the meantime, 13 continuous variables + 1 categorical variable were used. 
Do not forget categorical variables should not be seen as one variable per se. A categorical variable with 5 levels is seen for the models as 4 binary variables (n-1). In the case of the ecoregion variable for Bradypus, this categorical variable has 14 levels, meaning 13 binary variables. If we add that to the 13 continuous variable, that lets us with 26 variables for modeling 116 presence. This is really not a good practice as the models will completely over-fit the data, there are way too many variables. Less than 10 variables should be OK, but not more. 

Finally, despite our limited time, we should be able to provide an important update of BIOMOD by the end of March (BIOMOD v.2). We have made important changes (object-structure), added several improvements (more evaluation techniques, more flexibility for selecting thresholds, a full parameterization of each technique as an option, clamping procedures for extrapolations). We have fully incorporated Maxent into BIOMOD and most probably we will add Cforest (another implementation of random forest, less over fitting) and Support Vector Machine (SVM). Users could also load directly presence only data (as matrix/vector or spatial data points) and ASC or raster environmental layers (like in maxent) and BIOMOD will make the background selection itself directly. We are in the test phase and writing the corresponding help files (this is why any hint on clarity is more than welcome). 
We will also write some migration routines to make sure past BIOMOD runs could be updated to the new version. 
Please apologize us if we sometimes not answer swiftly to your emails in the coming weeks as we are dedicating most of our 'BIOMOD time' on this major update. 
I really hope this version will make things easier and you'll enjoy it. 

To end with, I guess you have seen R-Forge is becoming quite low to proceed the binary files (there are building BIOMOD since more than a day). There are a bit overloaded (too many packages). In the meantime, we have put the source files here: http://www.will.chez-alice.fr/Software.html
The BIOMOD major update (v.2) will be on CRAN (probably beginning of April). The R-Forge site will be dedicated to development only. 

All the best,

Wilfried

--------------------------
Dr. Wilfried Thuiller
Laboratoire d'Ecologie Alpine, UMR CNRS 5553
Université Joseph Fourier
BP53, 38041 Grenoble cedex 9, France
tel: +33 (0)4 76 51 44 97
fax: +33 (0)4 76 51 42 79

Email: wilfried.thuiller at ujf-grenoble.fr
Personal website: http://www.will.chez-alice.fr
Team website: http://www-leca.ujf-grenoble.fr/equipes/emabio.htm

ERC Starting Grant TEEMBIO project: http://www.will.chez-alice.fr/Research.html
FP6 European EcoChange project: http://www.ecochange-project.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/biomod-commits/attachments/20120229/340fcac7/attachment-0001.html>