[Basta-users] memory useage and categorical covariates

Fernando Colchero colchero at imada.sdu.dk
Thu Aug 21 16:33:44 CEST 2014


Hi Caroline,

    The latest version of BaSTA is not yet on CRAN but you can download it from http://basta.r-forge.r-project.org/#summary and just follow the instructions to install it. This las version should have the fix for the error message you were getting when adding covariates. If this does not work, please let us know and we’ll go back to fixing it.

    Could you fix the memory issue with the thinning?

   Best,

   Fernando


Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk
Web             www.sdu.dk/staff/colchero
Pers. web   www.colchero.com
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark

On 16 Aug 2014, at 13:27, Caroline <yfcaroline at gmail.com> wrote:

> Hello, thanks Mirre-
> you're correct about the excessive cpus  for four sims- we realised that when analysing different model settings.
> 
> We're currently assessing if the thinning fixes the issues and will report back on the outcomes as soon as we can.
> 
> Fernando, on the covariate names issue that you mentioned is in fact a bug - can we look out for the next version of basta with this fix included, or secondarily can you suggest any naming workarounds?
> 
> thanks muchly for your help,
> caroline
> 
> Sent from my i
> 
> On 16/08/2014, at 1:03 PM, Mirre Simons <mirresimons at gmail.com> wrote:
> 
>> An why 16 cpu for 4 sim? I dont think it distributes a single sim across multiple cpu..
>> 
>> Sent from my iPhone
>> 
>> On 6 Aug 2014, at 14:08, caroline chong <yfcaroline at gmail.com> wrote:
>> 
>>> Dear Fernando,
>>> 
>>> Thanks so much for your response. I'm excited to hear that the covariate bug has been able to be isolated. Looking forward to the fix - could you suggest a timeline when we can anticipate the next updated version of BaSTA to be released?
>>> 
>>> On the memory issue, I am currently running burnins of between 1000-8000 (as below, speciesout <- basta(object = iM.spec.basta$newData, studyStart = 1, studyEnd = 109, model = "GO", shape= "simple", nsim = 4, parallel =
>>> TRUE, ncpus = 16, updateJumps = TRUE, niter = 2000000, burnin= 8001, lifeTable=TRUE) and encountering the memory problems. Is this where you meant the burnin to be specified?
>>> 
>>> Thanks again for your help,
>>> Caroline.
>>> 
>>> 
>>> 
>>> On 05/08/2014, at 10:41 AM, Fernando Colchero wrote:
>>> 
>>>> Hi Caroline,
>>>> 
>>>>    Sorry for the late reply. Actually, about your first enquiry, we found a bug that produced that error. We need to update the version of BaSTA that fixes that bug. About the memory problem, based on the information you sent us I think the issue is that you have a very large dataset and that you are running the model for many iterations. I quick fix will be to specify the argument burnin = 200 or even larger, the default value is 20. The advantage is that you will keep much fewer iterations, reducing the amount of memory you use, while you’ll also reduce serial autocorrelation. Let us know if this solves the issue.
>>>> 
>>>>    Best,
>>>> 
>>>>    Fernando
>>>> 
>>>> 
>>>> Fernando Colchero
>>>> Assistant Professor
>>>> Department of Mathematics and Computer Science
>>>> Max-Planck Odense Center on the Biodemography of Aging
>>>> 
>>>> Tlf.               +45 65 50 23 24
>>>> Email           colchero at imada.sdu.dk
>>>> Web             www.sdu.dk/staff/colchero
>>>> Pers. web   www.colchero.com
>>>> Adr.              Campusvej 55, 5230, Odense, Dk
>>>> 
>>>> University of Southern Denmark
>>>> 
>>>> On 03 Aug 2014, at 01:46, Caroline Chong <caroline.chong at anu.edu.au> wrote:
>>>> 
>>>>> 
>>>>>>> 
>>>>>>> Thanks Owen, Fernando
>>>>>>> 
>>>>>>> We are encountering problems with memory usage when running basta
>>>>>>> analyses (using the latest version), and would be most grateful for
>>>>>>> your suggestions on how to resolve the following situation. We're
>>>>>>> running basta on a 64-bit linux and are able to commence basta runs
>>>>>>> successfully (both in serial and parallel), but rapidly occupy 128 GB
>>>>>>> RAM and all available swap (90 GB). The number of iterations is
>>>>>>> currently set to 2 million but we have also attempted 1 million. We
>>>>>>> are keen to run the final analyses, and will look forward to your
>>>>>>> feedback (and from the basta community) with great anticipation.
>>>>>>> 
>>>>>>> - All iterations are currently stored in the PAR matrix. Is it
>>>>>>> possible to store only the thinned chain in memory and write the full
>>>>>>> chain to disc every (for example) 100000 iterations?
>>>>>>> 
>>>>>>> - If so, would this be a reasonably straightforward alteration to
>>>>>>> implement in the current basta version, and what would this look like?
>>>>>>> 
>>>>>>> - Alternatively, could you please advise of any other methods to
>>>>>>> reduce memory useage?
>>>>>>> 
>>>>>>> Example of basta commands are below. I am aiming to compare model
>>>>>>> types (exponential, gompertz, logistic) run for each of 30 or more
>>>>>>> species:
>>>>>>> 
>>>>>>> for (s in 1:length(species.list))
>>>>>>> {
>>>>>>> 
>>>>>>> ### iterate through species list, read input matrix file
>>>>>>> species     <- species.list[s] 
>>>>>>> iM.spec <- read.delim(paste("inputmatg.", species, ".txt", sep=""),
>>>>>>> header=T, sep=",") colnames(iM.spec)[4:112]<- 1:109
>>>>>>> 
>>>>>>> iM.spec[,4:112] <- sapply(iM.spec[,4:112], as.character)
>>>>>>> iM.spec[,4:112] <- sapply(iM.spec[,4:112], as.numeric)
>>>>>>> 
>>>>>>> ### perform data check on imput matrix
>>>>>>> iM.spec.basta <- DataCheck(iM.spec, studyStart = 1, studyEnd = 109,
>>>>>>> autofix = rep(1, 7), silent = FALSE)
>>>>>>> 
>>>>>>> ### basta
>>>>>>> speciesout <- basta(object = iM.spec.basta$newData, studyStart = 1,
>>>>>>> studyEnd = 109, model = "GO", shape= "simple", nsim = 4, parallel =
>>>>>>> TRUE, ncpus = 16, updateJumps = TRUE, niter = 2000000, burnin= 8001,
>>>>>>> lifeTable=TRUE)
>>>>>>> }
>>>>>>> Example input matrix showing data for the first three individuals of
>>>>>>> species1 only, for brevity:
>>>>>>> 
>>>>>>> "ID","birth","death","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109"
>>>>>>> "1",1,0,68,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>>>>>>> "2",2,0,68,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>>>>>>> "3",3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
>>>>>>> 
>>>>>>> 
>>>>>>> My second query is as per last week regarding the "Error in FUN(newX[,
>>>>>>> i], ...) : invalid 'type' (character) of argument" error encountered
>>>>>>> when attempting to include a single or multiple covariates
>>>>>>> (particularly of type categorical), but re-posted here to the mailing
>>>>>>> list in case the community can also help out. Do you have any
>>>>>>> suggestions on how to code or name categorical covariates to
>>>>>>> circumvent this error?
>>>>>>> 
>>>>>>> Many thanks for your help,
>>>>>>> best regards
>>>>>>> Caroline.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 21 Jul 2014, at 23:52, caroline <caroline.chong at anu.edu.au> wrote:
>>>>>>> 
>>>>>>> Dear Owen/Fernando,
>>>>>>> 
>>>>>>> I was wondering whether you had any updated advice on how to code a
>>>>>>> single or multiple categorical covariates to avoid the "FUN(newX[,i])"
>>>>>>> error (similarly reported by Richard on 30 May 2014). I have checked
>>>>>>> that the covariate names do not start with the same characters and
>>>>>>> have also tried simplified names (a, b, c) to no avail so far. I would
>>>>>>> be most happy to provide you with some of my data set if that would be
>>>>>>> helpful for more context and to find the solution. I also intend to
>>>>>>> use basta with both categorical and numeric covariates so would
>>>>>>> appreciate any suggestions you may have on covariate naming.
>>>>>>> 
>>>>>>> Also, I would be grateful for your advice on deciphering the following
>>>>>>> (run on a cluster with 32 cpus available):
>>>>>>> 
>>>>>>> iM.spec.basta <- DataCheck(iM.spec, studyStart = 1, studyEnd = 109,
>>>>>>> autofix = rep(1, 7), silent = FALSE)
>>>>>>> 
>>>>>>> exspeciesout <- basta(object = iM.spec.basta$newData, studyStart = 1,
>>>>>>> studyEnd = 109, model = "EX", shape= "simple", nsim = 4, parallel =
>>>>>>> TRUE, ncpus = 16, updateJumps = TRUE, niter = 2000000, burnin= 8001,
>>>>>>> lifeTable=TRUE)
>>>>>>> 
>>>>>>> Total MCMC computing time: 10.48 hours.
>>>>>>> 
>>>>>>> Survival parameters converged appropriately.
>>>>>>> DIC was calculated.
>>>>>>> Error: cannot allocate vector of size 109.5 Gb
>>>>>>> Execution halted
>>>>>>> Warning message:
>>>>>>> system call failed: Cannot allocate memory
>>>>>>> 
>>>>>>> Warmest thanks for your assistance,
>>>>>>> with best regards,
>>>>>>> Caroline.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 01/11/2013, at 8:54 AM, Owen Jones wrote:
>>>>>>> 
>>>>>>>> Dear Caroline,
>>>>>>>> 
>>>>>>>> This is a possible bug in one of the sub-functions in basta that
>>>>>>>> deals with the covariates. 
>>>>>>>> 
>>>>>>>> We're investigating and will get back to you shortly.
>>>>>>>> 
>>>>>>>> Best wishes,
>>>>>>>> Owen
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 1 Nov 2013, at 14:01, caroline <caroline.chong at anu.edu.au> wrote:
>>>>>>>> 
>>>>>>>>> Dear All,
>>>>>>>>> 
>>>>>>>>> I am running a simple Gompertz model with one categorical
>>>>>>>>> covariate, species (coded as species SPEC =  a, b, c.... ah for
>>>>>>>>> simplicity). After running DataCheck with autofix set to on, I
>>>>>>>>> encounter the following error "in FUN(newX[,i]...)" - would you
>>>>>>>>> have any experience with a similar situation, and could provide any
>>>>>>>>> help or suggestions on how to interpret and trouble-shoot the
>>>>>>>>> problem? I am unsure as to how to decipher where the error lies.
>>>>>>>>> N.b. this data set seems to run ok when I do not include the
>>>>>>>>> covariate matrix.
>>>>>>>>> 
>>>>>>>>> Very grateful for your help,
>>>>>>>>> best regards
>>>>>>>>> Caroline.
>>>>>>>>> 
>>>>>>>>> Caroline Chong
>>>>>>>>> Postdoctoral Fellow
>>>>>>>>> Research School of Biology
>>>>>>>>> Australian National University, Canberra ACT
>>>>>>>>> 
>>>>>>>>> speciesout <- basta(object = inputMat2$newData, studyStart = 1,
>>>>>>>>> studyEnd = 109, covarsStruct = "all.in.mort", nsim = 4, parallel =
>>>>>>>>> TRUE, ncpus = 4, updateJumps = TRUE, niter = 100000) No problems
>>>>>>>>> were detected with the data.
>>>>>>>>> 
>>>>>>>>> Error in FUN(newX[, i], ...) : invalid 'type' (character) of
>>>>>>>>> argument
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Basta-users mailing list
>>> Basta-users at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/basta-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20140821/140b7aaf/attachment-0001.html>


More information about the Basta-users mailing list