[Basta-users] memory useage and categorical covariates

Sun Aug 3 01:46:34 CEST 2014

>> 
>> Thanks Owen, Fernando
>> 
>> We are encountering problems with memory usage when running basta
>> analyses (using the latest version), and would be most grateful for
>> your suggestions on how to resolve the following situation. We're
>> running basta on a 64-bit linux and are able to commence basta runs
>> successfully (both in serial and parallel), but rapidly occupy 128 GB
>> RAM and all available swap (90 GB). The number of iterations is
>> currently set to 2 million but we have also attempted 1 million. We
>> are keen to run the final analyses, and will look forward to your
>> feedback (and from the basta community) with great anticipation.
>> 
>> - All iterations are currently stored in the PAR matrix. Is it
>> possible to store only the thinned chain in memory and write the full
>> chain to disc every (for example) 100000 iterations?
>> 
>> - If so, would this be a reasonably straightforward alteration to
>> implement in the current basta version, and what would this look like?
>> 
>> - Alternatively, could you please advise of any other methods to
>> reduce memory useage?
>> 
>> Example of basta commands are below. I am aiming to compare model
>> types (exponential, gompertz, logistic) run for each of 30 or more
>> species:
>> 
>> for (s in 1:length(species.list))
>> {
>> 
>>  ### iterate through species list, read input matrix file
>>  species     <- species.list[s] 
>>  iM.spec <- read.delim(paste("inputmatg.", species, ".txt", sep=""),
>> header=T, sep=",") colnames(iM.spec)[4:112]<- 1:109
>> 
>>  iM.spec[,4:112] <- sapply(iM.spec[,4:112], as.character)
>>  iM.spec[,4:112] <- sapply(iM.spec[,4:112], as.numeric)
>> 
>>  ### perform data check on imput matrix
>>  iM.spec.basta <- DataCheck(iM.spec, studyStart = 1, studyEnd = 109,
>> autofix = rep(1, 7), silent = FALSE)
>> 
>>  ### basta
>> speciesout <- basta(object = iM.spec.basta$newData, studyStart = 1,
>> studyEnd = 109, model = "GO", shape= "simple", nsim = 4, parallel =
>> TRUE, ncpus = 16, updateJumps = TRUE, niter = 2000000, burnin= 8001,
>> lifeTable=TRUE)
>> }
>> Example input matrix showing data for the first three individuals of
>> species1 only, for brevity:
>> 
>> "ID","birth","death","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109"
>> "1",1,0,68,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>> "2",2,0,68,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>> "3",3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
>> 
>> 
>> My second query is as per last week regarding the "Error in FUN(newX[,
>> i], ...) : invalid 'type' (character) of argument" error encountered
>> when attempting to include a single or multiple covariates
>> (particularly of type categorical), but re-posted here to the mailing
>> list in case the community can also help out. Do you have any
>> suggestions on how to code or name categorical covariates to
>> circumvent this error?
>> 
>> Many thanks for your help,
>> best regards
>> Caroline.
>> 
>> 
>> 
>> On 21 Jul 2014, at 23:52, caroline <caroline.chong at anu.edu.au> wrote:
>> 
>> Dear Owen/Fernando,
>> 
>> I was wondering whether you had any updated advice on how to code a
>> single or multiple categorical covariates to avoid the "FUN(newX[,i])"
>> error (similarly reported by Richard on 30 May 2014). I have checked
>> that the covariate names do not start with the same characters and
>> have also tried simplified names (a, b, c) to no avail so far. I would
>> be most happy to provide you with some of my data set if that would be
>> helpful for more context and to find the solution. I also intend to
>> use basta with both categorical and numeric covariates so would
>> appreciate any suggestions you may have on covariate naming.
>> 
>> Also, I would be grateful for your advice on deciphering the following
>> (run on a cluster with 32 cpus available):
>> 
>> iM.spec.basta <- DataCheck(iM.spec, studyStart = 1, studyEnd = 109,
>> autofix = rep(1, 7), silent = FALSE)
>> 
>> exspeciesout <- basta(object = iM.spec.basta$newData, studyStart = 1,
>> studyEnd = 109, model = "EX", shape= "simple", nsim = 4, parallel =
>> TRUE, ncpus = 16, updateJumps = TRUE, niter = 2000000, burnin= 8001,
>> lifeTable=TRUE)
>> 
>> Total MCMC computing time: 10.48 hours.
>> 
>> Survival parameters converged appropriately.
>> DIC was calculated.
>> Error: cannot allocate vector of size 109.5 Gb
>> Execution halted
>> Warning message:
>> system call failed: Cannot allocate memory
>> 
>> Warmest thanks for your assistance,
>> with best regards,
>> Caroline.
>> 
>> 
>> 
>> On 01/11/2013, at 8:54 AM, Owen Jones wrote:
>> 
>>> Dear Caroline,
>>> 
>>> This is a possible bug in one of the sub-functions in basta that
>>> deals with the covariates. 
>>> 
>>> We're investigating and will get back to you shortly.
>>> 
>>> Best wishes,
>>> Owen
>>> 
>>> 
>>> 
>>> 
>>> On 1 Nov 2013, at 14:01, caroline <caroline.chong at anu.edu.au> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I am running a simple Gompertz model with one categorical
>>>> covariate, species (coded as species SPEC =  a, b, c.... ah for
>>>> simplicity). After running DataCheck with autofix set to on, I
>>>> encounter the following error "in FUN(newX[,i]...)" - would you
>>>> have any experience with a similar situation, and could provide any
>>>> help or suggestions on how to interpret and trouble-shoot the
>>>> problem? I am unsure as to how to decipher where the error lies.
>>>> N.b. this data set seems to run ok when I do not include the
>>>> covariate matrix.
>>>> 
>>>> Very grateful for your help,
>>>> best regards
>>>> Caroline.
>>>> 
>>>> Caroline Chong
>>>> Postdoctoral Fellow
>>>> Research School of Biology
>>>> Australian National University, Canberra ACT
>>>> 
>>>> speciesout <- basta(object = inputMat2$newData, studyStart = 1,
>>>> studyEnd = 109, covarsStruct = "all.in.mort", nsim = 4, parallel =
>>>> TRUE, ncpus = 4, updateJumps = TRUE, niter = 100000) No problems
>>>> were detected with the data.
>>>> 
>>>> Error in FUN(newX[, i], ...) : invalid 'type' (character) of
>>>> argument
>> 
>> 
>>