[Basta-users] errors in inputMat and `colnames<-`(`tmp`, value = c("b0", "b1"))?

Fri Sep 20 11:00:27 CEST 2013

Hi Caroline,

   I have to admit that your case was like the perfect storm for BaSTA! We've sorted out the issues with the dataset you sent us. Here are some ways of dealing with it:

   First, install the attached version of BaSTA which has several bug fixes that apply to your case. To install it just save it in a folder, say "C:/Documents/Temp/" and then run the following command on the R console:

install.packages("C:/Documents/Temp/BaSTA_1.9.1.tar.gz", type = "source")

   Then do the following: 

  cv <- read.csv(sprintf("%sCaptHist.csv", path))

  rd <- cv$ROBSDATES
  rd<-as.Date(rd)

  Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")

  birthDeath <- read.csv(sprintf("%spenults_birthdeath.csv", path))
  birthDeath2 <- birthDeath
  birthDeath2[birthDeath != 0] <- birthDeath[birthDeath != 0] + 100
  covarsRaw <- read.csv(sprintf("%sfixed_covars.csv", path))
  covars <- MakeCovMat(~SPECIES + CLADE, data = covarsRaw)
  # Change the colnames of two of the covariates that overlap with another two covariates:
  colnames(covars)[c(21, 31)] <- c("SPECIESmyrrh01", "CLADEA201")
  dat <- data.frame(birthDeath2, Y[, -1], covars[, -1])

  dat2 <- DataCheck(dat, studyStart = 101, studyEnd = 209, autofix = rep(1, 7), 
      silent = FALSE)
  out <- basta(dat2$newDat, studyStart = 101, studyEnd = 209, thetaStart = c(-10, 0.001))

   I hope this really solves it. If not please let us know. Best,

   Fernando

Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk
Web             www.sdu.dk/staff/colchero
Pers. web   www.colchero.com
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark

On 18 Sep 2013, at 16:54, Caroline Chong <caroline.chong at anu.edu.au> wrote:

> Hi Fernando,
> Thanks so much for your explanations, deciphering and help. I'll re-inspect my input files as well to see if I can simplify any parameters further. I'll look forward to keeping in touch re how this goes, and of your updates on any function fixes..
> Thanks again,
> best,
> caroline.
> 
> On 18/09/2013, at 11:35 PM, "Fernando Colchero" <colchero at imada.sdu.dk> wrote:
> 
>> Hi Caroline,
>> 
>>   Well, the latest recorded year being 149 is because for individual 1465 in your "penults_birthdeath.csv" table you have that the death date is 149. 
>> 
>>    The second problem, which is an issue with MakeCovMat() that we need to fix, is that, if you specify the covariates just by their name, the function will assume that you want to model the covariates all with interactions. With the tables you gave me, that produces more than 2,000 covariates, which is one quarter of the total number of points in your dataset. To avoid this, you can use the following code:
>> 
>> covarsRaw <- read.csv("fixed_covars.csv")
>> covars <- MakeCovMat(~SPECIES + CLADE + POP, data = covarsRaw)
>> 
>>   In this way, you'll have only 45, which are a lot, but it's better than 2,000... 
>> 
>>   Another problem, is that, your range of dates go from -5 to 109, so when you specify a 0 in the death or birth dates, it's not clear with of the 0s are missing values and which are actual dates. This is not your fault and it's something that we need to fix from BaSTA. If you could add up say, 200 to the real dates, then a 0 would actually mean a real missing value.
>> 
>>    There's still an issue when prepping the parameters that makes that the algorithm doesn't move. We're working on it. We'll get you an answer as soon as possible.
>> 
>>    Best,
>> 
>>    Fernando
>>   
>> 
>> Fernando Colchero
>> Assistant Professor
>> Department of Mathematics and Computer Science
>> Max-Planck Odense Center on the Biodemography of Aging
>> 
>> Tlf.               +45 65 50 23 24
>> Email           colchero at imada.sdu.dk
>> Web             www.sdu.dk/staff/colchero
>> Pers. web   www.colchero.com
>> Adr.              Campusvej 55, 5230, Odense, Dk
>> 
>> University of Southern Denmark
>> 
>> 
>> 
>> 
>> 
>> On Sep 17, 2013, at 4:08 AM, Caroline Chong <caroline.chong at anu.edu.au> wrote:
>> 
>>> Dear Fernando,
>>> 
>>> I have attempted to run using multiple categorical covariates (input covariates csv attached) but seem to have encountered a similar problem. ("Latest recorded death year" reports as 149 which is outside my range of observations).
>>> 
>>> Would you be able to suggest how to fix this in order to run MultiBasta? Also, may I enquire what might be an expected run time for model = "GO", nsim = 4, parallel = TRUE, ncpus = 4, updateJumps = TRUE - i.e. in the order of minutes, hours (or days?)
>>> 
>>> Greatest thanks for your assistance,
>>> with best regards
>>> Caroline.
>>> 
>>> 
>>> cv <- read.csv("CaptHist.csv")
>>> rd <- cv$ROBSDATES
>>> class(rd)
>>> sum(is.na(cv))
>>> rd<-as.Date(rd)
>>> Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
>>> head(Y)
>>> sum(is.na(cv))
>>> birthDeath <- read.csv("penults_birthdeath.csv")
>>> covar <- read.csv("~/fixeda_covars.csv")
>>> covars <- MakeCovMat(x= c("SPECIES", "SUBGEN", "CLADE", "SECT", "LOCAT"), data = covar)
>>> 
>>> colnames(covars)[-1] <- letters[1:(ncol(covars) - 1)]
>>> dat <- data.frame(birthDeath, Y[, -1], covars[, -1])
>>> dat2 <- DataCheck(dat, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent = FALSE)
>>> 
>>> The following rows deaths occur before observations start:
>>> [1] 550 689
>>> These records have been removed from the Dataframe
>>> The following rows have observations that occur after the year of death:
>>>   [1]    2   20   22   41   42   ..........
>>> Observations that post-date year of death have been removed.
>>> 
>>> The following rows have observations that occur before the year of birth:
>>>   [1]  298  299  300  301  .........
>>> [673] 1706 1707 1710 1711 1712 1715 1716 1717 1718
>>> Observations that pre-date year of birth have been removed.
>>> 
>>> The following rows have a one in the recapture matrix in the birth year:
>>>  [1]  14  25  36  47  58  80  91 102 113 114 125 136 147 158 169 191 213 224 225 236
>>> [21] 247 258 269 280
>>> *DataSummary*
>>> - Number of individuals         =    1,718 
>>> - Number with known birth year  =    1,260 
>>> - Number with known death year  =     361 
>>> - Number with known birth
>>>  AND death years                =      97 
>>> 
>>> - Total number of detections
>>>  in recapture matrix            =    8,789 
>>> 
>>> - Earliest detection time       =       1 
>>> - Latest detection time         =     109 
>>> - Earliest recorded birth year  =       1 
>>> - Latest recorded birth year    =     108 
>>> - Earliest recorded death year  =      16 
>>> - Latest recorded death year    =     149 
>>> 
>>> > source("/Users/caroline/BASTA/MultiBaSTA.r")
>>> > multiOut <- MultiBaSTA(dat2$newDat, studyStart = 1, studyEnd = 109, nsim=4, parallel = TRUE, ncpus = 4, models = c("GO"), shape = "simple", covarStruct="all.in.mort", updateJumps = TRUE)
>>> 
>>> --------------------------
>>> Run number 1, model: Go.Si
>>> --------------------------
>>> No problems were detected with the data.
>>> 
>>> Starting simulation to find jump sd's... 
>>> On 14/09/2013, at 2:24 PM, Fernando Colchero wrote:
>>> 
>>>> Hi Caroline,
>>>> 
>>>>    In principle yes, but let us know if you find any problems.
>>>> 
>>>>   best,
>>>> 
>>>>   Fernando
>>>> 
>>>> 
>>>> 
>>>> Fernando Colchero
>>>> Assistant Professor
>>>> Department of Mathematics and Computer Science
>>>> Max-Planck Odense Center on the Biodemography of Aging
>>>> 
>>>> Tlf.               +45 65 50 23 24
>>>> Email           colchero at imada.sdu.dk
>>>> Web             www.sdu.dk/staff/colchero
>>>> Pers. web   www.colchero.com
>>>> Adr.              Campusvej 55, 5230, Odense, Dk
>>>> 
>>>> University of Southern Denmark
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Sep 14, 2013, at 4:38 AM, Caroline Chong <caroline.chong at anu.edu.au> wrote:
>>>> 
>>>>> Hi Fernando,
>>>>> 
>>>>> Oh, that's brilliant. Thanks for explaining the problem! I have managed to run using the temporary fix and will let you know if I encounter any to the contrary as I try out different covariate combinations etc.
>>>>> To confirm - should this code be ok to run regardless of the type or combination of covariates I select to run? e.g. multiple categorical covariates, or a mixture of integer and categorical covariates - I am intending to incorporate these into the analysis also.
>>>>> 
>>>>> Many thanks,
>>>>> Caroline.
>>>>> 
>>>>> On 13/09/2013, at 11:01 PM, Fernando Colchero wrote:
>>>>> 
>>>>>> Hi Caroline,
>>>>>> 
>>>>>>    I found the problem. The issue is not with the data but a bug in the code when assigning parameter names to the covariates. The names in your CLADE covariates conflicted with the way BaSTA processes the results and finds the parameters. We have to fix it but, for the time being, here's a temporary solution so you can run your analyses:
>>>>>> 
>>>>>> cv <- read.csv("CaptHist.csv")
>>>>>> 
>>>>>> rd <- cv$ROBSDATES
>>>>>> class(rd)
>>>>>> sum(is.na(cv))
>>>>>> rd<-as.Date(rd)
>>>>>> Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
>>>>>> head(Y)
>>>>>> sum(is.na(cv))
>>>>>> 
>>>>>> birthDeath <- read.csv("penults_birthdeath.csv")
>>>>>> covar <- read.csv("fixed_covars.csv")
>>>>>> 
>>>>>> covars <- MakeCovMat(x= "CLADE", data = covar)
>>>>>> 
>>>>>> # Here's the way to avoid the problem:
>>>>>> colnames(covars)[-1] <- letters[1:(ncol(covars) - 1)]
>>>>>> dat <- data.frame(birthDeath, Y[, -1], covars[, -1])
>>>>>> 
>>>>>> dat2 <- DataCheck(dat, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), 
>>>>>>                   silent = FALSE)
>>>>>> out <- basta(dat2$newDat, studyStart = 1, studyEnd = 109, updateJumps = FALSE)
>>>>>> 
>>>>>> 
>>>>>>   Let me know if this works. Best,
>>>>>> 
>>>>>>    Fernando
>>>>>> 
>>>>>> Fernando Colchero
>>>>>> Assistant Professor
>>>>>> Department of Mathematics and Computer Science
>>>>>> Max-Planck Odense Center on the Biodemography of Aging
>>>>>> 
>>>>>> Tlf.               +45 65 50 23 24
>>>>>> Email           colchero at imada.sdu.dk
>>>>>> Web             www.sdu.dk/staff/colchero
>>>>>> Pers. web   www.colchero.com
>>>>>> Adr.              Campusvej 55, 5230, Odense, Dk
>>>>>> 
>>>>>> University of Southern Denmark
>>>>>> 
>>>>>> On 13 Sep 2013, at 12:08, Caroline Chong <caroline.chong at anu.edu.au> wrote:
>>>>>> 
>>>>>>> Hi Fernando,
>>>>>>> These errors are with the data attached here (and with my latest "errors in inputMat" email).
>>>>>>> 
>>>>>>> Thanks so much for taking a look!!
>>>>>>> 
>>>>>>> best,
>>>>>>> Caroline.
>>>>>>> "CaptHist.csv" = census matrix
>>>>>>> "penults_birthdeath.csv" = birthDeath matrix
>>>>>>> "fixed_covars.csv" = covariate matrix
>>>>>>> 
>>>>>>> 
>>>>>>> On 13/09/2013, at 7:49 PM, Fernando Colchero wrote:
>>>>>>> 
>>>>>>>> Hi Caroline,
>>>>>>>> 
>>>>>>>>    Are these errors with the data you sent me? If so, I'll run them myself and get back to you asap.
>>>>>>>> 
>>>>>>>>   Best,
>>>>>>>> 
>>>>>>>>   Fernando
>>>>>>>> 
>>>>>>>> Fernando Colchero
>>>>>>>> Assistant Professor
>>>>>>>> Department of Mathematics and Computer Science
>>>>>>>> Max-Planck Odense Center on the Biodemography of Aging
>>>>>>>> 
>>>>>>>> Tlf.               +45 65 50 23 24
>>>>>>>> Email           colchero at imada.sdu.dk
>>>>>>>> Web             www.sdu.dk/staff/colchero
>>>>>>>> Pers. web   www.colchero.com
>>>>>>>> Adr.              Campusvej 55, 5230, Odense, Dk
>>>>>>>> 
>>>>>>>> University of Southern Denmark
>>>>>>>> 
>>>>>>>> On 13 Sep 2013, at 09:58, Caroline Chong <caroline.chong at anu.edu.au> wrote:
>>>>>>>> 
>>>>>>>>> Dear BaSTA
>>>>>>>>> 
>>>>>>>>> Owen, and Fernando - thanks for your assistance with my previous birth-death coding issue (rowSums error)- happy to report that I was able to re-code this successfully and DataCheck now passes with no errors.
>>>>>>>>> 
>>>>>>>>> However I am running into the below two errors - would you be able to solve or decipher what the issue is? Firstly in the final compiled matrix (im2 = inputMat), every single observation now has a recorded Death observation whereas this is not the case in my input birthDeath matrix. I tried editing this via bd.na (below code) but this didn't work. Is there some possible issue when reading 0s in the birth and death columns?
>>>>>>>>> 
>>>>>>>>> ##e.g. original births and deaths observation matrix
>>>>>>>>> > head(birthDeath)
>>>>>>>>>   ID birth death
>>>>>>>>> 1  1     0    68
>>>>>>>>> 2  2     0    68
>>>>>>>>> 3  3     0     0
>>>>>>>>> 4  4     1     0
>>>>>>>>> 5  5     1     0
>>>>>>>>> 6  6     1     0
>>>>>>>>> ## whereas final merged matrix below shows:
>>>>>>>>> > head(im2)
>>>>>>>>> ID birth death 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
>>>>>>>>> 1     1     0    53 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
>>>>>>>>> 10    2     0    99 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
>>>>>>>>> 100   3     0    99 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
>>>>>>>>> 1000  4     8   103 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
>>>>>>>>> 1001  5     8   103 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
>>>>>>>>> 1002  6     8   103 0 0 
>>>>>>>>> > dc <- DataCheck(im2, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent=FALSE)
>>>>>>>>> No problems were detected with the data.
>>>>>>>>> 
>>>>>>>>> *DataSummary*
>>>>>>>>> - Number of individuals         =    1,720 
>>>>>>>>> - Number with known birth year  =    1,428 
>>>>>>>>> - Number with known death year  =    1,720 
>>>>>>>>> - Number with known birth
>>>>>>>>>  AND death years                =    1,428 
>>>>>>>>> 
>>>>>>>>> - Total number of detections
>>>>>>>>>  in recapture matrix            =   10,339 
>>>>>>>>> 
>>>>>>>>> - Earliest detection time       =       1 
>>>>>>>>> - Latest detection time         =     109 
>>>>>>>>> - Earliest recorded birth year  =       1 
>>>>>>>>> - Latest recorded birth year    =     107 
>>>>>>>>> - Earliest recorded death year  =       2 
>>>>>>>>> - Latest recorded death year    =     109 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Secondly on running basta (run time to error 20mins) I get returned e.g.
>>>>>>>>> 
>>>>>>>>> > out <- basta(object = im2, studyStart = 1, studyEnd = 109)
>>>>>>>>> No problems were detected with the data.
>>>>>>>>> 
>>>>>>>>> Starting simulation to find jump sd's...  done.
>>>>>>>>> 
>>>>>>>>> Simulation started...
>>>>>>>>> 
>>>>>>>>> Error in `colnames<-`(`*tmp*`, value = c("b0", "b1")) : 
>>>>>>>>>   length of 'dimnames' [2] not equal to array extent
>>>>>>>>> 
>>>>>>>>>  It appears that the dimensions of the matrix are not 2 - is this correct?, which I am unsure how to interpret or fix.
>>>>>>>>> 
>>>>>>>>> looking forward to hearing back,
>>>>>>>>> many thanks for your help,
>>>>>>>>> best regards
>>>>>>>>> Caroline.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> cv <- read.csv("~/CaptHist.csv")
>>>>>>>>> rd <- cv$ROBSDATES
>>>>>>>>> class(rd)
>>>>>>>>> sum(is.na(cv))
>>>>>>>>> rd<-as.Date(rd)
>>>>>>>>> Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
>>>>>>>>> head(Y)
>>>>>>>>> sum(is.na(cv))
>>>>>>>>> 
>>>>>>>>> birthDeath <- read.delim("~/penults_birthdeath.csv", sep=",", header=T)
>>>>>>>>> 
>>>>>>>>> bd.na <- t( # the below returns a transposed matrix, so we have to re-transpose it back to normal
>>>>>>>>>   apply( # foreach row (hence 1) in the birth dates matrix
>>>>>>>>>     birthDeath, 1,
>>>>>>>>>     function(r) { # apply by row this function (hence r)
>>>>>>>>>       if(r[2] == 0) { # if birth [2] is 0
>>>>>>>>>         r[2] <- NA # replace birth value with NA (R's missing data value)
>>>>>>>>>       }
>>>>>>>>>       if (r[3] == 0) { # if death [3] is 0
>>>>>>>>>         r[3] <- NA # replace death value with NA (R's missing data value)
>>>>>>>>>       }
>>>>>>>>>       return(r) # return the whole row
>>>>>>>>>     }
>>>>>>>>>   ))
>>>>>>>>> 
>>>>>>>>> table(is.na(bd.na[,3]))
>>>>>>>>> BD <- bd.na
>>>>>>>>> head(BD)
>>>>>>>>> covar <- read.delim("~/fixed_covars.csv", sep=",", header=T)
>>>>>>>>> head(covar)
>>>>>>>>> covMat <- MakeCovMat(x=c("CLADE"), data = covar)
>>>>>>>>> days <- as.numeric(colnames(Y)[2:ncol(Y)])
>>>>>>>>> y <- as.matrix(Y)
>>>>>>>>> bd <- apply(y,1,function(r) min(as.numeric(days[as.logical(r[2:ncol(Y)])]))) -1
>>>>>>>>> dd <- apply(y,1,function(r) max(as.numeric(days[as.logical(r[2:ncol(Y)])])))
>>>>>>>>> inputMat <- as.data.frame(cbind(BD, Y[, -1], covMat[, -1]))
>>>>>>>>> ##inputMat <- merge(BD, Y, by.x = "ID", by.y = "ID")
>>>>>>>>> ##inputMat <- merge(inputMat, covMat, by.x = "ID", by.y = "ID")
>>>>>>>>> dim(inputMat)
>>>>>>>>> colnames(inputMat)
>>>>>>>>> im2 <- inputMat
>>>>>>>>> im2[,2] <- bd
>>>>>>>>> im2[,3] <- dd
>>>>>>>>> head(im2)
>>>>>>>>> dc <- DataCheck(im2, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent=FALSE)
>>>>>>>>> names(dc)
>>>>>>>>> head(inputMat)
>>>>>>>>> # outMat <- dc$newData
>>>>>>>>> out <- basta(object = im2, studyStart = 1, studyEnd = 109)
>>>>>>>>> 
>>>>>>>>> <penults_birthdeath.csv><fixed_covars.csv><CaptHist.csv>
>>>>>>>> 
>>>>>>> 
>>>>>>> <penults_birthdeath.csv><fixed_covars.csv><CaptHist.csv>
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> <fixeda_covars.csv><CaptHist.csv><penults_birthdeath.csv>
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130920/ff3791c6/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BaSTA_1.9.1.tar.gz
Type: application/x-gzip
Size: 1095963 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130920/ff3791c6/attachment-0001.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130920/ff3791c6/attachment-0003.html>

[Basta-users] errors in inputMat and `colnames<-`(`*tmp*`, value = c("b0", "b1"))?

[Basta-users] errors in inputMat and `colnames<-`(`tmp`, value = c("b0", "b1"))?