[Basta-users] errors in inputMat and `colnames<-`(`*tmp*`, value = c("b0", "b1"))?

Caroline Chong caroline.chong at anu.edu.au
Mon Sep 23 04:10:46 CEST 2013


Hi Fernando, Owen,

Thanks again for your assistance in helping me through the veritable storm. I have for today two persisting problems I want to clarify with you and have attached my working datasets just to confirm that we have the same versions. I have checked that my input birthdeath matrix contains values from 0 - 109 (and in future datasets will aim to code from 0...n to avoid the negatives).

First, the obvious - I installed the fixed BaSTA version you sent last week and checked that I am running this version in R (I typically am using a Mac, but also tried running this in Windows and Linux to be sure).

I am encountering an error that is potentially populating from the Y <- CensusToCaptHist step. Is is possible that the input dates in CaptHist are not being translated entirely? For example, for individual 2, I have four observation dates recorded in CaptHist:

        ID      ROBSDATES
1       1       2012-07-14
2       1       2012-08-04
3       1       2012-08-18
4       1       2012-09-04
5       2       2012-07-14
6       2       2012-08-04
7       2       2012-08-18
8       2       2012-09-04
9       3       2012-07-14

and for this individual 2, birthdate = 0 (no data) and deathdate = 68.

However on running Y <- CensusToCaptHist I get returned that individual 2 has nine observation dates, in "years" 1, 22, 36, 53, 68, 86, 99, which is not represented in my input data. Would you be able to solve this translation discrepancy from the input CaptHist to output Y matrix? This results in many "errors" being returned when I run DataCheck.

Secondly, and hopefully as a minor check, I have a finalised covariate matrix containing four categorical covariates named: spec, subg, clade, sect. (File attached). If I wanted to include spec, subg and clade in to the analysis, would
 covars <- MakeCovMat(~spec + subg + clade, data = covarsRaw)
dat <- data.frame(birthDeath2, Y[, -1], covars[, -1])

be the correct code? And should I be able to proceed to use MultiBaSTA, (which I am eager to use)?

Again, am incredibly appreciative of your feedback and help.
with thanks,
best,
Caroline.

On 20/09/2013, at 7:00 PM, Fernando Colchero wrote:

Hi Caroline,

   I have to admit that your case was like the perfect storm for BaSTA! We've sorted out the issues with the dataset you sent us. Here are some ways of dealing with it:

   First, install the attached version of BaSTA which has several bug fixes that apply to your case. To install it just save it in a folder, say "C:/Documents/Temp/" and then run the following command on the R console:

install.packages("C:/Documents/Temp/BaSTA_1.9.1.tar.gz", type = "source")

   Then do the following:

  cv <- read.csv(sprintf("%sCaptHist.csv", path))

  rd <- cv$ROBSDATES
  rd<-as.Date(rd)

  Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")

  birthDeath <- read.csv(sprintf("%spenults_birthdeath.csv", path))
  birthDeath2 <- birthDeath
  birthDeath2[birthDeath != 0] <- birthDeath[birthDeath != 0] + 100
  covarsRaw <- read.csv(sprintf("%sfixed_covars.csv", path))
  covars <- MakeCovMat(~SPECIES + CLADE, data = covarsRaw)
  # Change the colnames of two of the covariates that overlap with another two covariates:
  colnames(covars)[c(21, 31)] <- c("SPECIESmyrrh01", "CLADEA201")
  dat <- data.frame(birthDeath2, Y[, -1], covars[, -1])

  dat2 <- DataCheck(dat, studyStart = 101, studyEnd = 209, autofix = rep(1, 7),
      silent = FALSE)
  out <- basta(dat2$newDat, studyStart = 101, studyEnd = 209, thetaStart = c(-10, 0.001))


   I hope this really solves it. If not please let us know. Best,

   Fernando



Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>
Web             www.sdu.dk/staff/colchero<http://www.sdu.dk/staff/colchero>
Pers. web   www.colchero.com<http://www.sdu.dk/staff/colchero>
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark

On 18 Sep 2013, at 16:54, Caroline Chong <caroline.chong at anu.edu.au<mailto:caroline.chong at anu.edu.au>> wrote:

Hi Fernando,
Thanks so much for your explanations, deciphering and help. I'll re-inspect my input files as well to see if I can simplify any parameters further. I'll look forward to keeping in touch re how this goes, and of your updates on any function fixes..
Thanks again,
best,
caroline.

On 18/09/2013, at 11:35 PM, "Fernando Colchero" <colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>> wrote:

Hi Caroline,

  Well, the latest recorded year being 149 is because for individual 1465 in your "penults_birthdeath.csv" table you have that the death date is 149.

   The second problem, which is an issue with MakeCovMat() that we need to fix, is that, if you specify the covariates just by their name, the function will assume that you want to model the covariates all with interactions. With the tables you gave me, that produces more than 2,000 covariates, which is one quarter of the total number of points in your dataset. To avoid this, you can use the following code:

covarsRaw <- read.csv("fixed_covars.csv")
covars <- MakeCovMat(~SPECIES + CLADE + POP, data = covarsRaw)

  In this way, you'll have only 45, which are a lot, but it's better than 2,000...

  Another problem, is that, your range of dates go from -5 to 109, so when you specify a 0 in the death or birth dates, it's not clear with of the 0s are missing values and which are actual dates. This is not your fault and it's something that we need to fix from BaSTA. If you could add up say, 200 to the real dates, then a 0 would actually mean a real missing value.

   There's still an issue when prepping the parameters that makes that the algorithm doesn't move. We're working on it. We'll get you an answer as soon as possible.

   Best,

   Fernando


Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>
Web             www.sdu.dk/staff/colchero<http://www.sdu.dk/staff/colchero>
Pers. web   www.colchero.com<http://www.sdu.dk/staff/colchero>
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark





On Sep 17, 2013, at 4:08 AM, Caroline Chong <caroline.chong at anu.edu.au<mailto:caroline.chong at anu.edu.au>> wrote:

Dear Fernando,

I have attempted to run using multiple categorical covariates (input covariates csv attached) but seem to have encountered a similar problem. ("Latest recorded death year" reports as 149 which is outside my range of observations).

Would you be able to suggest how to fix this in order to run MultiBasta? Also, may I enquire what might be an expected run time for model = "GO", nsim = 4, parallel = TRUE, ncpus = 4, updateJumps = TRUE - i.e. in the order of minutes, hours (or days?)

Greatest thanks for your assistance,
with best regards
Caroline.


cv <- read.csv("CaptHist.csv")
rd <- cv$ROBSDATES
class(rd)
sum(is.na(cv))
rd<-as.Date(rd)
Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
head(Y)
sum(is.na(cv))
birthDeath <- read.csv("penults_birthdeath.csv")
covar <- read.csv("~/fixeda_covars.csv")
covars <- MakeCovMat(x= c("SPECIES", "SUBGEN", "CLADE", "SECT", "LOCAT"), data = covar)

colnames(covars)[-1] <- letters[1:(ncol(covars) - 1)]
dat <- data.frame(birthDeath, Y[, -1], covars[, -1])
dat2 <- DataCheck(dat, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent = FALSE)

The following rows deaths occur before observations start:
[1] 550 689
These records have been removed from the Dataframe
The following rows have observations that occur after the year of death:
  [1]    2   20   22   41   42   ..........
Observations that post-date year of death have been removed.

The following rows have observations that occur before the year of birth:
  [1]  298  299  300  301  .........
[673] 1706 1707 1710 1711 1712 1715 1716 1717 1718
Observations that pre-date year of birth have been removed.

The following rows have a one in the recapture matrix in the birth year:
 [1]  14  25  36  47  58  80  91 102 113 114 125 136 147 158 169 191 213 224 225 236
[21] 247 258 269 280
*DataSummary*
- Number of individuals         =    1,718
- Number with known birth year  =    1,260
- Number with known death year  =     361
- Number with known birth
 AND death years                =      97

- Total number of detections
 in recapture matrix            =    8,789

- Earliest detection time       =       1
- Latest detection time         =     109
- Earliest recorded birth year  =       1
- Latest recorded birth year    =     108
- Earliest recorded death year  =      16
- Latest recorded death year    =     149

> source("/Users/caroline/BASTA/MultiBaSTA.r")
> multiOut <- MultiBaSTA(dat2$newDat, studyStart = 1, studyEnd = 109, nsim=4, parallel = TRUE, ncpus = 4, models = c("GO"), shape = "simple", covarStruct="all.in.mort", updateJumps = TRUE)

--------------------------
Run number 1, model: Go.Si
--------------------------
No problems were detected with the data.

Starting simulation to find jump sd's...
On 14/09/2013, at 2:24 PM, Fernando Colchero wrote:

Hi Caroline,

   In principle yes, but let us know if you find any problems.

  best,

  Fernando



Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>
Web             www.sdu.dk/staff/colchero<http://www.sdu.dk/staff/colchero>
Pers. web   www.colchero.com<http://www.sdu.dk/staff/colchero>
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark





On Sep 14, 2013, at 4:38 AM, Caroline Chong <caroline.chong at anu.edu.au<mailto:caroline.chong at anu.edu.au>> wrote:

Hi Fernando,

Oh, that's brilliant. Thanks for explaining the problem! I have managed to run using the temporary fix and will let you know if I encounter any to the contrary as I try out different covariate combinations etc.
To confirm - should this code be ok to run regardless of the type or combination of covariates I select to run? e.g. multiple categorical covariates, or a mixture of integer and categorical covariates - I am intending to incorporate these into the analysis also.

Many thanks,
Caroline.

On 13/09/2013, at 11:01 PM, Fernando Colchero wrote:

Hi Caroline,

   I found the problem. The issue is not with the data but a bug in the code when assigning parameter names to the covariates. The names in your CLADE covariates conflicted with the way BaSTA processes the results and finds the parameters. We have to fix it but, for the time being, here's a temporary solution so you can run your analyses:

cv <- read.csv("CaptHist.csv")

rd <- cv$ROBSDATES
class(rd)
sum(is.na(cv))
rd<-as.Date(rd)
Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
head(Y)
sum(is.na(cv))

birthDeath <- read.csv("penults_birthdeath.csv")
covar <- read.csv("fixed_covars.csv")

covars <- MakeCovMat(x= "CLADE", data = covar)

# Here's the way to avoid the problem:
colnames(covars)[-1] <- letters[1:(ncol(covars) - 1)]
dat <- data.frame(birthDeath, Y[, -1], covars[, -1])

dat2 <- DataCheck(dat, studyStart = 1, studyEnd = 109, autofix = rep(1, 7),
                  silent = FALSE)
out <- basta(dat2$newDat, studyStart = 1, studyEnd = 109, updateJumps = FALSE)


  Let me know if this works. Best,

   Fernando

Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>
Web             www.sdu.dk/staff/colchero<http://www.sdu.dk/staff/colchero>
Pers. web   www.colchero.com<http://www.sdu.dk/staff/colchero>
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark

On 13 Sep 2013, at 12:08, Caroline Chong <caroline.chong at anu.edu.au<mailto:caroline.chong at anu.edu.au>> wrote:

Hi Fernando,
These errors are with the data attached here (and with my latest "errors in inputMat" email).

Thanks so much for taking a look!!

best,
Caroline.
"CaptHist.csv" = census matrix
"penults_birthdeath.csv" = birthDeath matrix
"fixed_covars.csv" = covariate matrix


On 13/09/2013, at 7:49 PM, Fernando Colchero wrote:

Hi Caroline,

   Are these errors with the data you sent me? If so, I'll run them myself and get back to you asap.

  Best,

  Fernando

Fernando Colchero
Assistant Professor
Department of Mathematics and Computer Science
Max-Planck Odense Center on the Biodemography of Aging

Tlf.               +45 65 50 23 24
Email           colchero at imada.sdu.dk<mailto:colchero at imada.sdu.dk>
Web             www.sdu.dk/staff/colchero<http://www.sdu.dk/staff/colchero>
Pers. web   www.colchero.com<http://www.sdu.dk/staff/colchero>
Adr.              Campusvej 55, 5230, Odense, Dk

University of Southern Denmark

On 13 Sep 2013, at 09:58, Caroline Chong <caroline.chong at anu.edu.au<mailto:caroline.chong at anu.edu.au>> wrote:

Dear BaSTA

Owen, and Fernando - thanks for your assistance with my previous birth-death coding issue (rowSums error)- happy to report that I was able to re-code this successfully and DataCheck now passes with no errors.

However I am running into the below two errors - would you be able to solve or decipher what the issue is? Firstly in the final compiled matrix (im2 = inputMat), every single observation now has a recorded Death observation whereas this is not the case in my input birthDeath matrix. I tried editing this via bd.na (below code) but this didn't work. Is there some possible issue when reading 0s in the birth and death columns?

##e.g. original births and deaths observation matrix
> head(birthDeath)
  ID birth death
1  1     0    68
2  2     0    68
3  3     0     0
4  4     1     0
5  5     1     0
6  6     1     0
## whereas final merged matrix below shows:
> head(im2)
ID birth death 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1     1     0    53 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
10    2     0    99 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
100   3     0    99 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
1000  4     8   103 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1001  5     8   103 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1002  6     8   103 0 0
> dc <- DataCheck(im2, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent=FALSE)
No problems were detected with the data.

*DataSummary*
- Number of individuals         =    1,720
- Number with known birth year  =    1,428
- Number with known death year  =    1,720
- Number with known birth
 AND death years                =    1,428

- Total number of detections
 in recapture matrix            =   10,339

- Earliest detection time       =       1
- Latest detection time         =     109
- Earliest recorded birth year  =       1
- Latest recorded birth year    =     107
- Earliest recorded death year  =       2
- Latest recorded death year    =     109


Secondly on running basta (run time to error 20mins) I get returned e.g.

> out <- basta(object = im2, studyStart = 1, studyEnd = 109)
No problems were detected with the data.

Starting simulation to find jump sd's...  done.

Simulation started...

Error in `colnames<-`(`*tmp*`, value = c("b0", "b1")) :
  length of 'dimnames' [2] not equal to array extent

 It appears that the dimensions of the matrix are not 2 - is this correct?, which I am unsure how to interpret or fix.

looking forward to hearing back,
many thanks for your help,
best regards
Caroline.




cv <- read.csv("~/CaptHist.csv")
rd <- cv$ROBSDATES
class(rd)
sum(is.na(cv))
rd<-as.Date(rd)
Y <- CensusToCaptHist(ID = cv[,1], d=rd, timeInt="D")
head(Y)
sum(is.na(cv))

birthDeath <- read.delim("~/penults_birthdeath.csv", sep=",", header=T)

bd.na <- t( # the below returns a transposed matrix, so we have to re-transpose it back to normal
  apply( # foreach row (hence 1) in the birth dates matrix
    birthDeath, 1,
    function(r) { # apply by row this function (hence r)
      if(r[2] == 0) { # if birth [2] is 0
        r[2] <- NA # replace birth value with NA (R's missing data value)
      }
      if (r[3] == 0) { # if death [3] is 0
        r[3] <- NA # replace death value with NA (R's missing data value)
      }
      return(r) # return the whole row
    }
  ))

table(is.na(bd.na[,3]))
BD <- bd.na
head(BD)
covar <- read.delim("~/fixed_covars.csv", sep=",", header=T)
head(covar)
covMat <- MakeCovMat(x=c("CLADE"), data = covar)
days <- as.numeric(colnames(Y)[2:ncol(Y)])
y <- as.matrix(Y)
bd <- apply(y,1,function(r) min(as.numeric(days[as.logical(r[2:ncol(Y)])]))) -1
dd <- apply(y,1,function(r) max(as.numeric(days[as.logical(r[2:ncol(Y)])])))
inputMat <- as.data.frame(cbind(BD, Y[, -1], covMat[, -1]))
##inputMat <- merge(BD, Y, by.x = "ID", by.y = "ID")
##inputMat <- merge(inputMat, covMat, by.x = "ID", by.y = "ID")
dim(inputMat)
colnames(inputMat)
im2 <- inputMat
im2[,2] <- bd
im2[,3] <- dd
head(im2)
dc <- DataCheck(im2, studyStart = 1, studyEnd = 109, autofix = rep(1, 7), silent=FALSE)
names(dc)
head(inputMat)
# outMat <- dc$newData
out <- basta(object = im2, studyStart = 1, studyEnd = 109)

<penults_birthdeath.csv><fixed_covars.csv><CaptHist.csv>


<penults_birthdeath.csv><fixed_covars.csv><CaptHist.csv>




<fixeda_covars.csv><CaptHist.csv><penults_birthdeath.csv>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130923/bfaafcb9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CaptHist.csv
Type: text/csv
Size: 170082 bytes
Desc: CaptHist.csv
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130923/bfaafcb9/attachment-0003.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fixedb_covars.csv
Type: text/csv
Size: 40121 bytes
Desc: fixedb_covars.csv
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130923/bfaafcb9/attachment-0004.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: penults_birthdeath.csv
Type: text/csv
Size: 15339 bytes
Desc: penults_birthdeath.csv
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130923/bfaafcb9/attachment-0005.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BaSTA_1.9.1.tar.gz
Type: application/x-gzip
Size: 1095963 bytes
Desc: BaSTA_1.9.1.tar.gz
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130923/bfaafcb9/attachment-0001.bin>


More information about the Basta-users mailing list