[Traminer-users] quantitative explanatory variables?

Matthias Studer Matthias.Studer at unige.ch
Mon Oct 31 08:57:04 CET 2011


Dear Juan Zuluaga,

I agree with you. Our example dataset lacks an example with a 
quantitative covariate.

There are two solutions to analyse the link with a quantitative 
covariate. The first one is to discretize the variable before using it 
(an example is given below). The second solution is to use the tree 
procedure. This procedure automatically finds the best cutting points by 
testing all possible binary splits.  This will also work with ordinal 
covariates.

An example of both solutions is given below using the biofam dataset 
(Swiss family life sequences between 15 and 30 years old).

## Loading TraMineR
library(TraMineR)
## Loading the biofam dataset
data(biofam)

## States labels
bf.labels <- c("Parent", "Left", "Married", "Left/Married",  "Child",
                 "Left/Child", "Left/Married/Child", "Divorced")
## States short labels for the sequences
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
## Building the sequence object
biofam.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bf.labels)
## Computing distance using Optimal matching with transition based 
substitution costs.
biodist <- seqdist(biofam.seq, method="OM", sm="TRATE", indel=1)

## First solution : Use a discretized variable
## The "cut" function creates a factor using the given cutting points
biofam$cohort <- cut(biofam$birthyr, c(1900, 1930, 1940, 1950, 1960), 
right=FALSE,
                     labels=c("1900-1929", "1930-1939", "1940-1949", 
"1950-1959"))
## Compute the association with this new variable
da <- dissassoc(biodist, biofam$cohort, R=1000)
## Printing results
## Differences are highly significant
print(da)


## Second solution : Use the tree procedure
## It will automatically find the best binary splits
biotree <- seqtree(biofam.seq~birthyr, data=biofam, diss=biodist)

##Printing the tree
print(biotree)
## Displaying the tree (adjusting legend fontsize otherwise it's too big)
## You will need to install GraphViz for this
seqtreedisplay(biotree, type="d", legend.fontsize=2)


## Creating a new cohort covariate according to the splitting points 
found with the tree procedure
biofam$cohort2 <- cut(biofam$birthyr, c(1900, 1929, 1941, 1947, 1951, 
1970), right=FALSE,
                     labels=c("<=1928", "1929-1940", "1941-1946", 
"1947-1950", "1951+"))

## Computing association with this new variable
da2 <- dissassoc(biodist, biofam$cohort2, R=1000)
## Printing results
## Pseudo R2 is slightly higher than before
print(da2)

Hope this helps.

Matthias Studer



Le 30.10.2011 02:04, Zuluaga, Juan a écrit :
>
> Hello Traminer people,
>
> I read your Sociological Methods and Research paper.  The McVicar and 
> Anyadike-Danes (2002) dataset that you used has categorical covariates.
>
> How do you deal with quantitative variates?
>
> Thank you!
>
> -juan zuluaga
>
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20111031/9a38869c/attachment.htm>


More information about the Traminer-users mailing list