[Traminer-users] alining sequences according to exogenous time variables
Matthias Studer
Matthias.Studer at unige.ch
Tue Aug 30 09:55:20 CEST 2011
Hi Simon,
Sorry for taking so much time to answer your question... In this answer,
I will reuse the same code as the one used to truncate the sequence to a
varying length (available here:
http://lists.r-forge.r-project.org/pipermail/traminer-users/2011-May/000070.html
)
Below, I give two examples to answer your question based on the mvad
dataset. In the first one, we will align the sequence according to a
random date specific to each individual. In the second one, we will
compute the age at first joblessness episode to align the sequence.
## loading TraMineR and the data
library(TraMineR)
data(mvad)
## Defining alphabet and so on
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school",
"training")
mvad.labels <- c("employment", "further education", "higher education",
"joblessness", "school", "training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
## The idea is to transform the data before defining the sequence object
## Here we store the sequence data in a the mvad.data object
mvad.data <- mvad[, 17:86]
## Here we generate a random integer to align the sequences
## The results is a vector of length 712 with integer values ranging
from 1 to 20
## This should be the date you want to use to align your sequences
randomlength <- ceiling(runif(nrow(mvad))*20)
head(randomlength)
## On way is to use a "position" matrix, which store, for each cell the
current position in the sequence (see the previous mail about truncating
sequences)
positionindex <- matrix(1:70, nrow=nrow(mvad), ncol=70, byrow=TRUE)
head(positionindex)
## Using this position matrix, we can affect the NA value to each position
## that are before the aligning date
mvad.data[positionindex < randomlength] <- NA
## Now, here is the trick. We delete the NA values appearing on the left
using the option left="DEL"
mvad.seq <- seqdef(mvad.data, alphabet = mvad.alphabet, states =
mvad.scodes,
labels = mvad.labels, xtstep = 6, left="DEL")
all.equal(as.numeric(seqlength(mvad.seq)), 71-randomlength)
seqdplot(mvad.seq)
## Second example, we align the sequences with the age at first
joblessness episode
### Start by retrieving this age
## Use a fresh data object
mvad.data <- mvad[, 17:86]
## Creating a vector containing by default NA values
age <- rep(NA, nrow(mvad.data))
state <- "joblessness"
## for each column
for(i in 1:ncol(mvad.data)){
## Asign current column value to individual who first enter the
state "state"
## To do that search for individuals that did not experience the
state before
## Those individuals have NA entries in the age vector
notexperienced <- is.na(age)
## Select the individual that are now in the state "state"
instate <- mvad.data[, i] == state
## Assign current column value to individuals that meet both conditions
age[notexperienced & instate ] <- i
}
## indiv with NA values in age never experienced the state joblessness,
so remove it.
mvad.dataJL <- mvad.data[!is.na(age), ]
ageJL <- age[!is.na(age)]
## Use the position matrix
positionindex <- matrix(1:70, nrow=nrow(mvad.dataJL), ncol=70, byrow=TRUE)
## Using this position matrix, we can affect the NA value to each position
## that are before the aligning date
mvad.dataJL[positionindex < ageJL] <- NA
## Now, here is the trick. We delete the NA values appearing on the left
using the option left="DEL"
mvad.seqJL <- seqdef(mvad.dataJL, alphabet = mvad.alphabet, states =
mvad.scodes,
labels = mvad.labels, xtstep = 6, left="DEL")
seqdplot(mvad.seqJL)
Hope this helps.
Matthias Studer
Le 25.07.2011 18:42, Simon PAYE a écrit :
> Dear colleagues,
> I come back to the TraMineR list for another technical question that might be of use for other career or time-use analysts.
> My purpose is to align all the sequences of my sequence object according to a date specific to each individual. The variable indicating the date is exogenous to the sequence. In my case, I would like to align careers according to the moment in which a specific turning point occurs (for example being given job tenure).
> Example: aligning two sequences according to the date of tenure award
> 2000 2001 2002 2003 2004 2005
> X A A B B B C
> Y A A B B
> (X got married in 2002 and Y in 2003)
> I intend to end up with this:
> T-2 T-1 Ten. T+1 T+2 T+3
> X A A B B B C
> Y A A B B
> I have already aligned sequences manually in Excel, but it is a rather frustrating task. As I would like to do this aligning operation for various turning points in the career (e.g. parenthood, major publication, etc), I hope that we can succesfully develop a small loop in TraMineR or perhaps a new argument in the sequIplot function, something like:
> R> seqdef(mydata, var = 81:127, align = mydata$year.tenure)
>
> All the best and thank you again,
> Simon
>
>
> --------------------------------------------------------------------------
> Tous les courriers électroniques émis depuis la messagerie
> de Sciences Po doivent respecter des conditions d'usages.
> Pour les consulter rendez-vous sur
> http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
* Anglais - détecté
* Anglais
* Français
* Allemand
* Anglais
* Français
* Allemand
<javascript:void(0);>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20110830/148b912d/attachment.htm>
More information about the Traminer-users
mailing list