[Traminer-users] Truncating sequences to a varying length - SOLVED
Simon PAYE
simon.paye at sciences-po.fr
Fri May 27 16:46:52 CEST 2011
Dear Matthias,
Thank you very much for your help, which is extremely valuable.
I implemented the first solution, and instead of a random variable, I used a date I have in my dataset. For those interested, I put a copy of the protocol I used in attached file.
Thank you again,
Simon
Matthias Studer wrote: Hi Simon,
I answer in English as other users may be interested by your question. There are several ways to truncate sequences to a varying length.
Suppose we are working with the mvad data set and we would like to truncate sequence to a random varying length
## Creating the mvad sequence
library(TraMineR)
data(mvad)
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school",
"training")
mvad.labels <- c("employment", "further education", "higher education",
"joblessness", "school", "training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states = mvad.scodes,
labels = mvad.labels, xtstep = 6)
## Here we generate a random integer to cut the sequences
## The results is a vector of length 712 with integer values ranging from 10 to 69
## This should be the length you would like to use to truncate your sequences
randomlength <- as.integer(runif(nrow(mvad))*60)+10
head(randomlength)
## On way is to use a "position" matrix, which store, for each cell the current position in the sequence
positionindex <- matrix(1:70, nrow=nrow(mvad), ncol=70, byrow=TRUE)
head(positionindex)
## Using this position matrix, we can affect the "void" attribute (i.e. end of sequence) to each position
## that are greater than the truncating date
mvad.seq[positionindex > randomlength] <- attr(mvad.seq, "void")
## Checking the results
all.equal(as.numeric(seqlength(mvad.seq)), randomlength)
## Plotting result
seqiplot(mvad.seq)
## Another way would be to use a loop
## This may take much longer to compute
## Personally, I prefer the previous solution
mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states = mvad.scodes,
labels = mvad.labels, xtstep = 6)
## For each sequence
for(i in 1:length(randomlength)){
## for the given position until the end assign the "void" element
## We should add one to randomlength to cut after the randomlength
mvad.seq[i, (randomlength[i]+1):ncol(mvad.seq)] <- attr(mvad.seq, "void")
}
seqiplot(mvad.seq)
## Checking the results
all.equal(as.numeric(seqlength(mvad.seq)), randomlength)
Hope this helps.
Matthias Studer
Le 19.05.2011 16:17, Simon PAYE a écrit : <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;}span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;}.MsoChpDefault {mso-style-type:export-only;}@page Section1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;}div.Section1 {page:Section1;}--> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";}@page Section1 {size:595.3pt 841.9pt; margin:70.85pt 70.85pt 70.85pt 70.85pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;}div.Section1 {page:Section1;}-->
Bonjour chers collègues,
Je fais irruption dans votre liste pour une question qui me travaille depuis quelque temps et qui potentiellement peut intéresser beaucoup danalystes de séquences avec des durées variées (notamment les carrières).
Je dispose dune base dune centaine de carrières duniversitaires codées en STS dune longueur de 5 à 47 années.
Pour mes analyses, jai besoin de les aligner à droite (external time reference ou calendar time axis), ou de les aligner à gauche (internal time reference ou process time axis). Jusquici, pas de problème, car je peux passer dun modèle à lautre en utilisant les options left et right de seqdef.
Tout se complique lorsque je souhaite tronquer les séquences selon une date historique variable selon les individus.
Dans mon cas, cest lannée dobtention de la tenure (emploi permanent), que jai renseignée dans une variable appelée year.tenure. Si je veux, par exemple, analyser les séquences menant à la tenure en les alignant toutes à droite selon lannée dobtention de la tenure, comment dois-je procéder ?
Je nai pas trouvé de solution dans le users guide, ni dans les autres documents disponibles sur le site de TraMineR.
Merci pour votre réponse et pour tout ce que vous avez fait jusque là,
Simon
--
Simon Paye
Doctorant en sociologie
Centre de Sociologie des Organisations - Sciences Po Paris
Tel: 0148741267
--------------------------------------------------------------------------Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous surhttp://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm _______________________________________________Traminer-users mailing listTraminer-users at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users --------------------------------------------------------------------------Tous les courriers électroniques émis depuis la messageriede Sciences Po doivent respecter des conditions d'usages.Pour les consulter rendez-vous surhttp://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20110527/e833453e/attachment.htm>
-------------- next part --------------
> # Truncating sequences according to a historical date that varies across invidivuals. Thanks to M. Studer for his advice.
> setwd("E:/R/0 bases de données")
> library(TraMineR)
> donnees <- read.table("E:/R/0 bases de données/DB generic 09 (NA).csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> Gsubset <- subset(donnees, career.type=="G")
> # eliminate people who have not (yet) reached tenure:
> Gsubset2 <- subset(Gsubset, year.perm>1000) #all the "NA's" are eliminated
> #*****TEST WITH THE FIRST SOLUTION
> G.seq <- seqdef(Gsubset2, var = 83:129)
> cut.year <- Gsubset2$year.perm-Gsubset2$clock.0 # this variable defines the column in which the sequence should be truncated
> cut.year # OK: every row has an integer specifying where to truncate the sequence
[1] 20 14 10 20 2 1 6 5 19 25 1 1 11 11 5 6 4 31 5 5 8 2 9 4 7 7 2 0 5 10 6 3 4 8 8 28 2 2 3 5 4 5 5 4 0 14 3 22 0 12 4 2 5 0 14 2
[57] 1 1 4 0 2 3 14 0 0 16 0 0 27 1 0 4 4 0 4 8 0 4 9 0 4 0 0 3 3 1 12 1 7 0 4 3 4 1 8 16 9 0 5 2 7 0 11 0 5 13 2 23 5 3 5 3
[113] 6 0 0 3 2 15 5 2 1 10
> positionindex <- matrix(1:47, nrow=nrow(G.seq), ncol=ncol(G.seq), byrow=TRUE)
> # We now use a "position" matrix, which stores, for each cell the current position in the sequence
> head(positionindex)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[2,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[3,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[4,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[5,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[6,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47]
[1,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[2,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[3,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[4,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[5,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[6,] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> G.seq[positionindex > cut.year] <- attr(G.seq, "void")
> all.equal(as.numeric(seqlength(G.seq)), cut.year)
[1] TRUE
> seqiplot(G.seq)
More information about the Traminer-users
mailing list