[Traminer-users] Truncating sequences to a varying length - SOLVED

Fri May 27 16:46:52 CEST 2011

  Dear Matthias,
Thank you very much for your help, which is extremely valuable.
I implemented the first solution, and instead of a random variable, I used a date I have in my dataset. For those interested, I put a copy of the protocol I used in attached file.

Thank you again,
Simon

Matthias Studer  wrote:    Hi Simon,

    I answer in English as other users may be interested by your    question. There are several ways to truncate sequences to a varying    length.

    Suppose we are working with the mvad data set and we would like to    truncate sequence to a random varying length

    ## Creating the mvad sequence
    library(TraMineR)
    data(mvad)

    mvad.alphabet <- c("employment", "FE", "HE", "joblessness",    "school",
        "training")
    mvad.labels <- c("employment", "further education", "higher    education",
        "joblessness", "school", "training")
    mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
    mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states    = mvad.scodes,
        labels = mvad.labels, xtstep = 6)

    ## Here we generate a random integer to cut the sequences
    ## The results is a vector of length 712 with integer values ranging    from 10 to 69
    ## This should be the length you would like to use to truncate your    sequences
    randomlength <- as.integer(runif(nrow(mvad))*60)+10
    head(randomlength)

    ## On way is to use a "position" matrix, which store, for each cell    the current position in the sequence
    positionindex <- matrix(1:70, nrow=nrow(mvad), ncol=70,    byrow=TRUE)

    head(positionindex)

    ## Using this position matrix, we can affect the "void" attribute    (i.e. end of sequence) to each position 
    ## that are greater than the truncating date
    mvad.seq[positionindex > randomlength] <- attr(mvad.seq,    "void")

    ## Checking the results
    all.equal(as.numeric(seqlength(mvad.seq)), randomlength)

    ## Plotting result
    seqiplot(mvad.seq)

    ## Another way would be to use a loop 
    ## This may take much longer to compute
    ## Personally, I prefer the previous solution

    mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states    = mvad.scodes,
        labels = mvad.labels, xtstep = 6)

    ## For each sequence
    for(i in 1:length(randomlength)){
        ## for the given position until the end assign the "void"    element
        ## We should add one to randomlength to cut after the    randomlength
        mvad.seq[i, (randomlength[i]+1):ncol(mvad.seq)] <-    attr(mvad.seq, "void")
    }

    seqiplot(mvad.seq)
    ## Checking the results
    all.equal(as.numeric(seqlength(mvad.seq)), randomlength)

    Hope this helps.

    Matthias Studer

    Le 19.05.2011 16:17, Simon PAYE a écrit :          &lt;!-- /* Font Definitions */ @font-face	{font-family:&quot;Cambria Math&quot;;	panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face	{font-family:Calibri;	panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal	{margin:0in;	margin-bottom:.0001pt;	font-size:11.0pt;	font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;}a:link, span.MsoHyperlink	{mso-style-priority:99;	color:blue;	text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed	{mso-style-priority:99;	color:purple;	text-decoration:underline;}span.EmailStyle17	{mso-style-type:personal-compose;	font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;	color:windowtext;}.MsoChpDefault	{mso-style-type:export-only;}@page Section1	{size:8.5in 11.0in;	margin:1.0in 1.0in 1.0in 1.0in;}div.Section1	{page:Section1;}--&gt;                  &lt;!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal	{mso-style-parent:&quot;&quot;;	margin:0cm;	margin-bottom:.0001pt;	mso-pagination:widow-orphan;	font-size:12.0pt;	font-family:&quot;Times New Roman&quot;;	mso-fareast-font-family:&quot;Times New Roman&quot;;}@page Section1	{size:595.3pt 841.9pt;	margin:70.85pt 70.85pt 70.85pt 70.85pt;	mso-header-margin:35.4pt;	mso-footer-margin:35.4pt;	mso-paper-source:0;}div.Section1	{page:Section1;}--&gt;      
Bonjour chers collègues,      

Je fais irruption dans votre liste pour une        question qui me        travaille depuis quelque temps et qui potentiellement peut        intéresser beaucoup        d’analystes de séquences avec des durées variées (notamment les        carrières).      

Je dispose d’une base d’une centaine de        carrières d’universitaires        codées en STS d’une longueur de 5 à 47 années.      
Pour mes analyses, j’ai besoin de les aligner        à droite (‘external        time reference’ ou ‘calendar time axis’), ou de les aligner à        gauche (‘internal        time reference’ ou ‘process time axis’). Jusqu’ici, pas de        problème, car je        peux passer d’un modèle à l’autre en utilisant les options        ‘left’ et ‘right’ de        seqdef.      

Tout se complique lorsque je souhaite        tronquer les séquences        selon une date historique variable selon les individus.       
Dans mon cas, c’est l’année d’obtention de la        tenure (emploi        permanent), que j’ai renseignée dans une variable appelée        ‘year.tenure’. Si je        veux, par exemple, analyser les séquences menant à la tenure en        les alignant        toutes à droite selon l’année d’obtention de la tenure, comment        dois-je        procéder ?      

Je n’ai pas trouvé de solution dans le        ‘user’s guide’, ni        dans les autres documents disponibles sur le site de TraMineR.      

Merci pour votre réponse et pour tout ce que        vous avez fait        jusque là,      

Simon             

      -- 
      Simon Paye
      Doctorant en sociologie
      Centre de Sociologie des Organisations - Sciences Po Paris

      Tel: 0148741267
            --------------------------------------------------------------------------Tous      les courriers électroniques émis depuis la messagerie      de Sciences Po doivent respecter des conditions d'usages.      Pour les consulter rendez-vous surhttp://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm      _______________________________________________Traminer-users mailing listTraminer-users at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users      --------------------------------------------------------------------------Tous les courriers électroniques émis depuis la messageriede Sciences Po doivent respecter des conditions d'usages.Pour les consulter rendez-vous surhttp://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20110527/e833453e/attachment.htm>
-------------- next part --------------
> # Truncating sequences according to a historical date that varies across invidivuals. Thanks to M. Studer for his advice.

> setwd("E:/R/0 bases de données")

> library(TraMineR)

> donnees <- read.table("E:/R/0 bases de données/DB generic 09 (NA).csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)

> Gsubset <- subset(donnees, career.type=="G")

> # eliminate people who have not (yet) reached tenure:

> Gsubset2 <- subset(Gsubset, year.perm>1000) #all the "NA's" are eliminated

> #*****TEST WITH THE FIRST SOLUTION

> G.seq  <-  seqdef(Gsubset2,  var  =  83:129)

> cut.year <- Gsubset2$year.perm-Gsubset2$clock.0 # this variable defines the column in which the sequence should be truncated

> cut.year # OK: every row has an integer specifying where to truncate the sequence
  [1] 20 14 10 20  2  1  6  5 19 25  1  1 11 11  5  6  4 31  5  5  8  2  9  4  7  7  2  0  5 10  6  3  4  8  8 28  2  2  3  5  4  5  5  4  0 14  3 22  0 12  4  2  5  0 14  2
 [57]  1  1  4  0  2  3 14  0  0 16  0  0 27  1  0  4  4  0  4  8  0  4  9  0  4  0  0  3  3  1 12  1  7  0  4  3  4  1  8 16  9  0  5  2  7  0 11  0  5 13  2 23  5  3  5  3
[113]  6  0  0  3  2 15  5  2  1 10

> positionindex <- matrix(1:47, nrow=nrow(G.seq), ncol=ncol(G.seq), byrow=TRUE)

> # We now use a "position" matrix, which stores, for each cell the current position in the sequence

> head(positionindex)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29]
[1,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
[2,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
[3,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
[4,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
[5,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
[6,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29
     [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47]
[1,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47
[2,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47
[3,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47
[4,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47
[5,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47
[6,]    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47

> G.seq[positionindex > cut.year] <- attr(G.seq, "void")

> all.equal(as.numeric(seqlength(G.seq)), cut.year)
[1] TRUE

> seqiplot(G.seq)