<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}

o\:* {behavior:url(#default#VML);}

w\:* {behavior:url(#default#VML);}

.shape {behavior:url(#default#VML);}

</style><![endif]--><style><!--

/* Font Definitions */

@font-face

        {font-family:Helvetica;

        panose-1:2 11 6 4 2 2 2 2 2 4;}

@font-face

        {font-family:Wingdings;

        panose-1:5 0 0 0 0 0 0 0 0 0;}

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;

        mso-fareast-language:EN-US;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#0563C1;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:#954F72;

        text-decoration:underline;}

p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0cm;

        margin-right:0cm;

        margin-bottom:0cm;

        margin-left:36.0pt;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;

        mso-fareast-language:EN-US;}

span.EmailStyle18

        {mso-style-type:personal;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

span.EmailStyle19

        {mso-style-type:personal;

        font-family:"Calibri",sans-serif;

        color:#1F497D;}

span.EmailStyle20

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:3.0cm 1.0cm 2.0cm 3.0cm;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=LT link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Thanks Hadrien,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>I will keep Your suggestion in mind</span><span lang=EN-US style='color:#1F497D'>! <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>For now, perhaps someone else will give some thoughts on that<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>Rimantas<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='mso-fareast-language:LT'>From:</span></b><span lang=EN-US style='mso-fareast-language:LT'> traminer-users-bounces@lists.r-forge.r-project.org [mailto:traminer-users-bounces@lists.r-forge.r-project.org] <b>On Behalf Of </b>Hadrien Commenges<br><b>Sent:</b> Monday, February 16, 2015 11:50 PM<br><b>To:</b> Users questions<br><b>Subject:</b> Re: [Traminer-users] linking short sequences with clusters based on long sequences<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>I understand your problem but I'm not competent to give you sound advice. So I'll answer to a separate but linked question. You have two options: 1/ making clusters with classical data and exemplify with sequential data or 2/ classify with sequential data and make clusters profiles with classical data. In my experience (with other kind of data), the 1st option is safer: building your classification with classical (i.e. non sequential) variables, and then to extracting a set of several representative sequences (with seqrep). Doing so, you'll bypass your problem.</span><span style='font-size:12.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:LT'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>Good luck !<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>Hadrien<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'><o:p> </o:p></span></p></div><div class=MsoNormal align=center style='text-align:center'><span style='font-family:"Arial",sans-serif;color:black'><hr size=2 width="100%" align=center id=zwchr></span></div><div><p class=MsoNormal><b><span style='font-family:"Helvetica",sans-serif;color:black'>De: </span></b><span style='font-family:"Helvetica",sans-serif;color:black'>"Rimantas Vosylis" <<a href="mailto:rvosylis@live.com">rvosylis@live.com</a>><br><b>À: </b>"Users questions" <<a href="mailto:traminer-users@lists.r-forge.r-project.org">traminer-users@lists.r-forge.r-project.org</a>><br><b>Envoyé: </b>Lundi 16 Février 2015 16:24:37<br><b>Objet: </b>Re: [Traminer-users] linking short sequences with clusters        based        on        long sequences<o:p></o:p></span></p><div><p class=MsoNormal><span style='font-family:"Helvetica",sans-serif;color:black'><o:p> </o:p></span></p></div><p class=MsoNormal><span style='color:#1F497D'>Hadrien, </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thank You for these responses. I will try to explain design of my data a bit more. </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>My sequences are alligned to the moment my participants finish school (it happens at about 18 years of age). One object in the sequence represents a role combination status for 6 months. So for 30-year-olds I have about 24 objects (12 years * 2) ±1 object. For 25-year-olds, its about 14 objects. E.g.</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>For 30-year-olds</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>223334457777888999999999</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>For 25-year-olds</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>22333445777788</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>At the moment I have two sequences for each participant, because I analyze sequences for education-work and family (residence, marriage, parenthood) transitions separately. Now I only focus on education-work transitions, as I will repeat the same steps for family later. </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>My first goal of the study is to describe the existing transitions based on 30-year-olds only. </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>However, the next goal is to compare how these groups (clusters) differ on various psychosocial indicators e.g. personal identity</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>In addition (this is where it gets complicated), I want to compare how individuals who are only in the middle of that particular life path (trajectory) differ on various psychosocial indicators. The best way to do that would be to have actual longitudinal data for both: status sequences and psychosocial indicators. Yet I do not have such data. What I have is a group of 25-year-olds that were also assessed with Life History Calendar. Since I know their sequences as well, I believe that I could link them to the most likely trajectory based on the similarity of their current sequence. For example, if the representative sequence of cluster X is: 223334457777888999999999, then this squence for 25-year-old: 22333445777788 is very similar to the representative one of cluster 1. It only misses the information for the last 5 years. However, I am not sure which strategy is better: (A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence or (B) to run all analyses with both 25- and 30-year-olds together. For (A) I have a problem of selecting a representative sequence, which I did not solve yet. For (B) I have a problem of getting a bit different results with hierarchical cluster analysis (the clusters extracted look similar but some notable differences exist).</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>I have considered converting into <i>distinct state sequences</i>, but I think it is not suitable for me. Here is the reason why:</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Let’s say I have a sequence (a) for 30-year-old: 1111222233333333333344, then it will be converted into 1234. Now let’s say I have a sequence (b) for 25-year olds: 111122223333. It will be converted into 123. Now let’s say I have a sequence (c) for 25-year olds: 1234444444444. It will be converted into 1234. Sequence (b) will have a larger distance from (a) sequence even though the they are the same (except that I do not know how it will finish). Therefore, what I want is the opposite: I want (b) to have smaller distance from (a), and (c) to have larger distance.</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>I also completely understand that this sort of analysis is valid only with the assumption that the 25-year-old cohort will follow the same life trajectories as 30-year-olds. However, I think I can build enough support to believe so.</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Maybe You and others could give me some more thoughts about such analysis. </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thank You in advance – I really appreciate any help</span><span lang=EN-US style='color:#1F497D'>!!</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Rimantas</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'> </span><span style='color:black'><o:p></o:p></span></p><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='color:black;mso-fareast-language:LT'>From:</span></b><span lang=EN-US style='color:black;mso-fareast-language:LT'> <a href="mailto:traminer-users-bounces@lists.r-forge.r-project.org">traminer-users-bounces@lists.r-forge.r-project.org</a> [<a href="mailto:traminer-users-bounces@lists.r-forge.r-project.org">mailto:traminer-users-bounces@lists.r-forge.r-project.org</a>] <b>On Behalf Of </b>Hadrien Commenges<br><b>Sent:</b> Monday, February 16, 2015 3:42 PM<br><b>To:</b> Users questions<br><b>Subject:</b> Re: [Traminer-users] linking short sequences with clusters based on long sequences</span><span style='color:black'><o:p></o:p></span></p></div></div><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>I'll try two answers : </span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist.</span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object.</span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>Hope it helps.</span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'>Hadrien</span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><div class=MsoNormal align=center style='text-align:center'><span style='color:black'><hr size=2 width="100%" align=center></span></div><div><p class=MsoNormal><b><span style='font-family:"Helvetica",sans-serif;color:black'>De: </span></b><span style='font-family:"Helvetica",sans-serif;color:black'>"Rimantas Vosylis" <<a href="mailto:rvosylis@live.com" target="_blank">rvosylis@live.com</a>><br><b>À: </b><a href="mailto:traminer-users@lists.r-forge.r-project.org" target="_blank">traminer-users@lists.r-forge.r-project.org</a><br><b>Envoyé: </b>Lundi 16 Février 2015 13:41:30<br><b>Objet: </b>[Traminer-users] linking short sequences with clusters based on        long sequences</span><span style='color:black'><o:p></o:p></span></p><div><p class=MsoNormal><span style='font-family:"Helvetica",sans-serif;color:black'> </span><span style='color:black'><o:p></o:p></span></p></div><p class=MsoNormal><span style='color:black'>Dear Traminer users and experts,<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response </span><span style='font-family:Wingdings;color:black'>J</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one -  25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds.<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories).<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that  were found based on analyses that also would include 30-year-old group.<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don’t know.<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoListParagraph style='text-indent:-18.0pt'><span style='color:black'>1.</span><span style='font-size:7.0pt;font-family:"Times New Roman",serif;color:black'>       </span><span style='color:black'>The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones’ are already assigned to some cluster.<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds?<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants’ sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>Q2. Does this seem like a more valid strategy than the first one?<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>Q3. Perhaps You could provide another option on how to do such assigning?<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><b><span style='color:black'>I would really appreciate any help on any of these questions. </span></b><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'>Rimantas<o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Helvetica",sans-serif;color:black;mso-fareast-language:LT'><br>_______________________________________________<br>Traminer-users mailing list<br><a href="mailto:Traminer-users@lists.r-forge.r-project.org" target="_blank">Traminer-users@lists.r-forge.r-project.org</a><br><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users</a></span><span style='color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:LT'> </span><span style='color:black'><o:p></o:p></span></p></div></div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Helvetica",sans-serif;color:black;mso-fareast-language:LT'><br>_______________________________________________<br>Traminer-users mailing list<br><a href="mailto:Traminer-users@lists.r-forge.r-project.org">Traminer-users@lists.r-forge.r-project.org</a><br><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users</a><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:LT'><o:p> </o:p></span></p></div></div></div></body></html>