[Traminer-users] linking short sequences with clusters based on long sequences

Hadrien Commenges hc at parisgeo.cnrs.fr
Mon Feb 16 22:50:00 CET 2015


I understand your problem but I'm not competent to give you sound advice. So I'll answer to a separate but linked question. You have two options: 1/ making clusters with classical data and exemplify with sequential data or 2/ classify with sequential data and make clusters profiles with classical data. In my experience (with other kind of data), the 1st option is safer: building your classification with classical (i.e. non sequential) variables, and then to extracting a set of several representative sequences (with seqrep). Doing so, you'll bypass your problem. 

Good luck ! 

Hadrien 

----- Mail original -----

De: "Rimantas Vosylis" <rvosylis at live.com> 
À: "Users questions" <traminer-users at lists.r-forge.r-project.org> 
Envoyé: Lundi 16 Février 2015 16:24:37 
Objet: Re: [Traminer-users] linking short sequences with clusters based on long sequences 



Hadrien, 

Thank You for these responses. I will try to explain design of my data a bit more. 



My sequences are alligned to the moment my participants finish school (it happens at about 18 years of age). One object in the sequence represents a role combination status for 6 months. So for 30-year-olds I have about 24 objects (12 years * 2) ±1 object. For 25-year-olds, its about 14 objects. E.g. 

For 30-year-olds 

223334457777888999999999 

For 25-year-olds 

22333445777788 



At the moment I have two sequences for each participant, because I analyze sequences for education-work and family (residence, marriage, parenthood) transitions separately. Now I only focus on education-work transitions, as I will repeat the same steps for family later. 



My first goal of the study is to describe the existing transitions based on 30-year-olds only. 

However, the next goal is to compare how these groups (clusters) differ on various psychosocial indicators e.g. personal identity 



In addition (this is where it gets complicated), I want to compare how individuals who are only in the middle of that particular life path (trajectory) differ on various psychosocial indicators. The best way to do that would be to have actual longitudinal data for both: status sequences and psychosocial indicators. Yet I do not have such data. What I have is a group of 25-year-olds that were also assessed with Life History Calendar. Since I know their sequences as well, I believe that I could link them to the most likely trajectory based on the similarity of their current sequence. For example, if the representative sequence of cluster X is: 223334457777888999999999, then this squence for 25-year-old: 22333445777788 is very similar to the representative one of cluster 1. It only misses the information for the last 5 years. However, I am not sure which strategy is better: (A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence or (B) to run all analyses with both 25- and 30-year-olds together. For (A) I have a problem of selecting a representative sequence, which I did not solve yet. For (B) I have a problem of getting a bit different results with hierarchical cluster analysis (the clusters extracted look similar but some notable differences exist). 



I have considered converting into distinct state sequences , but I think it is not suitable for me. Here is the reason why: 

Let’s say I have a sequence (a) for 30-year-old: 1111222233333333333344, then it will be converted into 1234. Now let’s say I have a sequence (b) for 25-year olds: 111122223333. It will be converted into 123. Now let’s say I have a sequence (c) for 25-year olds: 1234444444444. It will be converted into 1234. Sequence (b) will have a larger distance from (a) sequence even though the they are the same (except that I do not know how it will finish). Therefore, what I want is the opposite: I want (b) to have smaller distance from (a), and (c) to have larger distance. 



I also completely understand that this sort of analysis is valid only with the assumption that the 25-year-old cohort will follow the same life trajectories as 30-year-olds. However, I think I can build enough support to believe so. 



Maybe You and others could give me some more thoughts about such analysis. 



Thank You in advance – I really appreciate any help !! 



Rimantas 






From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges 
Sent: Monday, February 16, 2015 3:42 PM 
To: Users questions 
Subject: Re: [Traminer-users] linking short sequences with clusters based on long sequences 





I'll try two answers : 





1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist. 





2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object. 





Hope it helps. 





Hadrien 



----- Mail original -----



De: "Rimantas Vosylis" < rvosylis at live.com > 
À: traminer-users at lists.r-forge.r-project.org 
Envoyé: Lundi 16 Février 2015 13:41:30 
Objet: [Traminer-users] linking short sequences with clusters based on long sequences 





Dear Traminer users and experts, 



I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response J 



I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. 



I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). 

So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. 

I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don’t know. 



1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones’ are already assigned to some cluster. 

Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 



2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants’ sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research 



Q2. Does this seem like a more valid strategy than the first one? 



Q3. Perhaps You could provide another option on how to do such assigning? 



I would really appreciate any help on any of these questions. 



Rimantas 




_______________________________________________ 
Traminer-users mailing list 
Traminer-users at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users 




_______________________________________________ 
Traminer-users mailing list 
Traminer-users at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20150216/d1aacff9/attachment-0001.html>


More information about the Traminer-users mailing list