From rvosylis at live.com Mon Feb 16 13:41:30 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Mon, 16 Feb 2015 14:41:30 +0200 Subject: [Traminer-users] linking short sequences with clusters based on long sequences Message-ID: Dear Traminer users and experts, I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response :) I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don't know. 1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones' are already assigned to some cluster. Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants' sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research Q2. Does this seem like a more valid strategy than the first one? Q3. Perhaps You could provide another option on how to do such assigning? I would really appreciate any help on any of these questions. Rimantas -------------- next part -------------- An HTML attachment was scrubbed... URL: From hc at parisgeo.cnrs.fr Mon Feb 16 14:42:13 2015 From: hc at parisgeo.cnrs.fr (Hadrien Commenges) Date: Mon, 16 Feb 2015 14:42:13 +0100 (CET) Subject: [Traminer-users] linking short sequences with clusters based on long sequences In-Reply-To: References: Message-ID: <398970667.873780.1424094133685.JavaMail.zimbra@parisgeo.cnrs.fr> I'll try two answers : 1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist. 2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object. Hope it helps. Hadrien ----- Mail original ----- De: "Rimantas Vosylis" ?: traminer-users at lists.r-forge.r-project.org Envoy?: Lundi 16 F?vrier 2015 13:41:30 Objet: [Traminer-users] linking short sequences with clusters based on long sequences Dear Traminer users and experts, I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response J I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don?t know. 1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones? are already assigned to some cluster. Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants? sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research Q2. Does this seem like a more valid strategy than the first one? Q3. Perhaps You could provide another option on how to do such assigning? I would really appreciate any help on any of these questions. Rimantas _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvosylis at live.com Mon Feb 16 16:24:37 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Mon, 16 Feb 2015 17:24:37 +0200 Subject: [Traminer-users] linking short sequences with clusters based on long sequences In-Reply-To: <398970667.873780.1424094133685.JavaMail.zimbra@parisgeo.cnrs.fr> References: <398970667.873780.1424094133685.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: Hadrien, Thank You for these responses. I will try to explain design of my data a bit more. My sequences are alligned to the moment my participants finish school (it happens at about 18 years of age). One object in the sequence represents a role combination status for 6 months. So for 30-year-olds I have about 24 objects (12 years * 2) ?1 object. For 25-year-olds, its about 14 objects. E.g. For 30-year-olds 223334457777888999999999 For 25-year-olds 22333445777788 At the moment I have two sequences for each participant, because I analyze sequences for education-work and family (residence, marriage, parenthood) transitions separately. Now I only focus on education-work transitions, as I will repeat the same steps for family later. My first goal of the study is to describe the existing transitions based on 30-year-olds only. However, the next goal is to compare how these groups (clusters) differ on various psychosocial indicators e.g. personal identity In addition (this is where it gets complicated), I want to compare how individuals who are only in the middle of that particular life path (trajectory) differ on various psychosocial indicators. The best way to do that would be to have actual longitudinal data for both: status sequences and psychosocial indicators. Yet I do not have such data. What I have is a group of 25-year-olds that were also assessed with Life History Calendar. Since I know their sequences as well, I believe that I could link them to the most likely trajectory based on the similarity of their current sequence. For example, if the representative sequence of cluster X is: 223334457777888999999999, then this squence for 25-year-old: 22333445777788 is very similar to the representative one of cluster 1. It only misses the information for the last 5 years. However, I am not sure which strategy is better: (A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence or (B) to run all analyses with both 25- and 30-year-olds together. For (A) I have a problem of selecting a representative sequence, which I did not solve yet. For (B) I have a problem of getting a bit different results with hierarchical cluster analysis (the clusters extracted look similar but some notable differences exist). I have considered converting into distinct state sequences, but I think it is not suitable for me. Here is the reason why: Let?s say I have a sequence (a) for 30-year-old: 1111222233333333333344, then it will be converted into 1234. Now let?s say I have a sequence (b) for 25-year olds: 111122223333. It will be converted into 123. Now let?s say I have a sequence (c) for 25-year olds: 1234444444444. It will be converted into 1234. Sequence (b) will have a larger distance from (a) sequence even though the they are the same (except that I do not know how it will finish). Therefore, what I want is the opposite: I want (b) to have smaller distance from (a), and (c) to have larger distance. I also completely understand that this sort of analysis is valid only with the assumption that the 25-year-old cohort will follow the same life trajectories as 30-year-olds. However, I think I can build enough support to believe so. Maybe You and others could give me some more thoughts about such analysis. Thank You in advance ? I really appreciate any help!! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges Sent: Monday, February 16, 2015 3:42 PM To: Users questions Subject: Re: [Traminer-users] linking short sequences with clusters based on long sequences I'll try two answers : 1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist. 2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object. Hope it helps. Hadrien _____ De: "Rimantas Vosylis" > ?: traminer-users at lists.r-forge.r-project.org Envoy?: Lundi 16 F?vrier 2015 13:41:30 Objet: [Traminer-users] linking short sequences with clusters based on long sequences Dear Traminer users and experts, I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response :) I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don?t know. 1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones? are already assigned to some cluster. Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants? sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research Q2. Does this seem like a more valid strategy than the first one? Q3. Perhaps You could provide another option on how to do such assigning? I would really appreciate any help on any of these questions. Rimantas _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hc at parisgeo.cnrs.fr Mon Feb 16 22:50:00 2015 From: hc at parisgeo.cnrs.fr (Hadrien Commenges) Date: Mon, 16 Feb 2015 22:50:00 +0100 (CET) Subject: [Traminer-users] linking short sequences with clusters based on long sequences In-Reply-To: References: <398970667.873780.1424094133685.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: <965766551.883454.1424123400661.JavaMail.zimbra@parisgeo.cnrs.fr> I understand your problem but I'm not competent to give you sound advice. So I'll answer to a separate but linked question. You have two options: 1/ making clusters with classical data and exemplify with sequential data or 2/ classify with sequential data and make clusters profiles with classical data. In my experience (with other kind of data), the 1st option is safer: building your classification with classical (i.e. non sequential) variables, and then to extracting a set of several representative sequences (with seqrep). Doing so, you'll bypass your problem. Good luck ! Hadrien ----- Mail original ----- De: "Rimantas Vosylis" ?: "Users questions" Envoy?: Lundi 16 F?vrier 2015 16:24:37 Objet: Re: [Traminer-users] linking short sequences with clusters based on long sequences Hadrien, Thank You for these responses. I will try to explain design of my data a bit more. My sequences are alligned to the moment my participants finish school (it happens at about 18 years of age). One object in the sequence represents a role combination status for 6 months. So for 30-year-olds I have about 24 objects (12 years * 2) ?1 object. For 25-year-olds, its about 14 objects. E.g. For 30-year-olds 223334457777888999999999 For 25-year-olds 22333445777788 At the moment I have two sequences for each participant, because I analyze sequences for education-work and family (residence, marriage, parenthood) transitions separately. Now I only focus on education-work transitions, as I will repeat the same steps for family later. My first goal of the study is to describe the existing transitions based on 30-year-olds only. However, the next goal is to compare how these groups (clusters) differ on various psychosocial indicators e.g. personal identity In addition (this is where it gets complicated), I want to compare how individuals who are only in the middle of that particular life path (trajectory) differ on various psychosocial indicators. The best way to do that would be to have actual longitudinal data for both: status sequences and psychosocial indicators. Yet I do not have such data. What I have is a group of 25-year-olds that were also assessed with Life History Calendar. Since I know their sequences as well, I believe that I could link them to the most likely trajectory based on the similarity of their current sequence. For example, if the representative sequence of cluster X is: 223334457777888999999999, then this squence for 25-year-old: 22333445777788 is very similar to the representative one of cluster 1. It only misses the information for the last 5 years. However, I am not sure which strategy is better: (A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence or (B) to run all analyses with both 25- and 30-year-olds together. For (A) I have a problem of selecting a representative sequence, which I did not solve yet. For (B) I have a problem of getting a bit different results with hierarchical cluster analysis (the clusters extracted look similar but some notable differences exist). I have considered converting into distinct state sequences , but I think it is not suitable for me. Here is the reason why: Let?s say I have a sequence (a) for 30-year-old: 1111222233333333333344, then it will be converted into 1234. Now let?s say I have a sequence (b) for 25-year olds: 111122223333. It will be converted into 123. Now let?s say I have a sequence (c) for 25-year olds: 1234444444444. It will be converted into 1234. Sequence (b) will have a larger distance from (a) sequence even though the they are the same (except that I do not know how it will finish). Therefore, what I want is the opposite: I want (b) to have smaller distance from (a), and (c) to have larger distance. I also completely understand that this sort of analysis is valid only with the assumption that the 25-year-old cohort will follow the same life trajectories as 30-year-olds. However, I think I can build enough support to believe so. Maybe You and others could give me some more thoughts about such analysis. Thank You in advance ? I really appreciate any help !! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges Sent: Monday, February 16, 2015 3:42 PM To: Users questions Subject: Re: [Traminer-users] linking short sequences with clusters based on long sequences I'll try two answers : 1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist. 2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object. Hope it helps. Hadrien ----- Mail original ----- De: "Rimantas Vosylis" < rvosylis at live.com > ?: traminer-users at lists.r-forge.r-project.org Envoy?: Lundi 16 F?vrier 2015 13:41:30 Objet: [Traminer-users] linking short sequences with clusters based on long sequences Dear Traminer users and experts, I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response J I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don?t know. 1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones? are already assigned to some cluster. Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants? sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research Q2. Does this seem like a more valid strategy than the first one? Q3. Perhaps You could provide another option on how to do such assigning? I would really appreciate any help on any of these questions. Rimantas _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvosylis at live.com Mon Feb 16 23:19:26 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Tue, 17 Feb 2015 00:19:26 +0200 Subject: [Traminer-users] linking short sequences with clusters based on long sequences In-Reply-To: <965766551.883454.1424123400661.JavaMail.zimbra@parisgeo.cnrs.fr> References: <398970667.873780.1424094133685.JavaMail.zimbra@parisgeo.cnrs.fr> <965766551.883454.1424123400661.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: Thanks Hadrien, I will keep Your suggestion in mind! For now, perhaps someone else will give some thoughts on that Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges Sent: Monday, February 16, 2015 11:50 PM To: Users questions Subject: Re: [Traminer-users] linking short sequences with clusters based on long sequences I understand your problem but I'm not competent to give you sound advice. So I'll answer to a separate but linked question. You have two options: 1/ making clusters with classical data and exemplify with sequential data or 2/ classify with sequential data and make clusters profiles with classical data. In my experience (with other kind of data), the 1st option is safer: building your classification with classical (i.e. non sequential) variables, and then to extracting a set of several representative sequences (with seqrep). Doing so, you'll bypass your problem. Good luck ! Hadrien _____ De: "Rimantas Vosylis" > ?: "Users questions" > Envoy?: Lundi 16 F?vrier 2015 16:24:37 Objet: Re: [Traminer-users] linking short sequences with clusters based on long sequences Hadrien, Thank You for these responses. I will try to explain design of my data a bit more. My sequences are alligned to the moment my participants finish school (it happens at about 18 years of age). One object in the sequence represents a role combination status for 6 months. So for 30-year-olds I have about 24 objects (12 years * 2) ?1 object. For 25-year-olds, its about 14 objects. E.g. For 30-year-olds 223334457777888999999999 For 25-year-olds 22333445777788 At the moment I have two sequences for each participant, because I analyze sequences for education-work and family (residence, marriage, parenthood) transitions separately. Now I only focus on education-work transitions, as I will repeat the same steps for family later. My first goal of the study is to describe the existing transitions based on 30-year-olds only. However, the next goal is to compare how these groups (clusters) differ on various psychosocial indicators e.g. personal identity In addition (this is where it gets complicated), I want to compare how individuals who are only in the middle of that particular life path (trajectory) differ on various psychosocial indicators. The best way to do that would be to have actual longitudinal data for both: status sequences and psychosocial indicators. Yet I do not have such data. What I have is a group of 25-year-olds that were also assessed with Life History Calendar. Since I know their sequences as well, I believe that I could link them to the most likely trajectory based on the similarity of their current sequence. For example, if the representative sequence of cluster X is: 223334457777888999999999, then this squence for 25-year-old: 22333445777788 is very similar to the representative one of cluster 1. It only misses the information for the last 5 years. However, I am not sure which strategy is better: (A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence or (B) to run all analyses with both 25- and 30-year-olds together. For (A) I have a problem of selecting a representative sequence, which I did not solve yet. For (B) I have a problem of getting a bit different results with hierarchical cluster analysis (the clusters extracted look similar but some notable differences exist). I have considered converting into distinct state sequences, but I think it is not suitable for me. Here is the reason why: Let?s say I have a sequence (a) for 30-year-old: 1111222233333333333344, then it will be converted into 1234. Now let?s say I have a sequence (b) for 25-year olds: 111122223333. It will be converted into 123. Now let?s say I have a sequence (c) for 25-year olds: 1234444444444. It will be converted into 1234. Sequence (b) will have a larger distance from (a) sequence even though the they are the same (except that I do not know how it will finish). Therefore, what I want is the opposite: I want (b) to have smaller distance from (a), and (c) to have larger distance. I also completely understand that this sort of analysis is valid only with the assumption that the 25-year-old cohort will follow the same life trajectories as 30-year-olds. However, I think I can build enough support to believe so. Maybe You and others could give me some more thoughts about such analysis. Thank You in advance ? I really appreciate any help!! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges Sent: Monday, February 16, 2015 3:42 PM To: Users questions Subject: Re: [Traminer-users] linking short sequences with clusters based on long sequences I'll try two answers : 1/ your question is not a simple technical decision, it's also a research choice and we can't answer without knowing your dataset and your research objectives. For example, you have 30 time steps (1 per year) and If you work with calendar-time: for the 30 years old you have 30 values, and for the 25 years old 25 values. You could assign null values during the first 5 years for 25 yo individuals. Another option would be to align each individual at his birthday year (time as process). On both cases, if you compute a distance in your dataset, sure the cohort will impact the results, but you can't erase the differences between 30 and 25 yo individuals, they do exist. 2/ if you want to minimize the importance of the cohort, the easiest way is to suppress the time as quantity and consider only the succession of states. Convert your sequences into distinct states sequences (seqdss) and compute your distances with this DSS object. Hope it helps. Hadrien _____ De: "Rimantas Vosylis" > ?: traminer-users at lists.r-forge.r-project.org Envoy?: Lundi 16 F?vrier 2015 13:41:30 Objet: [Traminer-users] linking short sequences with clusters based on long sequences Dear Traminer users and experts, I wrote this question few weeks ago but no one answered. I will make it brief this time, so maybe I will get some response :) I am interested in transitions to adulthood. I have two groups one is called 30-year-olds and another one - 25-year-olds. For both of these groups I have a sequence of life situation statuses. For 30-year-olds the sequence is longer than for 25-year-olds. I want find the typology these sequences (transitions to adulthood) and I also want to assign sequences of 25-year-olds and 30-year-olds to these types (trajectories). So the main issue for me is how can I assign the 25-year-olds that have shorter sequences to the clusters that were found based on analyses that also would include 30-year-old group. I came up with several strategies, but I am not sure which on is better, or maybe there is something else I can do but I don?t know. 1. The first strategy is that I simply run optimal matching calculations for the full dataset (including the ones that have long sequences and shorter ones) and those that have shorter ones? are already assigned to some cluster. Q1. My first question to You is: does this seem like a valid strategy to assign 25-year-olds to the clusters that are actually created using also 30-year-olds? 2. The second strategy is that I first analyze only 30-year-olds, then I extract ideal types representing each cluster, then I include these ideal types into dataset of only 25-year-olds and I rerun Optimal matching analysis. Then based on the shortest distance from each ideal type sequence to each participants? sequence I assign them to those clusters. Something similar was discussed by Martin, P., Schoon, I., Ross, A., Beyond Transitions: Applying Optimal Matching to Life Course Research Q2. Does this seem like a more valid strategy than the first one? Q3. Perhaps You could provide another option on how to do such assigning? I would really appreciate any help on any of these questions. Rimantas _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthias.Studer at unige.ch Mon Feb 16 23:43:44 2015 From: Matthias.Studer at unige.ch (Matthias Studer) Date: Mon, 16 Feb 2015 22:43:44 +0000 Subject: [Traminer-users] linking short sequences with custers based on long sequences Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7166C1D88@kilo.isis.unige.ch> Dear Rimantas Vosylis, Here are my thought about your issue. You are studying an outcome of the trajectories, whereas sequence analysis is often used to study how starting condition influence the following trajectories. This makes big differences. I think you should develop the exact assumption you are making. Why do you think that there is a relationship between trajectories and psychosocial indicators exactly (please find some example below)? - Previous semester influence current psychosocial indicator. In this case, you could align the sequence to the end of observation and add the state ?in school/education? for unobserved semester (at the beginning of the sequence). You?ll have complete trajectories in both cases. Depending on the issue, this may be a good solution. Concretely, this would lead to recode trajectory: ? 22333445777788 o To ? 111111111122333445777788 o Where state 1 is being in school o Your sequence would describe the last 24 semesters in all cases. - How are whole trajectories and psychosocial indicators linked from an holistic perspective? These kind of research questions are generally too vague for me. The research question assume that you measure complete trajectories, hence, you need predicting the end of incomplete trajectories. In order to render the uncertainty of the predictions, I use multiple imputation in some ways (but I never tried). I know Brendan Halpin has written an article about that. Strategy A goes in the same direction but do not render the uncertainty of the predictions. - I think strategy B may be meaningful because it may render the differences (in life history) between having 25 or 30 years old. However, you should be more precise about your assumption. Because I can only think about the relation you are studying (trajectories and psychosocial indicators) using the first research question, I would use that method. If you were studying the results of starting conditions (the effect of the situation at the end of education) I would go toward multiple imputation. Hope this helps. Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpohlig at bigsss.uni-bremen.de Tue Feb 24 15:47:40 2015 From: mpohlig at bigsss.uni-bremen.de (Matthias Pohlig) Date: Tue, 24 Feb 2015 15:47:40 +0100 Subject: [Traminer-users] as.clustrange produces "Error in `row.names" after weighted hclust Message-ID: <54EC8F0C.4030908@bigsss.uni-bremen.de> Dear TraMineR users, I am running into a strange error message when I am using as.clustrange after hclust(... method="ward.D2") with weights. I already wrote a bug report concerning this error. Maybe, there is someone on the mailinglist who has experienced the same error and can give some helpful comments? The complete error message is: Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': 'cluster4' Apparently, the rownames of the sequence object cannot be the cause of the problem as they are no hindrance in other cases: I can do all other stuff with the sequence object like seqplot(... weighted=TRUE). I am using the most current TaMineR version 1.8-9 and I have tried both the stable and the development version of the WeightedCluster package. If I am using agnes(... method="Ward") or forgo weights, the command works just fine. I assume that it has to do with the weights, but I do not know how to find the problem. Seqdef does not produce any errors or warnings and the weights do not seem to have problematic values. Unfortunately, I cannot post a public example from the EU-SILC microdata I am using due to confidentiality issues. Here is a short description of my weight variable: describe(mydata[, c("weight")]) mydata[, c("weight")] n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95 5439 0 4022 1 2987 504.8 766.6 1228.2 2122.4 3861.6 6445.6 8464.7 lowest : 0.00 69.51 71.14 116.27 127.13 highest: 19790.94 21687.61 22317.26 24353.99 46816.19 I would appreciate your thoughts and comments on this error! Best regards, Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvosylis at live.com Wed Feb 25 08:39:59 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Wed, 25 Feb 2015 09:39:59 +0200 Subject: [Traminer-users] linking short sequences with custers based on long sequences In-Reply-To: <367AEF503B1B6A4EA602FB66D71A3EC7166C1D88@kilo.isis.unige.ch> References: <367AEF503B1B6A4EA602FB66D71A3EC7166C1D88@kilo.isis.unige.ch> Message-ID: Dear Matthias Studer, Thank You for sharing Your thoughts about my issue. It gave me some ideas to think about, but I am still lost in choosing the right method. I will explain my research questions a bit more, and why I generally think, the approach I am using can be useful to me. Perhaps the most general goal of my study is to challenge the ideas laid out in the theory of ?emerging adulthood? by Jeffrey Jensen Arnett. In this theory, it is stated that transitional events that lead to acquisition of adult roles are not that important anymore for a person to become an adult. Instead, modern adulthood is achieved through acquisition of individualistic character traits such as becoming responsible, self-sufficient and so on. It is also said, that a person becomes an adult at about 30 years of age and between the adolescence and early adulthood there is now a new period ? ?emerging adulthood? ? that is described by delay of entry in adult roles, prolonged identity exploration, instability, feeling ?in-between? and so on. Critics of this theory argue that this stage, which is described by this features, is not really a stage but a trajectory. The first that I want to do that is to show that not all people delay entry into adult roles. Holistic trajectories I reveal so far, shows that rather well (so do delay, some not). Now I also want to show that these trajectories differ on these characteristics of emerging adulthood. I see some of these differences in 30-year-olds, but I also want to take a look how people, who tend to follow one path or the other (trajectory), are different on these characteristics whilst being 25 years of age. I believe the differences would be seen during that period as well. Perhaps this question (how holistic trajectories are related to psychosocial indicators) is a bit vague, but on the other hand, I believe that this methodology serves it substantially better, then focusing on single events (e.g. how marriage affects change in some behavior) that was used in previous studies. Single events are almost always confounded with other events (e.g. those that have children will most likely be married), and sequence analysis using OM also provides the dimension of time spent in some status. So now, I was thinking about what You suggested. I also created a fictitious dataset to play with and see how OM algorithm works by creating distances when I use different options for those short distances. First I used sequences like this (I used ?right = "DEL"? argument when defining sequences). 123334445556666 12333444 44555666 Then I tried inserting some other status as You suggested and transformed first and second sequence into 000000012333444 000000044555666 In both cases OM algorithm still penalized the short sequences for transformation quite similarly. If I treated the missing to the left as void, it considered substitution of each missing value as best way to align (as much as I was able to understand from cost matrix). If I inserted manually some value, distance was also similar. However, in both cases I found that my clusters extracted very highly linked to the length of sequences. The largest cluster contained most of long sequences (30-year-olds) and the rest were the short sequences divided into smaller clusters. So I feel like I have hit the wall here. I am still considering option A from my previous letter ((A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence), however even that seems to be to much ?innovative? and I might find it very hard to defend. So I think I will just do separate analysis for 30- and 25-year-olds :( Thanks again for everyone that shared thoughts about this! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Matthias Studer Sent: Tuesday, February 17, 2015 12:44 AM To: Users questions Subject: Re: [Traminer-users] linking short sequences with custers based on long sequences Dear Rimantas Vosylis, Here are my thought about your issue. You are studying an outcome of the trajectories, whereas sequence analysis is often used to study how starting condition influence the following trajectories. This makes big differences. I think you should develop the exact assumption you are making. Why do you think that there is a relationship between trajectories and psychosocial indicators exactly (please find some example below)? - Previous semester influence current psychosocial indicator. In this case, you could align the sequence to the end of observation and add the state ?in school/education? for unobserved semester (at the beginning of the sequence). You?ll have complete trajectories in both cases. Depending on the issue, this may be a good solution. Concretely, this would lead to recode trajectory: * 22333445777788 o To * 111111111122333445777788 o Where state 1 is being in school o Your sequence would describe the last 24 semesters in all cases. - How are whole trajectories and psychosocial indicators linked from an holistic perspective? These kind of research questions are generally too vague for me. The research question assume that you measure complete trajectories, hence, you need predicting the end of incomplete trajectories. In order to render the uncertainty of the predictions, I use multiple imputation in some ways (but I never tried). I know Brendan Halpin has written an article about that. Strategy A goes in the same direction but do not render the uncertainty of the predictions. - I think strategy B may be meaningful because it may render the differences (in life history) between having 25 or 30 years old. However, you should be more precise about your assumption. Because I can only think about the relation you are studying (trajectories and psychosocial indicators) using the first research question, I would use that method. If you were studying the results of starting conditions (the effect of the situation at the end of education) I would go toward multiple imputation. Hope this helps. Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: