[Traminer-users] linking short sequences with custers based on long sequences

Rimantas Vosylis rvosylis at live.com
Wed Feb 25 08:39:59 CET 2015


Dear Matthias Studer,

 

Thank You for sharing Your thoughts about my issue. It gave me some ideas to think about, but I am still lost in choosing the right method. I will explain my research questions a bit more, and why I generally think, the approach I am using can be useful to me.

 

Perhaps the most general goal of my study is to challenge the ideas laid out in the theory of “emerging adulthood” by Jeffrey Jensen Arnett. In this theory, it is stated that transitional events that lead to acquisition of adult roles are not that important anymore for a person to become an adult. Instead, modern adulthood is achieved through acquisition of individualistic character traits such as becoming responsible, self-sufficient and so on. It is also said, that a person becomes an adult at about 30 years of age and between the adolescence and early adulthood there is now a new period – “emerging adulthood” – that is described by delay of entry in adult roles, prolonged identity exploration, instability, feeling “in-between” and so on. Critics of this theory argue that this stage, which is described by this features, is not really a stage but a trajectory. 

                      The first that I want to do that is to show that not all people delay entry into adult roles. Holistic trajectories I reveal so far, shows that rather well (so do delay, some not). Now I also want to show that these trajectories differ on these characteristics of emerging adulthood. I see some of these differences in 30-year-olds, but I also want to take a look how people, who tend to follow one path or the other (trajectory), are different on these characteristics whilst being 25 years of age. I believe the differences would be seen during that period as well. 

                      Perhaps this question (how holistic trajectories are related to psychosocial indicators) is a bit vague, but on the other hand, I believe that this methodology serves it substantially better, then focusing on single events (e.g. how marriage affects change in some behavior) that was used in previous studies. Single events are almost always confounded with other events (e.g. those that have children will most likely be married), and sequence analysis using OM also provides the dimension of time spent in some status. 

 

So now, I was thinking about what You suggested. I also created a fictitious dataset to play with and see how OM algorithm works by creating distances when I use different options for those short distances. 

 

 

First I used sequences like this (I used “right = "DEL"” argument when defining sequences).

123334445556666

12333444

44555666

Then I tried inserting some other status as You suggested and transformed first and second sequence into

000000012333444

000000044555666

 

In both cases OM algorithm still penalized the short sequences for transformation quite similarly. If I treated the missing to the left as void, it considered substitution of each missing value as best way to align (as much as I was able to understand from cost matrix). If I inserted manually some value, distance was also similar. 

 

However, in both cases I found that my clusters extracted very highly linked to the length of sequences. The largest cluster contained most of long sequences (30-year-olds) and the rest were the short sequences divided into smaller clusters.

 

So I feel like I have hit the wall here. I am still considering option A from my previous letter ((A) to start with only 30-year-olds and then recalculate the similarity of 25-year-olds to some representative sequence), however even that seems to be to much „innovative“ and I might find it very hard to defend. 

 

So I think I will just do separate analysis for 30- and 25-year-olds :(

 

Thanks again for everyone that shared thoughts about this!

 

Rimantas

 

 

 

From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Matthias Studer
Sent: Tuesday, February 17, 2015 12:44 AM
To: Users questions
Subject: Re: [Traminer-users] linking short sequences with custers based on long sequences

 

Dear Rimantas Vosylis,

 

Here are my thought about your issue. You are studying an outcome of the trajectories, whereas sequence analysis is often used to study how starting condition influence the following trajectories. This makes big differences.

 

I think you should develop the exact assumption you are making. Why do you think that there is a relationship between trajectories and psychosocial indicators exactly (please find some example below)?

 

-          Previous semester influence current psychosocial indicator. In this case, you could align the sequence to the end of observation and add the state “in school/education” for unobserved semester (at the beginning of the sequence). You’ll have complete trajectories in both cases. Depending on the issue, this may be a good solution. Concretely, this would lead to recode trajectory:

*  22333445777788

o   To 

*  111111111122333445777788

o   Where state 1 is being in school

o   Your sequence would describe the last 24 semesters in all cases.

 

-          How are whole trajectories and psychosocial indicators linked from an holistic perspective? These kind of research questions are generally too vague for me. The research question assume that you measure complete trajectories, hence, you need predicting the end of incomplete trajectories. In order to render the uncertainty of the predictions, I use multiple imputation in some ways (but I never tried). I know Brendan Halpin has written an article about that. Strategy A goes in the same direction but do not render the uncertainty of the predictions.

 

-          I think strategy B may be meaningful because it may render the differences (in life history) between having 25 or 30 years old. However, you should be more precise about your assumption.

 

Because I can only think about the relation you are studying (trajectories and psychosocial indicators) using the first research question, I would use that method. If you were studying the results of starting conditions (the effect of the situation at the end of education) I would go toward multiple imputation.

 

Hope this helps.

Matthias

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20150225/495bef4a/attachment-0001.html>


More information about the Traminer-users mailing list