From shawn at ori.org  Fri Jan  3 02:37:53 2014
From: shawn at ori.org (Shawn Boles)
Date: Fri, 3 Jan 2014 01:37:53 +0000
Subject: [Traminer-users] minimum standards for datasets used with TraMineR
Message-ID: <05BF3E42C886444A86D060B044B7A1CD1570FB36@Exmail4.ori-eug.ori.org>

Hi All:

Any suggestions as to how to determine the minimum dataset size and missingness characteristics to which TraMineR may be applied sensibly would be helpful. I looked for information in documentation but did not see anything that I could use as a heuristic.

I am using TraMineR to analyze 5 years of BMI observations, coded  as a four level ordinal categorical variable for 5046 elementary school children (grades k-5) .  Only 414 of these have 5 observations ( ~ 56% of the K, 1 students measured at time 1  who could have had been measured in years 1 to 5.) .  Here are have two related, if not well stated, questions:


  1.  Is it legitimate to focus only on complete cases since I  only have 5 data points and high cumulative natural attrition. Testing  the complete cases against all cases reveals no substantive difference in values of predictors. The analyses from complete cases are informative,  while including all cases , regardless of imputation choice, just makes things noisy. I tried admitting only sequences of 4 or 5 but  results were still noisy.


  1.  Is five  too short a sequence object to use with TraMineR,  given the imputation  patterns required by the full dataset, regardless of whether they are due  to planned missingness or MAR?


  1.  Below is the number of cases with from 1 to 5 observations.

              1       2        3     4      5
              1846 1287 869 630 414

TraMineR is a great tool. I want to make sure I am using it appropriately.

Thanks.

Shawn Boles, Ph.D.
Senior Research Associate
Oregon Research Institute
1776 Millrace Drive
Eugene, Oregon 97403-2536
USA
Phone (541) 484-2123 ext 2225
Fax: (541) 484-1108


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140103/cc684cab/attachment.html>

From jeremyr at uga.edu  Mon Jan 13 22:09:08 2014
From: jeremyr at uga.edu (Jeremy Reynolds)
Date: Mon, 13 Jan 2014 16:09:08 -0500
Subject: [Traminer-users] plots with group option vs separate plots
Message-ID: <CABicVB9Uhod4m+uJdi-WgHRrb5mttbTeE6eESPc4dh58XzKrDA@mail.gmail.com>

Hello,

I have been making sequence plots, and I seem to be getting very different
results when I use the "group" option of the seqdplot or seqIplot command
than when I draw separate plots for each subgroup.

After creating a sequence object and performing optimal matching using PAM,
I have chosen a 4 cluster solution.  I then create a single plot that shows
the distribution across states in each of the 4 clusters like this:

seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA,
title="pam5vs")

If I subset the data and make a separate plot for one of the 4 clusters as
in the code below, the N matches the results above (the total N and the N
across the states), but I get a very different impression of how the cases
are distributed across the states in the two graphs.  Am I doing something
wrong?  I would be happy to provide more detail if needed.

Thanks,

Jeremy

cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875))
seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U"
))
seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875")

-- 
********************
Dr. Jeremy Reynolds
Associate Professor
Undergraduate Coordinator
Department of Sociology
116 Baldwin Hall
University of Georgia
Athens, GA 30602-1611
Phone: (706) 583-8072
Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
Fax: (706) 542-4320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140113/cf5acfcf/attachment.html>

From Gilbert.Ritschard at unige.ch  Wed Jan 15 16:35:34 2014
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Wed, 15 Jan 2014 15:35:34 +0000
Subject: [Traminer-users] plots with group option vs separate plots
In-Reply-To: <CABicVB9Uhod4m+uJdi-WgHRrb5mttbTeE6eESPc4dh58XzKrDA@mail.gmail.com>
References: <CABicVB9Uhod4m+uJdi-WgHRrb5mttbTeE6eESPc4dh58XzKrDA@mail.gmail.com>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch>

Hi Jeremy,

Thank you for your question. Could you provide a reproducible working example? Indeed you are right, you should get the same plot in both cases. You do not provide enough information, however, to allow identifying the source of the problem.

Your question certainly is of interest for many TraMineR users and future users.  I would therefore suggest you post your question on StackOverflow (see http://mephisto.unige.ch/traminer/contrib.shtml) using the "traminer" tag which is searchable, unlike this r-forge list.

Best.
Gilbert


From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Jeremy Reynolds
Sent: Monday, January 13, 2014 22:09
To: traminer-users at lists.r-forge.r-project.org
Subject: [Traminer-users] plots with group option vs separate plots

Hello,
I have been making sequence plots, and I seem to be getting very different results when I use the "group" option of the seqdplot or seqIplot command than when I draw separate plots for each subgroup.
After creating a sequence object and performing optimal matching using PAM, I have chosen a 4 cluster solution.  I then create a single plot that shows the distribution across states in each of the 4 clusters like this:

seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, title="pam5vs")
If I subset the data and make a separate plot for one of the 4 clusters as in the code below, the N matches the results above (the total N and the N across the states), but I get a very different impression of how the cases are distributed across the states in the two graphs.  Am I doing something wrong?  I would be happy to provide more detail if needed.
Thanks,

Jeremy

cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875))
seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" ))
seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875")

--
********************
Dr. Jeremy Reynolds
Associate Professor
Undergraduate Coordinator
Department of Sociology
116 Baldwin Hall
University of Georgia
Athens, GA 30602-1611
Phone: (706) 583-8072
Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
Fax: (706) 542-4320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140115/aacf6cad/attachment.html>

From jeremyr at uga.edu  Fri Jan 17 18:01:21 2014
From: jeremyr at uga.edu (Jeremy Reynolds)
Date: Fri, 17 Jan 2014 12:01:21 -0500
Subject: [Traminer-users] Traminer-users Digest, Vol 37, Issue 3
In-Reply-To: <5222aa1bf1e5494d9db4ddaaa1118396@BL2PR02MB433.namprd02.prod.outlook.com>
References: <5222aa1bf1e5494d9db4ddaaa1118396@BL2PR02MB433.namprd02.prod.outlook.com>
Message-ID: <CABicVB-=wmTKJgGmx1a--nk7MRC7rUmE9TFDudg9be=mibMV5g@mail.gmail.com>

Hi Gilbert,

Thank you for your response.  After more experimenting, I have found that
(as suspected) the error must be on my side.  I am not sure exactly what I
did wrong.  I think my mistake was either a sloppy use of the aggregation
feature in the weighted cluster package or an inappropriate use of the
subset() function when I created the separate data set.  In any case, I now
get sequence index and distribution plots that look the same whether I use
the group option or make the plots separately after sub-setting the data.

Jeremy


On Thu, Jan 16, 2014 at 6:00 AM,
traminer-users-request at lists.r-forge.r-project.org <
traminer-users-request at lists.r-forge.r-project.org> wrote:

> Send Traminer-users mailing list submissions to
>         traminer-users at lists.r-forge.r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
>
> or, via email, send a message with subject or body 'help' to
>         traminer-users-request at lists.r-forge.r-project.org
>
> You can reach the person managing the list at
>         traminer-users-owner at lists.r-forge.r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Traminer-users digest..."
>
>
> Today's Topics:
>
>    1. Re: plots with group option vs separate plots (Gilbert Ritschard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Jan 2014 15:35:34 +0000
> From: Gilbert Ritschard <Gilbert.Ritschard at unige.ch>
> To: Users questions <traminer-users at lists.r-forge.r-project.org>
> Subject: Re: [Traminer-users] plots with group option vs separate
>         plots
> Message-ID:
>         <66ABD43696E3DB4687E0BB396A76E5F149F8AA at golf.isis.unige.ch>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Jeremy,
>
> Thank you for your question. Could you provide a reproducible working
> example? Indeed you are right, you should get the same plot in both cases.
> You do not provide enough information, however, to allow identifying the
> source of the problem.
>
> Your question certainly is of interest for many TraMineR users and future
> users.  I would therefore suggest you post your question on StackOverflow
> (see http://mephisto.unige.ch/traminer/contrib.shtml) using the
> "traminer" tag which is searchable, unlike this r-forge list.
>
> Best.
> Gilbert
>
>
>
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:
> traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Jeremy
> Reynolds
> Sent: Monday, January 13, 2014 22:09
> To: traminer-users at lists.r-forge.r-project.org
> Subject: [Traminer-users] plots with group option vs separate plots
>
> Hello,
> I have been making sequence plots, and I seem to be getting very different
> results when I use the "group" option of the seqdplot or seqIplot command
> than when I draw separate plots for each subgroup.
> After creating a sequence object and performing optimal matching using
> PAM, I have chosen a 4 cluster solution.  I then create a single plot that
> shows the distribution across states in each of the 4 clusters like this:
>
> seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA,
> title="pam5vs")
> If I subset the data and make a separate plot for one of the 4 clusters as
> in the code below, the N matches the results above (the total N and the N
> across the states), but I get a very different impression of how the cases
> are distributed across the states in the two graphs.  Am I doing something
> wrong?  I would be happy to provide more detail if needed.
> Thanks,
>
> Jeremy
>
> cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875))
> seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U"
> ))
> seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875")
>
> --
> ********************
> Dr. Jeremy Reynolds
> Associate Professor
> Undergraduate Coordinator
> Department of Sociology
> 116 Baldwin Hall
> University of Georgia
> Athens, GA 30602-1611
> Phone: (706) 583-8072
> Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
> Fax: (706) 542-4320
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140115/aacf6cad/attachment-0001.html
> >
>
> ------------------------------
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
>
> End of Traminer-users Digest, Vol 37, Issue 3
> *********************************************
>


-- 
********************
Dr. Jeremy Reynolds
Associate Professor
Undergraduate Coordinator
Department of Sociology
116 Baldwin Hall
University of Georgia
Athens, GA 30602-1611
Phone: (706) 583-8072
Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
Fax: (706) 542-4320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140117/7838cb21/attachment.html>

From jeremyr at uga.edu  Fri Jan 17 19:41:36 2014
From: jeremyr at uga.edu (Jeremy Reynolds)
Date: Fri, 17 Jan 2014 13:41:36 -0500
Subject: [Traminer-users] group and main title for index plot by group
Message-ID: <CABicVB9Y1m9_KgmUPYv2ujzaGExjVZv7TLtfCSLhbq4ES+R3mQ@mail.gmail.com>

Hi All,

Is there a way to include titles for each group and an overall title in a
sequence index plot that uses the group option?  The title option adds the
title for each group, but I can't figure out how to add an overall title
(while retaining the group titles).

Thanks,

Jeremy

-- 
********************
Dr. Jeremy Reynolds
Associate Professor
Undergraduate Coordinator
Department of Sociology
116 Baldwin Hall
University of Georgia
Athens, GA 30602-1611
Phone: (706) 583-8072
Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
Fax: (706) 542-4320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140117/2953d1ca/attachment.html>

From N.Frans at student.rug.nl  Thu Jan 23 12:23:30 2014
From: N.Frans at student.rug.nl (Niek Frans)
Date: Thu, 23 Jan 2014 12:23:30 +0100
Subject: [Traminer-users] Variance of a sequence
Message-ID: <CA+LR1+Ht9DM6K9aqNz-Bt+h=0BW+Vhb445cASLWq_Eb9wh1mTQ@mail.gmail.com>

Hello everyone,

This is probably a very simple question, but I've spend a couple of hours
trying to find the answer without any luck, so I thought it try it here.

I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck
trying to compute the variance of the state-duration for the sequence.
Could anyone explain how the variance of a sequence is calculated? I know
it has something to do with the sequence length and the consecutive length
of remaining in one state, but I can't find the exact formula anywhere.

Kind regards,


Niek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140123/95cf77f8/attachment.html>

From alexis.gabadinho at unige.ch  Thu Jan 23 12:44:57 2014
From: alexis.gabadinho at unige.ch (Alexis Gabadinho)
Date: Thu, 23 Jan 2014 12:44:57 +0100
Subject: [Traminer-users] Variance of a sequence
In-Reply-To: <CA+LR1+Ht9DM6K9aqNz-Bt+h=0BW+Vhb445cASLWq_Eb9wh1mTQ@mail.gmail.com>
References: <CA+LR1+Ht9DM6K9aqNz-Bt+h=0BW+Vhb445cASLWq_Eb9wh1mTQ@mail.gmail.com>
Message-ID: <52E100B9.60404@unige.ch>

Hi Niek,

The variance is computed using the outcome of the seqdur function:

s1 <- seqdef("A-A-B-B-C-C-A-A-A-A")
x <- seqdur(s1)
x

n <- sum(!is.na(x))
var <- 1/n * sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE)
var

Best regards,
Alexis

Le 23/01/2014 12:23, Niek Frans a ?crit :
> Hello everyone,
>
> This is probably a very simple question, but I've spend a couple of 
> hours trying to find the answer without any luck, so I thought it try 
> it here.
>
> I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck 
> trying to compute the variance of the state-duration for the sequence. 
> Could anyone explain how the variance of a sequence is calculated? I 
> know it has something to do with the sequence length and the 
> consecutive length of remaining in one state, but I can't find the 
> exact formula anywhere.
>
> Kind regards,
>
>
> Niek
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140123/4128c6c1/attachment.html>

From N.Frans at student.rug.nl  Thu Jan 23 13:42:49 2014
From: N.Frans at student.rug.nl (Niek Frans)
Date: Thu, 23 Jan 2014 13:42:49 +0100
Subject: [Traminer-users] Variance of a sequence
In-Reply-To: <52E100B9.60404@unige.ch>
References: <CA+LR1+Ht9DM6K9aqNz-Bt+h=0BW+Vhb445cASLWq_Eb9wh1mTQ@mail.gmail.com>
 <52E100B9.60404@unige.ch>
Message-ID: <CA+LR1+HSiVWGXPt=76wsNGiYk_m5+evyYHczB2G+qjW=9CD=QQ@mail.gmail.com>

Thanks Alexis,

I think I understand it now. What messed me up were the numbers in table 5
of the article by Elzinga (2010)
http://smr.sagepub.com/content/38/3/463.full.pdf. These still don't add up
using the formula you described, but perhaps Elzinga switched the first two
numbers the wrong way around.
Anyway thanks for the quick and clear response.

Best regards,


Niek


On Thu, Jan 23, 2014 at 12:44 PM, Alexis Gabadinho <
alexis.gabadinho at unige.ch> wrote:

>  Hi Niek,
>
> The variance is computed using the outcome of the seqdur function:
>
> s1 <- seqdef("A-A-B-B-C-C-A-A-A-A")
> x <- seqdur(s1)
> x
>
> n <- sum(!is.na(x))
> var <- 1/n * sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE)
> var
>
> Best regards,
> Alexis
>
> Le 23/01/2014 12:23, Niek Frans a ?crit :
>
>    Hello everyone,
>
>  This is probably a very simple question, but I've spend a couple of hours
> trying to find the answer without any luck, so I thought it try it here.
>
>  I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck
> trying to compute the variance of the state-duration for the sequence.
> Could anyone explain how the variance of a sequence is calculated? I know
> it has something to do with the sequence length and the consecutive length
> of remaining in one state, but I can't find the exact formula anywhere.
>
>  Kind regards,
>
>
>  Niek
>
>
> _______________________________________________
> Traminer-users mailing listTraminer-users at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
>
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140123/dd6c0925/attachment.html>

From jeremyr at uga.edu  Fri Jan 31 19:38:24 2014
From: jeremyr at uga.edu (Jeremy Reynolds)
Date: Fri, 31 Jan 2014 13:38:24 -0500
Subject: [Traminer-users] complex survey weights in TraMineR
Message-ID: <CABicVB_N8NTuLoqv20aqcQZ-cko+cbPajaMDnjWE8aEUeUiaqg@mail.gmail.com>

Hi All,

I am using TraMineR to examine ten waves of data from the BHPS.

Since the weights are recommended during sequence analysis (see post
below), I would like to do so.  Incorporating the longitudinal weight
variable is not a problem, but I'm not sure how to handle the information
from the PSU and strata.

Does anyone have any tips?  Should I just use the longitudinal weight?

Thanks,

Jeremy


http://stats.stackexchange.com/questions/62012/when-and-how-to-use-weights-for-sequence-analysis-in-social-science

-- 
********************
Dr. Jeremy Reynolds
Associate Professor
Undergraduate Coordinator
Department of Sociology
116 Baldwin Hall
University of Georgia
Athens, GA 30602-1611
Phone: (706) 583-8072
Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php
Fax: (706) 542-4320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140131/01f02a44/attachment.html>