From shawn at ori.org Fri Jan 3 02:37:53 2014 From: shawn at ori.org (Shawn Boles) Date: Fri, 3 Jan 2014 01:37:53 +0000 Subject: [Traminer-users] minimum standards for datasets used with TraMineR Message-ID: <05BF3E42C886444A86D060B044B7A1CD1570FB36@Exmail4.ori-eug.ori.org> Hi All: Any suggestions as to how to determine the minimum dataset size and missingness characteristics to which TraMineR may be applied sensibly would be helpful. I looked for information in documentation but did not see anything that I could use as a heuristic. I am using TraMineR to analyze 5 years of BMI observations, coded as a four level ordinal categorical variable for 5046 elementary school children (grades k-5) . Only 414 of these have 5 observations ( ~ 56% of the K, 1 students measured at time 1 who could have had been measured in years 1 to 5.) . Here are have two related, if not well stated, questions: 1. Is it legitimate to focus only on complete cases since I only have 5 data points and high cumulative natural attrition. Testing the complete cases against all cases reveals no substantive difference in values of predictors. The analyses from complete cases are informative, while including all cases , regardless of imputation choice, just makes things noisy. I tried admitting only sequences of 4 or 5 but results were still noisy. 1. Is five too short a sequence object to use with TraMineR, given the imputation patterns required by the full dataset, regardless of whether they are due to planned missingness or MAR? 1. Below is the number of cases with from 1 to 5 observations. 1 2 3 4 5 1846 1287 869 630 414 TraMineR is a great tool. I want to make sure I am using it appropriately. Thanks. Shawn Boles, Ph.D. Senior Research Associate Oregon Research Institute 1776 Millrace Drive Eugene, Oregon 97403-2536 USA Phone (541) 484-2123 ext 2225 Fax: (541) 484-1108 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyr at uga.edu Mon Jan 13 22:09:08 2014 From: jeremyr at uga.edu (Jeremy Reynolds) Date: Mon, 13 Jan 2014 16:09:08 -0500 Subject: [Traminer-users] plots with group option vs separate plots Message-ID: Hello, I have been making sequence plots, and I seem to be getting very different results when I use the "group" option of the seqdplot or seqIplot command than when I draw separate plots for each subgroup. After creating a sequence object and performing optimal matching using PAM, I have chosen a 4 cluster solution. I then create a single plot that shows the distribution across states in each of the 4 clusters like this: seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, title="pam5vs") If I subset the data and make a separate plot for one of the 4 clusters as in the code below, the N matches the results above (the total N and the N across the states), but I get a very different impression of how the cases are distributed across the states in the two graphs. Am I doing something wrong? I would be happy to provide more detail if needed. Thanks, Jeremy cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" )) seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Wed Jan 15 16:35:34 2014 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Wed, 15 Jan 2014 15:35:34 +0000 Subject: [Traminer-users] plots with group option vs separate plots In-Reply-To: References: Message-ID: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> Hi Jeremy, Thank you for your question. Could you provide a reproducible working example? Indeed you are right, you should get the same plot in both cases. You do not provide enough information, however, to allow identifying the source of the problem. Your question certainly is of interest for many TraMineR users and future users. I would therefore suggest you post your question on StackOverflow (see http://mephisto.unige.ch/traminer/contrib.shtml) using the "traminer" tag which is searchable, unlike this r-forge list. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Jeremy Reynolds Sent: Monday, January 13, 2014 22:09 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] plots with group option vs separate plots Hello, I have been making sequence plots, and I seem to be getting very different results when I use the "group" option of the seqdplot or seqIplot command than when I draw separate plots for each subgroup. After creating a sequence object and performing optimal matching using PAM, I have chosen a 4 cluster solution. I then create a single plot that shows the distribution across states in each of the 4 clusters like this: seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, title="pam5vs") If I subset the data and make a separate plot for one of the 4 clusters as in the code below, the N matches the results above (the total N and the N across the states), but I get a very different impression of how the cases are distributed across the states in the two graphs. Am I doing something wrong? I would be happy to provide more detail if needed. Thanks, Jeremy cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" )) seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyr at uga.edu Fri Jan 17 18:01:21 2014 From: jeremyr at uga.edu (Jeremy Reynolds) Date: Fri, 17 Jan 2014 12:01:21 -0500 Subject: [Traminer-users] Traminer-users Digest, Vol 37, Issue 3 In-Reply-To: <5222aa1bf1e5494d9db4ddaaa1118396@BL2PR02MB433.namprd02.prod.outlook.com> References: <5222aa1bf1e5494d9db4ddaaa1118396@BL2PR02MB433.namprd02.prod.outlook.com> Message-ID: Hi Gilbert, Thank you for your response. After more experimenting, I have found that (as suspected) the error must be on my side. I am not sure exactly what I did wrong. I think my mistake was either a sloppy use of the aggregation feature in the weighted cluster package or an inappropriate use of the subset() function when I created the separate data set. In any case, I now get sequence index and distribution plots that look the same whether I use the group option or make the plots separately after sub-setting the data. Jeremy On Thu, Jan 16, 2014 at 6:00 AM, traminer-users-request at lists.r-forge.r-project.org < traminer-users-request at lists.r-forge.r-project.org> wrote: > Send Traminer-users mailing list submissions to > traminer-users at lists.r-forge.r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > or, via email, send a message with subject or body 'help' to > traminer-users-request at lists.r-forge.r-project.org > > You can reach the person managing the list at > traminer-users-owner at lists.r-forge.r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Traminer-users digest..." > > > Today's Topics: > > 1. Re: plots with group option vs separate plots (Gilbert Ritschard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Jan 2014 15:35:34 +0000 > From: Gilbert Ritschard > To: Users questions > Subject: Re: [Traminer-users] plots with group option vs separate > plots > Message-ID: > <66ABD43696E3DB4687E0BB396A76E5F149F8AA at golf.isis.unige.ch> > Content-Type: text/plain; charset="us-ascii" > > Hi Jeremy, > > Thank you for your question. Could you provide a reproducible working > example? Indeed you are right, you should get the same plot in both cases. > You do not provide enough information, however, to allow identifying the > source of the problem. > > Your question certainly is of interest for many TraMineR users and future > users. I would therefore suggest you post your question on StackOverflow > (see http://mephisto.unige.ch/traminer/contrib.shtml) using the > "traminer" tag which is searchable, unlike this r-forge list. > > Best. > Gilbert > > > > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto: > traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Jeremy > Reynolds > Sent: Monday, January 13, 2014 22:09 > To: traminer-users at lists.r-forge.r-project.org > Subject: [Traminer-users] plots with group option vs separate plots > > Hello, > I have been making sequence plots, and I seem to be getting very different > results when I use the "group" option of the seqdplot or seqIplot command > than when I draw separate plots for each subgroup. > After creating a sequence object and performing optimal matching using > PAM, I have chosen a 4 cluster solution. I then create a single plot that > shows the distribution across states in each of the 4 clusters like this: > > seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, > title="pam5vs") > If I subset the data and make a separate plot for one of the 4 clusters as > in the code below, the N matches the results above (the total N and the N > across the states), but I get a very different impression of how the cases > are distributed across the states in the two graphs. Am I doing something > wrong? I would be happy to provide more detail if needed. > Thanks, > > Jeremy > > cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) > seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" > )) > seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") > > -- > ******************** > Dr. Jeremy Reynolds > Associate Professor > Undergraduate Coordinator > Department of Sociology > 116 Baldwin Hall > University of Georgia > Athens, GA 30602-1611 > Phone: (706) 583-8072 > Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php > Fax: (706) 542-4320 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20140115/aacf6cad/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > End of Traminer-users Digest, Vol 37, Issue 3 > ********************************************* > -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyr at uga.edu Fri Jan 17 19:41:36 2014 From: jeremyr at uga.edu (Jeremy Reynolds) Date: Fri, 17 Jan 2014 13:41:36 -0500 Subject: [Traminer-users] group and main title for index plot by group Message-ID: Hi All, Is there a way to include titles for each group and an overall title in a sequence index plot that uses the group option? The title option adds the title for each group, but I can't figure out how to add an overall title (while retaining the group titles). Thanks, Jeremy -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: From N.Frans at student.rug.nl Thu Jan 23 12:23:30 2014 From: N.Frans at student.rug.nl (Niek Frans) Date: Thu, 23 Jan 2014 12:23:30 +0100 Subject: [Traminer-users] Variance of a sequence Message-ID: Hello everyone, This is probably a very simple question, but I've spend a couple of hours trying to find the answer without any luck, so I thought it try it here. I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck trying to compute the variance of the state-duration for the sequence. Could anyone explain how the variance of a sequence is calculated? I know it has something to do with the sequence length and the consecutive length of remaining in one state, but I can't find the exact formula anywhere. Kind regards, Niek -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.gabadinho at unige.ch Thu Jan 23 12:44:57 2014 From: alexis.gabadinho at unige.ch (Alexis Gabadinho) Date: Thu, 23 Jan 2014 12:44:57 +0100 Subject: [Traminer-users] Variance of a sequence In-Reply-To: References: Message-ID: <52E100B9.60404@unige.ch> Hi Niek, The variance is computed using the outcome of the seqdur function: s1 <- seqdef("A-A-B-B-C-C-A-A-A-A") x <- seqdur(s1) x n <- sum(!is.na(x)) var <- 1/n * sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE) var Best regards, Alexis Le 23/01/2014 12:23, Niek Frans a ?crit : > Hello everyone, > > This is probably a very simple question, but I've spend a couple of > hours trying to find the answer without any luck, so I thought it try > it here. > > I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck > trying to compute the variance of the state-duration for the sequence. > Could anyone explain how the variance of a sequence is calculated? I > know it has something to do with the sequence length and the > consecutive length of remaining in one state, but I can't find the > exact formula anywhere. > > Kind regards, > > > Niek > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From N.Frans at student.rug.nl Thu Jan 23 13:42:49 2014 From: N.Frans at student.rug.nl (Niek Frans) Date: Thu, 23 Jan 2014 13:42:49 +0100 Subject: [Traminer-users] Variance of a sequence In-Reply-To: <52E100B9.60404@unige.ch> References: <52E100B9.60404@unige.ch> Message-ID: Thanks Alexis, I think I understand it now. What messed me up were the numbers in table 5 of the article by Elzinga (2010) http://smr.sagepub.com/content/38/3/463.full.pdf. These still don't add up using the formula you described, but perhaps Elzinga switched the first two numbers the wrong way around. Anyway thanks for the quick and clear response. Best regards, Niek On Thu, Jan 23, 2014 at 12:44 PM, Alexis Gabadinho < alexis.gabadinho at unige.ch> wrote: > Hi Niek, > > The variance is computed using the outcome of the seqdur function: > > s1 <- seqdef("A-A-B-B-C-C-A-A-A-A") > x <- seqdur(s1) > x > > n <- sum(!is.na(x)) > var <- 1/n * sum((x - mean(x, na.rm = TRUE))^2, na.rm = TRUE) > var > > Best regards, > Alexis > > Le 23/01/2014 12:23, Niek Frans a ?crit : > > Hello everyone, > > This is probably a very simple question, but I've spend a couple of hours > trying to find the answer without any luck, so I thought it try it here. > > I'm trying to explain the Turbulence measure by Elzinga, but I'm stuck > trying to compute the variance of the state-duration for the sequence. > Could anyone explain how the variance of a sequence is calculated? I know > it has something to do with the sequence length and the consecutive length > of remaining in one state, but I can't find the exact formula anywhere. > > Kind regards, > > > Niek > > > _______________________________________________ > Traminer-users mailing listTraminer-users at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyr at uga.edu Fri Jan 31 19:38:24 2014 From: jeremyr at uga.edu (Jeremy Reynolds) Date: Fri, 31 Jan 2014 13:38:24 -0500 Subject: [Traminer-users] complex survey weights in TraMineR Message-ID: Hi All, I am using TraMineR to examine ten waves of data from the BHPS. Since the weights are recommended during sequence analysis (see post below), I would like to do so. Incorporating the longitudinal weight variable is not a problem, but I'm not sure how to handle the information from the PSU and strata. Does anyone have any tips? Should I just use the longitudinal weight? Thanks, Jeremy http://stats.stackexchange.com/questions/62012/when-and-how-to-use-weights-for-sequence-analysis-in-social-science -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: