From Arie.Riekhoff at uta.fi Wed Mar 5 14:12:30 2014 From: Arie.Riekhoff at uta.fi (Arie Riekhoff) Date: Wed, 05 Mar 2014 15:12:30 +0200 Subject: [Traminer-users] Using the process time axis when converting from SPELL to STS Message-ID: <20140305151230.Horde.CEBoGJMCGZlTFyK_XLQEF0A@imp2.uta.fi> Hello! The TraMineR package has been doing a wonderful job with my data and it's a real pleasure to work with it, even as a beginner in R. I have just run into a problem with the process axis function in the seqformat command and I haven't managed to figure out what I'm doing wrong. My data comes in SPELL format and I want to convert it to STS before creating a sequence object. Following instruction from the TraMineR user's guide this work very nicely if I select process = FALSE. However, I need to use the process time axis, because I'm working with a cohort of 3 consecutive birth years and I want to start counting from the year in which they reach a specific age and then follow them for the next 10 years (i.e. 120 months). My data starts in 1999 and registers statuses per month. I have recoded the dates, so that January 1999 is 1, February 1999 is 2, etc. I want to start the time axis for the respondents from the 3 consecutive birth years at t = 1 (year 1), t = 13 (year 2) and t = 25 (year 3). I have imported a separate file with the id's of the respondents and the different start times. So, my command was the following: > wr.sts.process <- seqformat (wr, id = "ID", begin = "BEGINTIME", end > = "ENDTIME", status = SOCECST_rec", from = "SPELL", to = "STS", > process = TRUE, pdata = starttime, pvar =c("ID", "start"), limit = > 120) [>] SPELL data converted into 2088 STS sequences But it results in missing values (NA) for almost each status. My wr dataframe with spell data looks something like this: > wr [1:5,] ID BEGINTIME ENDTIME SOCECST_rec 1 1 1 16 1 2 1 17 18 4 3 1 19 20 4 4 1 21 21 4 5 1 22 22 3 I had followed the user guide's advice to convert the status variable into an integer. And my starttime dataframe like this: > starttime [1:5,] ID start 1 1 13 2 2 13 3 3 25 4 4 1 5 5 25 I also tried converting into converting into a sequence object with the seqdef() function directly from spell data, but run into the same problem ([!] sequence with index: 1,2,3 etc contains only missing values). Like I wrote, when I use process = FALSE, both seqformat and seqdef work perfectly well, so it's not the wr data that's the problem. I guess I'm doing something wrong with the process time axis. Someone might have asked a similar question here before, but I couldn't find any definite answers anywhere. I hope that someone can point me in the right direction or give me a hint as to the solution of my problem! Thanks in advance, Aart-Jan From Gilbert.Ritschard at unige.ch Fri Mar 7 15:47:46 2014 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Fri, 7 Mar 2014 14:47:46 +0000 Subject: [Traminer-users] Using the process time axis when converting from SPELL to STS In-Reply-To: <20140305151230.Horde.CEBoGJMCGZlTFyK_XLQEF0A@imp2.uta.fi> References: <20140305151230.Horde.CEBoGJMCGZlTFyK_XLQEF0A@imp2.uta.fi> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F14ECB2E@golf.isis.unige.ch> Hi Aart-Jan, After a quick glance at your code, I notice that you provide 'birthyear' (your 'start' values in pdata) that are greater than the BEGINTIME. This should generate negative ages. This is probably the source of your problem. In your example I see that the start time of the first spell is 1 for id 1. (Is that a calendar date?). What are the begin time of the first spells for the cases 2, 3, 4 ? What does this begin time of observation correspond to ? Gilbert -----Original Message----- From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Arie Riekhoff Sent: Wednesday, March 05, 2014 14:13 To: traminer-users at r-forge.wu-wien.ac.at Subject: [Traminer-users] Using the process time axis when converting from SPELL to STS Hello! The TraMineR package has been doing a wonderful job with my data and it's a real pleasure to work with it, even as a beginner in R. I have just run into a problem with the process axis function in the seqformat command and I haven't managed to figure out what I'm doing wrong. My data comes in SPELL format and I want to convert it to STS before creating a sequence object. Following instruction from the TraMineR user's guide this work very nicely if I select process = FALSE. However, I need to use the process time axis, because I'm working with a cohort of 3 consecutive birth years and I want to start counting from the year in which they reach a specific age and then follow them for the next 10 years (i.e. 120 months). My data starts in 1999 and registers statuses per month. I have recoded the dates, so that January 1999 is 1, February 1999 is 2, etc. I want to start the time axis for the respondents from the 3 consecutive birth years at t = 1 (year 1), t = 13 (year 2) and t = 25 (year 3). I have imported a separate file with the id's of the respondents and the different start times. So, my command was the following: > wr.sts.process <- seqformat (wr, id = "ID", begin = "BEGINTIME", end = > "ENDTIME", status = SOCECST_rec", from = "SPELL", to = "STS", process > = TRUE, pdata = starttime, pvar =c("ID", "start"), limit = > 120) [>] SPELL data converted into 2088 STS sequences But it results in missing values (NA) for almost each status. My wr dataframe with spell data looks something like this: > wr [1:5,] ID BEGINTIME ENDTIME SOCECST_rec 1 1 1 16 1 2 1 17 18 4 3 1 19 20 4 4 1 21 21 4 5 1 22 22 3 I had followed the user guide's advice to convert the status variable into an integer. And my starttime dataframe like this: > starttime [1:5,] ID start 1 1 13 2 2 13 3 3 25 4 4 1 5 5 25 I also tried converting into converting into a sequence object with the seqdef() function directly from spell data, but run into the same problem ([!] sequence with index: 1,2,3 etc contains only missing values). Like I wrote, when I use process = FALSE, both seqformat and seqdef work perfectly well, so it's not the wr data that's the problem. I guess I'm doing something wrong with the process time axis. Someone might have asked a similar question here before, but I couldn't find any definite answers anywhere. I hope that someone can point me in the right direction or give me a hint as to the solution of my problem! Thanks in advance, Aart-Jan _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From throy08 at gmail.com Tue Mar 11 14:53:46 2014 From: throy08 at gmail.com (Throy A Campbell) Date: Tue, 11 Mar 2014 08:53:46 -0500 Subject: [Traminer-users] Missing Data Message-ID: Hello, I am following step by step instructions in the TraMiner manual and have executed steps for treating missing data *,%,NA. However, when I try to run dist.mostfreq and seqplot it gives me the following message: dist.mostfreq <- seqdist(Datam.seq, method = "LCS", refseq = 0) Error in seqdist(Datam.seq, method = "LCS", refseq = 0) : found missing values in sequences, please set 'with.missing=TRUE' to nevertheless compute distances Please advise. I am very thankful -- Throy Campbell ... Positive Thinking -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arie.Riekhoff at uta.fi Thu Mar 13 14:24:32 2014 From: Arie.Riekhoff at uta.fi (Arie Riekhoff) Date: Thu, 13 Mar 2014 15:24:32 +0200 Subject: [Traminer-users] Traminer-users Digest, Vol 39, Issue 2 In-Reply-To: References: Message-ID: <20140313152432.Horde.HiMKBpVGfFpTIbGQWfSxHIA@imp1.uta.fi> Dear Gilbert, Thank you so much. I didn't realise that TraMineR would not automatically discard those values preceding the "birth date". I have recoded my data's begin as well as end dates, which seems to have solved my problems! Best regards, Aart-Jan > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Mar 2014 14:47:46 +0000 > From: Gilbert Ritschard > To: Users questions > Subject: Re: [Traminer-users] Using the process time axis when > converting from SPELL to STS > Message-ID: > <66ABD43696E3DB4687E0BB396A76E5F14ECB2E at golf.isis.unige.ch> > Content-Type: text/plain; charset="us-ascii" > > Hi Aart-Jan, > > After a quick glance at your code, I notice that you provide > 'birthyear' (your 'start' values in pdata) that are greater than the > BEGINTIME. This should generate negative ages. This is probably the > source of your problem. > > In your example I see that the start time of the first spell is 1 > for id 1. (Is that a calendar date?). What are the begin time of the > first spells for the cases 2, 3, 4 ? > What does this begin time of observation correspond to ? > > Gilbert > > > > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org > [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On > Behalf Of Arie Riekhoff > Sent: Wednesday, March 05, 2014 14:13 > To: traminer-users at r-forge.wu-wien.ac.at > Subject: [Traminer-users] Using the process time axis when > converting from SPELL to STS > > Hello! > > The TraMineR package has been doing a wonderful job with my data and > it's a real pleasure to work with it, even as a beginner in R. I > have just run into a problem with the process axis function in the > seqformat command and I haven't managed to figure out what I'm doing > wrong. > > My data comes in SPELL format and I want to convert it to STS before > creating a sequence object. Following instruction from the TraMineR > user's guide this work very nicely if I select process = FALSE. > However, I need to use the process time axis, because I'm working > with a cohort of 3 consecutive birth years and I want to start > counting from the year in which they reach a specific age and then > follow them for the next 10 years (i.e. 120 months). My data starts > in 1999 and registers statuses per month. I have recoded the dates, > so that January 1999 is 1, February 1999 is 2, etc. I want to start > the time axis for the respondents from the 3 consecutive birth years > at t = 1 (year 1), t = 13 (year 2) and t = 25 (year 3). I have > imported a separate file with the id's of the respondents and the > different start times. > > So, my command was the following: > >> wr.sts.process <- seqformat (wr, id = "ID", begin = "BEGINTIME", end = >> "ENDTIME", status = SOCECST_rec", from = "SPELL", to = "STS", process >> = TRUE, pdata = starttime, pvar =c("ID", "start"), limit = >> 120) > [>] SPELL data converted into 2088 STS sequences > > But it results in missing values (NA) for almost each status. > > My wr dataframe with spell data looks something like this: > >> wr [1:5,] > > ID BEGINTIME ENDTIME SOCECST_rec > 1 1 1 16 1 > 2 1 17 18 4 > 3 1 19 20 4 > 4 1 21 21 4 > 5 1 22 22 3 > > I had followed the user guide's advice to convert the status > variable into an integer. > > And my starttime dataframe like this: > >> starttime [1:5,] > > ID start > 1 1 13 > 2 2 13 > 3 3 25 > 4 4 1 > 5 5 25 > > I also tried converting into converting into a sequence object with > the seqdef() function directly from spell data, but run into the > same problem ([!] sequence with index: 1,2,3 etc contains only > missing values). > > Like I wrote, when I use process = FALSE, both seqformat and seqdef > work perfectly well, so it's not the wr data that's the problem. I > guess I'm doing something wrong with the process time axis. > > Someone might have asked a similar question here before, but I > couldn't find any definite answers anywhere. I hope that someone can > point me in the right direction or give me a hint as to the solution > of my problem! > > Thanks in advance, > > Aart-Jan > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > ------------------------------ > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > End of Traminer-users Digest, Vol 39, Issue 2 > ********************************************* From pit.blavier at gmail.com Sun Mar 23 18:01:51 2014 From: pit.blavier at gmail.com (Pierre Blavier) Date: Sun, 23 Mar 2014 18:01:51 +0100 Subject: [Traminer-users] plots with group option vs separate plots In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> References: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> Message-ID: Hi everybody, here is a plot from the TramineR seqiplot command. It gives the first 10 sequences of my dataset, all my sequences are 21 period-long. Does anyone know how to reduce the length of the x-axis to have a length of the longest sequence, i.e. 21 periods ? It's probably simple but i could not fix it. I have the same problems for other plots Thanks, Best, Pierre 2014-01-15 16:35 GMT+01:00 Gilbert Ritschard : > Hi Jeremy, > > > > Thank you for your question. Could you provide a reproducible working > example? Indeed you are right, you should get the same plot in both cases. > You do not provide enough information, however, to allow identifying the > source of the problem. > > > > Your question certainly is of interest for many TraMineR users and future > users. I would therefore suggest you post your question on StackOverflow > (see http://mephisto.unige.ch/traminer/contrib.shtml) using the > "traminer" tag which is searchable, unlike this r-forge list. > > > > Best. > > Gilbert > > > > > > > > *From:* traminer-users-bounces at lists.r-forge.r-project.org [mailto: > traminer-users-bounces at lists.r-forge.r-project.org] *On Behalf Of *Jeremy > Reynolds > *Sent:* Monday, January 13, 2014 22:09 > *To:* traminer-users at lists.r-forge.r-project.org > *Subject:* [Traminer-users] plots with group option vs separate plots > > > > Hello, > > I have been making sequence plots, and I seem to be getting very different > results when I use the "group" option of the seqdplot or seqIplot command > than when I draw separate plots for each subgroup. > > After creating a sequence object and performing optimal matching using > PAM, I have chosen a 4 cluster solution. I then create a single plot that > shows the distribution across states in each of the 4 clusters like this: > > > seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, > title="pam5vs") > > If I subset the data and make a separate plot for one of the 4 clusters as > in the code below, the N matches the results above (the total N and the N > across the states), but I get a very different impression of how the cases > are distributed across the states in the two graphs. Am I doing something > wrong? I would be happy to provide more detail if needed. > > Thanks, > > Jeremy > > > cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) > seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" > )) > > seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") > > -- > > ******************** > Dr. Jeremy Reynolds > Associate Professor > Undergraduate Coordinator > Department of Sociology > 116 Baldwin Hall > University of Georgia > Athens, GA 30602-1611 > Phone: (706) 583-8072 > Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php > Fax: (706) 542-4320 > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example of the problem.pdf Type: application/pdf Size: 11254 bytes Desc: not available URL: From iwanttobeabadger at googlemail.com Mon Mar 24 15:46:57 2014 From: iwanttobeabadger at googlemail.com (Nathan Harmston) Date: Mon, 24 Mar 2014 14:46:57 +0000 Subject: [Traminer-users] Fwd: How does dissrep work? In-Reply-To: References: Message-ID: Hi, So I'm trying to use the TraMineR package to select a representative set of instances from a clsutering I have performed using the dissrep function. However, the help and documentation for how dissrep chooses the "representative instances" is a little lacking? How does TraMineR decide what is the most representative instance? What is the difference between setting the criterion to "dist" and "density"? I know these are simple questions, but I need to understand how this works before I can use this software in my work. Many thanks in advance, -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis.gabadinho at unige.ch Mon Mar 24 15:54:54 2014 From: alexis.gabadinho at unige.ch (Alexis Gabadinho) Date: Mon, 24 Mar 2014 15:54:54 +0100 Subject: [Traminer-users] plots with group option vs separate plots In-Reply-To: References: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> Message-ID: <5330473E.9050300@unige.ch> Hi Pierre, I suppose that there is a problem when defining your sequence object. What do you get when typing ncol(names_of_your_sequence_object) ? Best, Alexis Le 23. 03. 14 18:01, Pierre Blavier a ?crit : > Hi everybody, > > here is a plot from the TramineR seqiplot command. It gives the first > 10 sequences of my dataset, all my sequences are 21 period-long. Does > anyone know how to reduce the length of the x-axis to have a length of > the longest sequence, i.e. 21 periods ? It's probably simple but i > could not fix it. I have the same problems for other plots > > Thanks, Best, > > Pierre > > > > > > > > 2014-01-15 16:35 GMT+01:00 Gilbert Ritschard > >: > > Hi Jeremy, > > Thank you for your question. Could you provide a reproducible > working example? Indeed you are right, you should get the same > plot in both cases. You do not provide enough information, > however, to allow identifying the source of the problem. > > Your question certainly is of interest for many TraMineR users and > future users. I would therefore suggest you post your question on > StackOverflow (see > http://mephisto.unige.ch/traminer/contrib.shtml) using the > "traminer" tag which is searchable, unlike this r-forge list. > > Best. > > Gilbert > > *From:*traminer-users-bounces at lists.r-forge.r-project.org > > [mailto:traminer-users-bounces at lists.r-forge.r-project.org > ] *On > Behalf Of *Jeremy Reynolds > *Sent:* Monday, January 13, 2014 22:09 > *To:* traminer-users at lists.r-forge.r-project.org > > *Subject:* [Traminer-users] plots with group option vs separate plots > > Hello, > > I have been making sequence plots, and I seem to be getting very > different results when I use the "group" option of the seqdplot or > seqIplot command than when I draw separate plots for each subgroup. > > After creating a sequence object and performing optimal matching > using PAM, I have chosen a 4 cluster solution. I then create a > single plot that shows the distribution across states in each of > the 4 clusters like this: > > > seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, > title="pam5vs") > > If I subset the data and make a separate plot for one of the 4 > clusters as in the code below, the N matches the results above > (the total N and the N across the states), but I get a very > different impression of how the cases are distributed across the > states in the two graphs. Am I doing something wrong? I would be > happy to provide more detail if needed. > > Thanks, > > Jeremy > > > cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) > seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", > "O", "U" )) > > seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") > > -- > > ******************** > Dr. Jeremy Reynolds > Associate Professor > Undergraduate Coordinator > Department of Sociology > 116 Baldwin Hall > University of Georgia > Athens, GA 30602-1611 > Phone: (706) 583-8072 > Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php > Fax: (706) 542-4320 > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Mon Mar 24 16:14:30 2014 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Mon, 24 Mar 2014 15:14:30 +0000 Subject: [Traminer-users] plots with group option vs separate plots In-Reply-To: References: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F14FB215@golf.isis.unige.ch> Hi Pierre, It would be more useful to get help if you would provide the code you are using to generate your state sequence object. My guess is that you created your state sequence object by passing a table with 41 columns (the last 20 ones containing perhaps only NA's). Assuming your sequence data are in columns 1 to 21 of a data frame named data, a solution would be to use something like seqdef(data[,1:21]). If your state sequence object is data.seq, you can use seqiplot(data.seq[,1:21]) to plot only the first 21 columns. Fixing the seqdef problem is preferable however. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Pierre Blavier Sent: Sunday, March 23, 2014 18:02 To: Users questions Subject: Re: [Traminer-users] plots with group option vs separate plots Hi everybody, here is a plot from the TramineR seqiplot command. It gives the first 10 sequences of my dataset, all my sequences are 21 period-long. Does anyone know how to reduce the length of the x-axis to have a length of the longest sequence, i.e. 21 periods ? It's probably simple but i could not fix it. I have the same problems for other plots Thanks, Best, Pierre [Image removed by sender.] 2014-01-15 16:35 GMT+01:00 Gilbert Ritschard >: Hi Jeremy, Thank you for your question. Could you provide a reproducible working example? Indeed you are right, you should get the same plot in both cases. You do not provide enough information, however, to allow identifying the source of the problem. Your question certainly is of interest for many TraMineR users and future users. I would therefore suggest you post your question on StackOverflow (see http://mephisto.unige.ch/traminer/contrib.shtml) using the "traminer" tag which is searchable, unlike this r-forge list. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Jeremy Reynolds Sent: Monday, January 13, 2014 22:09 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] plots with group option vs separate plots Hello, I have been making sequence plots, and I seem to be getting very different results when I use the "group" option of the seqdplot or seqIplot command than when I draw separate plots for each subgroup. After creating a sequence object and performing optimal matching using PAM, I have chosen a 4 cluster solution. I then create a single plot that shows the distribution across states in each of the 4 clusters like this: seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, title="pam5vs") If I subset the data and make a separate plot for one of the 4 clusters as in the code below, the N matches the results above (the total N and the N across the states), but I get a very different impression of how the cases are distributed across the states in the two graphs. Am I doing something wrong? I would be happy to provide more detail if needed. Thanks, Jeremy cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" )) seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") -- ******************** Dr. Jeremy Reynolds Associate Professor Undergraduate Coordinator Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD000.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD000.jpg URL: From alexis.gabadinho at unige.ch Mon Mar 24 16:23:44 2014 From: alexis.gabadinho at unige.ch (Alexis Gabadinho) Date: Mon, 24 Mar 2014 16:23:44 +0100 Subject: [Traminer-users] Fwd: How does dissrep work? In-Reply-To: References: Message-ID: <53304E00.6030800@unige.ch> Hi Nathan, A paper preprint explaining the procedure is available here: http://mephisto.unige.ch/pub/publications/gr/Gabadinho-etal-Repr_seq_CCIS2011.pdf Best, Alexis Le 24. 03. 14 15:46, Nathan Harmston a ?crit : > > Hi, > > So I'm trying to use the TraMineR package to select a representative > set of instances from a clsutering I have performed using the dissrep > function. > > However, the help and documentation for how dissrep chooses the > "representative instances" is a little lacking? How does TraMineR > decide what is the most representative instance? What is the > difference between setting the criterion to "dist" and "density"? > > I know these are simple questions, but I need to understand how this > works before I can use this software in my work. > > Many thanks in advance, > > > > > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From pit.blavier at gmail.com Tue Mar 25 23:52:33 2014 From: pit.blavier at gmail.com (Pierre Blavier) Date: Tue, 25 Mar 2014 23:52:33 +0100 Subject: [Traminer-users] plots with group option vs separate plots In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F14FB215@golf.isis.unige.ch> References: <66ABD43696E3DB4687E0BB396A76E5F149F8AA@golf.isis.unige.ch> <66ABD43696E3DB4687E0BB396A76E5F14FB215@golf.isis.unige.ch> Message-ID: Hi, thanks to both, you are right : > ncol(trajpro.seq) [1] 41 instead of 21. In fact i have sequences that are misaligned like stairs year by year, so i do : trajpro.seq <- seqdef(immimigpro, 1636:1676, states=labpro.court, labels=labpro.long, xstep=6, left="DEL", right="DEL") summary(trajpro.seq) > freq(seqlength(trajpro.seq), total=T) n % 1 1 0.0 3 1 0.0 4 1 0.0 5 1 0.0 6 2 0.1 8 1 0.0 9 1 0.0 10 1 0.0 11 1 0.0 12 2 0.1 13 1 0.0 16 1 0.0 17 1 0.0 18 1 0.0 19 2 0.1 20 7 0.3 21 2264 98.9 41 1 0.0 NA 0 0.0 Total 2290 100.0 I guess these 2290-2264 individuals are "real missing values" and do not understand why one remains at 41. immimigpro <- subset(immimigpro, seqlength(trajpro.seq)==21) freq(seqlength(trajpro.seq), total=T) > trajpro.seq <- seqdef(immimigpro, 1636:1676, states=labpro.court, labels=labpro.long, xstep=6, left="DEL", right="DEL") [>] found missing values ('NA') in sequence data [>] preparing 2264 sequences [>] coding void elements with '%' and missing values with '*' [>] state coding: [alphabet] [label] [long label] 1 1 Sa Salari? 2 2 In Ind?pendant 3 3 Ch Ch?mage 4 4 Et Etudes 5 5 Fo Foyer 6 6 Aut Autre 7 7 Var Variable [>] 2264 sequences in the data set [>] min/max sequence length: 21/21 summary(trajpro.seq) and then i obtain the graph with 41 periods even if all sequences are all 21-period long. The solution with seqiplot(data.seq[,1:21]) works but it would be better to fix the problem above. Any idea ? Thanks a lot, best, Pierre 2014-03-24 16:14 GMT+01:00 Gilbert Ritschard : > Hi Pierre, > > > > It would be more useful to get help if you would provide the code you are > using to generate your state sequence object. > > > > My guess is that you created your state sequence object by passing a table > with 41 columns (the last 20 ones containing perhaps only NA's). Assuming > your sequence data are in columns 1 to 21 of a data frame named data, a > solution would be to use something like seqdef(data[,1:21]). > > > > If your state sequence object is data.seq, you can use > seqiplot(data.seq[,1:21]) to plot only the first 21 columns. Fixing the > seqdef problem is preferable however. > > > > Best. > > Gilbert > > > > > > *From:* traminer-users-bounces at lists.r-forge.r-project.org [ > mailto:traminer-users-bounces at lists.r-forge.r-project.org] > *On Behalf Of *Pierre Blavier > *Sent:* Sunday, March 23, 2014 18:02 > *To:* Users questions > *Subject:* Re: [Traminer-users] plots with group option vs separate plots > > > > Hi everybody, > > > > here is a plot from the TramineR seqiplot command. It gives the first 10 > sequences of my dataset, all my sequences are 21 period-long. Does anyone > know how to reduce the length of the x-axis to have a length of the longest > sequence, i.e. 21 periods ? It's probably simple but i could not fix it. I > have the same problems for other plots > > > > Thanks, Best, > > > > Pierre > > > > > > > > [image: Image removed by sender.] > > > > > > 2014-01-15 16:35 GMT+01:00 Gilbert Ritschard : > > Hi Jeremy, > > > > Thank you for your question. Could you provide a reproducible working > example? Indeed you are right, you should get the same plot in both cases. > You do not provide enough information, however, to allow identifying the > source of the problem. > > > > Your question certainly is of interest for many TraMineR users and future > users. I would therefore suggest you post your question on StackOverflow > (see http://mephisto.unige.ch/traminer/contrib.shtml) using the > "traminer" tag which is searchable, unlike this r-forge list. > > > > Best. > > Gilbert > > > > > > > > *From:* traminer-users-bounces at lists.r-forge.r-project.org [mailto: > traminer-users-bounces at lists.r-forge.r-project.org] *On Behalf Of *Jeremy > Reynolds > *Sent:* Monday, January 13, 2014 22:09 > *To:* traminer-users at lists.r-forge.r-project.org > *Subject:* [Traminer-users] plots with group option vs separate plots > > > > Hello, > > I have been making sequence plots, and I seem to be getting very different > results when I use the "group" option of the seqdplot or seqIplot command > than when I draw separate plots for each subgroup. > > After creating a sequence object and performing optimal matching using > PAM, I have chosen a 4 cluster solution. I then create a single plot that > shows the distribution across states in each of the 4 clusters like this: > > > seqdplot(seq.hc, group = pam5vs$clustering$cluster4, border = NA, > title="pam5vs") > > If I subset the data and make a separate plot for one of the 4 clusters as > in the code below, the N matches the results above (the total N and the N > across the states), but I get a very different impression of how the cases > are distributed across the states in the two graphs. Am I doing something > wrong? I would be happy to provide more detail if needed. > > Thanks, > > Jeremy > > > cluster4 <- subset(bhps, pam5vs$clustering$cluster4==(6875)) > seq.cluster4 <- seqdef(cluster4 [4:21], labels = c("M", "S", "F", "O", "U" > )) > > seqdplot(seq.cluster4, border = NA, title="pam5vs cluster 6875") > > -- > > ******************** > Dr. Jeremy Reynolds > Associate Professor > Undergraduate Coordinator > Department of Sociology > 116 Baldwin Hall > University of Georgia > Athens, GA 30602-1611 > Phone: (706) 583-8072 > Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php > Fax: (706) 542-4320 > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD000.jpg Type: image/jpeg Size: 823 bytes Desc: not available URL: