From rvosylis at live.com Wed Jun 10 12:30:38 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Wed, 10 Jun 2015 13:30:38 +0300 Subject: [Traminer-users] selecting the number of clusters Message-ID: Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don't help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite - the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gerhard.Wuehrer at jku.at Wed Jun 10 12:55:06 2015 From: Gerhard.Wuehrer at jku.at (=?UTF-8?Q?Gerhard=20W=C3=BChrer?=) Date: Wed, 10 Jun 2015 12:55:06 +0200 Subject: [Traminer-users] Antw: selecting the number of clusters In-Reply-To: References: Message-ID: <557833AA02000083000B8488@gwia.im.jku.at> ** Proprietary ** ** Reply Requested When Convenient ** Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: W?hrer, Gerhard.vcf Type: application/octet-stream Size: 348 bytes Desc: not available URL: From rvosylis at live.com Wed Jun 10 14:36:02 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Wed, 10 Jun 2015 15:36:02 +0300 Subject: [Traminer-users] Antw: selecting the number of clusters In-Reply-To: <557833AA02000083000B8488@gwia.im.jku.at> References: <557833AA02000083000B8488@gwia.im.jku.at> Message-ID: Dear Gerhard, Thank You for this suggestion! Sincerely Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis > 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvosylis at live.com Wed Jun 10 16:44:51 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Wed, 10 Jun 2015 17:44:51 +0300 Subject: [Traminer-users] Antw: selecting the number of clusters In-Reply-To: References: <557833AA02000083000B8488@gwia.im.jku.at> Message-ID: Dear Professor Gerhard, I tried the NbClust package, but it does not seem to work for analysis of the sequences. Thing is that it has one mandatory argument data which is used to indicate the dataset. However, in sequence analysis this is the sequences of numbers/symbols rather than the vector(s) of numeric variable values. Even though it is possible to specify the distance matrix, it still requires the actual dataset and in my impression, it is not possible to overcome this. If you have successfully used this package for sequence analysis, could You possibly copy paste the function that You have used for the calculation of the fit indices? Thank You in advance! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Rimantas Vosylis Sent: Wednesday, June 10, 2015 3:36 PM To: 'Users questions' Subject: Re: [Traminer-users] Antw: selecting the number of clusters Dear Gerhard, Thank You for this suggestion! Sincerely Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis > 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gerhard.Wuehrer at jku.at Wed Jun 10 17:04:13 2015 From: Gerhard.Wuehrer at jku.at (=?UTF-8?Q?Gerhard=20W=C3=BChrer?=) Date: Wed, 10 Jun 2015 17:04:13 +0200 Subject: [Traminer-users] Antw: Re: Antw: selecting the number of clusters In-Reply-To: References: <557833AA02000083000B8488@gwia.im.jku.at> Message-ID: <55786E0D02000083000B84DC@gwia.im.jku.at> ** Proprietary ** ** Reply Requested When Convenient ** Dear Rimantas, I used nbclust for other distance matrices, originating from other cluster/segment analysis. If you have the distance matrix, I think you can input that into nbclust? It may also happen, that there are really now clusters and the increase of the errors sum follows a monotone pattern. Please have also a look at the additional literature to be found with the traminer-package. Best regards - Gerhard o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis 10.06.2015 16:44 >>> Dear Professor Gerhard, I tried the NbClust package, but it does not seem to work for analysis of the sequences. Thing is that it has one mandatory argument data which is used to indicate the dataset. However, in sequence analysis this is the sequences of numbers/symbols rather than the vector(s) of numeric variable values. Even though it is possible to specify the distance matrix, it still requires the actual dataset and in my impression, it is not possible to overcome this. If you have successfully used this package for sequence analysis, could You possibly copy paste the function that You have used for the calculation of the fit indices? Thank You in advance! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Rimantas Vosylis Sent: Wednesday, June 10, 2015 3:36 PM To: 'Users questions' Subject: Re: [Traminer-users] Antw: selecting the number of clusters Dear Gerhard, Thank You for this suggestion! Sincerely Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: W?hrer, Gerhard.vcf Type: application/octet-stream Size: 348 bytes Desc: not available URL: From rvosylis at live.com Wed Jun 10 17:39:58 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Wed, 10 Jun 2015 18:39:58 +0300 Subject: [Traminer-users] Antw: Re: Antw: selecting the number of clusters In-Reply-To: <55786E0D02000083000B84DC@gwia.im.jku.at> References: <557833AA02000083000B8488@gwia.im.jku.at> <55786E0D02000083000B84DC@gwia.im.jku.at> Message-ID: Dear Gerhard, Indeed it is possible to input the distance martix, but it is also mandatory to specify the data :( I tried to input the sequence object as data but it does not work :( I will look through the literature You suggested! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 6:04 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: Re: Antw: selecting the number of clusters Dear Rimantas, I used nbclust for other distance matrices, originating from other cluster/segment analysis. If you have the distance matrix, I think you can input that into nbclust? It may also happen, that there are really now clusters and the increase of the errors sum follows a monotone pattern. Please have also a look at the additional literature to be found with the traminer-package. Best regards - Gerhard o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis > 10.06.2015 16:44 >>> Dear Professor Gerhard, I tried the NbClust package, but it does not seem to work for analysis of the sequences. Thing is that it has one mandatory argument data which is used to indicate the dataset. However, in sequence analysis this is the sequences of numbers/symbols rather than the vector(s) of numeric variable values. Even though it is possible to specify the distance matrix, it still requires the actual dataset and in my impression, it is not possible to overcome this. If you have successfully used this package for sequence analysis, could You possibly copy paste the function that You have used for the calculation of the fit indices? Thank You in advance! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Rimantas Vosylis Sent: Wednesday, June 10, 2015 3:36 PM To: 'Users questions' Subject: Re: [Traminer-users] Antw: selecting the number of clusters Dear Gerhard, Thank You for this suggestion! Sincerely Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis > 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton.perdoncin at gmail.com Wed Jun 10 18:51:25 2015 From: anton.perdoncin at gmail.com (Anton Perdoncin) Date: Wed, 10 Jun 2015 18:51:25 +0200 Subject: [Traminer-users] Format of sequences Message-ID: <55786B0D.1000504@gmail.com> Hi, I have sequences in the following format : ID BEGIN1 END1 STATE1 BEGIN2 END2 STATE2 etc... until 18 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... One line = one individual. Successive episodes = successive columns. I know that I need to convert dates into numbers: no problem with that. However, does anyone have any idea on how I could convert such a df into an STS or SPELL format ? Thanks! Best regards, Anton Perdoncin From thomas.collas at gmail.com Wed Jun 10 21:01:27 2015 From: thomas.collas at gmail.com (thomas collas) Date: Wed, 10 Jun 2015 21:01:27 +0200 Subject: [Traminer-users] Format of sequences In-Reply-To: <55786B0D.1000504@gmail.com> References: <55786B0D.1000504@gmail.com> Message-ID: Hello Anton, An easy solution is to build a very short loop (I know R is not made for loops but it's only 18 iterations) separating each group of three columns, turning the headings into common ones (begin/end/state) and pasting each one below the other. I hope that helps, thomas collas 2015-06-10 18:51 GMT+02:00 Anton Perdoncin : > Hi, > > I have sequences in the following format : > > ID BEGIN1 END1 STATE1 BEGIN2 END2 > STATE2 etc... until 18 > 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 > Y ... > 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 > Y ... > > One line = one individual. Successive episodes = successive columns. > > I know that I need to convert dates into numbers: no problem with that. > > However, does anyone have any idea on how I could convert such a df into > an STS or SPELL format ? > > Thanks! > > Best regards, > > Anton Perdoncin > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.collas at gmail.com Wed Jun 10 21:03:05 2015 From: thomas.collas at gmail.com (thomas collas) Date: Wed, 10 Jun 2015 21:03:05 +0200 Subject: [Traminer-users] Format of sequences In-Reply-To: References: <55786B0D.1000504@gmail.com> Message-ID: Addendum : Do not forget to keep the ID column with the three other columns at each iteration. 2015-06-10 21:01 GMT+02:00 thomas collas : > Hello Anton, > An easy solution is to build a very short loop (I know R is not made for > loops but it's only 18 iterations) separating each group of three columns, > turning the headings into common ones (begin/end/state) and pasting each > one below the other. > I hope that helps, > thomas collas > > 2015-06-10 18:51 GMT+02:00 Anton Perdoncin : > >> Hi, >> >> I have sequences in the following format : >> >> ID BEGIN1 END1 STATE1 BEGIN2 END2 >> STATE2 etc... until 18 >> 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >> Y ... >> 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >> Y ... >> >> One line = one individual. Successive episodes = successive columns. >> >> I know that I need to convert dates into numbers: no problem with that. >> >> However, does anyone have any idea on how I could convert such a df into >> an STS or SPELL format ? >> >> Thanks! >> >> Best regards, >> >> Anton Perdoncin >> _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hc at parisgeo.cnrs.fr Thu Jun 11 08:45:06 2015 From: hc at parisgeo.cnrs.fr (Hadrien Commenges) Date: Thu, 11 Jun 2015 08:45:06 +0200 (CEST) Subject: [Traminer-users] Format of sequences In-Reply-To: References: <55786B0D.1000504@gmail.com> Message-ID: <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> You could also split your table by set of columns (columns 1:4, then col c(1, 5:7), etc.) and then rbind() all the tables. Another option would be the melt() function in the reshape2 package. ----- Mail original ----- De: "thomas collas" ?: "Users questions" Envoy?: Mercredi 10 Juin 2015 21:03:05 Objet: Re: [Traminer-users] Format of sequences Addendum : Do not forget to keep the ID column with the three other columns at each iteration. 2015-06-10 21:01 GMT+02:00 thomas collas < thomas.collas at gmail.com > : Hello Anton, An easy solution is to build a very short loop (I know R is not made for loops but it's only 18 iterations) separating each group of three columns, turning the headings into common ones (begin/end/state) and pasting each one below the other. I hope that helps, thomas collas 2015-06-10 18:51 GMT+02:00 Anton Perdoncin < anton.perdoncin at gmail.com > :
Hi, I have sequences in the following format : ID BEGIN1 END1 STATE1 BEGIN2 END2 STATE2 etc... until 18 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... One line = one individual. Successive episodes = successive columns. I know that I need to convert dates into numbers: no problem with that. However, does anyone have any idea on how I could convert such a df into an STS or SPELL format ? Thanks! Best regards, Anton Perdoncin _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
_______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.collas at gmail.com Thu Jun 11 08:53:14 2015 From: thomas.collas at gmail.com (thomas collas) Date: Thu, 11 Jun 2015 08:53:14 +0200 Subject: [Traminer-users] Format of sequences In-Reply-To: <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> References: <55786B0D.1000504@gmail.com> <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: Hello, That's the same proposition actually. tc 2015-06-11 8:45 GMT+02:00 Hadrien Commenges : > You could also split your table by set of columns (columns 1:4, then col > c(1, 5:7), etc.) and then rbind() all the tables. > > Another option would be the melt() function in the reshape2 package. > > ------------------------------ > *De: *"thomas collas" > *?: *"Users questions" > *Envoy?: *Mercredi 10 Juin 2015 21:03:05 > *Objet: *Re: [Traminer-users] Format of sequences > > > Addendum : Do not forget to keep the ID column with the three other > columns at each iteration. > > 2015-06-10 21:01 GMT+02:00 thomas collas : > >> Hello Anton, >> An easy solution is to build a very short loop (I know R is not made for >> loops but it's only 18 iterations) separating each group of three columns, >> turning the headings into common ones (begin/end/state) and pasting each >> one below the other. >> I hope that helps, >> thomas collas >> >> 2015-06-10 18:51 GMT+02:00 Anton Perdoncin : >> >>> Hi, >>> >>> I have sequences in the following format : >>> >>> ID BEGIN1 END1 STATE1 BEGIN2 END2 >>> STATE2 etc... until 18 >>> 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >>> Y ... >>> 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >>> Y ... >>> >>> One line = one individual. Successive episodes = successive columns. >>> >>> I know that I need to convert dates into numbers: no problem with that. >>> >>> However, does anyone have any idea on how I could convert such a df into >>> an STS or SPELL format ? >>> >>> Thanks! >>> >>> Best regards, >>> >>> Anton Perdoncin >>> _______________________________________________ >>> Traminer-users mailing list >>> Traminer-users at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users >>> >> >> > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Thu Jun 11 08:57:12 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Thu, 11 Jun 2015 06:57:12 +0000 Subject: [Traminer-users] Format of sequences In-Reply-To: <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> References: <55786B0D.1000504@gmail.com> <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F10A1093E0@golf.isis.unige.ch> You could also consider the HSPELL_to_STS function provided by the TraMineRextras package. From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hadrien Commenges Sent: Thursday, June 11, 2015 08:45 To: Users questions Subject: Re: [Traminer-users] Format of sequences You could also split your table by set of columns (columns 1:4, then col c(1, 5:7), etc.) and then rbind() all the tables. Another option would be the melt() function in the reshape2 package. ________________________________ De: "thomas collas" > ?: "Users questions" > Envoy?: Mercredi 10 Juin 2015 21:03:05 Objet: Re: [Traminer-users] Format of sequences Addendum : Do not forget to keep the ID column with the three other columns at each iteration. 2015-06-10 21:01 GMT+02:00 thomas collas >: Hello Anton, An easy solution is to build a very short loop (I know R is not made for loops but it's only 18 iterations) separating each group of three columns, turning the headings into common ones (begin/end/state) and pasting each one below the other. I hope that helps, thomas collas 2015-06-10 18:51 GMT+02:00 Anton Perdoncin >: Hi, I have sequences in the following format : ID BEGIN1 END1 STATE1 BEGIN2 END2 STATE2 etc... until 18 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 Y ... One line = one individual. Successive episodes = successive columns. I know that I need to convert dates into numbers: no problem with that. However, does anyone have any idea on how I could convert such a df into an STS or SPELL format ? Thanks! Best regards, Anton Perdoncin _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Thu Jun 11 09:06:19 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Thu, 11 Jun 2015 07:06:19 +0000 Subject: [Traminer-users] Antw: Re: Antw: selecting the number of clusters In-Reply-To: References: <557833AA02000083000B8488@gwia.im.jku.at> <55786E0D02000083000B84DC@gwia.im.jku.at> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F10A109423@golf.isis.unige.ch> Did you look at the possibilities offered by the WeightedCluster package of Matthias Studer? The package comes with a vignette that nicely documents the proposed tools. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Rimantas Vosylis Sent: Wednesday, June 10, 2015 17:40 To: 'Users questions' Subject: Re: [Traminer-users] Antw: Re: Antw: selecting the number of clusters Dear Gerhard, Indeed it is possible to input the distance martix, but it is also mandatory to specify the data ? I tried to input the sequence object as data but it does not work ? I will look through the literature You suggested! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis > 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvosylis at live.com Thu Jun 11 11:03:07 2015 From: rvosylis at live.com (Rimantas Vosylis) Date: Thu, 11 Jun 2015 09:03:07 +0000 Subject: [Traminer-users] Antw: Re: Antw: selecting the number of clusters In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F10A109423@golf.isis.unige.ch> References: , <557833AA02000083000B8488@gwia.im.jku.at>, , , <55786E0D02000083000B84DC@gwia.im.jku.at>, , <66ABD43696E3DB4687E0BB396A76E5F10A109423@golf.isis.unige.ch> Message-ID: Dear Gilbert,indeed I found the WeightedCluster package last night and it did help me bigtime in my analysis. In fact I rerun everything with PAM clustering and got better results. So thanks for this reference - it I would not have found it myself that would be a big help! Rimantas From: Gilbert.Ritschard at unige.ch To: traminer-users at lists.r-forge.r-project.org Date: Thu, 11 Jun 2015 07:06:19 +0000 Subject: Re: [Traminer-users] Antw: Re: Antw: selecting the number of clusters Did you look at the possibilities offered by the WeightedCluster package of Matthias Studer? The package comes with a vignette that nicely documents the proposed tools. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Rimantas Vosylis Sent: Wednesday, June 10, 2015 17:40 To: 'Users questions' Subject: Re: [Traminer-users] Antw: Re: Antw: selecting the number of clusters Dear Gerhard, Indeed it is possible to input the distance martix, but it is also mandatory to specify the data L I tried to input the sequence object as data but it does not work L I will look through the literature You suggested! Rimantas From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Gerhard W?hrer Sent: Wednesday, June 10, 2015 1:55 PM To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Antw: selecting the number of clusters Hello, please try the R - package 'nbclust' do decide how many clusters are feasable. In addition to that statistical measures, inspect the different cluster solutions by content and how meaningful interpretations are. You can also do some kind of x-square test where you align the clusters with variables not used in the cluster analysis. At least you have some kind of face validity. Best regards - Gerhard A. W?hrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at >>> Rimantas Vosylis 10.06.2015 12:30 >>> Dear Traminer users, I am trying to build a typology of sequences by using cluster analysis with OM and Ward algorith. I have a problem of choosing the number of clusters. I use several empirical indexes, but they don?t help me a lot. I use Calinski and harabasz (CH) index, but it has a peak at two cluster solution and the goes down. I also use average shilloute width but it gives me the similar results as CH index. I also run pseudo ANOVA to see which cluster solution explains most variance, but it tells me the opposite ? the more the clusters the higher the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see that the most meaningful solutions (I have several types of sequences) lie somewhere between 4-6 clusters. Could You perhaps suggest which indexes worked best for You and matched Your expectations / theoretical knowledge and that I could use in my analysis? Thank You in advance!! Sincerely, Rimantas Vosylis PhD student, lecturer Insitute of Psychology Faculty of Social Technologies Mykolas Romeris University e-mail: rimantasv at mruni.eu e-mail2: rvosylis at live.com _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton.perdoncin at gmail.com Thu Jun 11 12:22:17 2015 From: anton.perdoncin at gmail.com (Anton Perdoncin) Date: Thu, 11 Jun 2015 12:22:17 +0200 Subject: [Traminer-users] Format of sequences In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F10A1093E0@golf.isis.unige.ch> References: <55786B0D.1000504@gmail.com> <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> <66ABD43696E3DB4687E0BB396A76E5F10A1093E0@golf.isis.unige.ch> Message-ID: <55796159.3030502@gmail.com> Hello, Thanks a lot Thomas, Hadrien and Gilbert. The HSPELL_TO_STS function is very usefull indeed, but it supposes that begin and end variables be numeric, which is not my case since I have dates. And converting dates into numbers originating from the most ancient date of the first episode seems to me much quicker with SPELL format. So here is one way of converting my HSPELL data into SPELL. Very close to Thomas' suggestion (might not be the most elegant script... but it does the job): d <- read.csv2("~ ... example.csv") f <- data.frame(ident=NULL,deb=NULL,fin=NULL,etat=NULL,motif=NULL) i <- 1 for(i in 1:20){ e <- d[,names(d) %in% c("ident",paste0(c("deb","fin","etat","motif"),as.character(i)))] names(e) <- c("ident","deb","fin","etat","motif") f <- rbind(f,e) } # ordering by identifiers f <- f[order(f$ident),] # deleting empty rows f <- subset(f, deb!="") # counting the number of episodes by individual dim <- nrow(f) f <- cbind(f,rep(1,dim)) colnames(f)[6]<-"nbepis" for (i in 1:(dim-1)) { if (f[i+1, 1] == f[i, 1]) { f[i+1, 6] <- f[i, 6]+1 } } # checking that maximum number of episodes is equal to 20 max(f$nbepis) For those who would like to see the actual result: see the example dataset attached. Best, Anton Le 11/06/2015 08:57, Gilbert Ritschard a ?crit : > > You could also consider the HSPELL_to_STS function provided by the > TraMineRextras package. > > *From:*traminer-users-bounces at lists.r-forge.r-project.org > [mailto:traminer-users-bounces at lists.r-forge.r-project.org] *On Behalf > Of *Hadrien Commenges > *Sent:* Thursday, June 11, 2015 08:45 > *To:* Users questions > *Subject:* Re: [Traminer-users] Format of sequences > > You could also split your table by set of columns (columns 1:4, then > col c(1, 5:7), etc.) and then rbind() all the tables. > > Another option would be the melt() function in the reshape2 package. > > ------------------------------------------------------------------------ > > *De: *"thomas collas" > > *?: *"Users questions" > > *Envoy?: *Mercredi 10 Juin 2015 21:03:05 > *Objet: *Re: [Traminer-users] Format of sequences > > Addendum : Do not forget to keep the ID column with the three other > columns at each iteration. > > 2015-06-10 21:01 GMT+02:00 thomas collas >: > > Hello Anton, > > An easy solution is to build a very short loop (I know R is not > made for loops but it's only 18 iterations) separating each group > of three columns, turning the headings into common ones > (begin/end/state) and pasting each one below the other. > > I hope that helps, > > thomas collas > > 2015-06-10 18:51 GMT+02:00 Anton Perdoncin > >: > > Hi, > > I have sequences in the following format : > > ID BEGIN1 END1 STATE1 BEGIN2 END2 > STATE2 etc... until 18 > 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 > Y ... > 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 > Y ... > > One line = one individual. Successive episodes = successive > columns. > > I know that I need to convert dates into numbers: no problem > with that. > > However, does anyone have any idea on how I could convert such > a df into > an STS or SPELL format ? > > Thanks! > > Best regards, > > Anton Perdoncin > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example.csv Type: text/csv Size: 2250 bytes Desc: not available URL: From simon.paye at sciencespo.fr Thu Jun 11 09:46:20 2015 From: simon.paye at sciencespo.fr (Simon PAYE) Date: Thu, 11 Jun 2015 09:46:20 +0200 Subject: [Traminer-users] Format of sequences In-Reply-To: References: <55786B0D.1000504@gmail.com> <812216976.2959586.1434005106503.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: Hi there, Anton, I just used the HSPELL_to_STS function and it worked well. I suggest you to try something like this: seq_sts <- HSPELL_to_STS(data, begin=c("BEGIN1","BEGIN2",...), end=c("END1","END2",...), status=c("STATE1","STATE2",...)) Best, Simon 2015-06-11 8:53 GMT+02:00 thomas collas : > Hello, > That's the same proposition actually. > tc > > 2015-06-11 8:45 GMT+02:00 Hadrien Commenges : > >> You could also split your table by set of columns (columns 1:4, then col >> c(1, 5:7), etc.) and then rbind() all the tables. >> >> Another option would be the melt() function in the reshape2 package. >> >> ------------------------------ >> *De: *"thomas collas" >> *?: *"Users questions" >> *Envoy?: *Mercredi 10 Juin 2015 21:03:05 >> *Objet: *Re: [Traminer-users] Format of sequences >> >> >> Addendum : Do not forget to keep the ID column with the three other >> columns at each iteration. >> >> 2015-06-10 21:01 GMT+02:00 thomas collas : >> >>> Hello Anton, >>> An easy solution is to build a very short loop (I know R is not made for >>> loops but it's only 18 iterations) separating each group of three columns, >>> turning the headings into common ones (begin/end/state) and pasting each >>> one below the other. >>> I hope that helps, >>> thomas collas >>> >>> 2015-06-10 18:51 GMT+02:00 Anton Perdoncin : >>> >>>> Hi, >>>> >>>> I have sequences in the following format : >>>> >>>> ID BEGIN1 END1 STATE1 BEGIN2 END2 >>>> STATE2 etc... until 18 >>>> 1 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >>>> Y ... >>>> 2 01/01/1950 01/01/1960 X 02/01/1960 30/01/1960 >>>> Y ... >>>> >>>> One line = one individual. Successive episodes = successive columns. >>>> >>>> I know that I need to convert dates into numbers: no problem with that. >>>> >>>> However, does anyone have any idea on how I could convert such a df into >>>> an STS or SPELL format ? >>>> >>>> Thanks! >>>> >>>> Best regards, >>>> >>>> Anton Perdoncin >>>> _______________________________________________ >>>> Traminer-users mailing list >>>> Traminer-users at lists.r-forge.r-project.org >>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users >>>> >>> >>> >> >> _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users >> >> >> _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users >> > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > -- -------------------------------------------------------------------------- Tous les courriers ?lectroniques ?mis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage. Pour les consulter rendez-vous sur : http://www.sciencespo.fr/ressources-numeriques/fr/content/regles-de-confidentialite -------------- next part -------------- An HTML attachment was scrubbed... URL: