From ju.bourdais at gmail.com Mon Nov 17 10:09:29 2014 From: ju.bourdais at gmail.com (Julien Bourdais) Date: Mon, 17 Nov 2014 10:09:29 +0100 Subject: [Traminer-users] Date format with SPELL In-Reply-To: <546477DA.7030605@gmail.com> References: <546477DA.7030605@gmail.com> Message-ID: <5469BB49.5060005@gmail.com> Dear Traminer's users, I guess my question is typically the one of a begginer on Traminer. I've been collecting a list of stays that some people did in a hospital. The beginning and end dates of each stay are rather precise (dd/mm/yyyy) and I'd like to stick to that precision for the moment. Although, in the user's guide and the Traminer's mailing list I couldn't find any example where the beginning and the end of the spell are written in a date format (not just as a year). I suppos e Traminer do not support the date format. Every time I try to define a sequence object I receive this message : > SEJ.seq <- seqdef(sej, id="ID", begin="entree", end="sortie", status="sejour_type", fillblanks=NA, informat="SPELL", states=sej.states, labels=sej.labels, process=FALSE) Error in Summary.factor(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : min not meaningful for factors In addition:Warning messages: 1: In Ops.factor(begincolumn, 1) : < not meaningful for factors 2: In Ops.factor(endcolumn, begincolumn) : - not meaningful for factors 3: In Ops.factor(begincolumn, 0) : > not meaningful for factors Here is a sample of my data : *ID* *sejour_type* *entree* *sortie* 1 Temps plein 06/06/2013 19/03/2014 1 Temps plein 10/05/2010 16/05/2013 2 Temps plein 19/01/2012 27/01/2012 2 Temps plein 01/02/2011 04/08/2011 3 Temps plein 21/02/2013 19/03/2014 7 H?pital de jour 18/09/2014 12/11/2014 Am I right thinking that the date format is the problem or does it rely somewhere else ? Kindly yours, Julien -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Mon Nov 17 11:52:49 2014 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Mon, 17 Nov 2014 10:52:49 +0000 Subject: [Traminer-users] Date format with SPELL In-Reply-To: <5469BB49.5060005@gmail.com> References: <546477DA.7030605@gmail.com> <5469BB49.5060005@gmail.com> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16224D3@golf.isis.unige.ch> Dear Julien TraMineR does not support the date format. The begin and end arguments in seqformat (and seqdef) should be integer variables. In your case you should transform the dates into numbers of days (see http://stackoverflow.com/questions/19564930/how-do-i-convert-date-to-number-of-days-in-r ). you may also want to have a look at this related question on SO: http://stackoverflow.com/questions/11801511/using-time-diary-data-with-traminer In addition, we recommend to first transform your SPELL data into STS form with seqformat and then input the STS form into seqdef. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Julien Bourdais Sent: Monday, November 17, 2014 10:09 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Date format with SPELL Dear Traminer's users, I guess my question is typically the one of a begginer on Traminer. I've been collecting a list of stays that some people did in a hospital. The beginning and end dates of each stay are rather precise (dd/mm/yyyy) and I'd like to stick to that precision for the moment. Although, in the user's guide and the Traminer's mailing list I couldn't find any example where the beginning and the end of the spell are written in a date format (not just as a year). I supp os e Traminer do not support the date format. Every time I try to define a sequence object I receive this message : > SEJ.seq <- seqdef(sej, id="ID", begin="entree", end="sortie", status="sejour_type", fillblanks=NA, informat="SPELL", states=sej.states, labels=sej.labels, process=FALSE) Error in Summary.factor(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : min not meaningful for factors In addition: Warning messages: 1: In Ops.factor(begincolumn, 1) : < not meaningful for factors 2: In Ops.factor(endcolumn, begincolumn) : - not meaningful for factors 3: In Ops.factor(begincolumn, 0) : > not meaningful for factors Here is a sample of my data : ID sejour_type entree sortie 1 Temps plein 06/06/2013 19/03/2014 1 Temps plein 10/05/2010 16/05/2013 2 Temps plein 19/01/2012 27/01/2012 2 Temps plein 01/02/2011 04/08/2011 3 Temps plein 21/02/2013 19/03/2014 7 H?pital de jour 18/09/2014 12/11/2014 Am I right thinking that the date format is the problem or does it rely somewhere else ? Kindly yours, Julien -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gerhard.Wuehrer at jku.at Fri Nov 21 09:01:38 2014 From: Gerhard.Wuehrer at jku.at (Gerhard Wuehrer) Date: Fri, 21 Nov 2014 09:01:38 +0100 Subject: [Traminer-users] large data sets Message-ID: <546EFF7202000083000AE6A1@gwia.im.jku.at> ** Proprietary ** ** Reply Requested When Convenient ** Hi TraMineRs, what is your experience with large datasets? I mean n = 45.000 and the states possible as sequences are m = 184, the sequence intervall varies between 1 to 66. States can repeat several times. Thank you for your opinion and advice. Best regards - Gerhard Wuehrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Wuehrer, Gerhard.vcf Type: application/octet-stream Size: 334 bytes Desc: not available URL: From Alexandre.Pollien at unil.ch Fri Nov 21 15:32:25 2014 From: Alexandre.Pollien at unil.ch (Pollien Alexandre) Date: Fri, 21 Nov 2014 14:32:25 +0000 Subject: [Traminer-users] large data sets (Gerhard Wuehrer) In-Reply-To: References: Message-ID: <8664c4d3859b4b9793fac07240a69a3d@prdexch06.ad.unil.ch> Hello, Unless you haven't a very specific configuration (?), such a huge dataset won't be fully supported with TraMineR in R: you'll have a "can't allocate vetor size" error" and other calculations will be very long (with much smaller files, I have waited overnight to get pseudo-ANOVA). One point that is essential is to reduce the file to "unique" sequences: create an ID that designates each different sequence and keep only one of each. Once results of sequence analysis obtained (mds, cluster), you can reassign sequence variables in all identical one. Depending on the nature of the data, this can divided the size by 10-20. I don't think a large number of modalities (states) causes an overmemory error. But I'm not sure that the analysis will be successful. I have only experienced about 20 states, that I quickly recoded in less than 10 states because I don't manage to achieve anything good with too much modalities. You have maybe to try. The sequence analysis works very differently according the nature of data. Alexandre -----Message d'origine----- De?: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de traminer-users-request at lists.r-forge.r-project.org Envoy??: vendredi 21 novembre 2014 12:00 ??: traminer-users at lists.r-forge.r-project.org Objet?: Traminer-users Digest, Vol 41, Issue 2 Send Traminer-users mailing list submissions to traminer-users at lists.r-forge.r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users or, via email, send a message with subject or body 'help' to traminer-users-request at lists.r-forge.r-project.org You can reach the person managing the list at traminer-users-owner at lists.r-forge.r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Traminer-users digest..." Today's Topics: 1. large data sets (Gerhard Wuehrer) ---------------------------------------------------------------------- Message: 1 Date: Fri, 21 Nov 2014 09:01:38 +0100 From: "Gerhard Wuehrer" To: Subject: [Traminer-users] large data sets Message-ID: <546EFF7202000083000AE6A1 at gwia.im.jku.at> Content-Type: text/plain; charset="utf-8" ** Proprietary ** ** Reply Requested When Convenient ** Hi TraMineRs, what is your experience with large datasets? I mean n = 45.000 and the states possible as sequences are m = 184, the sequence intervall varies between 1 to 66. States can repeat several times. Thank you for your opinion and advice. Best regards - Gerhard Wuehrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Wuehrer, Gerhard.vcf Type: application/octet-stream Size: 334 bytes Desc: not available URL: ------------------------------ _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users End of Traminer-users Digest, Vol 41, Issue 2 ********************************************* From Matthias.Studer at unige.ch Fri Nov 21 20:01:56 2014 From: Matthias.Studer at unige.ch (Matthias Studer) Date: Fri, 21 Nov 2014 19:01:56 +0000 Subject: [Traminer-users] large data sets (Gerhard Wuehrer) In-Reply-To: <8664c4d3859b4b9793fac07240a69a3d@prdexch06.ad.unil.ch> References: <8664c4d3859b4b9793fac07240a69a3d@prdexch06.ad.unil.ch> Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC70E9D512B@kilo.isis.unige.ch> Hello, The WeightedCluster library provide a way to work with unique sequences and weights them accordingly. You can have a look at the manual here: http://mephisto.unige.ch/weightedcluster/ You can find some additional comments here: http://stackoverflow.com/questions/15929936/problem-with-big-data-during-computation-of-sequence-distances-using-tramine Hope this helps. Matthias -----Message d'origine----- De?: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de Pollien Alexandre Envoy??: vendredi 21 novembre 2014 15:32 ??: traminer-users at lists.r-forge.r-project.org Objet?: Re: [Traminer-users] large data sets (Gerhard Wuehrer) Hello, Unless you haven't a very specific configuration (?), such a huge dataset won't be fully supported with TraMineR in R: you'll have a "can't allocate vetor size" error" and other calculations will be very long (with much smaller files, I have waited overnight to get pseudo-ANOVA). One point that is essential is to reduce the file to "unique" sequences: create an ID that designates each different sequence and keep only one of each. Once results of sequence analysis obtained (mds, cluster), you can reassign sequence variables in all identical one. Depending on the nature of the data, this can divided the size by 10-20. I don't think a large number of modalities (states) causes an overmemory error. But I'm not sure that the analysis will be successful. I have only experienced about 20 states, that I quickly recoded in less than 10 states because I don't manage to achieve anything good with too much modalities. You have maybe to try. The sequence analysis works very differently according the nature of data. Alexandre -----Message d'origine----- De?: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de traminer-users-request at lists.r-forge.r-project.org Envoy??: vendredi 21 novembre 2014 12:00 ??: traminer-users at lists.r-forge.r-project.org Objet?: Traminer-users Digest, Vol 41, Issue 2 Send Traminer-users mailing list submissions to traminer-users at lists.r-forge.r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users or, via email, send a message with subject or body 'help' to traminer-users-request at lists.r-forge.r-project.org You can reach the person managing the list at traminer-users-owner at lists.r-forge.r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Traminer-users digest..." Today's Topics: 1. large data sets (Gerhard Wuehrer) ---------------------------------------------------------------------- Message: 1 Date: Fri, 21 Nov 2014 09:01:38 +0100 From: "Gerhard Wuehrer" To: Subject: [Traminer-users] large data sets Message-ID: <546EFF7202000083000AE6A1 at gwia.im.jku.at> Content-Type: text/plain; charset="utf-8" ** Proprietary ** ** Reply Requested When Convenient ** Hi TraMineRs, what is your experience with large datasets? I mean n = 45.000 and the states possible as sequences are m = 184, the sequence intervall varies between 1 to 66. States can repeat several times. Thank you for your opinion and advice. Best regards - Gerhard Wuehrer o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69 4040 Linz/Austria tel.: 004373224689401 fax.:004373224689404 mail: gerhard.wuehrer at jku.at URL: www.marketing.jku.at -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Wuehrer, Gerhard.vcf Type: application/octet-stream Size: 334 bytes Desc: not available URL: ------------------------------ _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users End of Traminer-users Digest, Vol 41, Issue 2 ********************************************* _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users