From pit.blavier at gmail.com Mon Mar 2 09:26:05 2015 From: pit.blavier at gmail.com (Pierre Blavier) Date: Mon, 2 Mar 2015 09:26:05 +0100 Subject: [Traminer-users] transforming the x-axis scale of seqdplot Message-ID: Hi TramineR users ! Rather silly question : What i have : i have data in STS-format describing time-use of each individual accross a 24H-day and for periods of 10 minutes. Hence i have 24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to 06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the maximum allowed by tramineR). My question is : how could i obtain in the plots of sequence (for e.g. seqdplot) an x-axis legend with time in hours instead of time corresponding to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144) ? For example something like only the hours beginnning at 6h00 A.M. ? It looks silly question but i have not found yet the solution even if i have screen subjects of this list, TramineR manuals, and consulted options of the seqdplot function (xtlab, start, ...). Does anybody have already met the problem ? Thanks a lot, best, Pierre Blavier -------------- next part -------------- An HTML attachment was scrubbed... URL: From hc at parisgeo.cnrs.fr Mon Mar 2 10:32:29 2015 From: hc at parisgeo.cnrs.fr (Hadrien Commenges) Date: Mon, 2 Mar 2015 10:32:29 +0100 (CET) Subject: [Traminer-users] transforming the x-axis scale of seqdplot In-Reply-To: References: Message-ID: <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr> The x-axis shows the column names, so you just need to rename your variables. Be careful: R accepts names beginning with a number (6:00 for example) but you can't call that variable afterwards (mydata$6:00). Hadrien ----- Mail original ----- De: "Pierre Blavier" ?: "Users questions" Envoy?: Lundi 2 Mars 2015 09:26:05 Objet: [Traminer-users] transforming the x-axis scale of seqdplot Hi TramineR users ! Rather silly question : What i have : i have data in STS-format describing time-use of each individual accross a 24H-day and for periods of 10 minutes. Hence i have 24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to 06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the maximum allowed by tramineR). My question is : how could i obtain in the plots of sequence (for e.g. seqdplot) an x-axis legend with time in hours instead of time corresponding to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144) ? For example something like only the hours beginnning at 6h00 A.M. ? It looks silly question but i have not found yet the solution even if i have screen subjects of this list, TramineR manuals, and consulted options of the seqdplot function (xtlab, start, ...). Does anybody have already met the problem ? Thanks a lot, best, Pierre Blavier _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dina.frommert at gmx.net Mon Mar 2 11:09:49 2015 From: dina.frommert at gmx.net (Dina Frommert) Date: Mon, 2 Mar 2015 11:09:49 +0100 Subject: [Traminer-users] transforming the x-axis scale of seqdplot In-Reply-To: <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr> References: , <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Tue Mar 3 10:28:43 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Tue, 3 Mar 2015 09:28:43 +0000 Subject: [Traminer-users] transforming the x-axis scale of seqdplot In-Reply-To: References: , <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16EC20E@golf.isis.unige.ch> The best solution is to provide the labels (column names) once at the seqdef stage with the ?cnames? argument. The labels will then be consistently used for each plot (unless overwritten with the xtlab argument of the seqplot function). Likewise, the xtstep argument, which is also best specified at the seqdef stage, can be changed for a specific plot by providing a xtstep argument to the plot function (see plot.stslist help page). Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Dina Frommert Sent: Monday, March 02, 2015 11:10 To: traminer-users at lists.r-forge.r-project.org Subject: Re: [Traminer-users] transforming the x-axis scale of seqdplot Hi Pierre, you could use xtlab in the seqplot function to specify the labels and xtstep in the seqdef function to specifiy the step between the tickmarks. Best Dina Gesendet: Montag, 02. M?rz 2015 um 10:32 Uhr Von: "Hadrien Commenges" > An: "Users questions" > Betreff: Re: [Traminer-users] transforming the x-axis scale of seqdplot The x-axis shows the column names, so you just need to rename your variables. Be careful: R accepts names beginning with a number (6:00 for example) but you can't call that variable afterwards (mydata$6:00). Hadrien ________________________________ De: "Pierre Blavier" > ?: "Users questions" > Envoy?: Lundi 2 Mars 2015 09:26:05 Objet: [Traminer-users] transforming the x-axis scale of seqdplot Hi TramineR users ! Rather silly question : What i have : i have data in STS-format describing time-use of each individual accross a 24H-day and for periods of 10 minutes. Hence i have 24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to 06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the maximum allowed by tramineR). My question is : how could i obtain in the plots of sequence (for e.g. seqdplot) an x-axis legend with time in hours instead of time corresponding to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144) ? For example something like only the hours beginnning at 6h00 A.M. ? It looks silly question but i have not found yet the solution even if i have screen subjects of this list, TramineR manuals, and consulted options of the seqdplot function (xtlab, start, ...). Does anybody have already met the problem ? Thanks a lot, best, Pierre Blavier _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.lindberg at case.edu Tue Mar 3 22:52:42 2015 From: aron.lindberg at case.edu (Aron Lindberg) Date: Tue, 03 Mar 2015 13:52:42 -0800 (PST) Subject: [Traminer-users] Error in seqefsub Message-ID: <1425419561471.5d56b0f4@Nodemailer> Hi, I?m trying to run code along these lines: required_packages <- c('TraMineR','TraMineRextras','magrittr', 'dplyr', ?rmarkdown?, ?devtools') new_packages <- required_packages[!(required_packages %in% installed.packages()[,'Package'])] if(length(new_packages)) install.packages(new_packages) lapply(required_packages, library, character.only=T) set.seed(1) setwd("/home/rstudio/crowd_sequencing") data <- read.csv(paste0(getwd(), "/data/dataset_error.csv"), header=TRUE) data <- subset(data, select = c("SequenceIndex", "PostOrder", "Category")) colnames(data) <- c("id", "time", "event") data$end <- data$time data <- data[with(data, order(time)), ] data$time <- match(data$time, unique(data$time)) data$end <- match(data$end, unique(data$end)) data.sample <- data # data.sample = data[sample(nrow(data), nrow(data)*0.01), ] data.sample <- data.sample[order(data.sample$id), ] data.seqe <- seqecreate(data = data.sample, id = id, timestamp = time,? ? ? ? ? ? ? ? ? ? ? ? ? event = event) # Frequent subsequences: fsubseq <- seqefsub(data.seqe, pMinSupport = 0.65, maxK = 3) However, this generates: Error in class(ret$subseq) <- c("seqelist", "list") : attempt to set an attribute on NULLwhich is surprising because everything seems fine up until the fsubseq function is called.I?ve put the full dataset here: https://gist.github.com/aronlindberg/0421ca91dd598f74017e Best, Aron --? Aron Lindberg Doctoral Candidate,?Information Systems Weatherhead School of Management? Case Western Reserve University aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.lindberg at case.edu Tue Mar 3 22:55:16 2015 From: aron.lindberg at case.edu (Aron Lindberg) Date: Tue, 03 Mar 2015 13:55:16 -0800 (PST) Subject: [Traminer-users] Error in seqefsub In-Reply-To: <1425419561471.5d56b0f4@Nodemailer> References: <1425419561471.5d56b0f4@Nodemailer> Message-ID: <1425419716076.29f2febf@Nodemailer> Sorry, scratch that, as soon as I went below pMinSupport = 0.5 I realize that there are no subsequences which are in more than 50% of the sequences.? --? Aron Lindberg Doctoral Candidate,?Information Systems Weatherhead School of Management? Case Western Reserve University aronlindberg.github.io On Tue, Mar 3, 2015 at 4:52 PM, Aron Lindberg wrote: > Hi, > I?m trying to run code along these lines: > required_packages <- c('TraMineR','TraMineRextras','magrittr', 'dplyr', ?rmarkdown?, ?devtools') > new_packages <- required_packages[!(required_packages %in% installed.packages()[,'Package'])] > if(length(new_packages)) install.packages(new_packages) > lapply(required_packages, library, character.only=T) > set.seed(1) > setwd("/home/rstudio/crowd_sequencing") > data <- read.csv(paste0(getwd(), "/data/dataset_error.csv"), header=TRUE) > data <- subset(data, select = c("SequenceIndex", "PostOrder", "Category")) > colnames(data) <- c("id", "time", "event") > data$end <- data$time > data <- data[with(data, order(time)), ] > data$time <- match(data$time, unique(data$time)) > data$end <- match(data$end, unique(data$end)) > data.sample <- data > # data.sample = data[sample(nrow(data), nrow(data)*0.01), ] > data.sample <- data.sample[order(data.sample$id), ] > data.seqe <- seqecreate(data = data.sample, id = id, timestamp = time,? > ? ? ? ? ? ? ? ? ? ? ? ? event = event) > # Frequent subsequences: > fsubseq <- seqefsub(data.seqe, pMinSupport = 0.65, maxK = 3) > However, this generates: > Error in class(ret$subseq) <- c("seqelist", "list") : > attempt to set an attribute on NULLwhich is surprising because everything seems fine up until the fsubseq function is called.I?ve put the full dataset here: > https://gist.github.com/aronlindberg/0421ca91dd598f74017e > Best, > Aron > --? > Aron Lindberg > Doctoral Candidate,?Information Systems > Weatherhead School of Management? > Case Western Reserve University > aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyrin.soh at gmail.com Sat Mar 7 16:58:56 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Sat, 07 Mar 2015 10:58:56 -0500 Subject: [Traminer-users] seqici() provides different results for the same sequence? Message-ID: <54FB2040.8030209@gmail.com> Hi TraMineR users, I wonder if seqici() depends on the context? I have a sequence and I compute the complexity seqici(mySeq). I have the same sequence in a set of sequences, and I compute the complexity and have different values. Can some one help to know what happens? Note that I carefully check that the two sequences are the same. Thanks. Z?phyrin From alexis.gabadinho at unige.ch Mon Mar 9 09:32:17 2015 From: alexis.gabadinho at unige.ch (Alexis gabadinho) Date: Mon, 9 Mar 2015 08:32:17 +0000 Subject: [Traminer-users] seqici() provides different results for the same sequence? In-Reply-To: <54FB2040.8030209@gmail.com> References: <54FB2040.8030209@gmail.com> Message-ID: <54FD5A91.101@unige.ch> Hi Z?phyrin, Without an example, it is difficult to give any answer. Could you describe the sequences and the code that you use ? Best, Alexis Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit : > Hi TraMineR users, > I wonder if seqici() depends on the context? > I have a sequence and I compute the complexity seqici(mySeq). > I have the same sequence in a set of sequences, and I compute the > complexity and have different values. > > Can some one help to know what happens? > Note that I carefully check that the two sequences are the same. > > Thanks. > Z?phyrin > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From Gilbert.Ritschard at unige.ch Mon Mar 9 12:26:17 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Mon, 9 Mar 2015 11:26:17 +0000 Subject: [Traminer-users] seqici() provides different results for the same sequence? In-Reply-To: <54FD5A91.101@unige.ch> References: <54FB2040.8030209@gmail.com> <54FD5A91.101@unige.ch> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch> Hi Alexis, An example and the explanation has been given on stackoverflow http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence Gilbert -----Original Message----- From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Alexis gabadinho Sent: Monday, March 09, 2015 09:32 To: traminer-users at lists.r-forge.r-project.org Subject: Re: [Traminer-users] seqici() provides different results for the same sequence? Hi Z?phyrin, Without an example, it is difficult to give any answer. Could you describe the sequences and the code that you use ? Best, Alexis Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit : > Hi TraMineR users, > I wonder if seqici() depends on the context? > I have a sequence and I compute the complexity seqici(mySeq). > I have the same sequence in a set of sequences, and I compute the > complexity and have different values. > > Can some one help to know what happens? > Note that I carefully check that the two sequences are the same. > > Thanks. > Z?phyrin > > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- > users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From zephyrin.soh at gmail.com Mon Mar 9 14:01:40 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Mon, 09 Mar 2015 09:01:40 -0400 Subject: [Traminer-users] seqici() provides different results for the same sequence? In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch> References: <54FB2040.8030209@gmail.com> <54FD5A91.101@unige.ch> <66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch> Message-ID: <54FD99B4.6060900@gmail.com> Exactely Gilbert, I also post and explain what happens here http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence So is it right that to compute the complexity OF A SEQUENCE, the considered alphabet is the alphabet of ALL the sequences ? My point is that it is the complexity of the sequence ACCORDING to other sequences. What do you think Alexis? Is it an issue? Thanks, Z?phyrin Le 2015-03-09 07:26, Gilbert Ritschard a ?crit : > Hi Alexis, > > An example and the explanation has been given on stackoverflow > http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence > > Gilbert > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Alexis gabadinho > Sent: Monday, March 09, 2015 09:32 > To: traminer-users at lists.r-forge.r-project.org > Subject: Re: [Traminer-users] seqici() provides different results for the same sequence? > > Hi Z?phyrin, > > Without an example, it is difficult to give any answer. Could you describe the sequences and the code that you use ? > > Best, > Alexis > > Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit : >> Hi TraMineR users, >> I wonder if seqici() depends on the context? >> I have a sequence and I compute the complexity seqici(mySeq). >> I have the same sequence in a set of sequences, and I compute the >> complexity and have different values. >> >> Can some one help to know what happens? >> Note that I carefully check that the two sequences are the same. >> >> Thanks. >> Z?phyrin >> >> >> _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- >> users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From zephyrin.soh at gmail.com Mon Mar 9 15:56:39 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Mon, 09 Mar 2015 10:56:39 -0400 Subject: [Traminer-users] Occurence of sequences and time Message-ID: <54FDB4A7.5080801@gmail.com> Hi TraMineR users, I look at TraMineR and I wonder if someone can help me to deal with this two problems: 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? Thanks in advance for your help. Z?phyrin. From Gilbert.Ritschard at unige.ch Tue Mar 10 09:12:49 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Tue, 10 Mar 2015 08:12:49 +0000 Subject: [Traminer-users] Occurence of sequences and time In-Reply-To: <54FDB4A7.5080801@gmail.com> References: <54FDB4A7.5080801@gmail.com> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> Hi Z?phyrin, For a workaround to find substrings instead of subsequences look at http://stackoverflow.com/a/27230558/1586731 The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence). Best. Gilbert -----Original Message----- From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh Sent: Monday, March 09, 2015 15:57 To: Users questions Subject: [Traminer-users] Occurence of sequences and time Hi TraMineR users, I look at TraMineR and I wonder if someone can help me to deal with this two problems: 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? Thanks in advance for your help. Z?phyrin. _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From zephyrin.soh at gmail.com Wed Mar 11 12:50:17 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Wed, 11 Mar 2015 07:50:17 -0400 Subject: [Traminer-users] Occurence of sequences and time In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> References: <54FDB4A7.5080801@gmail.com> <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> Message-ID: <55002BF9.8050309@gmail.com> Hi Gilbert, Thanks a lot! Trying to run your script on my example (A-B-A-C-B-A-C-A-B-A), I observed the entry below (for the subsequence B-A). Subsequence Support Count ntrans nevent (B)-(A) 1 1 2 2 How can I interpret this line, since in my sequence, I have 3 occ of B-A? By the way, any advice about considering the duration in the state (or of the event)? Thanks, Z?phyrin Le 2015-03-10 04:12, Gilbert Ritschard a ?crit : > Hi Z?phyrin, > > For a workaround to find substrings instead of subsequences look at > http://stackoverflow.com/a/27230558/1586731 > > The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence). > > Best. > Gilbert > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh > Sent: Monday, March 09, 2015 15:57 > To: Users questions > Subject: [Traminer-users] Occurence of sequences and time > > Hi TraMineR users, > I look at TraMineR and I wonder if someone can help me to deal with this two problems: > > 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. > I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. > For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. > > 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? > I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? > > Thanks in advance for your help. > Z?phyrin. > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From zephyrin.soh at gmail.com Wed Mar 11 12:52:59 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Wed, 11 Mar 2015 07:52:59 -0400 Subject: [Traminer-users] Occurence of sequences and time In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> References: <54FDB4A7.5080801@gmail.com> <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> Message-ID: <55002C9B.1070909@gmail.com> PS: Even using CDIS in seqeconstraint. Le 2015-03-10 04:12, Gilbert Ritschard a ?crit : > Hi Z?phyrin, > > For a workaround to find substrings instead of subsequences look at > http://stackoverflow.com/a/27230558/1586731 > > The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence). > > Best. > Gilbert > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh > Sent: Monday, March 09, 2015 15:57 > To: Users questions > Subject: [Traminer-users] Occurence of sequences and time > > Hi TraMineR users, > I look at TraMineR and I wonder if someone can help me to deal with this two problems: > > 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. > I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. > For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. > > 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? > I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? > > Thanks in advance for your help. > Z?phyrin. > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From Gilbert.Ritschard at unige.ch Wed Mar 11 15:23:30 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Wed, 11 Mar 2015 14:23:30 +0000 Subject: [Traminer-users] Occurence of sequences and time In-Reply-To: <55002C9B.1070909@gmail.com> References: <54FDB4A7.5080801@gmail.com> <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> <55002C9B.1070909@gmail.com> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch> No problem here. I get the count 3. I suggest you post your question with the code you are using under the traminer tag on Stackoverflow so that we can see and fix your error. Best Gilbert -----Original Message----- From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh Sent: Wednesday, March 11, 2015 12:53 To: Users questions Subject: Re: [Traminer-users] Occurence of sequences and time PS: Even using CDIS in seqeconstraint. Le 2015-03-10 04:12, Gilbert Ritschard a ?crit : > Hi Z?phyrin, > > For a workaround to find substrings instead of subsequences look at > http://stackoverflow.com/a/27230558/1586731 > > The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence). > > Best. > Gilbert > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org > [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf > Of Z?phyrin Soh > Sent: Monday, March 09, 2015 15:57 > To: Users questions > Subject: [Traminer-users] Occurence of sequences and time > > Hi TraMineR users, > I look at TraMineR and I wonder if someone can help me to deal with this two problems: > > 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. > I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. > For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. > > 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? > I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? > > Thanks in advance for your help. > Z?phyrin. > > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- > users _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- > users _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From zephyrin.soh at gmail.com Wed Mar 11 15:55:39 2015 From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=) Date: Wed, 11 Mar 2015 10:55:39 -0400 Subject: [Traminer-users] Occurence of sequences and time In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch> References: <54FDB4A7.5080801@gmail.com> <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch> <55002C9B.1070909@gmail.com> <66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch> Message-ID: <5500576B.9070506@gmail.com> Oups! My mistake setting seqeconstraint(). Could you please show me how to combine with seqpm(). I would like to count the occurrences of a specific subsequence. I put as comments here http://stackoverflow.com/questions/27199222/searching-for-most-common-substrings-into-subsequences/28989953#28989953 Thanks. Z?phyrin Le 2015-03-11 10:23, Gilbert Ritschard a ?crit : > No problem here. I get the count 3. > > I suggest you post your question with the code you are using under the traminer tag on Stackoverflow so that we can see and fix your error. > > Best > Gilbert > > -----Original Message----- > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh > Sent: Wednesday, March 11, 2015 12:53 > To: Users questions > Subject: Re: [Traminer-users] Occurence of sequences and time > > > PS: Even using CDIS in seqeconstraint. > > Le 2015-03-10 04:12, Gilbert Ritschard a ?crit : >> Hi Z?phyrin, >> >> For a workaround to find substrings instead of subsequences look at >> http://stackoverflow.com/a/27230558/1586731 >> >> The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence). >> >> Best. >> Gilbert >> >> -----Original Message----- >> From: traminer-users-bounces at lists.r-forge.r-project.org >> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf >> Of Z?phyrin Soh >> Sent: Monday, March 09, 2015 15:57 >> To: Users questions >> Subject: [Traminer-users] Occurence of sequences and time >> >> Hi TraMineR users, >> I look at TraMineR and I wonder if someone can help me to deal with this two problems: >> >> 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence. >> I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states. >> For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence. >> >> 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong? >> I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration? >> >> Thanks in advance for your help. >> Z?phyrin. >> >> _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- >> users _______________________________________________ >> Traminer-users mailing list >> Traminer-users at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer- >> users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users > _______________________________________________ > Traminer-users mailing list > Traminer-users at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users From aron.lindberg at case.edu Wed Mar 11 16:29:08 2015 From: aron.lindberg at case.edu (Aron Lindberg) Date: Wed, 11 Mar 2015 08:29:08 -0700 (PDT) Subject: [Traminer-users] p-values for sequential association rules? Message-ID: <1426087747528.72703738@Nodemailer> Hi, When running sequential association rules I get several values in the output: ? ? ? ? ?Rules Support? ? ? Conf? ? ? Lift Standardlift ? JMeasure ImplicStat ? p.value 1 (I1) => (I2)? ? ? 15 0.7142857 0.7482993 ? -1.1607143 0.45894146 -0.9770084 0.1642825 2 (I2) => (I1)? ? ? 17 0.8095238 0.8480726 ? -0.4404762 0.20127953 -0.9770084 0.1642825 3 (I2) => (I3)? ? ? 17 0.8095238 0.9373434? ? 0.5193644 0.01626898? 0.0805823 0.5321129 >From here I understand what support, confidence, and lift are: http://stackoverflow.com/questions/27947556/traminerseqerules-help-page However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? Best, ARon --? Aron Lindberg Doctoral Candidate,?Information Systems Weatherhead School of Management? Case Western Reserve University aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Wed Mar 11 23:09:06 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Wed, 11 Mar 2015 22:09:06 +0000 Subject: [Traminer-users] p-values for sequential association rules? In-Reply-To: <1426087747528.72703738@Nodemailer> References: <1426087747528.72703738@Nodemailer> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch> Hi Aron, The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength. For more explanation see for example, Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420. Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf Best, Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Wednesday, March 11, 2015 16:29 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] p-values for sequential association rules? Hi, When running sequential association rules I get several values in the output: Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129 From here I understand what support, confidence, and lift are: http://stackoverflow.com/questions/27947556/traminerseqerules-help-page However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? Best, ARon -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.lindberg at case.edu Thu Mar 12 02:18:17 2015 From: aron.lindberg at case.edu (Aron Lindberg) Date: Wed, 11 Mar 2015 18:18:17 -0700 (PDT) Subject: [Traminer-users] p-values for sequential association rules? In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch> References: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch> Message-ID: <1426123096634.7488ccf6@Nodemailer> Thanks Gilbert, That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant? Best, Aron --? Aron Lindberg Doctoral Candidate,?Information Systems Weatherhead School of Management? Case Western Reserve University aronlindberg.github.io On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard wrote: > Hi Aron, > The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength. > For more explanation see for example, > Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420. > Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf > Best, > Gilbert > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg > Sent: Wednesday, March 11, 2015 16:29 > To: traminer-users at lists.r-forge.r-project.org > Subject: [Traminer-users] p-values for sequential association rules? > Hi, > When running sequential association rules I get several values in the output: > Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value > 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825 > 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825 > 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129 > From here I understand what support, confidence, and lift are: > http://stackoverflow.com/questions/27947556/traminerseqerules-help-page > However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. > Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? > Best, > ARon > -- > Aron Lindberg > Doctoral Candidate, Information Systems > Weatherhead School of Management > Case Western Reserve University > aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Thu Mar 12 08:42:03 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Thu, 12 Mar 2015 07:42:03 +0000 Subject: [Traminer-users] p-values for sequential association rules? In-Reply-To: <1426123096634.7488ccf6@Nodemailer> References: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch> <1426123096634.7488ccf6@Nodemailer> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch> Lift and implication strength are not equivalent, and can well differ as you have observed. Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Thursday, March 12, 2015 02:18 To: Users questions Cc: Users questions Subject: Re: [Traminer-users] p-values for sequential association rules? Thanks Gilbert, That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant? Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote: Hi Aron, The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength. For more explanation see for example, Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420. Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf Best, Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Wednesday, March 11, 2015 16:29 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] p-values for sequential association rules? Hi, When running sequential association rules I get several values in the output: Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129 From here I understand what support, confidence, and lift are: http://stackoverflow.com/questions/27947556/traminerseqerules-help-page However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? Best, ARon -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron.lindberg at case.edu Sat Mar 14 01:21:24 2015 From: aron.lindberg at case.edu (Aron Lindberg) Date: Fri, 13 Mar 2015 17:21:24 -0700 (PDT) Subject: [Traminer-users] p-values for sequential association rules? In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch> References: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch> Message-ID: <1426292483203.9dc2fc39@Nodemailer> Does that then mean that for a rule to really be considered as interesting, it needs both a lift above 1 and a p-value below 0.05? Or could a rule with a lift above 1 and an insignificant p-value still be of value? --? Aron Lindberg Doctoral Candidate,?Information Systems Weatherhead School of Management? Case Western Reserve University aronlindberg.github.io On Thu, Mar 12, 2015 at 12:42 AM, Gilbert Ritschard wrote: > Lift and implication strength are not equivalent, and can well differ as you have observed. > Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant. > Best. > Gilbert > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg > Sent: Thursday, March 12, 2015 02:18 > To: Users questions > Cc: Users questions > Subject: Re: [Traminer-users] p-values for sequential association rules? > Thanks Gilbert, > That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant? > Best, > Aron > -- > Aron Lindberg > Doctoral Candidate, Information Systems > Weatherhead School of Management > Case Western Reserve University > aronlindberg.github.io > On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote: > Hi Aron, > The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength. > For more explanation see for example, > Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420. > Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf > Best, > Gilbert > From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg > Sent: Wednesday, March 11, 2015 16:29 > To: traminer-users at lists.r-forge.r-project.org > Subject: [Traminer-users] p-values for sequential association rules? > Hi, > When running sequential association rules I get several values in the output: > Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value > 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825 > 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825 > 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129 > From here I understand what support, confidence, and lift are: > http://stackoverflow.com/questions/27947556/traminerseqerules-help-page > However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. > Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? > Best, > ARon > -- > Aron Lindberg > Doctoral Candidate, Information Systems > Weatherhead School of Management > Case Western Reserve University > aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gilbert.Ritschard at unige.ch Sun Mar 15 16:00:35 2015 From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard) Date: Sun, 15 Mar 2015 15:00:35 +0000 Subject: [Traminer-users] p-values for sequential association rules? In-Reply-To: <1426292483203.9dc2fc39@Nodemailer> References: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch> <1426292483203.9dc2fc39@Nodemailer> Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F6419@golf.isis.unige.ch> Hi Aron, There are several papers in the literature that are devoted to the comparison between interestingness measures of association rules. See for example Lenca et al. (2006) http://www.sciencedirect.com/science/article/pii/S0377221706011465 . On page 619 of that paper there is something about the significance of the lift. Best, Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Saturday, March 14, 2015 01:21 To: Users questions Cc: Users questions Subject: Re: [Traminer-users] p-values for sequential association rules? Does that then mean that for a rule to really be considered as interesting, it needs both a lift above 1 and a p-value below 0.05? Or could a rule with a lift above 1 and an insignificant p-value still be of value? -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Thu, Mar 12, 2015 at 12:42 AM, Gilbert Ritschard > wrote: Lift and implication strength are not equivalent, and can well differ as you have observed. Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant. Best. Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Thursday, March 12, 2015 02:18 To: Users questions Cc: Users questions Subject: Re: [Traminer-users] p-values for sequential association rules? Thanks Gilbert, That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant? Best, Aron -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote: Hi Aron, The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength. For more explanation see for example, Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420. Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf Best, Gilbert From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg Sent: Wednesday, March 11, 2015 16:29 To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] p-values for sequential association rules? Hi, When running sequential association rules I get several values in the output: Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129 From here I understand what support, confidence, and lift are: http://stackoverflow.com/questions/27947556/traminerseqerules-help-page However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant. Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant? Best, ARon -- Aron Lindberg Doctoral Candidate, Information Systems Weatherhead School of Management Case Western Reserve University aronlindberg.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From elie.chosson at upmf-grenoble.fr Tue Mar 17 14:23:55 2015 From: elie.chosson at upmf-grenoble.fr (ELIE CHOSSON) Date: Tue, 17 Mar 2015 14:23:55 +0100 (CET) Subject: [Traminer-users] Method options in seqdist functions: "OMloc" and others... In-Reply-To: <1997679886.2243354.1426596532614.JavaMail.zimbra@upmf-grenoble.fr> Message-ID: <1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr> Hello TraMineR community! I work on Optimal Matching Analysis and I'm interested about extensions of the method. In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.). (For example here) But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package). I install the development version but the seqdist function still not provide this options. So my question is: there is these options availables somewhere now? If not, I would like to create variable indel costs but it seem to be hard to do... I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...) I thank you for your answers, and I thank a lot all contributors to TraMineR. Sorry for my english, Kind regards, -- Elie Chosson, Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R), Centre de Recherche en ?conomie de Grenoble (CREG) BATEG, Bureau 513, tel: 04 76 82 59 89 BP 47 - 38040 Grenoble Cedex 9 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthias.Studer at unige.ch Tue Mar 17 21:28:45 2015 From: Matthias.Studer at unige.ch (Matthias Studer) Date: Tue, 17 Mar 2015 20:28:45 +0000 Subject: [Traminer-users] Method options in seqdist functions: "OMloc" and others... In-Reply-To: <1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr> References: <1997679886.2243354.1426596532614.JavaMail.zimbra@upmf-grenoble.fr> <1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr> Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7166C556D@kilo.isis.unige.ch> Dear Elie, Unfortunately, we are a little bit late in our release plan, because some special cases still need some work (such as missing data handling and unequal sequence length for some distances/configurations of parameters). We have decided to make all these distances measures available in a separate package until the we can merge them in TraMineR. This is a development package, so please note that you may experiences some errors code for some cases. The package is now available as the seqdist2 library here: https://r-forge.r-project.org/R/?group_id=743 More information about the argument and all the distances can be found in Studer, Matthias, Ritschard, Gilbert, A comparative review of sequence dissimilarity measures, LIVES Working Paper 2014/33 Best, Matthias De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de ELIE CHOSSON Envoy? : mardi 17 mars 2015 14:24 ? : traminer-users at lists.r-forge.r-project.org Objet : [Traminer-users] Method options in seqdist functions: "OMloc" and others... Hello TraMineR community! I work on Optimal Matching Analysis and I'm interested about extensions of the method. In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.). (For example here) But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package). I install the development version but the seqdist function still not provide this options. So my question is: there is these options availables somewhere now? If not, I would like to create variable indel costs but it seem to be hard to do... I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...) I thank you for your answers, and I thank a lot all contributors to TraMineR. Sorry for my english, Kind regards, -- ________________________________ Elie Chosson, Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R), Centre de Recherche en ?conomie de Grenoble (CREG) BATEG, Bureau 513, tel: 04 76 82 59 89 BP 47 - 38040 Grenoble Cedex 9 ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elie.chosson at upmf-grenoble.fr Wed Mar 18 12:32:31 2015 From: elie.chosson at upmf-grenoble.fr (ELIE CHOSSON) Date: Wed, 18 Mar 2015 12:32:31 +0100 (CET) Subject: [Traminer-users] Traminer-users Digest, Vol 44, Issue 12 In-Reply-To: References: Message-ID: <1484524036.2838167.1426678351297.JavaMail.zimbra@upmf-grenoble.fr> Wow! I'm going to test this development package, and of course with all precaution for use. Luckily i have not missing values neither different sequence's lengths in my data... Thank you very much, Best, Elie ----- Mail original ----- De: traminer-users-request at lists.r-forge.r-project.org ?: traminer-users at lists.r-forge.r-project.org Envoy?: Mercredi 18 Mars 2015 12:00:04 Objet: Traminer-users Digest, Vol 44, Issue 12 Send Traminer-users mailing list submissions to traminer-users at lists.r-forge.r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users or, via email, send a message with subject or body 'help' to traminer-users-request at lists.r-forge.r-project.org You can reach the person managing the list at traminer-users-owner at lists.r-forge.r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Traminer-users digest..." Today's Topics: 1. Method options in seqdist functions: "OMloc" and others... (ELIE CHOSSON) 2. Re: Method options in seqdist functions: "OMloc" and others... (Matthias Studer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 17 Mar 2015 14:23:55 +0100 (CET) From: ELIE CHOSSON To: traminer-users at lists.r-forge.r-project.org Subject: [Traminer-users] Method options in seqdist functions: "OMloc" and others... Message-ID: <1953565023.2273114.1426598635194.JavaMail.zimbra at upmf-grenoble.fr> Content-Type: text/plain; charset="utf-8" Hello TraMineR community! I work on Optimal Matching Analysis and I'm interested about extensions of the method. In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.). (For example here) But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package). I install the development version but the seqdist function still not provide this options. So my question is: there is these options availables somewhere now? If not, I would like to create variable indel costs but it seem to be hard to do... I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...) I thank you for your answers, and I thank a lot all contributors to TraMineR. Sorry for my english, Kind regards, -- Elie Chosson, Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R), Centre de Recherche en ?conomie de Grenoble (CREG) BATEG, Bureau 513, tel: 04 76 82 59 89 BP 47 - 38040 Grenoble Cedex 9 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 17 Mar 2015 20:28:45 +0000 From: Matthias Studer To: Users questions Subject: Re: [Traminer-users] Method options in seqdist functions: "OMloc" and others... Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7166C556D at kilo.isis.unige.ch> Content-Type: text/plain; charset="utf-8" Dear Elie, Unfortunately, we are a little bit late in our release plan, because some special cases still need some work (such as missing data handling and unequal sequence length for some distances/configurations of parameters). We have decided to make all these distances measures available in a separate package until the we can merge them in TraMineR. This is a development package, so please note that you may experiences some errors code for some cases. The package is now available as the seqdist2 library here: https://r-forge.r-project.org/R/?group_id=743 More information about the argument and all the distances can be found in Studer, Matthias, Ritschard, Gilbert, A comparative review of sequence dissimilarity measures, LIVES Working Paper 2014/33 Best, Matthias De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de ELIE CHOSSON Envoy? : mardi 17 mars 2015 14:24 ? : traminer-users at lists.r-forge.r-project.org Objet : [Traminer-users] Method options in seqdist functions: "OMloc" and others... Hello TraMineR community! I work on Optimal Matching Analysis and I'm interested about extensions of the method. In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.). (For example here) But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package). I install the development version but the seqdist function still not provide this options. So my question is: there is these options availables somewhere now? If not, I would like to create variable indel costs but it seem to be hard to do... I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...) I thank you for your answers, and I thank a lot all contributors to TraMineR. Sorry for my english, Kind regards, -- ________________________________ Elie Chosson, Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R), Centre de Recherche en ?conomie de Grenoble (CREG) BATEG, Bureau 513, tel: 04 76 82 59 89 BP 47 - 38040 Grenoble Cedex 9 ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users End of Traminer-users Digest, Vol 44, Issue 12 ********************************************** -- Elie Chosson, Doctorant contractuel, BATEG, Bureau 513, tel: 04 76 82 59 89 Centre de Recherche en ?conomie de Grenoble (CREG) BP 47 - 38040 Grenoble Cedex 9 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcant at sse.com.au Tue Mar 31 07:05:13 2015 From: rcant at sse.com.au (Rosemary Cant) Date: Tue, 31 Mar 2015 13:05:13 +0800 Subject: [Traminer-users] Use of TraMineR in Australia Message-ID: <551A2B09.2040100@sse.com.au> I am a PhD student in the School of Population Health at the University of Western Australia. I propose to use Dynamic Hamming Analysis to examine the trajectories/careers in the child protection system from the time of first report of children reported to the Western Australia statutory authority for possible child sexual abuse. I am keen to make contact with researchers in Australia who have used sequence analysis and/or TraMinerR. Rosemary Cant -- Rosemary Cant Social Systems and Evaluation PO Box 8009 Perth Business Centre 6849 Telephone: 08 93283086 Email: rcant at sse.com.au