From pit.blavier at gmail.com Mon Mar 2 09:26:05 2015
From: pit.blavier at gmail.com (Pierre Blavier)
Date: Mon, 2 Mar 2015 09:26:05 +0100
Subject: [Traminer-users] transforming the x-axis scale of seqdplot
Message-ID:
Hi TramineR users !
Rather silly question :
What i have : i have data in STS-format describing time-use of each
individual accross a 24H-day and for periods of 10 minutes. Hence i have
24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to
06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the
maximum allowed by tramineR).
My question is : how could i obtain in the plots of sequence (for e.g.
seqdplot) an x-axis legend with time in hours instead of time corresponding
to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144)
? For example something like only the hours beginnning at 6h00 A.M. ?
It looks silly question but i have not found yet the solution even if i
have screen subjects of this list, TramineR manuals, and consulted options
of the seqdplot function (xtlab, start, ...). Does anybody have already met
the problem ?
Thanks a lot, best,
Pierre Blavier
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From hc at parisgeo.cnrs.fr Mon Mar 2 10:32:29 2015
From: hc at parisgeo.cnrs.fr (Hadrien Commenges)
Date: Mon, 2 Mar 2015 10:32:29 +0100 (CET)
Subject: [Traminer-users] transforming the x-axis scale of seqdplot
In-Reply-To:
References:
Message-ID: <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr>
The x-axis shows the column names, so you just need to rename your variables. Be careful: R accepts names beginning with a number (6:00 for example) but you can't call that variable afterwards (mydata$6:00).
Hadrien
----- Mail original -----
De: "Pierre Blavier"
?: "Users questions"
Envoy?: Lundi 2 Mars 2015 09:26:05
Objet: [Traminer-users] transforming the x-axis scale of seqdplot
Hi TramineR users !
Rather silly question :
What i have : i have data in STS-format describing time-use of each individual accross a 24H-day and for periods of 10 minutes. Hence i have 24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to 06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the maximum allowed by tramineR).
My question is : how could i obtain in the plots of sequence (for e.g. seqdplot) an x-axis legend with time in hours instead of time corresponding to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144) ? For example something like only the hours beginnning at 6h00 A.M. ?
It looks silly question but i have not found yet the solution even if i have screen subjects of this list, TramineR manuals, and consulted options of the seqdplot function (xtlab, start, ...). Does anybody have already met the problem ?
Thanks a lot, best,
Pierre Blavier
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From dina.frommert at gmx.net Mon Mar 2 11:09:49 2015
From: dina.frommert at gmx.net (Dina Frommert)
Date: Mon, 2 Mar 2015 11:09:49 +0100
Subject: [Traminer-users] transforming the x-axis scale of seqdplot
In-Reply-To: <1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr>
References: ,
<1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr>
Message-ID:
An HTML attachment was scrubbed...
URL:
From Gilbert.Ritschard at unige.ch Tue Mar 3 10:28:43 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Tue, 3 Mar 2015 09:28:43 +0000
Subject: [Traminer-users] transforming the x-axis scale of seqdplot
In-Reply-To:
References: ,
<1338392307.1133397.1425288749602.JavaMail.zimbra@parisgeo.cnrs.fr>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16EC20E@golf.isis.unige.ch>
The best solution is to provide the labels (column names) once at the seqdef stage with the ?cnames? argument. The labels will then be consistently used for each plot (unless overwritten with the xtlab argument of the seqplot function).
Likewise, the xtstep argument, which is also best specified at the seqdef stage, can be changed for a specific plot by providing a xtstep argument to the plot function (see plot.stslist help page).
Best.
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Dina Frommert
Sent: Monday, March 02, 2015 11:10
To: traminer-users at lists.r-forge.r-project.org
Subject: Re: [Traminer-users] transforming the x-axis scale of seqdplot
Hi Pierre,
you could use xtlab in the seqplot function to specify the labels and xtstep in the seqdef function to specifiy the step between the tickmarks.
Best
Dina
Gesendet: Montag, 02. M?rz 2015 um 10:32 Uhr
Von: "Hadrien Commenges" >
An: "Users questions" >
Betreff: Re: [Traminer-users] transforming the x-axis scale of seqdplot
The x-axis shows the column names, so you just need to rename your variables. Be careful: R accepts names beginning with a number (6:00 for example) but you can't call that variable afterwards (mydata$6:00).
Hadrien
________________________________
De: "Pierre Blavier" >
?: "Users questions" >
Envoy?: Lundi 2 Mars 2015 09:26:05
Objet: [Traminer-users] transforming the x-axis scale of seqdplot
Hi TramineR users !
Rather silly question :
What i have : i have data in STS-format describing time-use of each individual accross a 24H-day and for periods of 10 minutes. Hence i have 24*6=144 variables (time1, time2, time3, ..., time144. time1 corresponds to 06h00-06h10 A.M. andsoforth) describing 12 possible states (which is the maximum allowed by tramineR).
My question is : how could i obtain in the plots of sequence (for e.g. seqdplot) an x-axis legend with time in hours instead of time corresponding to the index of my variables defining the sequence (i.e. 1, 2, 3, ..., 144) ? For example something like only the hours beginnning at 6h00 A.M. ?
It looks silly question but i have not found yet the solution even if i have screen subjects of this list, TramineR manuals, and consulted options of the seqdplot function (xtlab, start, ...). Does anybody have already met the problem ?
Thanks a lot, best,
Pierre Blavier
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
_______________________________________________ Traminer-users mailing list Traminer-users at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aron.lindberg at case.edu Tue Mar 3 22:52:42 2015
From: aron.lindberg at case.edu (Aron Lindberg)
Date: Tue, 03 Mar 2015 13:52:42 -0800 (PST)
Subject: [Traminer-users] Error in seqefsub
Message-ID: <1425419561471.5d56b0f4@Nodemailer>
Hi,
I?m trying to run code along these lines:
required_packages <- c('TraMineR','TraMineRextras','magrittr', 'dplyr', ?rmarkdown?, ?devtools')
new_packages <- required_packages[!(required_packages %in% installed.packages()[,'Package'])]
if(length(new_packages)) install.packages(new_packages)
lapply(required_packages, library, character.only=T)
set.seed(1)
setwd("/home/rstudio/crowd_sequencing")
data <- read.csv(paste0(getwd(), "/data/dataset_error.csv"), header=TRUE)
data <- subset(data, select = c("SequenceIndex", "PostOrder", "Category"))
colnames(data) <- c("id", "time", "event")
data$end <- data$time
data <- data[with(data, order(time)), ]
data$time <- match(data$time, unique(data$time))
data$end <- match(data$end, unique(data$end))
data.sample <- data
# data.sample = data[sample(nrow(data), nrow(data)*0.01), ]
data.sample <- data.sample[order(data.sample$id), ]
data.seqe <- seqecreate(data = data.sample, id = id, timestamp = time,?
? ? ? ? ? ? ? ? ? ? ? ? event = event)
# Frequent subsequences:
fsubseq <- seqefsub(data.seqe, pMinSupport = 0.65, maxK = 3)
However, this generates:
Error in class(ret$subseq) <- c("seqelist", "list") :
attempt to set an attribute on NULLwhich is surprising because everything seems fine up until the fsubseq function is called.I?ve put the full dataset here:
https://gist.github.com/aronlindberg/0421ca91dd598f74017e
Best,
Aron
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aron.lindberg at case.edu Tue Mar 3 22:55:16 2015
From: aron.lindberg at case.edu (Aron Lindberg)
Date: Tue, 03 Mar 2015 13:55:16 -0800 (PST)
Subject: [Traminer-users] Error in seqefsub
In-Reply-To: <1425419561471.5d56b0f4@Nodemailer>
References: <1425419561471.5d56b0f4@Nodemailer>
Message-ID: <1425419716076.29f2febf@Nodemailer>
Sorry, scratch that, as soon as I went below pMinSupport = 0.5 I realize that there are no subsequences which are in more than 50% of the sequences.?
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
On Tue, Mar 3, 2015 at 4:52 PM, Aron Lindberg
wrote:
> Hi,
> I?m trying to run code along these lines:
> required_packages <- c('TraMineR','TraMineRextras','magrittr', 'dplyr', ?rmarkdown?, ?devtools')
> new_packages <- required_packages[!(required_packages %in% installed.packages()[,'Package'])]
> if(length(new_packages)) install.packages(new_packages)
> lapply(required_packages, library, character.only=T)
> set.seed(1)
> setwd("/home/rstudio/crowd_sequencing")
> data <- read.csv(paste0(getwd(), "/data/dataset_error.csv"), header=TRUE)
> data <- subset(data, select = c("SequenceIndex", "PostOrder", "Category"))
> colnames(data) <- c("id", "time", "event")
> data$end <- data$time
> data <- data[with(data, order(time)), ]
> data$time <- match(data$time, unique(data$time))
> data$end <- match(data$end, unique(data$end))
> data.sample <- data
> # data.sample = data[sample(nrow(data), nrow(data)*0.01), ]
> data.sample <- data.sample[order(data.sample$id), ]
> data.seqe <- seqecreate(data = data.sample, id = id, timestamp = time,?
> ? ? ? ? ? ? ? ? ? ? ? ? event = event)
> # Frequent subsequences:
> fsubseq <- seqefsub(data.seqe, pMinSupport = 0.65, maxK = 3)
> However, this generates:
> Error in class(ret$subseq) <- c("seqelist", "list") :
> attempt to set an attribute on NULLwhich is surprising because everything seems fine up until the fsubseq function is called.I?ve put the full dataset here:
> https://gist.github.com/aronlindberg/0421ca91dd598f74017e
> Best,
> Aron
> --?
> Aron Lindberg
> Doctoral Candidate,?Information Systems
> Weatherhead School of Management?
> Case Western Reserve University
> aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From zephyrin.soh at gmail.com Sat Mar 7 16:58:56 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Sat, 07 Mar 2015 10:58:56 -0500
Subject: [Traminer-users] seqici() provides different results for the same
sequence?
Message-ID: <54FB2040.8030209@gmail.com>
Hi TraMineR users,
I wonder if seqici() depends on the context?
I have a sequence and I compute the complexity seqici(mySeq).
I have the same sequence in a set of sequences, and I compute the
complexity and have different values.
Can some one help to know what happens?
Note that I carefully check that the two sequences are the same.
Thanks.
Z?phyrin
From alexis.gabadinho at unige.ch Mon Mar 9 09:32:17 2015
From: alexis.gabadinho at unige.ch (Alexis gabadinho)
Date: Mon, 9 Mar 2015 08:32:17 +0000
Subject: [Traminer-users] seqici() provides different results for the
same sequence?
In-Reply-To: <54FB2040.8030209@gmail.com>
References: <54FB2040.8030209@gmail.com>
Message-ID: <54FD5A91.101@unige.ch>
Hi Z?phyrin,
Without an example, it is difficult to give any answer. Could you
describe the sequences and the code that you use ?
Best,
Alexis
Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit :
> Hi TraMineR users,
> I wonder if seqici() depends on the context?
> I have a sequence and I compute the complexity seqici(mySeq).
> I have the same sequence in a set of sequences, and I compute the
> complexity and have different values.
>
> Can some one help to know what happens?
> Note that I carefully check that the two sequences are the same.
>
> Thanks.
> Z?phyrin
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From Gilbert.Ritschard at unige.ch Mon Mar 9 12:26:17 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Mon, 9 Mar 2015 11:26:17 +0000
Subject: [Traminer-users] seqici() provides different results for the
same sequence?
In-Reply-To: <54FD5A91.101@unige.ch>
References: <54FB2040.8030209@gmail.com> <54FD5A91.101@unige.ch>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch>
Hi Alexis,
An example and the explanation has been given on stackoverflow
http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence
Gilbert
-----Original Message-----
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Alexis gabadinho
Sent: Monday, March 09, 2015 09:32
To: traminer-users at lists.r-forge.r-project.org
Subject: Re: [Traminer-users] seqici() provides different results for the same sequence?
Hi Z?phyrin,
Without an example, it is difficult to give any answer. Could you describe the sequences and the code that you use ?
Best,
Alexis
Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit :
> Hi TraMineR users,
> I wonder if seqici() depends on the context?
> I have a sequence and I compute the complexity seqici(mySeq).
> I have the same sequence in a set of sequences, and I compute the
> complexity and have different values.
>
> Can some one help to know what happens?
> Note that I carefully check that the two sequences are the same.
>
> Thanks.
> Z?phyrin
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
> users
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From zephyrin.soh at gmail.com Mon Mar 9 14:01:40 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Mon, 09 Mar 2015 09:01:40 -0400
Subject: [Traminer-users] seqici() provides different results for the
same sequence?
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch>
References: <54FB2040.8030209@gmail.com> <54FD5A91.101@unige.ch>
<66ABD43696E3DB4687E0BB396A76E5F16F0E66@golf.isis.unige.ch>
Message-ID: <54FD99B4.6060900@gmail.com>
Exactely Gilbert,
I also post and explain what happens here
http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence
So is it right that to compute the complexity OF A SEQUENCE, the
considered alphabet is the alphabet of ALL the sequences ? My point is
that it is the complexity of the sequence ACCORDING to other sequences.
What do you think Alexis? Is it an issue?
Thanks,
Z?phyrin
Le 2015-03-09 07:26, Gilbert Ritschard a ?crit :
> Hi Alexis,
>
> An example and the explanation has been given on stackoverflow
> http://stackoverflow.com/questions/28916712/traminer-seqici-provide-different-result-for-the-same-sequence
>
> Gilbert
>
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Alexis gabadinho
> Sent: Monday, March 09, 2015 09:32
> To: traminer-users at lists.r-forge.r-project.org
> Subject: Re: [Traminer-users] seqici() provides different results for the same sequence?
>
> Hi Z?phyrin,
>
> Without an example, it is difficult to give any answer. Could you describe the sequences and the code that you use ?
>
> Best,
> Alexis
>
> Le 07. 03. 15 15:58, Z?phyrin Soh a ?crit :
>> Hi TraMineR users,
>> I wonder if seqici() depends on the context?
>> I have a sequence and I compute the complexity seqici(mySeq).
>> I have the same sequence in a set of sequences, and I compute the
>> complexity and have different values.
>>
>> Can some one help to know what happens?
>> Note that I carefully check that the two sequences are the same.
>>
>> Thanks.
>> Z?phyrin
>>
>>
>> _______________________________________________
>> Traminer-users mailing list
>> Traminer-users at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
>> users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From zephyrin.soh at gmail.com Mon Mar 9 15:56:39 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Mon, 09 Mar 2015 10:56:39 -0400
Subject: [Traminer-users] Occurence of sequences and time
Message-ID: <54FDB4A7.5080801@gmail.com>
Hi TraMineR users,
I look at TraMineR and I wonder if someone can help me to deal with this
two problems:
1. Subsequence/substring mining: I can find if a sequence contains a
subsequence/substring. But I would like to know the number of occurence
of subsequence/substring in a sequence.
I am interesting in the substring and not subsequence. As I know,
subsequence also consider non consecutive states, but I want to consider
only consecutive states.
For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3
as the number of B-A in the sequence.
2. Time: TraMineR consider the time as the time when the event occurs or
when the state is reaches. Am I wrong?
I would like to consider the time spent (duration) in a state. AAs the
duration (in sec. for example) can be a "double" (e.g., 4.73), is there
any way to mine sequences with states/events duration?
Thanks in advance for your help.
Z?phyrin.
From Gilbert.Ritschard at unige.ch Tue Mar 10 09:12:49 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Tue, 10 Mar 2015 08:12:49 +0000
Subject: [Traminer-users] Occurence of sequences and time
In-Reply-To: <54FDB4A7.5080801@gmail.com>
References: <54FDB4A7.5080801@gmail.com>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
Hi Z?phyrin,
For a workaround to find substrings instead of subsequences look at
http://stackoverflow.com/a/27230558/1586731
The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence).
Best.
Gilbert
-----Original Message-----
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh
Sent: Monday, March 09, 2015 15:57
To: Users questions
Subject: [Traminer-users] Occurence of sequences and time
Hi TraMineR users,
I look at TraMineR and I wonder if someone can help me to deal with this two problems:
1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence.
I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states.
For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence.
2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong?
I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration?
Thanks in advance for your help.
Z?phyrin.
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From zephyrin.soh at gmail.com Wed Mar 11 12:50:17 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Wed, 11 Mar 2015 07:50:17 -0400
Subject: [Traminer-users] Occurence of sequences and time
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
References: <54FDB4A7.5080801@gmail.com>
<66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
Message-ID: <55002BF9.8050309@gmail.com>
Hi Gilbert,
Thanks a lot!
Trying to run your script on my example (A-B-A-C-B-A-C-A-B-A), I
observed the entry below (for the subsequence B-A).
Subsequence Support Count ntrans nevent
(B)-(A) 1 1 2 2
How can I interpret this line, since in my sequence, I have 3 occ of B-A?
By the way, any advice about considering the duration in the state (or
of the event)?
Thanks,
Z?phyrin
Le 2015-03-10 04:12, Gilbert Ritschard a ?crit :
> Hi Z?phyrin,
>
> For a workaround to find substrings instead of subsequences look at
> http://stackoverflow.com/a/27230558/1586731
>
> The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence).
>
> Best.
> Gilbert
>
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh
> Sent: Monday, March 09, 2015 15:57
> To: Users questions
> Subject: [Traminer-users] Occurence of sequences and time
>
> Hi TraMineR users,
> I look at TraMineR and I wonder if someone can help me to deal with this two problems:
>
> 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence.
> I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states.
> For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence.
>
> 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong?
> I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration?
>
> Thanks in advance for your help.
> Z?phyrin.
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From zephyrin.soh at gmail.com Wed Mar 11 12:52:59 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Wed, 11 Mar 2015 07:52:59 -0400
Subject: [Traminer-users] Occurence of sequences and time
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
References: <54FDB4A7.5080801@gmail.com>
<66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
Message-ID: <55002C9B.1070909@gmail.com>
PS: Even using CDIS in seqeconstraint.
Le 2015-03-10 04:12, Gilbert Ritschard a ?crit :
> Hi Z?phyrin,
>
> For a workaround to find substrings instead of subsequences look at
> http://stackoverflow.com/a/27230558/1586731
>
> The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence).
>
> Best.
> Gilbert
>
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh
> Sent: Monday, March 09, 2015 15:57
> To: Users questions
> Subject: [Traminer-users] Occurence of sequences and time
>
> Hi TraMineR users,
> I look at TraMineR and I wonder if someone can help me to deal with this two problems:
>
> 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence.
> I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states.
> For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence.
>
> 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong?
> I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration?
>
> Thanks in advance for your help.
> Z?phyrin.
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From Gilbert.Ritschard at unige.ch Wed Mar 11 15:23:30 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Wed, 11 Mar 2015 14:23:30 +0000
Subject: [Traminer-users] Occurence of sequences and time
In-Reply-To: <55002C9B.1070909@gmail.com>
References: <54FDB4A7.5080801@gmail.com>
<66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
<55002C9B.1070909@gmail.com>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch>
No problem here. I get the count 3.
I suggest you post your question with the code you are using under the traminer tag on Stackoverflow so that we can see and fix your error.
Best
Gilbert
-----Original Message-----
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh
Sent: Wednesday, March 11, 2015 12:53
To: Users questions
Subject: Re: [Traminer-users] Occurence of sequences and time
PS: Even using CDIS in seqeconstraint.
Le 2015-03-10 04:12, Gilbert Ritschard a ?crit :
> Hi Z?phyrin,
>
> For a workaround to find substrings instead of subsequences look at
> http://stackoverflow.com/a/27230558/1586731
>
> The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence).
>
> Best.
> Gilbert
>
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org
> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf
> Of Z?phyrin Soh
> Sent: Monday, March 09, 2015 15:57
> To: Users questions
> Subject: [Traminer-users] Occurence of sequences and time
>
> Hi TraMineR users,
> I look at TraMineR and I wonder if someone can help me to deal with this two problems:
>
> 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence.
> I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states.
> For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence.
>
> 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong?
> I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration?
>
> Thanks in advance for your help.
> Z?phyrin.
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
> users _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
> users
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From zephyrin.soh at gmail.com Wed Mar 11 15:55:39 2015
From: zephyrin.soh at gmail.com (=?UTF-8?B?WsOpcGh5cmluIFNvaA==?=)
Date: Wed, 11 Mar 2015 10:55:39 -0400
Subject: [Traminer-users] Occurence of sequences and time
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch>
References: <54FDB4A7.5080801@gmail.com>
<66ABD43696E3DB4687E0BB396A76E5F16F1BA9@golf.isis.unige.ch>
<55002C9B.1070909@gmail.com>
<66ABD43696E3DB4687E0BB396A76E5F16F2F06@golf.isis.unige.ch>
Message-ID: <5500576B.9070506@gmail.com>
Oups! My mistake setting seqeconstraint().
Could you please show me how to combine with seqpm().
I would like to count the occurrences of a specific subsequence.
I put as comments here
http://stackoverflow.com/questions/27199222/searching-for-most-common-substrings-into-subsequences/28989953#28989953
Thanks.
Z?phyrin
Le 2015-03-11 10:23, Gilbert Ritschard a ?crit :
> No problem here. I get the count 3.
>
> I suggest you post your question with the code you are using under the traminer tag on Stackoverflow so that we can see and fix your error.
>
> Best
> Gilbert
>
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Z?phyrin Soh
> Sent: Wednesday, March 11, 2015 12:53
> To: Users questions
> Subject: Re: [Traminer-users] Occurence of sequences and time
>
>
> PS: Even using CDIS in seqeconstraint.
>
> Le 2015-03-10 04:12, Gilbert Ritschard a ?crit :
>> Hi Z?phyrin,
>>
>> For a workaround to find substrings instead of subsequences look at
>> http://stackoverflow.com/a/27230558/1586731
>>
>> The seqeconstraint function used there also allows to control for the counting method (use CDIS for counting distinct occurrences in a same sequence).
>>
>> Best.
>> Gilbert
>>
>> -----Original Message-----
>> From: traminer-users-bounces at lists.r-forge.r-project.org
>> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf
>> Of Z?phyrin Soh
>> Sent: Monday, March 09, 2015 15:57
>> To: Users questions
>> Subject: [Traminer-users] Occurence of sequences and time
>>
>> Hi TraMineR users,
>> I look at TraMineR and I wonder if someone can help me to deal with this two problems:
>>
>> 1. Subsequence/substring mining: I can find if a sequence contains a subsequence/substring. But I would like to know the number of occurence of subsequence/substring in a sequence.
>> I am interesting in the substring and not subsequence. As I know, subsequence also consider non consecutive states, but I want to consider only consecutive states.
>> For example, in the sequence A-B-A-C-B-A-C-A-B-A, I would like to have 3 as the number of B-A in the sequence.
>>
>> 2. Time: TraMineR consider the time as the time when the event occurs or when the state is reaches. Am I wrong?
>> I would like to consider the time spent (duration) in a state. AAs the duration (in sec. for example) can be a "double" (e.g., 4.73), is there any way to mine sequences with states/events duration?
>>
>> Thanks in advance for your help.
>> Z?phyrin.
>>
>> _______________________________________________
>> Traminer-users mailing list
>> Traminer-users at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
>> users _______________________________________________
>> Traminer-users mailing list
>> Traminer-users at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
>> users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
From aron.lindberg at case.edu Wed Mar 11 16:29:08 2015
From: aron.lindberg at case.edu (Aron Lindberg)
Date: Wed, 11 Mar 2015 08:29:08 -0700 (PDT)
Subject: [Traminer-users] p-values for sequential association rules?
Message-ID: <1426087747528.72703738@Nodemailer>
Hi,
When running sequential association rules I get several values in the output:
? ? ? ? ?Rules Support? ? ? Conf? ? ? Lift Standardlift ? JMeasure ImplicStat ? p.value
1 (I1) => (I2)? ? ? 15 0.7142857 0.7482993 ? -1.1607143 0.45894146 -0.9770084 0.1642825
2 (I2) => (I1)? ? ? 17 0.8095238 0.8480726 ? -0.4404762 0.20127953 -0.9770084 0.1642825
3 (I2) => (I3)? ? ? 17 0.8095238 0.9373434? ? 0.5193644 0.01626898? 0.0805823 0.5321129
>From here I understand what support, confidence, and lift are:
http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
Best,
ARon
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Gilbert.Ritschard at unige.ch Wed Mar 11 23:09:06 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Wed, 11 Mar 2015 22:09:06 +0000
Subject: [Traminer-users] p-values for sequential association rules?
In-Reply-To: <1426087747528.72703738@Nodemailer>
References: <1426087747528.72703738@Nodemailer>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch>
Hi Aron,
The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.
For more explanation see for example,
Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf
Best,
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Wednesday, March 11, 2015 16:29
To: traminer-users at lists.r-forge.r-project.org
Subject: [Traminer-users] p-values for sequential association rules?
Hi,
When running sequential association rules I get several values in the output:
Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value
1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825
2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825
3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129
From here I understand what support, confidence, and lift are:
http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
Best,
ARon
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aron.lindberg at case.edu Thu Mar 12 02:18:17 2015
From: aron.lindberg at case.edu (Aron Lindberg)
Date: Wed, 11 Mar 2015 18:18:17 -0700 (PDT)
Subject: [Traminer-users] p-values for sequential association rules?
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch>
References: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch>
Message-ID: <1426123096634.7488ccf6@Nodemailer>
Thanks Gilbert,
That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant?
Best,
Aron
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard
wrote:
> Hi Aron,
> The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.
> For more explanation see for example,
> Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
> Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf
> Best,
> Gilbert
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
> Sent: Wednesday, March 11, 2015 16:29
> To: traminer-users at lists.r-forge.r-project.org
> Subject: [Traminer-users] p-values for sequential association rules?
> Hi,
> When running sequential association rules I get several values in the output:
> Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value
> 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825
> 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825
> 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129
> From here I understand what support, confidence, and lift are:
> http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
> However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
> Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
> Best,
> ARon
> --
> Aron Lindberg
> Doctoral Candidate, Information Systems
> Weatherhead School of Management
> Case Western Reserve University
> aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Gilbert.Ritschard at unige.ch Thu Mar 12 08:42:03 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Thu, 12 Mar 2015 07:42:03 +0000
Subject: [Traminer-users] p-values for sequential association rules?
In-Reply-To: <1426123096634.7488ccf6@Nodemailer>
References: <66ABD43696E3DB4687E0BB396A76E5F16F3541@golf.isis.unige.ch>
<1426123096634.7488ccf6@Nodemailer>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch>
Lift and implication strength are not equivalent, and can well differ as you have observed.
Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant.
Best.
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Thursday, March 12, 2015 02:18
To: Users questions
Cc: Users questions
Subject: Re: [Traminer-users] p-values for sequential association rules?
Thanks Gilbert,
That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant?
Best,
Aron
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote:
Hi Aron,
The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.
For more explanation see for example,
Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf
Best,
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Wednesday, March 11, 2015 16:29
To: traminer-users at lists.r-forge.r-project.org
Subject: [Traminer-users] p-values for sequential association rules?
Hi,
When running sequential association rules I get several values in the output:
Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value
1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825
2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825
3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129
From here I understand what support, confidence, and lift are:
http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
Best,
ARon
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From aron.lindberg at case.edu Sat Mar 14 01:21:24 2015
From: aron.lindberg at case.edu (Aron Lindberg)
Date: Fri, 13 Mar 2015 17:21:24 -0700 (PDT)
Subject: [Traminer-users] p-values for sequential association rules?
In-Reply-To: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch>
References: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch>
Message-ID: <1426292483203.9dc2fc39@Nodemailer>
Does that then mean that for a rule to really be considered as interesting, it needs both a lift above 1 and a p-value below 0.05? Or could a rule with a lift above 1 and an insignificant p-value still be of value?
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
On Thu, Mar 12, 2015 at 12:42 AM, Gilbert Ritschard
wrote:
> Lift and implication strength are not equivalent, and can well differ as you have observed.
> Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant.
> Best.
> Gilbert
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
> Sent: Thursday, March 12, 2015 02:18
> To: Users questions
> Cc: Users questions
> Subject: Re: [Traminer-users] p-values for sequential association rules?
> Thanks Gilbert,
> That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant?
> Best,
> Aron
> --
> Aron Lindberg
> Doctoral Candidate, Information Systems
> Weatherhead School of Management
> Case Western Reserve University
> aronlindberg.github.io
> On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote:
> Hi Aron,
> The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.
> For more explanation see for example,
> Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
> Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf
> Best,
> Gilbert
> From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
> Sent: Wednesday, March 11, 2015 16:29
> To: traminer-users at lists.r-forge.r-project.org
> Subject: [Traminer-users] p-values for sequential association rules?
> Hi,
> When running sequential association rules I get several values in the output:
> Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value
> 1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825
> 2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825
> 3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129
> From here I understand what support, confidence, and lift are:
> http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
> However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
> Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
> Best,
> ARon
> --
> Aron Lindberg
> Doctoral Candidate, Information Systems
> Weatherhead School of Management
> Case Western Reserve University
> aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Gilbert.Ritschard at unige.ch Sun Mar 15 16:00:35 2015
From: Gilbert.Ritschard at unige.ch (Gilbert Ritschard)
Date: Sun, 15 Mar 2015 15:00:35 +0000
Subject: [Traminer-users] p-values for sequential association rules?
In-Reply-To: <1426292483203.9dc2fc39@Nodemailer>
References: <66ABD43696E3DB4687E0BB396A76E5F16F3AE0@golf.isis.unige.ch>
<1426292483203.9dc2fc39@Nodemailer>
Message-ID: <66ABD43696E3DB4687E0BB396A76E5F16F6419@golf.isis.unige.ch>
Hi Aron,
There are several papers in the literature that are devoted to the comparison between interestingness measures of association rules. See for example Lenca et al. (2006)
http://www.sciencedirect.com/science/article/pii/S0377221706011465 . On page 619 of that paper there is something about the significance of the lift.
Best,
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Saturday, March 14, 2015 01:21
To: Users questions
Cc: Users questions
Subject: Re: [Traminer-users] p-values for sequential association rules?
Does that then mean that for a rule to really be considered as interesting, it needs both a lift above 1 and a p-value below 0.05? Or could a rule with a lift above 1 and an insignificant p-value still be of value?
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
On Thu, Mar 12, 2015 at 12:42 AM, Gilbert Ritschard > wrote:
Lift and implication strength are not equivalent, and can well differ as you have observed.
Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant.
Best.
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Thursday, March 12, 2015 02:18
To: Users questions
Cc: Users questions
Subject: Re: [Traminer-users] p-values for sequential association rules?
Thanks Gilbert,
That?s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant?
Best,
Aron
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard > wrote:
Hi Aron,
The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.
For more explanation see for example,
Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf
Best,
Gilbert
From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Wednesday, March 11, 2015 16:29
To: traminer-users at lists.r-forge.r-project.org
Subject: [Traminer-users] p-values for sequential association rules?
Hi,
When running sequential association rules I get several values in the output:
Rules Support Conf Lift Standardlift JMeasure ImplicStat p.value
1 (I1) => (I2) 15 0.7142857 0.7482993 -1.1607143 0.45894146 -0.9770084 0.1642825
2 (I2) => (I1) 17 0.8095238 0.8480726 -0.4404762 0.20127953 -0.9770084 0.1642825
3 (I2) => (I3) 17 0.8095238 0.9373434 0.5193644 0.01626898 0.0805823 0.5321129
From here I understand what support, confidence, and lift are:
http://stackoverflow.com/questions/27947556/traminerseqerules-help-page
However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.
Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?
Best,
ARon
--
Aron Lindberg
Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From elie.chosson at upmf-grenoble.fr Tue Mar 17 14:23:55 2015
From: elie.chosson at upmf-grenoble.fr (ELIE CHOSSON)
Date: Tue, 17 Mar 2015 14:23:55 +0100 (CET)
Subject: [Traminer-users] Method options in seqdist functions: "OMloc" and
others...
In-Reply-To: <1997679886.2243354.1426596532614.JavaMail.zimbra@upmf-grenoble.fr>
Message-ID: <1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr>
Hello TraMineR community!
I work on Optimal Matching Analysis and I'm interested about extensions of the method.
In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.).
(For example here)
But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package).
I install the development version but the seqdist function still not provide this options.
So my question is: there is these options availables somewhere now?
If not, I would like to create variable indel costs but it seem to be hard to do...
I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...)
I thank you for your answers, and I thank a lot all contributors to TraMineR.
Sorry for my english,
Kind regards,
--
Elie Chosson,
Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R),
Centre de Recherche en ?conomie de Grenoble (CREG)
BATEG, Bureau 513, tel: 04 76 82 59 89
BP 47 - 38040 Grenoble Cedex 9
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Matthias.Studer at unige.ch Tue Mar 17 21:28:45 2015
From: Matthias.Studer at unige.ch (Matthias Studer)
Date: Tue, 17 Mar 2015 20:28:45 +0000
Subject: [Traminer-users] Method options in seqdist functions: "OMloc"
and others...
In-Reply-To: <1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr>
References: <1997679886.2243354.1426596532614.JavaMail.zimbra@upmf-grenoble.fr>
<1953565023.2273114.1426598635194.JavaMail.zimbra@upmf-grenoble.fr>
Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7166C556D@kilo.isis.unige.ch>
Dear Elie,
Unfortunately, we are a little bit late in our release plan, because some special cases still need some work (such as missing data handling and unequal sequence length for some distances/configurations of parameters).
We have decided to make all these distances measures available in a separate package until the we can merge them in TraMineR. This is a development package, so please note that you may experiences some errors code for some cases.
The package is now available as the seqdist2 library here: https://r-forge.r-project.org/R/?group_id=743
More information about the argument and all the distances can be found in
Studer, Matthias, Ritschard, Gilbert, A comparative review of sequence dissimilarity measures, LIVES Working Paper 2014/33
Best,
Matthias
De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de ELIE CHOSSON
Envoy? : mardi 17 mars 2015 14:24
? : traminer-users at lists.r-forge.r-project.org
Objet : [Traminer-users] Method options in seqdist functions: "OMloc" and others...
Hello TraMineR community!
I work on Optimal Matching Analysis and I'm interested about extensions of the method.
In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.).
(For example here)
But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package).
I install the development version but the seqdist function still not provide this options.
So my question is: there is these options availables somewhere now?
If not, I would like to create variable indel costs but it seem to be hard to do...
I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...)
I thank you for your answers, and I thank a lot all contributors to TraMineR.
Sorry for my english,
Kind regards,
--
________________________________
Elie Chosson,
Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R),
Centre de Recherche en ?conomie de Grenoble (CREG)
BATEG, Bureau 513, tel: 04 76 82 59 89
BP 47 - 38040 Grenoble Cedex 9
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From elie.chosson at upmf-grenoble.fr Wed Mar 18 12:32:31 2015
From: elie.chosson at upmf-grenoble.fr (ELIE CHOSSON)
Date: Wed, 18 Mar 2015 12:32:31 +0100 (CET)
Subject: [Traminer-users] Traminer-users Digest, Vol 44, Issue 12
In-Reply-To:
References:
Message-ID: <1484524036.2838167.1426678351297.JavaMail.zimbra@upmf-grenoble.fr>
Wow!
I'm going to test this development package, and of course with all precaution for use.
Luckily i have not missing values neither different sequence's lengths in my data...
Thank you very much,
Best,
Elie
----- Mail original -----
De: traminer-users-request at lists.r-forge.r-project.org
?: traminer-users at lists.r-forge.r-project.org
Envoy?: Mercredi 18 Mars 2015 12:00:04
Objet: Traminer-users Digest, Vol 44, Issue 12
Send Traminer-users mailing list submissions to
traminer-users at lists.r-forge.r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
or, via email, send a message with subject or body 'help' to
traminer-users-request at lists.r-forge.r-project.org
You can reach the person managing the list at
traminer-users-owner at lists.r-forge.r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Traminer-users digest..."
Today's Topics:
1. Method options in seqdist functions: "OMloc" and others...
(ELIE CHOSSON)
2. Re: Method options in seqdist functions: "OMloc" and
others... (Matthias Studer)
----------------------------------------------------------------------
Message: 1
Date: Tue, 17 Mar 2015 14:23:55 +0100 (CET)
From: ELIE CHOSSON
To: traminer-users at lists.r-forge.r-project.org
Subject: [Traminer-users] Method options in seqdist functions: "OMloc"
and others...
Message-ID:
<1953565023.2273114.1426598635194.JavaMail.zimbra at upmf-grenoble.fr>
Content-Type: text/plain; charset="utf-8"
Hello TraMineR community!
I work on Optimal Matching Analysis and I'm interested about extensions of the method.
In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.).
(For example here)
But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package).
I install the development version but the seqdist function still not provide this options.
So my question is: there is these options availables somewhere now?
If not, I would like to create variable indel costs but it seem to be hard to do...
I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...)
I thank you for your answers, and I thank a lot all contributors to TraMineR.
Sorry for my english,
Kind regards,
--
Elie Chosson,
Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R),
Centre de Recherche en ?conomie de Grenoble (CREG)
BATEG, Bureau 513, tel: 04 76 82 59 89
BP 47 - 38040 Grenoble Cedex 9
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
Message: 2
Date: Tue, 17 Mar 2015 20:28:45 +0000
From: Matthias Studer
To: Users questions
Subject: Re: [Traminer-users] Method options in seqdist functions:
"OMloc" and others...
Message-ID:
<367AEF503B1B6A4EA602FB66D71A3EC7166C556D at kilo.isis.unige.ch>
Content-Type: text/plain; charset="utf-8"
Dear Elie,
Unfortunately, we are a little bit late in our release plan, because some special cases still need some work (such as missing data handling and unequal sequence length for some distances/configurations of parameters).
We have decided to make all these distances measures available in a separate package until the we can merge them in TraMineR. This is a development package, so please note that you may experiences some errors code for some cases.
The package is now available as the seqdist2 library here: https://r-forge.r-project.org/R/?group_id=743
More information about the argument and all the distances can be found in
Studer, Matthias, Ritschard, Gilbert, A comparative review of sequence dissimilarity measures, LIVES Working Paper 2014/33
Best,
Matthias
De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de ELIE CHOSSON
Envoy? : mardi 17 mars 2015 14:24
? : traminer-users at lists.r-forge.r-project.org
Objet : [Traminer-users] Method options in seqdist functions: "OMloc" and others...
Hello TraMineR community!
I work on Optimal Matching Analysis and I'm interested about extensions of the method.
In searching on the Internet, i found that the seqdist function provides methods like "OMloc" for localized optimal matching analysis (like the Hollister proposal), or others methods (duration sensitive OM, etc.).
(For example here)
But this options are not accepted when I use the seqdist function (on the 1.8-9 version of TraMineR package).
I install the development version but the seqdist function still not provide this options.
So my question is: there is these options availables somewhere now?
If not, I would like to create variable indel costs but it seem to be hard to do...
I try to use seqdistmc function with a channel which is my real sequence objet and a new sequence object who give for each position an indicator that taking account neighbouring elements ("0" if all neighbours are others state, "1" if one neighbour is other state and "0" if all neighbours are same state). But I don't think is a good way to do (and it's not conform to the Hollister proposal...)
I thank you for your answers, and I thank a lot all contributors to TraMineR.
Sorry for my english,
Kind regards,
--
________________________________
Elie Chosson,
Attach? Temporaire ? l'Enseignement et ? la Recherche (A.TE.R),
Centre de Recherche en ?conomie de Grenoble (CREG)
BATEG, Bureau 513, tel: 04 76 82 59 89
BP 47 - 38040 Grenoble Cedex 9
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
------------------------------
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
End of Traminer-users Digest, Vol 44, Issue 12
**********************************************
--
Elie Chosson,
Doctorant contractuel,
BATEG, Bureau 513, tel: 04 76 82 59 89
Centre de Recherche en ?conomie de Grenoble (CREG)
BP 47 - 38040 Grenoble Cedex 9
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rcant at sse.com.au Tue Mar 31 07:05:13 2015
From: rcant at sse.com.au (Rosemary Cant)
Date: Tue, 31 Mar 2015 13:05:13 +0800
Subject: [Traminer-users] Use of TraMineR in Australia
Message-ID: <551A2B09.2040100@sse.com.au>
I am a PhD student in the School of Population Health at the University
of Western Australia. I propose to use Dynamic Hamming Analysis to
examine the trajectories/careers in the child protection system from
the time of first report of children reported to the Western Australia
statutory authority for possible child sexual abuse. I am keen to make
contact with researchers in Australia who have used sequence analysis
and/or TraMinerR.
Rosemary Cant
--
Rosemary Cant Social Systems and Evaluation PO Box 8009 Perth Business
Centre 6849 Telephone: 08 93283086 Email: rcant at sse.com.au