[Traminer-users] p-values for sequential association rules?

Gilbert Ritschard Gilbert.Ritschard at unige.ch
Sun Mar 15 16:00:35 CET 2015


Hi Aron,

There are several papers in the literature that are devoted to the comparison between interestingness measures of association rules. See for example Lenca et al. (2006)
http://www.sciencedirect.com/science/article/pii/S0377221706011465 . On page 619 of that paper there is something about the significance of the lift.

Best,
Gilbert



From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Saturday, March 14, 2015 01:21
To: Users questions
Cc: Users questions
Subject: Re: [Traminer-users] p-values for sequential association rules?

Does that then mean that for a rule to really be considered as interesting, it needs both a lift above 1 and a p-value below 0.05? Or could a rule with a lift above 1 and an insignificant p-value still be of value?

--
Aron Lindberg

Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io


On Thu, Mar 12, 2015 at 12:42 AM, Gilbert Ritschard <Gilbert.Ritschard at unige.ch<mailto:Gilbert.Ritschard at unige.ch>> wrote:
Lift and implication strength are not equivalent, and can well differ as you have observed.


Lift measures whether the chance to observe the conclusion increases when the condition holds, why the implication strength measures whether the number of counter-examples decreases when the condition holds. A significant p-value means that the latter decrease is significant.


Best.
Gilbert


From: traminer-users-bounces at lists.r-forge.r-project.org<mailto:traminer-users-bounces at lists.r-forge.r-project.org> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Thursday, March 12, 2015 02:18
To: Users questions
Cc: Users questions
Subject: Re: [Traminer-users] p-values for sequential association rules?


Thanks Gilbert,


That’s very helpful! How do I mesh this with the interpretation of the lift? For example, I have many rules where the lift is below 1 (and then based on assessing the lift is of no interest), but where the p-value is significant. I also have cases which show the reverse scenario, lift > 1, but insignificant p-values. Is it the case that the lift is similar to a coefficient, and then there is some error around it, thus sometimes causing large lift values to be insignificant?


Best,
Aron


--
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io



On Wed, Mar 11, 2015 at 3:09 PM, Gilbert Ritschard <Gilbert.Ritschard at unige.ch<mailto:Gilbert.Ritschard at unige.ch>> wrote:
Hi Aron,


The p-value is that of the implication strength (ImplicStat). This criteria is a z value. The lower ImplicStat (greater negative value), the greater the implication strength of the rule. Your first two rules have an implication strength of about -1, and P(Z<-1) is about .16 for a normal distribution. Likewise, for the 3rd rule, you have an implicative strength of .08, and P(Z<.08) = .53. When the p-value is less than .05, the rule has a significant implicative strength.


For more explanation see for example,
Ritschard, G., V. Pisetta and D.A. Zighed (2008), Inducing and evaluating classification trees with statistical implicative criteria, in R. Gras, E. Suzuki, F. Guillet and F. Spagnolo (eds), Statistical Implicative Analysis: Theory and Applications, Series Studies in Computational Intelligence, Vol. 127. Berlin: Springer, 397-420.
Preprint: http://mephisto.unige.ch/pub/publications/gr/ritsch-pisetta-zighed_bookGras_final_plain.pdf


Best,
Gilbert


From: traminer-users-bounces at lists.r-forge.r-project.org<mailto:traminer-users-bounces at lists.r-forge.r-project.org> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Aron Lindberg
Sent: Wednesday, March 11, 2015 16:29
To: traminer-users at lists.r-forge.r-project.org<mailto:traminer-users at lists.r-forge.r-project.org>
Subject: [Traminer-users] p-values for sequential association rules?


Hi,


When running sequential association rules I get several values in the output:


         Rules Support      Conf      Lift Standardlift   JMeasure ImplicStat   p.value
1 (I1) => (I2)      15 0.7142857 0.7482993   -1.1607143 0.45894146 -0.9770084 0.1642825
2 (I2) => (I1)      17 0.8095238 0.8480726   -0.4404762 0.20127953 -0.9770084 0.1642825
3 (I2) => (I3)      17 0.8095238 0.9373434    0.5193644 0.01626898  0.0805823 0.5321129


From here I understand what support, confidence, and lift are:
http://stackoverflow.com/questions/27947556/traminerseqerules-help-page


However, what does the p-value mean? It seems to be highly and negatively correlated with the confidence, but at the same time I have many sequences with combinations high support, confidence, and lift that still are insignificant.


Hence: what does the p-value pertain to? Can the rules be meaningfully interpreted even with the p-value is insignificant?


Best,
ARon


--
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management
Case Western Reserve University
aronlindberg.github.io



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20150315/29eef99a/attachment-0001.html>


More information about the Traminer-users mailing list