[Traminer-users] clustering event sequence data

Wed Jul 11 11:05:49 CEST 2012

This is perfect. Many thanks Hugo!

Kind regards,

Mat

From: traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf Of Hugo Varet
Sent: 09 July 2012 18:33
To: Users questions
Subject: Re: [Traminer-users] clustering event sequence data

Hello Mat,

a few months ago, I wanted to perform clustering of event sequences and Matthias Studer told me how to do this with the TraMineRextras package. I think you can find his message in the archives of the mailing list (he sent it on march 16th 2012).

To extract association rules from event sequences and to get the corresponding hazard ratios, you have to use the seqerulesdisc function available in the same package.

Hope this helps, best regards,

Hugo

2012/7/9 Weldon, Mat <m.weldon at lancaster.ac.uk<mailto:m.weldon at lancaster.ac.uk>>
Hello,

I'm doing a project with a set of criminal histories (ie. Lists of age-stamped offences). Here is an example: oftype is the type of crime, and sid is the subject ID:
        sid        oftype   age
5556.1 5556           B&E    18
5556.2 5556 motor vehicle    18
5556.3 5556 motor vehicle    18
5556.4 5556           B&E    22
5556.5 5556       alcohol    24
5556.6 5556 miscellaneous    29

Since these are events, I'm using the event methods in TraMineR to analyse them. I've created a seqe object, and run a frequent sub-sequence analysis. Here is the top 10:

           Subsequence   Support Count

1            (assault) 0.6261261   417

2  (child molestation) 0.6246246   416

3               (rape) 0.5000000   333

4              (theft) 0.4429429   295

5                (B&E) 0.4159159   277

6      (noncontact SO) 0.3963964   264

7       (public order) 0.3858859   257

8            (alcohol) 0.3183183   212

9  (assault)-(assault) 0.3018018   201

10    (assault)-(rape) 0.2882883   192

Computed on 666 event sequences

  Constraint Value

 countMethod  COBJ

I'd like to compute clusters of sequences, either using agnes or pam algorithms, and then run a discriminating sequence analysis on the clusters (as demonstrated by Studer et al. 2010). However, I'm a bit stuck and I haven't been able to find any help in the documentation. I have a few questions:

1.       Is there a function for computing dissimilarity measures, like seqdist, that works with event sequences? Something that I can feed into a clustering algorithm? I don't know how Studer et al. did it because no code was provided.

2.       Is there a way to constrain frequent subsequences to be maximal, in the sense that if "(assault)-(assault)" is frequent then "(assault)" will not be listed, for example?

3.       Is there a way to calculate association rules for sequences using a hazard ratio measure similar to that described in Muller et al. (2010)?

Many thanks in advance. Best wishes,

Mat

Mat Weldon
Department of Mathematics and Statistics
Room B18, Fylde College
Lancaster University
Lancaster, LA1 4YF
Tel: 07929 310475
Email: m.weldon at lancaster.ac.uk<mailto:m.weldon at lancaster.ac.uk>

_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org<mailto:Traminer-users at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20120711/241b0f9c/attachment.html>