[Traminer-users] Event sequence analysis

Wed Mar 9 12:44:33 CET 2016

Hi everyone
For my PhD-reseach into self-regulation of learners in online nad blended learning environments I try to get more insight in the events and its sequences that occur when learners use a Moodle learning environment. The dataset consists of 80 learners (id), 60.000 lines of timestamped events (events) and 40 different events (variables). I'm particularly interested in the "seqefsub" and "seqecmpgroup" functions.
I ran some tests with my data using these functions, everything works fine if I use the command below:
fsubseq <- seqefsub(Data.seqe, pMinSupport = 0.05, maxK=2)
>From the moment I want do increase the pMinSupport and/or maxK R (3.2.1-3) crashes. I assumed that this problem occurred due to the shortage of RAM and general capacity of my computer. Therefore I turned to a HPC solution. If I run the code as shown above it works fine on the HPC as well. If I change the "fsubseq" function parameters to maxK=4 (4 event sequences), leave the pMinSupport at 0.05 and I run it on the HPC with 20GB of RAM for 10:00:00 hours, no results are show due to the task is killed after this time. As 10h at 20GB of RAM, is quite some calculation power I find it strange that also R seems to jam over there. Especially, because in the end I would like to evolve to running the "optimal" script involving (more or less) every possible event sequence as shown in "rscript.R" below.
I looked at the problem from different angles (limited, because I do know something about statistics and computers but not as much as people who are experts in this), but I seem to be stuck for the moment. Therefore I contact you.
I was wondering if you (or researchers you know) ever had experiences with (1) running TraMineR for this kind of (large?) TimeStamped Event datasets? And (2) running it on such a high performance cluster?
Thank you very much in advance! If there might be additional information needed, please let me know!
Best wishes
Stijn Van Laer
The my.calc.PBS file  to run the job from the HPC looks like:
#!/bin/bash -l
module load R
cd $PBS_O_WORKDIR
Rscript --vanilla rscript.R
The rcsript.R file to run in R looks like:
#!/usr/bin/Rscript --vanilla --slave
library(TraMineR)
Data<-read.csv("sequence_data_cvod.csv",header=TRUE)
Labels<-read.csv("clust_cvod.csv",header=TRUE)
Data.seqe<-seqecreate(Data,id=Data$Id,timestamp=Data$Timestamp,event=Data$Event)
fsubseq <- seqefsub(Data.seqe, pMinSupport = 0.01)
sink("fsubseq.txt", split=TRUE)
fsubseq
sink()
discrcohort <- seqecmpgroup(fsubseq, group = Labels$cluster,     pvalue.limit=0.0005,weighted=TRUE,method = "chisq")
sink("discrcohort.txt", split=TRUE)
discrcohort
sink()


Centrum voor Instructiepsychologie en -Technologie | Center for Instructional Psychology and Technology
Dekenstraat 2 | Post box 3773 | B-3000 Leuven | Room 05.69

Tel.: (+32) (0)16 32 82 76
E-mail: Stijn.vanlaer at ppw.kuleuven.be<mailto:Stijn.vanlaer at ppw.kuleuven.be>
LinkedIn: https://www.linkedin.com/in/stijnvanlaer
Twitter: https://twitter.com/Stijn_Van_Laer
Url: http://ppw.kuleuven.be/home/english/research/etrg/CIPT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20160309/81b8fa70/attachment.html>