From r.c.m.vanaert at tilburguniversity.edu Sun Mar 3 12:55:17 2013 From: r.c.m.vanaert at tilburguniversity.edu (Robbie Aert, van) Date: Sun, 3 Mar 2013 12:55:17 +0100 Subject: [Traminer-users] Script for computing z-values for transition probabilities Message-ID: Dear TraMineR users, In collaboration with Bertolt Meyer, I wrote a script for computing z-values for transition probabilities within the TraMineR package. The formulas described in the book by Bakeman and Gottman (2009, p. 108-111) were used as starting point for the newly developed function. The output that can be obtained with this function is similar to the output of the seqtrate() function; the only difference is that the transition probabilities are now replaced by z-values. Below and attached to this email you can find the script. Best greetings, Robbie van Aert Reference: Bakeman, R., & Gottman, J.M. (1997). Observing interaction: An introduction to sequential analysis (2nd ed.) Cambridge, UK: Cambridge University Press. ################################################ Script ################################################ seqtrate.z <- function(seq_obj) { matrix <- as.matrix(seq_obj) # reformat to matrix # tr <- table(c(matrix[,-ncol(matrix)]), c(matrix[,-1])) transi <- tr[c(3:14), c(3:14)] ### Number of loops are number of rows in matrix. n_loops <- nrow(transi) ### Empty arrays in which the data is going to be stored. res.Xg <- array(NA, dim = c(nrow(transi), ncol(transi))) res.Xt <- array(NA, dim = c(nrow(transi), ncol(transi))) res.Xgt <- array(NA, dim = c(nrow(transi), ncol(transi))) res.Mgt <- array(NA, dim = c(nrow(transi), ncol(transi))) res.Pg <- array(NA, dim = c(nrow(transi), ncol(transi))) res.Pt <- array(NA, dim = c(nrow(transi), ncol(transi))) ### Loops that are used for filling the arrays. for (i in 1:n_loops) { for (k in 1:n_loops) { #################################### res.Xg[i,k] <- sum(transi[i, ]) # Xg = sum observations in Gth row # res.Xt[i,k] <- sum(transi[ ,k]) # Xt = sum observations in Tth column # res.Xgt[i,k] <- transi[i,k] # Xgt = observed freq # res.Mgt[i,k] <- ((res.Xg[i,k] * res.Xt[i,k]) / sum(transi)) # Mgt = expected freq based on data # res.Pg[i,k] <- res.Xg[i,k] / sum(transi) # Pg = Xg divided by total N # res.Pt[i,k] <- res.Xt[i,k] / sum(transi) # Pt = Xt divided by total N # } #################################### } ### Formula for z-values. res.Zgt <- round((res.Xgt - res.Mgt) / (sqrt(res.Mgt*(1 - res.Pg)*(1 - res.Pt))),3) ### Assign column and row names to matrix. column <- colnames(tr.rates) colnames(res.Zgt) <- column row <- rownames(tr.rates) rownames(res.Zgt) <- row return(res.Zgt) } -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Function seqtrate.z().R Type: application/octet-stream Size: 2475 bytes Desc: not available URL: From Matthias.Studer at unige.ch Wed Mar 6 13:49:49 2013 From: Matthias.Studer at unige.ch (Matthias Studer) Date: Wed, 6 Mar 2013 12:49:49 +0000 Subject: [Traminer-users] WeightedCluster: a new library for (sequences) clustering in R Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7B0FF7E@kilo.isis.unige.ch> Dear TraMineR Users, I have the pleasure to announce the first official release of the WeightedCluster R library. This library greatly facilitates the clustering of state's sequences and, more generally, weighted data. The main functionalities of this library include: * Aggregation of identical sequences (in order to save memory and cluster a bigger number of sequences). * Computation of several clustering quality measure. * Methods facilitating the choice of the number of groups and cluster algorithm based on cluster quality measures. * Clustering of weighted data using a distance matrix (for instance, using sampling weights or aggregated sequences). * An optimized PAM clustering algorithm. * Graphical representation of hierarchical clustering of state sequence (you need to install GraphViz http://www.graphviz.org before launching R) The library comes with the "WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R", also available in French. Aside from presenting the library, this manual discusses several important issues when clustering state's sequences (or any other object) in the social sciences, such as cluster validation and the usual sociological assumptions, for instance. A short script (that can be easily reproduced) illustrating the functionalities of the library is available at the WeightedCluster website: http://mephisto.unige.ch/weightedcluster/ or below The library can be installed with the following command (R version 2.15 or higher is mandatory): install.packages("WeightedCluster") library(WeightedCluster) ## To get the manuals, please run: vignette("WeightedCluster") ## complete manual in English vignette("WeightedCluster-fr") ## complete manual in French vignette("WeightedClusterPreview") ## short preview in English Any comments, suggestions or bug reports are very welcome. Kind regards, Matthias Studer ## Loading the library library(WeightedCluster) ## Loading the mvad dataset data(mvad) ## aggregating identical sequence aggMvad <- wcAggregateCases(mvad[, 17:86]) print(aggMvad) uniqueMvad <- mvad[aggMvad$aggIndex, 17:86] ## defining the state sequence object mvad.seq <- seqdef(uniqueMvad, weights=aggMvad$aggWeights) ## Computing Hamming distance between sequence diss <- seqdist(mvad.seq, method="HAM") ## Clustering the sequences using "average" hierarchical clustering ## Here, we need to set the weights (members argument) to account for identical sequence aggregation averageClust <- hclust(as.dist(diss), method="average", members=aggMvad$aggWeights) ## Representing the hierarchical clustering as a tree averageTree <- as.seqtree(averageClust, seqdata=mvad.seq, diss=diss, ncluster=6) ## Graphical representation of the tree (you need to have Graphviz installed before lauchning R) seqtreedisplay(averageTree, type="d", border=NA, showdepth=TRUE) ## Compute several clustering quality measure for partition in 2, 3, 4, ... 10 groups. avgClustQual <- as.clustrange(averageClust, diss, weights=aggMvad$aggWeights, ncluster=10) ## Plot the evolution of the clustering quality according to number of clusters. plot(avgClustQual) ## The same, but using normalized values. plot(avgClustQual, norm="zscore") ## Print the 2 best number of group according to each quality measure summary(avgClustQual, max.rank=2) ## Compute PAM clustering and cluster quality measure for different number of groups (ranging from 2 to 10) pamClustRange <- wcKMedRange(diss, kvals=2:10, weights=aggMvad$aggWeights) ## Print the 2 best number of group according to each quality measure for the PAM clustering summary(pamClustRange, max.rank=2) ## The best clustering was found using average clustering in 5 groups according to ASW (average silhouette width) seqdplot(mvad.seq, group=avgClustQual$clustering$cluster5, border=NA) ## Clustering was made on distinct sequences ## Recover the clustering solution in the original (full) dataset uniqueCluster5 <- avgClustQual$clustering$cluster5 mvad$cluster5 <- uniqueCluster5[aggMvad$disaggIndex] ## Compute association between clustering and father unemployment chisq.test(table(mvad$funemp, mvad$cluster5)) --- Matthias Studer Institut d'?tudes d?mographiques et du parcours de vie et D?partement des sciences ?conomiques Uni-Mail, bureau 5205 40, bd du Pont d'Arve 1211 Gen?ve 4 Tel: +41 22 379 82 15 Fax: +41 22 379 82 99 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.c.m.vanaert at tilburguniversity.edu Thu Mar 7 16:20:11 2013 From: r.c.m.vanaert at tilburguniversity.edu (Robbie Aert, van) Date: Thu, 7 Mar 2013 16:20:11 +0100 Subject: [Traminer-users] Script for computing z-scores for transition probabilities Message-ID: Dear TraMineR users, A couple of days ago, I sent an email with a newly developed function for computing z-scores for transition probabilities. However, there was a problem with handling missing data in the function. I would like to apologize for this and below you can find the adjusted script. Kind regards, Robbie van Aert ###################################################################### SCRIPT ################################################################################# seqtrate.z <- function(seq_obj) { tr.rates <- suppressMessages(round(seqtrate(seq_obj), 2)) matrix <- as.matrix(seq_obj) # reformat to matrix # tr <- table(c(matrix[,-ncol(matrix)]), c(matrix[,-1])) ### Check whether there are missings in the matrix. if('%' %in% colnames(tr) == TRUE) { tr.rev <- tr[-1,-1] } else { tr.rev <- tr } if('*' %in% colnames(tr) == TRUE) { tr.rev <- tr.rev[-1,-1] } ### Number of loops are number of rows in matrix. n_loops <- nrow(tr.rev) ### Empty arrays in which the data is going to be stored. res.Xg <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) res.Xt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) res.Xgt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) res.Mgt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) res.Pg <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) res.Pt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev))) ### Loops that are used for filling the arrays. for (i in 1:n_loops) { for (k in 1:n_loops) { ######################################## res.Xg[i,k] <- sum(tr.rev[i, ]) # Xg = sum observations in Gth row # res.Xt[i,k] <- sum(tr.rev[ ,k]) # Xt = sum observations in Tth column # res.Xgt[i,k] <- tr.rev[i,k] # Xgt = observed freq # res.Mgt[i,k] <- ((res.Xg[i,k] * res.Xt[i,k]) / sum(tr.rev)) # Mgt = expected freq based on data # res.Pg[i,k] <- res.Xg[i,k] / sum(tr.rev) # Pg = Xg divided by total N # res.Pt[i,k] <- res.Xt[i,k] / sum(tr.rev) # Pt = Xt divided by total N # } ######################################## } ### Formula for z-values. res.Zgt <- round((res.Xgt - res.Mgt) / (sqrt(res.Mgt*(1 - res.Pg)*(1 - res.Pt))),3) ### Assign column and row names to matrix. column <- colnames(tr.rates) colnames(res.Zgt) <- column row <- rownames(tr.rates) rownames(res.Zgt) <- row return(res.Zgt) } -------------- next part -------------- An HTML attachment was scrubbed... URL: