From r.c.m.vanaert at tilburguniversity.edu  Sun Mar  3 12:55:17 2013
From: r.c.m.vanaert at tilburguniversity.edu (Robbie Aert, van)
Date: Sun, 3 Mar 2013 12:55:17 +0100
Subject: [Traminer-users] Script for computing z-values for transition
	probabilities
Message-ID: <CAFgFF7o3A01Ns=2jmXJxknYpU0ensyD66zLfDeW-QHpf=ePi1A@mail.gmail.com>

Dear TraMineR users,

In collaboration with Bertolt Meyer, I wrote a script for computing
z-values for transition probabilities within the TraMineR package. The
formulas described in the book by Bakeman and Gottman (2009, p. 108-111)
were used as starting point for the newly developed function. The output
that can be obtained with this function is similar to the output of the
seqtrate() function; the only difference is that the transition
probabilities are now replaced by z-values.

Below and attached to this email you can find the script.

Best greetings,
Robbie van Aert

Reference:

Bakeman, R., & Gottman, J.M. (1997). Observing interaction: An introduction
to sequential analysis (2nd ed.) Cambridge, UK: Cambridge University Press.

################################################ Script
################################################

seqtrate.z <- function(seq_obj) {
  matrix <- as.matrix(seq_obj)                            # reformat to
matrix #
  tr <- table(c(matrix[,-ncol(matrix)]), c(matrix[,-1]))
  transi <- tr[c(3:14), c(3:14)]

  ### Number of loops are number of rows in matrix.
  n_loops <- nrow(transi)

  ### Empty arrays in which the data is going to be stored.
  res.Xg <- array(NA, dim = c(nrow(transi), ncol(transi)))
  res.Xt <- array(NA, dim = c(nrow(transi), ncol(transi)))
  res.Xgt <- array(NA, dim = c(nrow(transi), ncol(transi)))
  res.Mgt <- array(NA, dim = c(nrow(transi), ncol(transi)))
  res.Pg <- array(NA, dim = c(nrow(transi), ncol(transi)))
  res.Pt <- array(NA, dim = c(nrow(transi), ncol(transi)))

  ### Loops that are used for filling the arrays.
  for (i in 1:n_loops) {
    for (k in 1:n_loops) {
            ####################################
      res.Xg[i,k] <- sum(transi[i, ])
        # Xg = sum observations in Gth row          #
      res.Xt[i,k] <- sum(transi[ ,k])
        # Xt = sum observations in Tth column     #
      res.Xgt[i,k] <- transi[i,k]
           # Xgt = observed freq                              #
      res.Mgt[i,k] <- ((res.Xg[i,k] * res.Xt[i,k]) / sum(transi))
 # Mgt = expected freq based on data       #
      res.Pg[i,k] <- res.Xg[i,k] / sum(transi)
     # Pg = Xg divided by total N                     #
      res.Pt[i,k] <- res.Xt[i,k] / sum(transi)
       # Pt = Xt divided by total N                      #
    }
                  ####################################
  }
  ### Formula for z-values.
  res.Zgt <- round((res.Xgt - res.Mgt) / (sqrt(res.Mgt*(1 - res.Pg)*(1 -
res.Pt))),3)

  ### Assign column and row names to matrix.
  column <- colnames(tr.rates)
  colnames(res.Zgt) <- column
  row <- rownames(tr.rates)
  rownames(res.Zgt) <- row

  return(res.Zgt)
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20130303/2a273784/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Function seqtrate.z().R
Type: application/octet-stream
Size: 2475 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20130303/2a273784/attachment.obj>

From Matthias.Studer at unige.ch  Wed Mar  6 13:49:49 2013
From: Matthias.Studer at unige.ch (Matthias Studer)
Date: Wed, 6 Mar 2013 12:49:49 +0000
Subject: [Traminer-users] WeightedCluster: a new library for (sequences)
	clustering in R
Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC7B0FF7E@kilo.isis.unige.ch>

Dear TraMineR Users,

I have the pleasure to announce the first official release of the WeightedCluster R library. This library greatly facilitates the clustering of state's sequences and, more generally, weighted data. The main functionalities of this library include:

  *   Aggregation of identical sequences (in order to save memory and cluster a bigger number of sequences).
  *   Computation of several clustering quality measure.
  *   Methods facilitating the choice of the number of groups and cluster algorithm based on cluster quality measures.
  *   Clustering of weighted data using a distance matrix (for instance, using sampling weights or aggregated sequences).
  *   An optimized PAM clustering algorithm.
  *   Graphical representation of hierarchical clustering of state sequence (you need to install GraphViz http://www.graphviz.org before launching R)
The library comes with the "WeightedCluster<http://mephisto.unige.ch/weightedcluster/WeightedCluster.pdf> Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R", also available in French<http://mephisto.unige.ch/weightedcluster/WeightedCluster-fr.pdf>. Aside from presenting the library, this manual discusses several important issues when clustering state's sequences (or any other object) in the social sciences, such as cluster validation and the usual sociological assumptions, for instance.
A short script (that can be easily reproduced) illustrating the functionalities of the library is available at the WeightedCluster website: http://mephisto.unige.ch/weightedcluster/ or below

The library can be installed with the following command (R version 2.15 or higher is mandatory):
install.packages("WeightedCluster")
library(WeightedCluster)
## To get the manuals, please run:
   vignette("WeightedCluster") ## complete manual in English
   vignette("WeightedCluster-fr") ## complete manual in French
   vignette("WeightedClusterPreview") ## short preview in English

Any comments, suggestions or bug reports are very welcome.

Kind regards,
Matthias Studer

## Loading the library
library(WeightedCluster)

## Loading the mvad dataset
data(mvad)

## aggregating identical sequence
aggMvad <- wcAggregateCases(mvad[, 17:86])
print(aggMvad)
uniqueMvad <- mvad[aggMvad$aggIndex, 17:86]

## defining the state sequence object
mvad.seq <- seqdef(uniqueMvad, weights=aggMvad$aggWeights)
## Computing Hamming distance between sequence
diss <- seqdist(mvad.seq, method="HAM")

## Clustering the sequences using "average" hierarchical clustering
## Here, we need to set the weights (members argument) to account for identical sequence aggregation
averageClust <- hclust(as.dist(diss), method="average", members=aggMvad$aggWeights)

## Representing the hierarchical clustering as a tree
averageTree <- as.seqtree(averageClust, seqdata=mvad.seq, diss=diss, ncluster=6)
## Graphical representation of the tree (you need to have Graphviz installed before lauchning R)
seqtreedisplay(averageTree, type="d", border=NA,  showdepth=TRUE)

## Compute several clustering quality measure for partition in 2, 3, 4, ... 10 groups.
avgClustQual <- as.clustrange(averageClust, diss, weights=aggMvad$aggWeights, ncluster=10)

## Plot the evolution of the clustering quality according to number of clusters.
plot(avgClustQual)

## The same, but using normalized values.
plot(avgClustQual, norm="zscore")

## Print the 2 best number of group according to each quality measure
summary(avgClustQual, max.rank=2)

## Compute PAM clustering and cluster quality measure for different number of groups (ranging from 2 to 10)
pamClustRange <- wcKMedRange(diss, kvals=2:10, weights=aggMvad$aggWeights)

## Print the 2 best number of group according to each quality measure for the PAM clustering
summary(pamClustRange, max.rank=2)

## The best clustering was found using average clustering in 5 groups according to ASW (average silhouette width)
seqdplot(mvad.seq, group=avgClustQual$clustering$cluster5, border=NA)

## Clustering was made on distinct sequences
## Recover the clustering solution in the original (full) dataset
uniqueCluster5 <- avgClustQual$clustering$cluster5
mvad$cluster5 <- uniqueCluster5[aggMvad$disaggIndex]

## Compute association between clustering and father unemployment
chisq.test(table(mvad$funemp, mvad$cluster5))


---
Matthias Studer
Institut d'?tudes d?mographiques et du parcours de vie
et D?partement des sciences ?conomiques
Uni-Mail, bureau 5205
40, bd du Pont d'Arve
1211 Gen?ve 4
Tel: +41 22 379 82 15
Fax: +41 22 379 82 99

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20130306/ec153603/attachment.html>

From r.c.m.vanaert at tilburguniversity.edu  Thu Mar  7 16:20:11 2013
From: r.c.m.vanaert at tilburguniversity.edu (Robbie Aert, van)
Date: Thu, 7 Mar 2013 16:20:11 +0100
Subject: [Traminer-users] Script for computing z-scores for transition
	probabilities
Message-ID: <CAFgFF7qS56aN5JBw7GM0Ysj33hUuP4s3hUAtxD_dBna-hgLU+w@mail.gmail.com>

Dear TraMineR users,

A couple of days ago, I sent an email with a newly developed function for
computing z-scores for transition probabilities. However, there was a
problem with handling missing data in the function. I would like to
apologize for this and below you can find the adjusted script.

Kind regards,
Robbie van Aert


######################################################################
SCRIPT
#################################################################################

seqtrate.z <- function(seq_obj) {
  tr.rates <- suppressMessages(round(seqtrate(seq_obj), 2))
  matrix <- as.matrix(seq_obj)                            # reformat to
matrix #
  tr <- table(c(matrix[,-ncol(matrix)]), c(matrix[,-1]))

  ### Check whether there are missings in the matrix.
  if('%' %in% colnames(tr) == TRUE) {
    tr.rev <- tr[-1,-1]
  } else { tr.rev <- tr }
  if('*' %in% colnames(tr) == TRUE) {
    tr.rev <- tr.rev[-1,-1] }

  ### Number of loops are number of rows in matrix.
  n_loops <- nrow(tr.rev)

  ### Empty arrays in which the data is going to be stored.
  res.Xg <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))
  res.Xt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))
  res.Xgt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))
  res.Mgt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))
  res.Pg <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))
  res.Pt <- array(NA, dim = c(nrow(tr.rev), ncol(tr.rev)))

  ### Loops that are used for filling the arrays.
  for (i in 1:n_loops) {
    for (k in 1:n_loops) {
    ########################################
      res.Xg[i,k] <- sum(tr.rev[i, ])
 # Xg = sum observations in Gth row                 #
      res.Xt[i,k] <- sum(tr.rev[ ,k])
 # Xt = sum observations in Tth column             #
      res.Xgt[i,k] <- tr.rev[i,k]
   # Xgt = observed freq                                      #
      res.Mgt[i,k] <- ((res.Xg[i,k] * res.Xt[i,k]) / sum(tr.rev))    # Mgt
= expected freq based on data               #
      res.Pg[i,k] <- res.Xg[i,k] / sum(tr.rev)                           #
Pg = Xg divided by total N                           #
      res.Pt[i,k] <- res.Xt[i,k] / sum(tr.rev)
# Pt = Xt divided by total N                             #
    }
          ########################################
  }

  ### Formula for z-values.
  res.Zgt <- round((res.Xgt - res.Mgt) / (sqrt(res.Mgt*(1 - res.Pg)*(1 -
res.Pt))),3)

  ### Assign column and row names to matrix.
  column <- colnames(tr.rates)
  colnames(res.Zgt) <- column
  row <- rownames(tr.rates)
  rownames(res.Zgt) <- row

  return(res.Zgt)
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20130307/9588b699/attachment.html>