[Traminer-users] transition rates as substitution costs with HAM metric in seqdistmc

Mon Feb 7 10:06:26 CET 2011

Hi Florian,

Many thanks for your bug report. The problem you have pointed out also 
affects "seqdist". Actually, seqdist and seqdistmc do not uses 
substitution costs matrix when method="HAM".

This problem will be corrected in the next version of TraMineR which 
should be released soon.

Many thanks again.
All the best.

Matthias Studer

Le 05.02.2011 14:46, Florian Hertel a écrit :
>  Hi all,
>
> first of all: thanks for TraMineR!
>
> I encountered a problem running a multiple sequence analyzes using the 
> HAM metric with substitution costs based on transition rates. I 
> started with creating a substitution cost matrix (scm) for each 
> channel based on the transition rates not allowing the rates to vary 
> over the time. The seqdistmc command (using the HAM metric) fed with 
> my scm reported, that the dimensions of the substitution matrix are 
> not equal to the number of episodes in my sequences. Thus, I created 
> an array  replicating the substitution cost matrix for each possible 
> point in time. From the report, I read that seqdistmc created a 
> "time-varying substitution cost matrix using 1 as a constant value", 
> therefore abandoning my substitution cost matrix. In order to achieve 
> my goal I just used a DHD metric based on my replicated substitution 
> cost matrix which seemed to work fine.
>
> Here a small example:
>
> dat25_1 <- matrix(c("HighCl", "LowCl", "LowCl",
>                   "LowCl", "HighCl", "LowCl",
>                   "HighCl", "LowCl", "HighCl"),
>                   3,3,dimnames=list(c("pers1","pers2","pers3"),
>                   c("class1","class2","class3")))
> dat25_2 <- matrix(c("BMW", "VW", "VW",
>                   "BMW", "BMW", "VW",
>                   "BMW", "VW", "VW"),
>                   3,3,dimnames=list(c("pers1","pers2","pers3"),
>                   c("car1","car2","car3")))
>
> seqdat1 <- seqdef(dat25_1)
> seqdat2 <- seqdef(dat25_2)
>
> subm1 <- seqsubm(seqdat1,method="TRATE",time.varying=FALSE)
> subm2 <- seqsubm(seqdat2,method="TRATE",time.varying=FALSE)
>
> subm1
> subm2
>
>
> # The first tral with error message
> seqdistmc(channels=list(seqdat1,seqdat2),method="HAM", 
> sm=list(subm1,subm2))
>
> subm13 <- array(subm1,c(2,2,3),dimnames(subm1))
> subm23 <- array(subm2,c(2,2,3),dimnames(subm2))
>
> #The second trial, ignoring my sm list but using a constant value of 1 
> for all substitutions.
> seqdistmc(channels=list(seqdat1,seqdat2),method="HAM", 
> sm=list(subm13,subm23))
>
> #The third trial, working fine (hopefully)
> seqdistmc(channels=list(seqdat1,seqdat2),method="DHD", 
> sm=list(subm13,subm23))
>
> From what I understood the first trial did not work because my 
> substitution cost matrix (scm) are only 2-dimensional. But the second 
> trial which should be "OM without indels" did not work out with my scm 
> but seqdistmc chose the default mode instead with a constant 
> substitution cost of 1. Why did the seqdist program ignored my matrix? 
> Is it because the Hamming distance is the number of positions at which 
> the corresponding states differ? If so, why is it then possible to 
> insert an own scm?
>
> Many thanks in advance and apologies for any inconvenience caused by 
> the long message.
>
> Best,
> Florian
>