From jeremyr at uga.edu Fri Aug 14 19:57:03 2015 From: jeremyr at uga.edu (Jeremy Reynolds) Date: Fri, 14 Aug 2015 13:57:03 -0400 Subject: [Traminer-users] violation of triangle inequality with future based substitution costs Message-ID: Dear Traminer Experts, I have been using future based substitution costs as described in Studer and Ritschard 2014 and implemented in Traminer to make dissimilarity matrices. Unfortunately, the results often violate the triangle inequality. It seems to depend on what subset of my data I use, but I don't really know why the triangle inequality is violated in some cases and not in others. Is there some appropriate way to fix this (e.g. alter one of the substitution costs manually)? Does the violation suggest that there is a problem somewhere else in my analysis or suggest something about my data? I have pasted the output from my latest analysis below in case it is useful. Thanks, Jeremy > ###########> # Calculate Substitution Costs using future similarity > ###########> > ######> #default lag of 1> ######> future <- seqcost(seq.hc, method="FUTURE", lag=1) [>] creating substitution-cost matrix using common future... [>] computing transition rates for states 1/2/3/4/5 ...> dimnames(future) = list( c("M", "S", "F", "O", "U"), c("M", "S", "F", "O", "U"))> round(future, 4) M S F O U M 0.0000 0.1701 0.4147 0.6304 0.3960 S 0.1701 0.0000 0.2375 0.6125 0.3910 F 0.4147 0.2375 0.0000 0.7700 0.5552 O 0.6304 0.6125 0.7700 0.0000 0.4573 U 0.3960 0.3910 0.5552 0.4573 0.0000 > #largest substitution cost (F vs O)> maxsub <- max (max (future))> maxsub [1] 0.7699533> #smallest substitution cost (M vs S)> minsub <- min(future[lower.tri(future)])> minsub [1] 0.1701276 > ###### > #lag of 2 > ###### > future2 <- seqcost(seq.hc, method="FUTURE", lag=2) [>] creating substitution-cost matrix using common future... [>] computing transition rates for states 1/2/3/4/5 ... > dimnames(future2) = list( c("M", "S", "F", "O", "U"), c("M", "S", "F", "O", "U")) > round(future2, 4) M S F O U M 0.0000 0.1015 0.2740 0.4063 0.2184 S 0.1015 0.0000 0.1692 0.4116 0.2428 F 0.2740 0.1692 0.0000 0.5508 0.3660 O 0.4063 0.4116 0.5508 0.0000 0.2577 U 0.2184 0.2428 0.3660 0.2577 0.0000 > #largest substitution cost (F vs O)> maxsub2 <- max (max (future2))> maxsub2 [1] 0.5508451> #smallest substitution cost (M vs S)> minsub2 <- min(future2[lower.tri(future2)])> minsub2 [1] 0.1014867 > ##########################################> # make distance matrices> ######################################### > disomf1i5 <- seqdistOO(seq.hc, method = "OM", indel = .5*maxsub, sm = future) [>] 10923 sequences with 5 distinct events/states [>] 10923 distinct sequences [>] min/max sequence length: 18/18 [>] computing distances using OM metric [>] total time: 59.34 secsWarning message: [!] at least, one substitution cost doesn't respect the triangle inequality. [!] replacing 1 with 2 (cost=0.1701276) and then 2 with 3 (cost=0.2375203) [!] costs less than replacing directly 1 with 3 (cost=0.4147297) [!] total difference ([1=>2] + [2=>3] - [1=>3]): -0.007081773 > disomf2i5 <- seqdistOO(seq.hc, method = "OM", indel = .5*maxsub2, sm = future2) [>] 10923 sequences with 5 distinct events/states [>] 10923 distinct sequences [>] min/max sequence length: 18/18 [>] computing distances using OM metric [>] total time: 59.04 secsWarning message: [!] at least, one substitution cost doesn't respect the triangle inequality. [!] replacing 1 with 2 (cost=0.1014867) and then 2 with 3 (cost=0.1691506) [!] costs less than replacing directly 1 with 3 (cost=0.27397) [!] total difference ([1=>2] + [2=>3] - [1=>3]): -0.003332678 -- ******************** Dr. Jeremy Reynolds Professor Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthias.Studer at unige.ch Mon Aug 17 14:34:41 2015 From: Matthias.Studer at unige.ch (Matthias Studer) Date: Mon, 17 Aug 2015 12:34:41 +0000 Subject: [Traminer-users] violation of triangle inequality with future based substitution costs In-Reply-To: References: Message-ID: <367AEF503B1B6A4EA602FB66D71A3EC720056F87@kilo.isis.unige.ch> Dear Jeremy, Thank you for pointing this out. In our first version of the paper, we forgot to take the square root of the result, leading to a squared Euclidean distance that do not garantee the triangle inequality. We corrected this issue in the lastest version of the paper published in the Journal of the Royal Statistical Society: Serie A. However, the change was unfortunately not put in the latest development version of seqdist2. Please excuse us for that. We plan to fix in the next upcoming days. Future based costs are based on your data, this is why the results depend on your subset. Best regards, Matthias De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de Jeremy Reynolds Envoy? : vendredi 14 ao?t 2015 19:57 ? : traminer-users at lists.r-forge.r-project.org Objet : [Traminer-users] violation of triangle inequality with future based substitution costs Dear Traminer Experts, I have been using future based substitution costs as described in Studer and Ritschard 2014 and implemented in Traminer to make dissimilarity matrices. Unfortunately, the results often violate the triangle inequality. It seems to depend on what subset of my data I use, but I don't really know why the triangle inequality is violated in some cases and not in others. Is there some appropriate way to fix this (e.g. alter one of the substitution costs manually)? Does the violation suggest that there is a problem somewhere else in my analysis or suggest something about my data? I have pasted the output from my latest analysis below in case it is useful. Thanks, Jeremy -- ******************** Dr. Jeremy Reynolds Professor Department of Sociology 116 Baldwin Hall University of Georgia Athens, GA 30602-1611 Phone: (706) 583-8072 Web: http://uga.edu/soc/people/faculty/reynolds_jeremy.php Fax: (706) 542-4320 -------------- next part -------------- An HTML attachment was scrubbed... URL: