[Traminer-users] error splitting sts Object

Chris Cameron cjc73 at cornell.edu
Thu Mar 15 20:13:11 CET 2012


I think Alexis was correct in saying " The vector containing group membership information should be a standalone vector and should by no way be added as a further column to your sequence object." Please examine this further, as the code below demonstrates that you are introducing an error in your sequences even if it is not the source of your particular error message. 

Try not appending the cluster labels vector back into the sequence object. This definitely changes the sequences in the subsets (in my example and testing). Though this is not apparent in your code, I am not sure if your summary(stsNWObject) was generated before or after you added the cluster variable.

In case it helps, I think the nr variable referenced in the error is a variable that refers to the number of rows. You can produce an error message that shows nr in this context by summarizing an empty subset of the sequence object.

# summary(atus.lim[atus.lim$CLUSTER=='foo',])    #Where "foo" is not present in the CLUSTER list.

## atus.lim is the sequence dataset
# atus.lab will be the list of numbers corresponding to the  

# Choose Costs
# Lets suppose that activities that are frequently observed together are more interchangable
sub_cost = seqsubm(atus.lim, method="TRATE")
# if sequence lengths were equal, then 
#indel_cost = 2
indel_cost = .45*max(sub_cost[upper.tri(sub_cost, diag=FALSE)])
sub_cost <= 2*indel_cost  ## Check to see how many subs will not be allowed (they will be deleted and inserted instead)

# Compute Distances with Optimal Matching('OM') and costs
seq_dist = seqdist(atus.lim, method='OM', indel=indel_cost, sm=sub_cost, full.matrix=FALSE)

# seq.cluster <- agnes(seq_dist, diss = TRUE, method = "ward")
# The agnes function does not seem to be working, but we can use hclust
# Using package fastcluster with overwritten hclust
seq.cluster <- hclust(seq_dist, method = "ward") 
plot(seq.cluster)


# This creates 3 clusters and produces atus.lab, which I think is what you want stsNWObject$CLUSTER to be
seq.c <- cutree(seq.cluster, k = 3)
atus.lab <- factor(seq.c, labels = paste("c", 1:3))

# Make a subset:
atus.c1 = atus.lim[atus.lim$CLUSTER=='c 1',]
summary(atus.c1)

 [>] sequence object created with TraMineR version 1.8-1 
 [>] 622 sequences in the data set, 619 unique 
 [>] min/max sequence length: 7/12
 [>] alphabet (state labels):  
     1=1 (Sleep)
     2=2 (Groom)
     3=3 (Eat)
     4=4 (Help)
     5=5 (Chores)
     6=6 (Work)
     7=7 (Local)
     8=8 (Relax)
 [>] dimensionality of the sequence space: 84 
 [>] colors: 1=#7FC97F 2=#BEAED4 3=#FDC086 4=#FFFF99 5=#386CB0 6=#F0027F 7=#BF5B17 8=#666666 
 [>] symbol for void element: %

# Using your method of appending the cluster column to the sequence data
# Note this changes the length and dimensionality of the sequences!
atus.lim$CLUSTER <- factor(seq.c, labels = paste(1:3))
atus.c1 = atus.lim[atus.lim$CLUSTER==1,]
summary(atus.c1)

 [>] sequence object created with TraMineR version 1.8-1 
 [>] 622 sequences in the data set, 619 unique 
 [>] min/max sequence length: 8/13
 [>] alphabet (state labels):  
     1=1 (Sleep)
     2=2 (Groom)
     3=3 (Eat)
     4=4 (Help)
     5=5 (Chores)
     6=6 (Work)
     7=7 (Local)
     8=8 (Relax)
 [>] dimensionality of the sequence space: 91 
 [>] colors: 1=#7FC97F 2=#BEAED4 3=#FDC086 4=#FFFF99 5=#386CB0 6=#F0027F 7=#BF5B17 8=#666666 
 [>] symbol for void element: % 

seq.c <- cutree(seq.cluster, k = 10)
atus.lim$CLUSTER <- factor(seq.c, labels = paste(1:10))
atus.c1 = atus.lim[atus.lim$CLUSTER==1,]
summary(atus.c1)


On Mar 15, 2012, at 1:10 PM, Hadrien Commenges wrote:

> Hi,
> 
> I've created a sts object with the seqdef function and I'd like to split this object by a factor (cluster). I canuse some functions with the "group=" option, but I need to work with smaller objects and I really want to split the big sts object into several ones. When I split this object I get an error and I can't compute or plot anything with the splitted object.
> 
> Here is the summary of the big sts object:
> 
> > 
> summary(stsNWObject)
> 
>  [>] sequence object created with TraMineR version 1.8-1
>  [>] 22064 sequences in the data set, 19745 unique
>  [>] min/max sequence length: 288/288
>  [>] alphabet (state labels):
>      1=Hout (Home outside)
>      2=Ad (Adjacent)
>      3=Ne (Near)
>      4=Fa (Far)
>      5=Hoho (Home home)
>      6=Tr (Trip)
>  [>] dimensionality of the sequence space: 1440
>  [>] colors: 1=#A1D99B 2=#41AB5D 3=#006D2C 4=#00441B 5=#E5F5E0 6=#000000
>  [>] symbol for missing state: * 
> 
> 
> 
> And here is how I create the splitted object and the error I get:
> 
> > stsNWObject1 <- stsNWObject[stsNWObject
> $CLUSTER==1, ]
> 
> > 
> summary(stsNWObject1)
> 
>  [>] sequence object created with TraMineR version 1.8-1
>  [>] 7862 sequences in the data set, 6438 unique
>  [>] min/max sequence length: 288/288
>  [>] alphabet (state labels):
>      1=Hout (Home outside)
>      2=Ad (Adjacent)
>      3=Ne (Near)
>      4=Fa (Far)
>      5=Hoho (Home home)
>      6=Tr (Trip)
>  [>] dimensionality of the sequence space: 1440
>  [>] colors: 1=#A1D99B 2=#41AB5D 3=#006D2C 4=#00441B 5=#E5F5E0 6=#000000
> 
> Error in if (any(object == nr)) { : missing value where TRUE/FALSE needed
> 
> 
> Could anyone help me to understand and resolve this error.
> 
> Thanks, 
> 
> Hadrien
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users



More information about the Traminer-users mailing list