[Phylobase-devl] labelling order

Sun Dec 28 13:24:05 CET 2008

Here are my 2 cents concerning labelling order.

Instead of relying on the order in which the labels for the nodes
(internal and tips) are stored I would argue that by also storing the
node number it would make things more robust (and potentially more
flexible). I can see it could help at least in the 3 cases I describe
below. To illustrate what I propose here is an example of @tip.labels
for geospiza:

@tip.labels
           1              2              3              4              
"fuliginosa"       "fortis" "magnirostris"  "conirostris"

           5              6              7              8              
"difficilis"      "pallida"     "parvulus"   "psittacula" 

         9               10             11             12
"scandens"         "pauper"   "Platyspiza"        "fusca" 

            13           14
"Pinaroloxias"   "olivacea"           

As Thibault pointed out in his email, it would indeed be similar to what
getnodes() currently returns. However, currently, getnodes() relies on
on the order of the labels to determine the node number and vice versa.
By using this system, getnodes() -or getNode()- could then just return
the appropriate combination of label and node.

1. non-unique labels
If in the future we allow node labels to be non unique, we will still be
able to track down the correct node by its number. This would be crucial
to not break data association and to reassign a name to the correct
node. 

2. order of labels independent of tree order
Steve said:
> Say I want labels for nodes 5 and 6. Where do those labels go? i.e.  
> what does labels() look like for this edge matrix, and how do we  
> reorder this tree for plotting or traversal? What about after we root
> this tree at node 5? 

If labels are associated with their node number, then it would be
possible to identify the labels associated with node 5 and 6 (for
instance), no matter the order of the node in the tree (or in the edge
matrix). In addition, we can modify labels<- to specify the node number
that we want to replace. If an user does something like
labels(geospiza)[5] <- "foo"
we can make sure that the number 5 is the node with the identifier 5
(and not the 5th element of the vector returned by labels, which then
can become arbitrarily). Furthermore, if the user tries something like:
labels(geospiza, "node")[5] <- "N05" 
or
nodeLabels(geospiza)[5] <- "N05"
we can return an error message because the node with the identifier 5 is
not an internal node.

3. association with data more robust
If we could rely on node numbers to match the data with the tree, then
we minimize the risk of error when we coerce the objects to a data frame
for display or export purposes (print, tdata, etc.). Using node numbers
to match the data rather than the names would also allow non-unique and
missing labels.

What I propose is not really a new functionality. I think it would just
make things more robust by not having to rely on the order in which
labels are stored compared to the order of the node in the tree/edge
matrix. I don't think it would bring much change for the things that are
currently implemented but would allow in the future (perhaps for 0.6) to
have more flexibility in the way we deal with labels by allowing
non-unique and missing labels.

I haven't given as much thoughts to edge labels so I don't know if we
could/should do something similar with them.