[Phylobase-devl] Aug. virtual hackathon

Peter D. Cowan pdc at berkeley.edu
Mon Aug 17 03:04:27 CEST 2009

On Sun, Aug 16, 2009 at 07:45:13PM -0400, François Michonneau wrote:
> > [From the PDF]
> >
> > |Tip and internal node labels have now internal names that are
> > |simply the node they are supposed to document. It thus becomes
> > |possible to store labels in any order and it makes assignment of
> > |labels more robust.
> >
> > I'm going to play devils advocate (and profess ignorance) here, what
> > problem does this address?  How was label assignment non-robust
> > before?  Was it an issue of bugs, or was there a design flaw?
> I think it addresses 2 issues:
> 1. It seems to me that in the trunk version, it's difficult to make sure
> that the labels are returned in the correct order or in a way which
> allows the user to know which node is associated with each label. Also,
> it seemed to me that if we start to provide functions to reorder the
> trees then the issue of matching nodes and labels could have been
> complicated. Adding these internal names to the labels provide a more
> transparent way of matching labels than relying on the order of the
> nodes that can be altered by reordering methods.

I don't mean to belabor this, but I still don't quite understand.  My understanding was that node.label was a vector where the index corresponded to the node number. Tree reorder only changes the order of the edge matrix and the vectors of edge.length and edge.label.  It doesn't change the node numbers or the order of the node.label vector.

> 2. The other advantage of using these internal names is that it's going
> to be possible to allow non-unique names for labels and keeping the
> option of having different data associated with nodes named
> identically. 

If I understand the problem here, the primary issues is with associating the data in the first place.  Once data has been associated with a particular node we should be able to keep track of it, right?  

> > Also what changes might need to be made to other code, specifically do you think I'll need to update the plotting code to handle the new label system?
> I think I have implemented most of the changes that the modification of
> the labeling structure implied... at the exception of the plotting code.
> I am not very familiar with this part and I can't really estimate if a
> lot of changes are necessary. However, most of the changes in my branch
> are internal. So, if we decide to go with fm-branch, there shouldn't be
> more change to the plotting code to do.

I'm planning on making some changes to the plotting code anyway, so I should be able to make any required changes, by the end of the hackathon.


> Just to get an idea of possible problems, I found a small bug in both
> branches:
> - in trunk: the tree and the tip labels are matched correctly with their
> data but not the labels from the data
> - in fm-branch: data labels are matched with their data but are not
> aligned with the tree.
> (I made this observation only by looking at the position of olivacea)
> Also in both, the option "show.node.label" doesn't seem to work.
> > |The user can however provide a named vector (the names being the node numbers), 
> > |in which case, the labels will be matched. 
> > 
> > For me the difficulty with all of the node naming issues is keeping track of/figuring out which node number I want to change.  It seems in a case where the nodes already have a name I would want to match that if possible.  If I already have a node name and I want to change it, do I need to figure out the node number first, or can I use the node name that already exists?
> I totally agree. Having the node number associated with the label should
> help.
> We don't have proper methods to get the value or replace a subset of the
> nodes (i.e., [ ] and [ ]<- ). It's still possible to replace the label
> of a node knowing its node number:
> tipLabels(geospiza)[4] <- "G. conirostris"
> or 
> tipLabels(geospiza)["4"] <- "G. conirostris"
> This will work because, the tip labels are currently stored in order.
> However, with internal nodes, only the second option would work. It's
> also not possible to do something like:
> tipLabels(geospiza)["fusca"] <- "G. fusca"
> Not using methods also can lead to bypassing the object validator.
> It seems that this kind of approach is also more S3 than S4. Should we
> still implement them? Should we alternatively (or also) implement
> something like:
> tipLabels(geospiza, 4) <- "G. conirostris"
> and
> tipLabels(geospiza, "conirostris") <- "G. conirostris"
> > If no one is opposed to integrating these changes we should consider doing it before, or at the beginning of the sessions, so that we're all working with the most up-to date version.  I can help with merging if there are any questions.
> Help with merging will be appreciated if it's indeed the direction we
> choose.
> > [snip]
> > 
> > >   It builds without the vignette (R CMD build fm-branch --no-vignettes).
> > > It doesn't check however, but I don't think it's related to my changes
> > > as it fails with the following error message:
> > > * checking PDF version of manual without index ... ERROR
> > > It looks that it comes from a typo in the vignette. I'll try to
> > > investigate the problem.
> > 
> > We should also note that the latest versions of R-devel (2.10), have changed the way docs are built and checked.  Specifically there is not support in R check for documentation of S4 classes and methods.  R check w/ devel reports at least a couple more documentation warnings than it did before.  I'm going to try to get an automated R check running before the hackathon, hopefully with an R daily.
> Sounds good. That would be very useful!
>   -- François
> > Peter

More information about the Phylobase-devl mailing list