[Phylobase-devl] accessor and replace method consistency

Sat Sep 5 23:09:10 CEST 2009

On Fri, 2009-09-04 at 17:37 -0700, Jim Regetz wrote:
> Hi all,
> 
> I promise this will be my last biggie proposal before the release.
> 
> We've found and fixed a number of bugs resulting from misaligning the 
> elements of various slots (labels and nodes, edges and lengths, nodes 
> and data, etc). I've noticed a few new ones that I haven't reported yet 
> (in plot, prune, and tdata). Partly these occur because we sometimes 
> assume too much about slot data when accessing them directly via the @ 
> operator -- assumptions that tend to be true of the test data we usually 
> use, but aren't in fact enforced. And partly this is because the 
> accessor methods themselves are not consistent across slots (and have 
> changed over time).
> 
> Rather than go with a band-aid approach, I think we should step back and 
> do a little accessor cleanup. It won't be that hard -- I'm happy to make 
> the changes, test things, and fix any downstream functions/methods that 
> break as a result. If we go with this, then I would recommend we use 
> accessors more routinely in the code. If in some case someone elects to 
> circumvent the getter/setter methods for performance reasons, fine, but 
> tread carefully (and consider adding a code comment for clarification).

I have been trying to use accessors as much as possible in my code. I
had to use the slots directly when I updated the conversion method from
phylo4 to phylo, because it didn't work when edges were stored
pruningwise. (I wouldn't have found this without the testing framework,
thanks for this Jim!). 

> So, for the details. As I understand it, we place no restrictions on the 
> order of elements in the vector slots (tip.label, node.label, 
> edge.length, edge.label) nor on the order of rows in the tabular slots 
> (edge, tip.data, node.data). The association between slots is done via 
> the element names for the label and length slots, row names for the data 
> slots, and of course the actual matrix values for the edge matrix.
> 
> That's fine, and provides some flexibility, but it also imposes extra 
> overhead on most phylobase functions. In lieu of enforcing particular 
> ordering as part of the phylo[4] definition (which we may yet decide to 
> do in the future, for efficiency!), I propose making the following 
> consistent ordering rules a feature of the getter/setter methods.

I agree that some of the methods I rewrote may have a few extra-safety
layers that don't make things terribly efficient. 

> Note 
> that some of these already work this way, especially coming out of 
> Francois' branch, but the point is to really codify and test it:
> 
> * The following should always return the vectors ordered by node ID, in 
> ascending order:
> 
> nodeId
> nodeType (already does)
> labels
> tipLabels
> nodeLabels
> 
> This means the ordering will *always* correspond to 1:nTips for tips, 
> (nTips+1):(nTips+nNodes) for internal nodes, and 1:(nTips+nNodes) for 
> the 'all' case. In the case of missing labels, the accessor should still 
> return a complete vector of NAs.
> 
> * Similarly, tdata (whether tip, internal, or all) should always return 
> a data frame with rows ordered by node ID, in ascending order.
> 
> * edge should always returns the edge matrix as-is, because the actual 
> row order can matter for traversal methods.
> 
> * edgeLength and edgeLabel should always return the vectors in an order 
> that matches the rows of the current edge matrix. As with node labels, 
> in the case of missing edge lengths or labels, the accessor should still 
> return a complete vector of NAs.
> 
> * For Replace methods, when names are not used, the replacement ordering 
> should corresponding directly to accessor ordering (and not the existing 
> internal ordering). For example, even if the tip.label slot is:
> 
> x at tip.label
>     2    3    1
> "t2" "t3" "t1"
> 
> then tipLabels(x) <- c("tA", "tB", "tC") should still produce this:
> 
> x at tip.label
>     1    2    3
> "tA" "tB" "tC"
> 
> It already does, happily, but not all of the setters are like this.

I agree with all this.

> Incidentally, if we modify nodeId as I proposed, then the pretty print 
> method would show the object with tips first, then the internal nodes. I 
> realize that's different from what we have now, and maybe I missed an 
> earlier discussion about this, but I would be totally fine with that. Of 
> course, if folks always want internal nodes first, then tips, that's 
> fine, and we can tweak the print method accordingly. Either way would be 
> better than now IMHO, as the printed order can change when the edge 
> order changes (even with the pretty method).

I think nodeId determines the order of many things. Starting by this
function should fix a few things. I'm not sure why internal nodes are
first. I don't really mind either way.

> Whew. I'm happy to debate these (and can certainly change my mind on 
> details), but in any case I think it would be a boon for developers and 
> users alike to have more consistency on these fronts.

I agree.

I can help with the changes (mostly in the evening). Maybe we can
organize an IRC meeting to distribute the tasks? Tuesday, 11am EST?

  Cheers,
  -- François

> 
> Thoughts?
> 
> Jim
> 
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20090905/1fcbfab2/attachment.pgp