[Phylobase-devl] accessor and replace method consistency
François Michonneau
francois.michonneau at gmail.com
Sat Sep 5 23:09:10 CEST 2009
On Fri, 2009-09-04 at 17:37 -0700, Jim Regetz wrote:
> Hi all,
>
> I promise this will be my last biggie proposal before the release.
>
> We've found and fixed a number of bugs resulting from misaligning the
> elements of various slots (labels and nodes, edges and lengths, nodes
> and data, etc). I've noticed a few new ones that I haven't reported yet
> (in plot, prune, and tdata). Partly these occur because we sometimes
> assume too much about slot data when accessing them directly via the @
> operator -- assumptions that tend to be true of the test data we usually
> use, but aren't in fact enforced. And partly this is because the
> accessor methods themselves are not consistent across slots (and have
> changed over time).
>
> Rather than go with a band-aid approach, I think we should step back and
> do a little accessor cleanup. It won't be that hard -- I'm happy to make
> the changes, test things, and fix any downstream functions/methods that
> break as a result. If we go with this, then I would recommend we use
> accessors more routinely in the code. If in some case someone elects to
> circumvent the getter/setter methods for performance reasons, fine, but
> tread carefully (and consider adding a code comment for clarification).
I have been trying to use accessors as much as possible in my code. I
had to use the slots directly when I updated the conversion method from
phylo4 to phylo, because it didn't work when edges were stored
pruningwise. (I wouldn't have found this without the testing framework,
thanks for this Jim!).
> So, for the details. As I understand it, we place no restrictions on the
> order of elements in the vector slots (tip.label, node.label,
> edge.length, edge.label) nor on the order of rows in the tabular slots
> (edge, tip.data, node.data). The association between slots is done via
> the element names for the label and length slots, row names for the data
> slots, and of course the actual matrix values for the edge matrix.
>
> That's fine, and provides some flexibility, but it also imposes extra
> overhead on most phylobase functions. In lieu of enforcing particular
> ordering as part of the phylo[4] definition (which we may yet decide to
> do in the future, for efficiency!), I propose making the following
> consistent ordering rules a feature of the getter/setter methods.
I agree that some of the methods I rewrote may have a few extra-safety
layers that don't make things terribly efficient.
> Note
> that some of these already work this way, especially coming out of
> Francois' branch, but the point is to really codify and test it:
>
> * The following should always return the vectors ordered by node ID, in
> ascending order:
>
> nodeId
> nodeType (already does)
> labels
> tipLabels
> nodeLabels
>
> This means the ordering will *always* correspond to 1:nTips for tips,
> (nTips+1):(nTips+nNodes) for internal nodes, and 1:(nTips+nNodes) for
> the 'all' case. In the case of missing labels, the accessor should still
> return a complete vector of NAs.
>
> * Similarly, tdata (whether tip, internal, or all) should always return
> a data frame with rows ordered by node ID, in ascending order.
>
> * edge should always returns the edge matrix as-is, because the actual
> row order can matter for traversal methods.
>
> * edgeLength and edgeLabel should always return the vectors in an order
> that matches the rows of the current edge matrix. As with node labels,
> in the case of missing edge lengths or labels, the accessor should still
> return a complete vector of NAs.
>
> * For Replace methods, when names are not used, the replacement ordering
> should corresponding directly to accessor ordering (and not the existing
> internal ordering). For example, even if the tip.label slot is:
>
> x at tip.label
> 2 3 1
> "t2" "t3" "t1"
>
> then tipLabels(x) <- c("tA", "tB", "tC") should still produce this:
>
> x at tip.label
> 1 2 3
> "tA" "tB" "tC"
>
> It already does, happily, but not all of the setters are like this.
I agree with all this.
> Incidentally, if we modify nodeId as I proposed, then the pretty print
> method would show the object with tips first, then the internal nodes. I
> realize that's different from what we have now, and maybe I missed an
> earlier discussion about this, but I would be totally fine with that. Of
> course, if folks always want internal nodes first, then tips, that's
> fine, and we can tweak the print method accordingly. Either way would be
> better than now IMHO, as the printed order can change when the edge
> order changes (even with the pretty method).
I think nodeId determines the order of many things. Starting by this
function should fix a few things. I'm not sure why internal nodes are
first. I don't really mind either way.
> Whew. I'm happy to debate these (and can certainly change my mind on
> details), but in any case I think it would be a boon for developers and
> users alike to have more consistency on these fronts.
I agree.
I can help with the changes (mostly in the evening). Maybe we can
organize an IRC meeting to distribute the tasks? Tuesday, 11am EST?
Cheers,
-- François
>
> Thoughts?
>
> Jim
>
> _______________________________________________
> Phylobase-devl mailing list
> Phylobase-devl at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/phylobase-devl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20090905/1fcbfab2/attachment.pgp
More information about the Phylobase-devl
mailing list