[Phylobase-devl] prune/subset questions
Ben Bolker
bolker at ufl.edu
Wed Sep 2 14:25:09 CEST 2009
Jim Regetz wrote:
> Some follow-up...
>
> Jim Regetz wrote:
>> Hi all,
>>
>> As far as I can tell, the new phylo4 prune method I've written is
>> working just fine, and supports both trim.internal=TRUE and
>> trim.internal=FALSE. It only does subtree=FALSE, more on that below.
>
> I just committed this. It is currently a prune method (for consistency
> with previous organization), but as mentioned below I would prefer to
> migrate it into the subset method.
>
>> Some questions for the group:
>>
>> 1. Is there a compelling reason to keep both subset *and* prune methods?
>> Or is this just a historical artifact? I think the only differences are:
>> (1) you can only pass the trim.internal and subtree arguments to prune,
>> but not subset, and (2) subset accepts tips.include, tips.exclude, mrca,
>> and node.subtree, whereas prune only does tips.exclude. Why not just
>> expose trim.internal and subtree (if desired) via the subset methods,
>> and eliminate prune? Or if someone really wants a prune function, it can
>> simply be an inflexible wrapper for subset, only accepting tips.exclude.
>
> I am increasingly inclined to drop prune as a formal method, making
> subset (and associated "[" extraction) the sole interface to this
> functionality.
>
> Objections?
OK with me.
>
> Going a step further, should we drop all the ape-based phylo subsetting
> methods (including pruning and tip dropping)? Seems like ape
> import/export are a sufficient link; maintaining additional coupling
> through wrapper functions will mean duplication -- and maintenance -- of
> code and documentation, which could become a headache as both phylobase
> and ape continue to evolve.
Yes, if we can.
>
>> 2. Do we need/want to support a subtree=TRUE option? I haven't worked on
>> this at all. For what it's worth, even using the current ape-based
>> subset method, this option unreliable for phylo4(d):
>>
>> require(phylobase)
>> data(geospiza)
>> geotree <- extractTree(geospiza)
>> prune(geotree, c(1,3), subtree=TRUE)
>> ## Error in checkTree(object) : All labels must be unique
>> ## In addition: Warning message:
>> ## In asMethod(object) : trees with unknown order may be unsafe in ape
>>
>> Here it's because the resulting tree would have two tip labels called
>> "[1_tips]". Anyway, I would be happy with leaving subtree as a future
>> feature possibility for now.
Also fine with me.
>
> Still not done. Currently, if you use subtree=TRUE, you get routed
> through phylo coercion and ape::drop.tip, as before. Of course, this
> won't be possible any longer if we drop ape-based subsetting altogether.
> I don't have a good sense of how much of a priority this is.
I don't know either, since I don't use it that much. Perhaps we
could just make this a future feature and see if any of our users
(ha!) yells for it.
>
>> 3. Any opinions on dealing with root edge length during subsetting? The
>> current method (using ape::drop.tip) just loses that information. In the
>> new method, the root edge essentially accumulates the edges associated
>> with any singletons that form along it as a consequence of the pruning.
>> Of course, that could make for a long root edge when retaining just two
>> closely related species in a large tree. Alternatively, albeit somewhat
>> arbitrarily, we could make it be the length of the edge connecting the
>> new root to its parent node in the original tree. Of course, this could
>> also be computed after the fact, e.g. with:
>>
>> edgeLength(phy, MRCA(phy, tips.included))
>>
>> where phy was the full (pre-subset) tree.
>
> Currently implemented as described. In case someone is interested in
> obtaining the alternative root edge length, I can just add that line of
> code to the example in the documentation.
Fine with me.
>
>> 4. This new method was initially kinda slow, but mostly because it makes
>> a bunch of descendants() calls in one part, and that can be slow. So I
>> rewrote descendants() to use a (very simple) C function that works on a
>> preordered edge matrix, which helps a lot with speed. I'll commit if
>> this are no objections. The new subset is still slower than ape's
>> drop.tip, but not horribly so.
>
> I've since modified phylo4 subset/prune to reduce the reliance on
> descendants(), but I can still commit the C-based descendants
> replacement. The subsetting does, however, now require at least one
> reorder(), which is fine for small trees, but slow for large ones. So I
> wrote a C function for reorder() as well (thanks to Peter for the push
> in that direction). There are actually two alternative C functions, one
> faster but only called if the tree is a true binary tree (no singletons
> or polytomies), and the other slower but usable on any valid phylo4 tree
> (I think). Performance of both is no better than the pure R reorder on
> small trees, maybe even trivially slower, but much better on large trees.
>
> Are folks fine with having these things reimplemented as compiled C
> functions? What I've done so far is pretty simple, and uses only the .C
> interface, for what it's worth. I could imagine writing an analogous C
> function for ancestors(), but that's probably enough at least for the
> time being. Opinions?
>
This is *amazing*. Thanks!
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://lists.r-forge.r-project.org/pipermail/phylobase-devl/attachments/20090902/faeb46bb/attachment.pgp
More information about the Phylobase-devl
mailing list