[Phylobase-devl] prune/subset questions
Jim Regetz
regetz at nceas.ucsb.edu
Sat Aug 29 03:33:45 CEST 2009
Hi all,
As far as I can tell, the new phylo4 prune method I've written is
working just fine, and supports both trim.internal=TRUE and
trim.internal=FALSE. It only does subtree=FALSE, more on that below.
Some questions for the group:
1. Is there a compelling reason to keep both subset *and* prune methods?
Or is this just a historical artifact? I think the only differences are:
(1) you can only pass the trim.internal and subtree arguments to prune,
but not subset, and (2) subset accepts tips.include, tips.exclude, mrca,
and node.subtree, whereas prune only does tips.exclude. Why not just
expose trim.internal and subtree (if desired) via the subset methods,
and eliminate prune? Or if someone really wants a prune function, it can
simply be an inflexible wrapper for subset, only accepting tips.exclude.
2. Do we need/want to support a subtree=TRUE option? I haven't worked on
this at all. For what it's worth, even using the current ape-based
subset method, this option unreliable for phylo4(d):
require(phylobase)
data(geospiza)
geotree <- extractTree(geospiza)
prune(geotree, c(1,3), subtree=TRUE)
## Error in checkTree(object) : All labels must be unique
## In addition: Warning message:
## In asMethod(object) : trees with unknown order may be unsafe in ape
Here it's because the resulting tree would have two tip labels called
"[1_tips]". Anyway, I would be happy with leaving subtree as a future
feature possibility for now.
3. Any opinions on dealing with root edge length during subsetting? The
current method (using ape::drop.tip) just loses that information. In the
new method, the root edge essentially accumulates the edges associated
with any singletons that form along it as a consequence of the pruning.
Of course, that could make for a long root edge when retaining just two
closely related species in a large tree. Alternatively, albeit somewhat
arbitrarily, we could make it be the length of the edge connecting the
new root to its parent node in the original tree. Of course, this could
also be computed after the fact, e.g. with:
edgeLength(phy, MRCA(phy, tips.included))
where phy was the full (pre-subset) tree.
4. This new method was initially kinda slow, but mostly because it makes
a bunch of descendants() calls in one part, and that can be slow. So I
rewrote descendants() to use a (very simple) C function that works on a
preordered edge matrix, which helps a lot with speed. I'll commit if
this are no objections. The new subset is still slower than ape's
drop.tip, but not horribly so.
Cheers,
Jim
More information about the Phylobase-devl
mailing list