[Roxygen-devel] S4 implementation of @usage

Fri Aug 31 01:30:41 CEST 2012

  >> Hadley Wickham <hadley at rice.edu>
  >> on Thu, 30 Aug 2012 15:07:37 -0500 wrote:

  >> 
  >> No, no ... that was precisely the main point! It should process each
  >> *tag* at "package level". Then process each *tag* at a "block level" and
  >> finally at "tag level". Here is the complete pseudo code for roxy
  >> package:

  > Hmmmm, that's an interesting approach.

  > I had another idea, and that was that what I was calling a GlobalTag
  > is originally what I was calling a Roccer - i.e. it's something that
  > takes a RoxyPackage as input and a RoxyPackage as output. 

You really seem to be emotionally attached to this Roccer. Roccer is an
object which process RoxyPackage, right? So it is a function or method?
That is it is an action (a processor?). After so much time I am still
confused. GlobalTag is an class (it represents an object). And you are
equating roccer and GlobalTag ... ???

  > Then it would be a clean separation: RoxyTags act locally (at the
  > RoxyBlock level) and Roccers act globally (at the RoxyPackage
  > level).

I sort of see your point. It's useful distinction indeed. But the way
you say is really confusing. RoxyTag is a class, it cannot "act" in S4
(or even in Java or C++) worlds. Methods and functions act on objects.

Really, most of my confusion is terminological; you seem to mix objects,
classes, methods and actions in one big duno-what :)

  > I think it makes sense to make this distinction because the process
  > you outline below doesn't quite fit the common Roccer (e.g. family,
  > inheritParams, include) scenario which typically involves two passes:
  > the first to built up a datastructure that stores information about
  > the global structure, and the second that goes through and modifies
  > individual RocBlocks.

Yes, it does. You roccer is my procPackage(package, tag) and
procPackage(package, block) methods. They are free to make as many
stages as they want. But, often, two stages won't even be necessary, as
things can be constructed incrementally. 

For example @rdname will build a new block incrementally attaching
same-name-blocks to it, @param will build new tag @arguments
incrementally attaching parameters to it, @usage does the same, etc.

  > I also think it seems wrong to have a generic called procPackage - why
  > do you have the name of the class it operates on in the generic name?

:) because the output is RoxyPackage, and it is not acting only on
RoxyPackage, but also on tags and blocks (and potentially on object
itself!). These are all methods dispatched on *three* arguments:

1) These methods return RoxyPackage:
   procPackage(object, package), procPackage(object, package, tag), procPackage(object, package, block)

2) These return RoxyBlock:
   procBlock(object, block), procBlock(object, block, tag)

3) This returns RoxyTag
   procTag(object, tag)

(I didn't include the 'object'' in my pseudocode for simplicity)

It achieves the functional separation which you are talking about and
offers a finer control. Not only a tag can have a global action on a
package, it can have a sub-global action on a block. Also blocks can
have global actions on a package. Isn't that nice?

In my first attempt I used only one generic roxyParse. And it was
confusing because the input type determines the output type and there
was no clear separation. Just replace procXXX in my previous mail with
roxyProcess:

1) These methods return RoxyPackage:
   roxyProcess(object, package), roxyProcess(object, package, tag), roxyProcess(object, package, block)

2) These return RoxyBlock:
   roxyProcess(object, block), roxyProcess(object, block, tag)

3) This returns RoxyTag
   roxyProcess(object, tag)

I am slightly indifferent between the above two naming conventions. Both
work the same, but with the first is a bit less of a brain stretch.

  >> roxygenise <- function(...., ## see below for explanation
  >> processors = c(procRd, procNamespace, procCollate, procMyCoolIndex)){
  >> roxypkg <- parse_all_files(...)
  >> 
  >> roxypkg <- procPackage(roxypkg)
  >> 
  >> for ( proc in processors )
  >> roxypkg <- proc(roxypkg)
  >> 
  >> }

  > I was thinking of something like

  > roxygenise <- function(..., tags, roccers, outputs) {

  >   pkg <- roxy_init(...)  # not an S4 method because it starts the ball rolling
  >   pkg <- roxyProcess(pkg, tags)
  >   pkg <- roxyProcess(pkg, roccers)
  >   roxyWrite(pkg, outputs)
  > }

  > where if not specified tags, roccers and outputs would default to
  > finding all classes that inherited from the appropriate base class.

Don't really follow. tags are what? User supplied tags? How is then
roxyProcess is dispatched on tags?

Isn't roxy_init the tag generator? In my world it does precisely
that. It tokenizes the package into small pieces called blocks which
contain tags (roxy_init instantiates all the tags). Then roxyProcess can
be dispatched on those.

You seem to talk about tags, roccers and outputs as functions
(i.e. actions taking on the pkg). But at the same time you dispatch over
those. It's really confusing. 

May be the best thing is that you give it a try, and build the first
version of the package according to your vision. Then if I still think
that my approach is more parsimonious I will give a rewrite. Afterward
we can talk in more concrete terms.

  >> > I don't think that can work, because combining Rd commands can't work
  >> > at the string level, and it varies from command to command.  i.e. if
  >> > you have multiple @keywords you do
  >> 
  >> > \keyword{one}
  >> > \keyword{two}
  >> > \keyword{three}
  >> 
  >> > but for multiple @params you do
  >> 
  >> > \arguments{
  >> > \item{one}{desc one}
  >> > \item{two}{desc two}
  >> > }
  >> 
  >> Good example. Let me fit it into the above paradigm.
  >> 
  >> After the *block level* processing this RoxyBlock will have a tag of
  >> class "TagArguments" (and *no* tags of class TagParam). That is, it is a
  >> *standalone* entity which can be transformed by outRd into a string.
  >> 
  >> In other words, TagParam's roxyProcess *block level* method creates a tag
  >> @arguments and removes the corresponding TagParam argument from the
  >> roxyBlock object.

  > Problems:

  > * calling it "block" level doesn't really make sense because it's now
  > (potentially) working on multiple blocks

Not at all. procBlock(block, param) works on a block and generates new
tag of class TagArgument (in the same block!). Then procPackage(package,
rdname) unifies all the argument into one argument inside the final
RoxyBlock object.

  > * what happens to the objects? Does each RoxyBlock now have a list of
  > objects associated with it? If so, how do you associate the usage
  > statements with the objects.  i.e. how would you resolve:

If you accept my evaluation mechanism (i.e. replacement of
object_from_call), then each RoxyBlock might end up associated with a
list of objects from the very first parsing stage. But this seem  not
to be a problem in this case.

  > #' Topic a.
  > #'
  > #' @rdname shared
  > a <- function(x) {}

  > #' Topic b.
  > #'
  > #' @rdname shared
  > #' @usage b() # some comment here
  > b <- function(x) {}

The order is procTag -> procBlock -> procPackage

So the first is called procTag(b, usage) which generates from the raw
tag (aka object with @name and @text) a "complete" tag object with all
the other slots filled in. And this tag, being processed, doesn't need
to carry 'b' function further on. And even if it really needs to carry
it, the class "TagUsage" can have a slot "@object" which will hold
object 'b'. And procTag(b, usage) fills in this slot by object 'b'.

Actually, in this paradigm, dispatch on object of procPackage and
procBlock might not be needed at all. Only dispatching procTag on the
object might be enough. Tags which need to carry the object over, can
have slot @object to do so.

  > * Output objects are used solely for their side effects (writing to
  > disk).  They return nothing.

  > * Roccer objects return a modified RoxyPackage and have no
  > side-effects.  This is important for testing.

  > * All roccers need to be run before output can occur.

Yes, useful distinction. Especially the last part. For the second one
you cannot really enforce Roccer to have no side effect. The user can do
whatever.

(Really, why not call roccer a processor? And why not call it a function
or method instead of an object? Let's be precise. User's life will be so
much simpler without these new terms.)

    Vitalie