[Roxygen-devel] Design of roxygen3

Wed Aug 29 15:18:06 CEST 2012

Below I've attempted to outline the basic object structure of
roxygen3.  I would really appreciate any feedback.  There's a prettier
version at https://gist.github.com/3512194.  Thanks!

Hadley

[Important terminology: __roc__ is short for roxygen comment block,
__rocblock__ and for combination of roc and object that's being
documented]

# Roxygen design

The __roccer__ is the object in charge of processing a tag. It is made
up of two components:

* A `parser`, which is in charge of converting the raw text string
into an intermediate object that can be used by other roccers and is
turned into output

* An `rocout` object, which takes the intermediate format (after it's
potentially been modified by other roccers) and turns it into an
output format

Each of these is described in turn below.

Each roccer has a name, and a set of dependencies (currently stored in
`base_prereqs`.  Before the roccers are called, a topological sort is
performed to ensure that they are run in the correct order. This makes
sure (e.g.) that `@title` is processed before the intro paragraphs,
and that `@param` is processed before `@inheritParams`.

In the future roccers may gain some sort of keyword or tag field, to
make it more easy to flexibly select subsets of the roccers to run.

## Parser

The parser implements one method: `parse_rocblocks`, which takes a
list of rocblocks as input and returns a list of rocblocks as output.

There are currently three types of parser:

* `null_parser`: does nothing (useful when the text is used as is)

* `roc_parser`: modifies only the roc component of the rocblock. It
has two arguments: `tag` and `one`.  These are both functions: `tag`
called will just a single tag, and `one` will be called with the
entire rocblock (broken down using `do.call` into `roc`, `obj` etc).
These both return list which are combined with the original roc using
`modifyList`.

* `rocblock_parser`: has only a single argument, `all`, which is a
function that is called with the list of all rocblocks and should
return a list of rocblocks. This means it can modify anything: it can
add or delete rocblocks, or modify any component of the rocblock.

This makes it possible for tags to be very flexible - in roxygen2 they
were basically limited to local action, but many tags (like
`@include`, `@family`, and `@inheritParams` need a more global
perspective. It also makes it possible to write other specialised
parsers that might operate on particular types of object and extract
more information to add to the roc.

The other advantage of multiple parsers is for caching: the more
locally a parser operates the easier it is to cache between subsequent
runs. Globally operating tags depend on all roccers, so will generally
need to be recomputed every time, but roccers that work with a single
tag should only need to be recomputed if that tag changes.

## Rocout

The `rocout` object has three methods:

* `output_build`: this is given a single tag from the rocblock, and
should return an `output` object which describes where and what to
write.

* `output_postproc`: this combines multiple tags and does any other
postprocessing data modification work before the final output.
(Mainly separated out from `output_write` to make it easier to test)

* `output_write`: this basically calls `format` on the output object
representing each file and then `write_if_different` to provide an
informative message.

## Output objects

Output objects represent the various possible types of output.  There
are currently 3:

* rd commands in an rd file
* lines in the `NAMESPACE` file
* fields in the `DESCRIPTION` file

-- 
Assistant Professor
Department of Statistics / Rice University
http://had.co.nz/