[Roxygen-devel] S4 implementation of @usage

Thu Aug 30 18:29:07 CEST 2012

> How about this. You can have 3 levels on which a tag can perform an
> aciton -- a local tag level, on the object documentation (block) level
> and on a package level. For each of this actions you have a generic
> dispatched on *tag* object:
>
> setGeneric("prepareTag", function(tag)  standardGeneric("prepareTag"),
>            useAsDefault = function(tag) tag)
>
> setGeneric("prepareDoc", function(tag, roxydoc)  standardGeneric("prepareDoc"),
>            useAsDefault = function(tag, roxydoc) roxydoc)
>
> setGeneric("preparePackage", function(tag, roxypackage)  standardGeneric("preparePackage"),
>            useAsDefault = function(tag, roxypackage) roxypackage)
>
> So you have 3 core objects roxyPackage holding a list of roxyBlocks,
> roxyBlocks which comprise roxyDoc and object, and finally roxyDoc which
> comprise roxyTags.
>
> preparePackage returns roxyPackage
> prepareDoc returns roxyDoc
> prepareTag returns roxyTag
>
> Pretty simple, isn't it? Only special tags have to declare prepareDoc
> and preparePackage.
>
> So roxygenize will iterate 3 times over all tags and call prepareTag on
> first iteration, then prepareDoc, and finally preparePackage.

That's better, but generally you only need to call preparePackage
once, not once for each instance of that tag.  I was thinking
something like:

setGeneric("roxyProcess", function(input, ...) {
  standardGeneric("roxyProcess")
})

setMethod("roxyProcess", "RoxyPackage", function(input) {
  # Process each block individually for local tags
  input at blocks <- lapply(input at blocks, roxyProcess)

  # Process the global tags which
  input at blocks <- lapply(input at globalTags, roxyProcess, input at blocks)

  input
})

setMethod("roxyProcess", "RoxyBlock", function(input) {
  input at tags <- lapply(input at tags, roxyProcess)
  input
})

> (The above is to give an idea and fix the terminology for the
> sequel. The finial implementation will only have one roxyPrepare method
> dispatched on both arguments (tag, missing), (Tag, RoxyDoc), (Tag,
> roxypackage). Hopefully I am clear enough here.)

Yup, it makes sense, I just don't think it will work ;)

> Otherwise you need some global exchange. Store roxyBlock globally and
> allow the tag prepareTag method to modify it by side effect. Ugly and
> not an R-ish way.
>
>   > setClass("GlobalTag", contains = "Tag",
>   >   slots = "RocBlock" # pointer back to the rocblock that contained them
>   > )
>
> pointer? That's not that easy in R, is it? You mean drooping to C and
> installing real C pointers to objects?

Well it wouldn't be a dynamic pointer, it would just be copy.

> An alternative could be to make roxyDoc an S4 environment. And each
> tag can have a @parent slot holding his parent environment. Same for
> roxyPackage object, it can be an environment. And roxyBlock can have a
> slot pointing to parent roxyPackage object.
>
> Then prepareTag can have access to his parent roxyDoc and roxyPackage.

Yes, then you might as well use reference classes.

> As compared to the preparePackage/Doc/Tag approach, the modifications are
> done by side effect. Not that nice IMO.

Agreed.

> Hm,  I was thinking that outRd should return a string containing an Rd
> representation of the tag and should have nothing to with the file. It's
> a task of a special function (write_rd_file) to aggregate all the tags
> from an roxyDoc object and write them into a file.

I don't think that can work, because combining Rd commands can't work
at the string level, and it varies from command to command.  i.e. if
you have multiple @keywords you do

\keyword{one}
\keyword{two}
\keyword{three}

but for multiple @params you do

\arguments{
\item{one}{desc one}
\item{two}{desc two}
}

> From what you say, it seems that outRd should also have a global
> perspective. This looks like a redundancy to me. The preparation stage
> (prepareTag, prepareDoc and preparePackage) should handle all this
> global dependencies.
>
> Take for example @rdname. The method procesPackage(rdTag, roxypackage)
> should return a new RoxyPackage object, with all RoxyDoc with the same
> @rdname unified in one RoxyDoc object. So that by the end of all the
> procesXXX stages, the roxyPackage contains only one RoxyDoc object per
> output file.

I agree, but it's harder to do that merging at that point than at the
Rd level.  (That's from personal experience doing it both ways - I
don't have any evidence to back it up)

>   > Another possible approach could be:
>
>   > setClass("OutputRd")
>   > setGeneric("makeOutput", function(input, output) {})
>   > setMethod("makeOutput", c("TagParam", "OutputRd"), ...)
>   > setMethod("makeOutput", c("TagImport", "OutputNamespace"), ...)
>   > setMethod("makeOutput", c("TagIncludes", "OutputDescription") ...)
>
>   > setMethod("makeOutput", c("RoxyBlock", "Output"), function(input, output) {
>   >   lapply(input at tags, makeOutput, output = output)
>   > }
>   > setMethod("makeOutput", c("RoxyPackage", "Output"), function(input, ouput) {
>   >   unlist(lapply(input at blocks, makeOutput, output = output), recursive = FALSE)
>   > }
>
>   > setGeneric("writeOutput", function(output, data) {})
>   > setMethod("writeOutput", c("outputNamespace", "list")) {
>   >   lines <- sort(unique(unlist(data)))
>   >   write_if_different(lines, "NAMESPACE")
>   > }
>
>   > That would make it easier for the user to specify which sorts of
>   > output they want because they could just provide a list of Output
>   > objects to roxygenise.
>
> Interesting, but it doesn't feel natural to me. It specifies a *type* of
> an output by the type of an *input* object which you create specifically
> for this purpose (to indicate the type of output). That's tough ;).

If you did it with different methods (outRd, outNamespace,
outDescription), and the user created a new type of output (e.g.
outDemoIndex) how would they tell roxygen to all call the outDemoIndex
on every block/tag?

This style would also be useful for testing - you could have output
objects that don't actually output anything, they just capture all
their inputs for later inspection.

> I still don't understand why you would need a method to generate a
> namespace and description? Isn't this a global action? That is, the
> write_namespace function should take as input all objects (roxyPackage),
> iterate through all the roxyDoc objects, look into @export field and
> finally write a namespace file? Similarly for description.

Hmmm - I just think it feels more natural for it to be driven by the
individual tags. Each tag knows what it's output should be (these
namespace declarations, these entries in the Rd file), and then the
output object knows how to aggregate them all together and write them
to disk.

> And why users might want to modify the default namespace generator?

They might want to not use it - i.e. if you're managing the namespace
by hand, it'd be nice to tell roxygen not to mess with it.

> S4 should be used only for those parts of the package which impose
> different behavior for different objects. All the rest are simple
> functions. It looks to me that you are really over zealous in trying to
> use S4 for everything.

But output does differ between tags?  Some need to output Rd, some
NAMESPACE, etc.

> A conceptual note. Roxygen is a documentation generator, so the output
> is one to one correspondence to the file format (rd, text, html
> etc). Making an namespace or description output is unnatural and seems
> to be an unnecessary confusion.

I don't think that's necessary true.  e.g. a potential useful
extension could be to modify @importFrom so that it also lists the
external functions used in the documentation - then it would need to
output to both namespace and Rd.  I think there are lots of potential
use cases for output to multiple locations.

Hmmm, but you list rd, html and text as potential output formats - I
definitely don't see html and text as output types.  That's something
you might generate from the Rd files, but I don't think it's the job
of roxygen to generate anything that R doesn't use directly.  Maybe
that's why we seem to be talking past each other.

I see roxygen being 90% a documentation generator - the other 10% is
automatically generating any other file type that makes package
development easier.  This includes the NAMESPACE, some parts of the
DESCRIPTION, the demo index, ...

> I think this trails back to rocklet concept in the first version of the
> roxygen. I could never understand what is the point of collate_rocklet
> and namespace_rocklet. From the documentation they are objects what
> specify an action. This is confusing, as action is usually associated
> with a function or a method.

Yes, agreed.

> So instead of
>
>        roxygenize( ..., roclets = c("collate", "namespace", "rd"))
>
> this would have been much simpler:
>
>        roxygenize( ..., collate = TRUE, namespace = TRUE, output = rd)

See I'm thinking

roxygenize( ..., output = c(OutputRd(), OutputCollate(), outputNamespace())

would be even more useful, because it makes it easier to extend
roxygen with new types of output.

> I am definitely not seeing the full picture here, but I can be pretty
> sure that whatever the reason behind those decision was, it could have
> been done in a standard R-ish way. There is really no need to confuse
> the user with new pseudo class objects like rocklets or roccers or
> whatever. Functions, methods, classes and object, that is the standard R
> language.

Well roclets were standard s3 objects - I think roxygen did work in
standard R way.  It's more that the design of roxygen didn't match up
to what extensions were actually needed - but it's really hard to
predict what people will want to extend before you actually write the
system.

Hadley

-- 
Assistant Professor
Department of Statistics / Rice University
http://had.co.nz/