[Hadoopstreaming-commits] r7 - in pkg: . man

Sat Mar 21 06:29:52 CET 2009

Author: drosen
Date: 2009-03-21 06:29:52 +0100 (Sat, 21 Mar 2009)
New Revision: 7

Modified:
   pkg/DESCRIPTION
   pkg/man/HadoopStreaming-package.Rd
   pkg/man/hsTableReader.Rd
Log:
documentation edits

Modified: pkg/DESCRIPTION
===================================================================

--- pkg/DESCRIPTION	2009-03-21 05:18:54 UTC (rev 6)
+++ pkg/DESCRIPTION	2009-03-21 05:29:52 UTC (rev 7)
@@ -6,9 +6,5 @@
 Author: David S. Rosenberg <drosen at sensenetworks.com>
 Maintainer: David S. Rosenberg <drosen at sensenetworks.com>
 Depends: getopt
-Description:   This package facilitates a fairly literal implementation of
-  map/reduce streaming in R.  It also facilitates operating on data in
-  a streaming fashion, without using Hadoop at all. A function is
-  provided for handling many of the command line arguments that are
-  useful for streaming data.
+Description: Provides a framework for writing map/reduce scripts for use in Hadoop Streaming. Also facilitates operating on data in a streaming fashion, without Hadoop.
 License: GPL

Modified: pkg/man/HadoopStreaming-package.Rd
===================================================================
--- pkg/man/HadoopStreaming-package.Rd	2009-03-21 05:18:54 UTC (rev 6)
+++ pkg/man/HadoopStreaming-package.Rd	2009-03-21 05:29:52 UTC (rev 7)
@@ -6,23 +6,21 @@
 Functions facilitating Hadoop streaming with R.
 }
 \description{
-  This package facilitates a fairly literal implementation of
-  map/reduce streaming in R.  It also facilitates operating on data in
-  a streaming fashion, without using Hadoop at all. A function is
-  provided for handling many of the command line arguments that are
-  useful for streaming data.
+Provides a framework for writing map/reduce scripts for use in Hadoop
+Streaming. Also facilitates operating on data in a streaming fashion,
+without Hadoop.
 }
 \details{
 \tabular{ll}{
 Package: \tab HadoopStreaming\cr
 Type: \tab Package\cr
 Version: \tab 0.1\cr
-Date: \tab 2009-03-11\cr
+Date: \tab 2009-03-16\cr
 License: \tab GNU \cr
 LazyLoad: \tab yes\cr
 }
 The functions in this package read data in chunks from a file connection
-(STDIN when used with Hadoop streaming), package up the chunks in
+(stdin when used with Hadoop streaming), package up the chunks in
 various ways, and pass the packaged versions to user-supplied
 functions.
 
@@ -37,24 +35,24 @@
 Only hsTableReader will break the data into chunks comprising all rows
 of the same key.  This \emph{assumes} that all rows with the same key are
 stored consecutively in the input file.  This is always the case if
-the input file is taken to be the STDIN pipe to a Hadoop reducer,
+the input file is taken to be the stdin provided by Hadoop in a
+Hadoop streaming job,
 since Hadoop guarantees that the rows given to the reducer are sorted
 by key.  When running from the command line (not in Hadoop), we can
 use the sort utility to sort the keys ourselves.
 
-In addition to the data reading functions, the function hsCmdLineArgs
-offers several default command line arguments for doing things such
-as, specifying an input file, the number of lines of input to read,
-the input and output column separators, etc.
-
-The hsCmdLineArgs function also facilitates packaging both the mapper
+In addition to the data reading functions, the function
+\code{hsCmdLineArgs} offers several default command line arguments for
+doing things such as specifying an input file, the number of lines of
+input to read, the input and output column separators, etc.
+The \code{hsCmdLineArgs} function also facilitates packaging both the mapper
 and reducer scripts into a single R script by accept arguments
 --mapper and --reducer to specify whether the call to the script
 should execute the mapper branch or the reducer.
 
-The examples below give some support code for using these functions.
-Details on using the functions themselves can be found in the
-documentation for those functions.
+The examples below give a bit of support code for using the functions in
+this package. Details on using the functions themselves can be found in
+the documentation for those functions.
 
 For a full demo of running a map/reduce script from the command line and
 in Hadoop, see the directory <RLibraryPath>/HadoopStreaming/wordCntDemo/

Modified: pkg/man/hsTableReader.Rd
===================================================================
--- pkg/man/hsTableReader.Rd	2009-03-21 05:18:54 UTC (rev 6)
+++ pkg/man/hsTableReader.Rd	2009-03-21 05:29:52 UTC (rev 7)
@@ -3,13 +3,15 @@
 %- Also NEED an '\alias' for EACH other topic documented here.
 \title{Chunks input data into data frames}
 \description{
-This function repeatedly reads a chunk of data from an input connection,
+This function repeatedly reads chunks of data from an input connection,
 packages the data as a data.frame, optionally ensures that all the rows
 for certain keys are contained in the data.frame, and passes the data.frame to
 a handler for processing.  This continues until the end of file.
 }
 \usage{
-hsTableReader(file = "", cols = "character", chunkSize = -1, FUN = print, ignoreKey = TRUE, singleKey = TRUE, skip = 0, sep = "\t", keyCol = "key",PFUN=NULL)
+hsTableReader(file = "", cols = "character", chunkSize = -1,
+FUN = print, ignoreKey = TRUE, singleKey = TRUE,
+ skip = 0, sep = "\t", keyCol = "key",PFUN=NULL)
 }
 %- maybe also 'usage' for other objects documented here.
 \arguments{