[Subgroup-commits] r61 - in pkg/rsubgroup: . R inst/java man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Thu Jun 18 16:48:11 CEST 2015


Author: atzmueller
Date: 2015-06-18 16:48:11 +0200 (Thu, 18 Jun 2015)
New Revision: 61

Added:
   pkg/rsubgroup/man/is.pattern.matching.Rd
Removed:
   pkg/rsubgroup/man/is.matching.Rd
Modified:
   pkg/rsubgroup/DESCRIPTION
   pkg/rsubgroup/NEWS
   pkg/rsubgroup/R/classes.R
   pkg/rsubgroup/R/subgroup.R
   pkg/rsubgroup/inst/java/subgroup.jar
   pkg/rsubgroup/man/SDTaskConfig-class.Rd
Log:
  * SDTaskConfig now provides an option mintp, that allows to set the minimal
  true positives threshold to be contained in a subgroup, which is usually
  very effective for pruning.
  * The Pattern class now contains a list of selection expressions (selectors)
  for the subgroup, not only the description. Using the is.pattern.matching
  function, a match of a pattern and a data instance can be checked now. 


Modified: pkg/rsubgroup/DESCRIPTION
===================================================================
--- pkg/rsubgroup/DESCRIPTION	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/DESCRIPTION	2015-06-18 14:48:11 UTC (rev 61)
@@ -2,7 +2,7 @@
 Type: Package
 Title: Subgroup Discovery and Analytics
 Version: 0.7
-Date: 2015-06-09
+Date: 2015-06-18
 Author: Martin Atzmueller
 Maintainer: Martin Atzmueller <martin at atzmueller.net>
 Description: A collection of efficient and effective tools and

Modified: pkg/rsubgroup/NEWS
===================================================================
--- pkg/rsubgroup/NEWS	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/NEWS	2015-06-18 14:48:11 UTC (rev 61)
@@ -9,9 +9,12 @@
   * document setting Java heap space before loading the rsubgroup library.
   * Improve error handling (exception signaling) when running subgroup discovery
   using an ARFF file directly.
+  * SDTaskConfig now provides an option mintp, that allows to set the minimal
+  true positives threshold to be contained in a subgroup, which is usually
+  very effective for pruning.
   * The Pattern class now contains a list of selection expressions (selectors)
-  for the subgroup, not only the description. Using the is.matching function,
-  a match of a pattern and a data instance can be checked now. 
+  for the subgroup, not only the description. Using the is.pattern.matching
+  function, a match of a pattern and a data instance can be checked now. 
 
 * Bug fixes:
   * fix providing attributes=NULL (i.e., automatically include all attributes)

Modified: pkg/rsubgroup/R/classes.R
===================================================================
--- pkg/rsubgroup/R/classes.R	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/R/classes.R	2015-06-18 14:48:11 UTC (rev 61)
@@ -37,14 +37,17 @@
         k           = "numeric",
         minqual     = "numeric",
         minsize     = "numeric",
+        mintp       = "numeric",
         maxlen      = "numeric",
         nodefaults  = "logical",
         relfilter   = "logical",
         postfilter  = "character",
         attributes  = ".vectorOrNull"
     ),
-    prototype(qf="ps", method="sdmap", k = as.integer(20), minqual = as.integer(0), minsize = as.integer(0),
-        maxlen = as.integer(7), nodefaults = FALSE, relfilter = FALSE, postfilter = "", attributes = NULL)
+    prototype(qf="ps", method="sdmap", k = as.integer(20),
+        minqual = as.integer(0), minsize = as.integer(0), mintp = as.integer(0),
+        maxlen = as.integer(7), nodefaults = FALSE, relfilter = FALSE,
+        postfilter = "", attributes = NULL)
 )
 
 SDTaskConfig <- function(...){

Modified: pkg/rsubgroup/R/subgroup.R
===================================================================
--- pkg/rsubgroup/R/subgroup.R	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/R/subgroup.R	2015-06-18 14:48:11 UTC (rev 61)
@@ -95,6 +95,7 @@
   J(task, "setMaxSGCount", as.integer(config at k))
   J(task, "setMinQualityLimit", as.double(config at minqual))
   J(task, "setMinSubgroupSize", as.double(config at minsize))
+  J(task, "setMinTPSupportAbsolute", as.double(config at mintp))
   J(task, "setMaxSGDSize", as.integer(config at maxlen))
   J(task, "setSuppressStrictlyIrrelevantSubgroups", config at relfilter)
   J(task, "setIgnoreDefaultValues", config at nodefaults)
@@ -299,11 +300,13 @@
   invisible()
 }
 
-is.matching <- function(pattern, data.list) {
+is.pattern.matching <- function(pattern, data.list) {
   selectors <- pattern at selectors
   matching <- TRUE
   for (sel in names(selectors)) {
-    if (data.list[[sel]] != selectors[[sel]]) {
+    data.list.selector <- as.character(data.list[[sel]])
+    pattern.selector <- as.character(selectors[[sel]])
+    if (isTRUE(data.list.selector != pattern.selector)) {
       matching <- FALSE
       break
     }

Modified: pkg/rsubgroup/inst/java/subgroup.jar
===================================================================
(Binary files differ)

Modified: pkg/rsubgroup/man/SDTaskConfig-class.Rd
===================================================================
--- pkg/rsubgroup/man/SDTaskConfig-class.Rd	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/man/SDTaskConfig-class.Rd	2015-06-18 14:48:11 UTC (rev 61)
@@ -22,27 +22,32 @@
 	Gain \code{gain},
 	Relative Gain \code{relgain},
 	Weighted Relative Accuracy \code{wracc}.
+	The default is \code{qf = "ps"}.
 	}
     \item{\code{method}:}{A mining method; one of
 	Beam-Search \code{beam},
 	BSD \code{bsd},
 	SD-Map \code{sdmap},
 	SD-Map enabling internal disjunctions \code{sdmap-dis}.
+	The default is \code{method = "sdmap"}.
 	}
     \item{\code{k}:}{The maximum number (top-k) of patterns
-	to discover.}
-	\item{\code{minqual}}{The minimal quality.}
-	\item{\code{minsize}}{The minimal size of a subgroup
-	(minimal coverage of database records).}
-	\item{\code{maxlen}}{The maximal description length of
-	a pattern, i.e., the maximal number of conjunctions.}
-	\item{\code{nodefaults}}{Ignore default values, i.e.,
+	to discover. The default is \code{k = 20}}
+	\item{\code{minqual}:}{The minimal quality (default \code{minqual = 0}).}
+	\item{\code{minsize}:}{The minimal size of a subgroup
+	(minimal coverage of database records, default \code{minsize = 0}).}
+	\item{\code{mintp}:}{The minimal true positive (tp) threshold
+	(minimal (absolute) number of true positives in a subgroup, relevant for
+	binary target concepts only), defaults to \code{mintp = 0}}.
+	\item{\code{maxlen}:}{The maximal description length of
+	a pattern, i.e., the maximal number of conjunctions (default \code{maxlen = 7}).}
+	\item{\code{nodefaults}:}{Ignore default values, i.e.,
 	do not include the respective first value of each
-	attribute}
-	\item{\code{relfilter}}{Controls, whether irrelevant
+	attribute (default \code{nodefaults=FALSE}, i.e., include all values).}
+	\item{\code{relfilter}:}{Controls, whether irrelevant
 	patterns are filtered during pattern mining; negatively
-	impacts performance.}
-	\item{\code{postfilter}}{Controls, whether a post-processing
+	impacts performance (default \code{relfilter = FALSE})).}
+	\item{\code{postfilter}:}{Controls, whether a post-processing
 	filter is applied; one of:
 	Minimum Improvement (Global) \code{min-improve-global},
 	checks the patterns against all possible generalizations,
@@ -50,16 +55,18 @@
 	checks the patterns against all their generalizations
 	in the result set,
 	Relevancy Filter \code{relevancy}, removes patterns that
-	are strictly irrlevant,
+	are strictly irrelevant,
 	Significant Improvement (Global) \code{sig-improve-global},
 	removes patterns that do not significantly improve
 	(0.05 level) w.r.t. all their possible generalizations,
 	Significant Improvement (Set) \code{sig-improve-set},
 	removes patterns that do not significantly improve
 	(0.05 level) w.r.t. all generalizations in the result set.
+	By default no postfilter is set, i.e., \code{postfilter = ""}.
 	}
-	\item{\code{attributes}}{The list of attributes to consider for mining.
-	Either a vector of attribute names, or NULL, which includes all attributes.}
+	\item{\code{attributes}:}{The list of attributes to consider for mining.
+	Either a vector of attribute names, or NULL (the default),
+	which includes all attributes.}
 	}
  }
 \seealso{

Deleted: pkg/rsubgroup/man/is.matching.Rd
===================================================================
--- pkg/rsubgroup/man/is.matching.Rd	2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/man/is.matching.Rd	2015-06-18 14:48:11 UTC (rev 61)
@@ -1,20 +0,0 @@
-\name{is.matching}
-\alias{is.matching}
-\title{Tests whether a pattern and a data list (row of a data frame) match}
-\description{
-Tests whether a pattern and a data list (row of a data frame) match, e.g.,
-for implementing classification methods.
-}
-\usage{
-is.matching(pattern, data.list)
-}
-\arguments{
-\item{pattern}{An instance of class Pattern, e.g., returned by DiscoverSubgroups.}
-\item{data.list}{A list having the attributes as 'keys', and the values as
-respective values of the list. This corresponds, for example, to a row of a
-data frame.}
-}
-\seealso{
-\code{\link{Pattern-class}}.
-}
-\keyword{test pattern}
\ No newline at end of file

Copied: pkg/rsubgroup/man/is.pattern.matching.Rd (from rev 59, pkg/rsubgroup/man/is.matching.Rd)
===================================================================
--- pkg/rsubgroup/man/is.pattern.matching.Rd	                        (rev 0)
+++ pkg/rsubgroup/man/is.pattern.matching.Rd	2015-06-18 14:48:11 UTC (rev 61)
@@ -0,0 +1,20 @@
+\name{is.pattern.matching}
+\alias{is.pattern.matching}
+\title{Tests whether a pattern and a data list (row of a data frame) match}
+\description{
+Tests whether a pattern and a data list (row of a data frame) match, e.g.,
+for implementing classification methods.
+}
+\usage{
+is.pattern.matching(pattern, data.list)
+}
+\arguments{
+\item{pattern}{An instance of class Pattern, e.g., returned by DiscoverSubgroups.}
+\item{data.list}{A list having the attributes as 'keys', and the values as
+respective values of the list. This corresponds, for example, to a row of a
+data frame.}
+}
+\seealso{
+\code{\link{Pattern-class}}.
+}
+\keyword{test pattern}
\ No newline at end of file



More information about the Subgroup-commits mailing list