[Subgroup-commits] r61 - in pkg/rsubgroup: . R inst/java man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Jun 18 16:48:11 CEST 2015
Author: atzmueller
Date: 2015-06-18 16:48:11 +0200 (Thu, 18 Jun 2015)
New Revision: 61
Added:
pkg/rsubgroup/man/is.pattern.matching.Rd
Removed:
pkg/rsubgroup/man/is.matching.Rd
Modified:
pkg/rsubgroup/DESCRIPTION
pkg/rsubgroup/NEWS
pkg/rsubgroup/R/classes.R
pkg/rsubgroup/R/subgroup.R
pkg/rsubgroup/inst/java/subgroup.jar
pkg/rsubgroup/man/SDTaskConfig-class.Rd
Log:
* SDTaskConfig now provides an option mintp, that allows to set the minimal
true positives threshold to be contained in a subgroup, which is usually
very effective for pruning.
* The Pattern class now contains a list of selection expressions (selectors)
for the subgroup, not only the description. Using the is.pattern.matching
function, a match of a pattern and a data instance can be checked now.
Modified: pkg/rsubgroup/DESCRIPTION
===================================================================
--- pkg/rsubgroup/DESCRIPTION 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/DESCRIPTION 2015-06-18 14:48:11 UTC (rev 61)
@@ -2,7 +2,7 @@
Type: Package
Title: Subgroup Discovery and Analytics
Version: 0.7
-Date: 2015-06-09
+Date: 2015-06-18
Author: Martin Atzmueller
Maintainer: Martin Atzmueller <martin at atzmueller.net>
Description: A collection of efficient and effective tools and
Modified: pkg/rsubgroup/NEWS
===================================================================
--- pkg/rsubgroup/NEWS 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/NEWS 2015-06-18 14:48:11 UTC (rev 61)
@@ -9,9 +9,12 @@
* document setting Java heap space before loading the rsubgroup library.
* Improve error handling (exception signaling) when running subgroup discovery
using an ARFF file directly.
+ * SDTaskConfig now provides an option mintp, that allows to set the minimal
+ true positives threshold to be contained in a subgroup, which is usually
+ very effective for pruning.
* The Pattern class now contains a list of selection expressions (selectors)
- for the subgroup, not only the description. Using the is.matching function,
- a match of a pattern and a data instance can be checked now.
+ for the subgroup, not only the description. Using the is.pattern.matching
+ function, a match of a pattern and a data instance can be checked now.
* Bug fixes:
* fix providing attributes=NULL (i.e., automatically include all attributes)
Modified: pkg/rsubgroup/R/classes.R
===================================================================
--- pkg/rsubgroup/R/classes.R 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/R/classes.R 2015-06-18 14:48:11 UTC (rev 61)
@@ -37,14 +37,17 @@
k = "numeric",
minqual = "numeric",
minsize = "numeric",
+ mintp = "numeric",
maxlen = "numeric",
nodefaults = "logical",
relfilter = "logical",
postfilter = "character",
attributes = ".vectorOrNull"
),
- prototype(qf="ps", method="sdmap", k = as.integer(20), minqual = as.integer(0), minsize = as.integer(0),
- maxlen = as.integer(7), nodefaults = FALSE, relfilter = FALSE, postfilter = "", attributes = NULL)
+ prototype(qf="ps", method="sdmap", k = as.integer(20),
+ minqual = as.integer(0), minsize = as.integer(0), mintp = as.integer(0),
+ maxlen = as.integer(7), nodefaults = FALSE, relfilter = FALSE,
+ postfilter = "", attributes = NULL)
)
SDTaskConfig <- function(...){
Modified: pkg/rsubgroup/R/subgroup.R
===================================================================
--- pkg/rsubgroup/R/subgroup.R 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/R/subgroup.R 2015-06-18 14:48:11 UTC (rev 61)
@@ -95,6 +95,7 @@
J(task, "setMaxSGCount", as.integer(config at k))
J(task, "setMinQualityLimit", as.double(config at minqual))
J(task, "setMinSubgroupSize", as.double(config at minsize))
+ J(task, "setMinTPSupportAbsolute", as.double(config at mintp))
J(task, "setMaxSGDSize", as.integer(config at maxlen))
J(task, "setSuppressStrictlyIrrelevantSubgroups", config at relfilter)
J(task, "setIgnoreDefaultValues", config at nodefaults)
@@ -299,11 +300,13 @@
invisible()
}
-is.matching <- function(pattern, data.list) {
+is.pattern.matching <- function(pattern, data.list) {
selectors <- pattern at selectors
matching <- TRUE
for (sel in names(selectors)) {
- if (data.list[[sel]] != selectors[[sel]]) {
+ data.list.selector <- as.character(data.list[[sel]])
+ pattern.selector <- as.character(selectors[[sel]])
+ if (isTRUE(data.list.selector != pattern.selector)) {
matching <- FALSE
break
}
Modified: pkg/rsubgroup/inst/java/subgroup.jar
===================================================================
(Binary files differ)
Modified: pkg/rsubgroup/man/SDTaskConfig-class.Rd
===================================================================
--- pkg/rsubgroup/man/SDTaskConfig-class.Rd 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/man/SDTaskConfig-class.Rd 2015-06-18 14:48:11 UTC (rev 61)
@@ -22,27 +22,32 @@
Gain \code{gain},
Relative Gain \code{relgain},
Weighted Relative Accuracy \code{wracc}.
+ The default is \code{qf = "ps"}.
}
\item{\code{method}:}{A mining method; one of
Beam-Search \code{beam},
BSD \code{bsd},
SD-Map \code{sdmap},
SD-Map enabling internal disjunctions \code{sdmap-dis}.
+ The default is \code{method = "sdmap"}.
}
\item{\code{k}:}{The maximum number (top-k) of patterns
- to discover.}
- \item{\code{minqual}}{The minimal quality.}
- \item{\code{minsize}}{The minimal size of a subgroup
- (minimal coverage of database records).}
- \item{\code{maxlen}}{The maximal description length of
- a pattern, i.e., the maximal number of conjunctions.}
- \item{\code{nodefaults}}{Ignore default values, i.e.,
+ to discover. The default is \code{k = 20}}
+ \item{\code{minqual}:}{The minimal quality (default \code{minqual = 0}).}
+ \item{\code{minsize}:}{The minimal size of a subgroup
+ (minimal coverage of database records, default \code{minsize = 0}).}
+ \item{\code{mintp}:}{The minimal true positive (tp) threshold
+ (minimal (absolute) number of true positives in a subgroup, relevant for
+ binary target concepts only), defaults to \code{mintp = 0}}.
+ \item{\code{maxlen}:}{The maximal description length of
+ a pattern, i.e., the maximal number of conjunctions (default \code{maxlen = 7}).}
+ \item{\code{nodefaults}:}{Ignore default values, i.e.,
do not include the respective first value of each
- attribute}
- \item{\code{relfilter}}{Controls, whether irrelevant
+ attribute (default \code{nodefaults=FALSE}, i.e., include all values).}
+ \item{\code{relfilter}:}{Controls, whether irrelevant
patterns are filtered during pattern mining; negatively
- impacts performance.}
- \item{\code{postfilter}}{Controls, whether a post-processing
+ impacts performance (default \code{relfilter = FALSE})).}
+ \item{\code{postfilter}:}{Controls, whether a post-processing
filter is applied; one of:
Minimum Improvement (Global) \code{min-improve-global},
checks the patterns against all possible generalizations,
@@ -50,16 +55,18 @@
checks the patterns against all their generalizations
in the result set,
Relevancy Filter \code{relevancy}, removes patterns that
- are strictly irrlevant,
+ are strictly irrelevant,
Significant Improvement (Global) \code{sig-improve-global},
removes patterns that do not significantly improve
(0.05 level) w.r.t. all their possible generalizations,
Significant Improvement (Set) \code{sig-improve-set},
removes patterns that do not significantly improve
(0.05 level) w.r.t. all generalizations in the result set.
+ By default no postfilter is set, i.e., \code{postfilter = ""}.
}
- \item{\code{attributes}}{The list of attributes to consider for mining.
- Either a vector of attribute names, or NULL, which includes all attributes.}
+ \item{\code{attributes}:}{The list of attributes to consider for mining.
+ Either a vector of attribute names, or NULL (the default),
+ which includes all attributes.}
}
}
\seealso{
Deleted: pkg/rsubgroup/man/is.matching.Rd
===================================================================
--- pkg/rsubgroup/man/is.matching.Rd 2015-06-12 09:48:04 UTC (rev 60)
+++ pkg/rsubgroup/man/is.matching.Rd 2015-06-18 14:48:11 UTC (rev 61)
@@ -1,20 +0,0 @@
-\name{is.matching}
-\alias{is.matching}
-\title{Tests whether a pattern and a data list (row of a data frame) match}
-\description{
-Tests whether a pattern and a data list (row of a data frame) match, e.g.,
-for implementing classification methods.
-}
-\usage{
-is.matching(pattern, data.list)
-}
-\arguments{
-\item{pattern}{An instance of class Pattern, e.g., returned by DiscoverSubgroups.}
-\item{data.list}{A list having the attributes as 'keys', and the values as
-respective values of the list. This corresponds, for example, to a row of a
-data frame.}
-}
-\seealso{
-\code{\link{Pattern-class}}.
-}
-\keyword{test pattern}
\ No newline at end of file
Copied: pkg/rsubgroup/man/is.pattern.matching.Rd (from rev 59, pkg/rsubgroup/man/is.matching.Rd)
===================================================================
--- pkg/rsubgroup/man/is.pattern.matching.Rd (rev 0)
+++ pkg/rsubgroup/man/is.pattern.matching.Rd 2015-06-18 14:48:11 UTC (rev 61)
@@ -0,0 +1,20 @@
+\name{is.pattern.matching}
+\alias{is.pattern.matching}
+\title{Tests whether a pattern and a data list (row of a data frame) match}
+\description{
+Tests whether a pattern and a data list (row of a data frame) match, e.g.,
+for implementing classification methods.
+}
+\usage{
+is.pattern.matching(pattern, data.list)
+}
+\arguments{
+\item{pattern}{An instance of class Pattern, e.g., returned by DiscoverSubgroups.}
+\item{data.list}{A list having the attributes as 'keys', and the values as
+respective values of the list. This corresponds, for example, to a row of a
+data frame.}
+}
+\seealso{
+\code{\link{Pattern-class}}.
+}
+\keyword{test pattern}
\ No newline at end of file
More information about the Subgroup-commits
mailing list