% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/GE.R
\name{GE}
\alias{GE}
\alias{GE.default}
\alias{GE.formula}
\title{Generalized Edition}
\usage{
\method{GE}{formula}(formula, data, ...)

\method{GE}{default}(x, k = 5, kk = ceiling(k/2), classColumn = ncol(x),
  ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{k}{Number of nearest neighbors to be considered.}

\item{kk}{Minimum size for local majority class in order to relabel an instance.}

\item{classColumn}{positive integer indicating the column which contains the
(factor of) classes. By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Similarity-based filter for removing or repairing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
\code{GE} is a generalization of \code{\link{ENN}} that integrates the possibility of 'repairing'
or 'relabeling' instances rather than only 'removing'. For each instance, \code{GE} considers
its \code{k-1} neighbors and the instance itself. If there are at least \code{kk} examples from the same class,
the instance is relabeled with that class (which could be its own). Otherwise, it is removed.
}
\examples{
# Next example is not run in order to save time
\dontrun{
data(iris)
out <- GE(iris)
summary(out, explicit = TRUE)
# We check that the process was correct
irisCopy <- iris
irisCopy[out$repIdx,5] <- out$repLab
cleanData <- irisCopy[setdiff(1:nrow(iris),out$remIdx),]
identical(out$cleanData,cleanData)
}
}
\references{
Koplowitz J., Brown T. A. (1981): On the relation of performance to editing
in nearest neighbor rules. \emph{Pattern Recognition}, 13(3), 251-255.
}

