% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sono.R
\name{sono}
\alias{sono}
\title{SONO (Scores Of Nominal Outlyingness)}
\usage{
sono(
  data,
  probs,
  alpha = 0.01,
  r = 2,
  MAXLEN = 0,
  frequent = FALSE,
  verbose = TRUE
)
}
\arguments{
\item{data}{Dataset; needs to be of class data.frame and consist of factor variables only.}

\item{probs}{List of probability vectors for each variable. Each element of the list must
include as many probabilities as the number of levels associated with it in the dataset.}

\item{alpha}{Significance level for the simultaneous Multinomial confidence intervals constructed, determining what the
frequency thresholds should be for itemsets of different length, used for outlier detection for discrete features. Must be a positive real, at most equal to 0.50. A
greater value leads to a much more conservative algorithm. Default value is 0.01.}

\item{r}{Exponent term in the computation of scores. Must be a non-negative number. The greater its value, the less contribution
itemsets of greater length will have in the overall score. It is suggested that this is not much larger than 3. Default value is 2.}

\item{MAXLEN}{Maximum itemset sequence length to be considered. Default value is 0 which calculates MAXLEN according to a criterion
on the sparsity caused by the total combinations that can be encountered as sequences of greater length are taken into account.
Otherwise, MAXLEN can take any value from 1 up to the total number of discrete variables included in the data set. If user-given MAXLEN is
larger than the estimated value, MAXLEN will default to the latter and a warning message will be displayed, so that
redunand computations are avoided.}

\item{frequent}{Logical determining whether highly frequent or highly infrequent itemsets are considered as outliers. Defaults
to FALSE, treating highly infrequent itemsets are outlying.}

\item{verbose}{Defaults to TRUE to print progress messages. Change to FALSE to suppress.}
}
\value{
A list with 4 elements. The first element is the value of MAXLEN. The second element corresponds to a data frame
with 2 columns; one for the observation numbers and one with the final score of outlyingness.
The third and fourth elements are the matrix of variable contributions and the nominal outlyingness depths vector, respectively.
}
\description{
Function used to compute scores of nominal outlyingness for datasets consisting of nominal features. The
computation is done using the score of \insertCite{costa_novel_2025;textual}{SONO}, defined as follows for an observation \eqn{\boldsymbol{x}_i}:
\deqn{s(\boldsymbol{x}_i)=\sum_{\substack{d \subseteq \boldsymbol{x}_{i}: \\ \text{supp}(d) \notin (\sigma_d, n], \\ \lvert d \rvert \leq \mathrm{MAXLEN}}} \frac{\sigma_d}{\text{supp}(d) \times \lvert d \rvert^r}, \\
r> 0, \ i=1,\dots,n,}
for highly infrequent itemsets and:
\deqn{s(\boldsymbol{x}_i)=\sum_{\substack{d \subseteq \boldsymbol{x}_{i}: \\ \text{supp}(d) \notin [0, \sigma_d), \\ \lvert d \rvert \leq \mathrm{MAXLEN}}} \frac{\text{supp}(d)}{\sigma_d \times \left( \text{MAXLEN} - \lvert d \rvert + 1 \right)^r}, \\
r> 0, \ i=1,\dots,n,}
for highly frequent itemsets.
In the above, \eqn{\text{supp}(d)} is the support of itemset \eqn{d}, \eqn{\sigma_d} is the the maximum/minimum support threshold and \eqn{\text{MAXLEN}} is the maximum length of sequences considered, while \eqn{r} is an exponent term to be determined by the user.
}
\examples{
dt <- as.data.frame(sample(c(1:2), 100, replace = TRUE, prob = c(0.5, 0.5)))
dt <- cbind(dt, sample(c(1:3), 100, replace = TRUE, prob = c(0.5, 0.3, 0.2)))
dt[, 1] <- as.factor(dt[, 1])
dt[, 2] <- as.factor(dt[, 2])
colnames(dt) <- c('V1', 'V2')
sono(data = dt,
probs = list(c(0.5, 0.5), c(1/3, 1/3, 1/3)),
alpha = 0.01,
r = 2,
MAXLEN = 0,
frequent = FALSE)

}
\references{
{
\insertRef{costa_novel_2025}{SONO}
}
}
