% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clusters.R
\name{clusterKmeans}
\alias{clusterKmeans}
\title{Automated K-Means Clustering + PCA/t-SNE}
\usage{
clusterKmeans(
  df,
  k = NULL,
  wss_var = 0,
  limit = 15,
  drop_na = TRUE,
  ignore = NULL,
  ohse = TRUE,
  norm = TRUE,
  algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"),
  dim_red = "PCA",
  comb = c(1, 2),
  seed = 123,
  quiet = FALSE,
  ...
)
}
\arguments{
\item{df}{Dataframe}

\item{k}{Integer. Number of clusters}

\item{wss_var}{Numeric. Used to pick automatic \code{k} value,
when \code{k} is \code{NULL} based on WSS variance while considering
\code{limit} clusters. Values between (0, 1). Default value could be
0.05 to consider convergence.}

\item{limit}{Integer. How many clusters should be considered?}

\item{drop_na}{Boolean. Should NA rows be removed?}

\item{ignore}{Character vector. Names of columns to ignore.}

\item{ohse}{Boolean. Do you wish to automatically run one hot
encoding to non-numerical columns?}

\item{norm}{Boolean. Should the data be normalized?}

\item{algorithm}{character: may be abbreviated.  Note that
    \code{"Lloyd"} and \code{"Forgy"} are alternative names for one
    algorithm.}

\item{dim_red}{Character. Select dimensionality reduction technique.
Pass any of: \code{c("PCA", "tSNE", "all", "none")}.}

\item{comb}{Vector. Which columns do you wish to plot? Select which
two variables by name or column position.}

\item{seed}{Numeric. Seed for reproducibility}

\item{quiet}{Boolean. Keep quiet? If not, print messages.}

\item{...}{Additional parameters to pass sub-functions.}
}
\value{
List. If no \code{k} is provided, contains \code{nclusters} and
\code{nclusters_plot} to determine optimal \code{k} given their WSS (Within
Groups Sum of Squares). If \code{k} is provided, additionally we get:
\itemize{
  \item \code{df} data.frame with original \code{df} plus \code{cluster} column
  \item \code{clusters} integer which is the same as \code{k}
  \item \code{fit} kmeans object used to fit clusters
  \item \code{means} data.frame with means and counts for each cluster
  \item \code{correlations} plot with correlations grouped by clusters
  \item \code{PCA} list with PCA results (when \code{dim_red="PCA"})
  \item \code{tSNE} list with t-SNE results (when \code{dim_red="tSNE"})
}
}
\description{
This function lets the user cluster a whole data.frame automatically.
As you might know, the goal of kmeans is to group data points into
distinct non-overlapping subgroups. If needed, one hot encoding will
be applied to categorical values automatically with this function.
For consideration: Scale/standardize the data when applying kmeans.
Also, kmeans assumes spherical shapes of clusters and does not work well
when clusters are in different shapes such as elliptical clusters.
}
\examples{
Sys.unsetenv("LARES_FONT") # Temporal
data("iris")
df <- subset(iris, select = c(-Species))

# If dataset has +5 columns, feel free to reduce dimenstionalities
# with reduce_pca() or reduce_tsne() first

# Find optimal k
check_k <- clusterKmeans(df, limit = 10)
check_k$nclusters_plot
# Or pick k automatically based on WSS variance
check_k <- clusterKmeans(df, wss_var = 0.05, limit = 10)
# You can also use our other functions:
# clusterOptimalK(df) and clusterVisualK(df)

# Run with selected k
clusters <- clusterKmeans(df, k = 3)
names(clusters)

# Cross-Correlations for each cluster
plot(clusters$correlations)

# PCA Results (when dim_red = "PCA")
plot(clusters$PCA$plot_explained)
plot(clusters$PCA$plot)
}
\seealso{
Other Clusters: 
\code{\link{clusterOptimalK}()},
\code{\link{clusterVisualK}()},
\code{\link{reduce_pca}()},
\code{\link{reduce_tsne}()}
}
\concept{Clusters}
