% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/autoDataPrep.R
\name{autoDataprep}
\alias{autoDataprep}
\title{Automatic data preparation for ML algorithm}
\usage{
autoDataprep(data, target = NULL, missimpute = "default",
  auto_mar = FALSE, mar_object = NULL, dummyvar = TRUE,
  char_var_limit = 12, aucv = 0.02, corr = 0.99,
  outlier_flag = FALSE, interaction_var = FALSE,
  frequent_var = FALSE, uid = NULL, onlykeep = NULL, drop = NULL,
  verbose = FALSE)
}
\arguments{
\item{data}{[data.frame | Required] dataframe or data.table}

\item{target}{[integer | Required] dependent variable (binary or multiclass)}

\item{missimpute}{[text | Optional] missing value impuation using mlr misimpute function. See more methods in details}

\item{auto_mar}{[character | Optional] identify any missing variable which are completely missing at random or not.(default FALSE). If TRUE this will call autoMAR()}

\item{mar_object}{[character | Optional] object created from autoMAR function}

\item{dummyvar}{[logical | Optional] categorical feature engineering i.e. one hot encoding (default is TRUE)}

\item{char_var_limit}{[integer | Optional] default limit is 12 for a dummy variable preparation. Ex: if gender variable has two different value "M" and "F", then gender has 2 level}

\item{aucv}{[integer | Optional] cut off value for AUC based variable selection}

\item{corr}{[integer | Optional] cut off value for correlation based variable selection}

\item{outlier_flag}{[logical | Optional] to add outlier features (default is False)}

\item{interaction_var}{[logical | Optional] bulk interactions transformer for numerical features}

\item{frequent_var}{[logical | Optional] Frequent transformer for categorical features}

\item{uid}{[character | Optional] unique identifier column if any to keep in the final data set}

\item{onlykeep}{[character | Optional] only consider selected variables for data preparation}

\item{drop}{[character | Optional] exclude variable list from the data preparation}

\item{verbose}{[logical | Optional] display executions steps on console. Default FALSE}
}
\value{
list output contains below objects

\describe{
  \item{\code{complete_data}}{Complete data set including new novel features based on the functional understanding of the dataset}
  \item{\code{master_data}}{filtered data set based on the input parameter}
  \item{\code{final_var_list}}{list of master varaibles}
  \item{\code{auc_var}}{list of auc variables}
  \item{\code{cor_var}}{list of correlation variables}
  \item{\code{overall_var}}{all variables in the dataset}
  \item{\code{zerovariance}}{zero variance variables in the dataset}
}
}
\description{
Final data preparation before ML algorithm. Function provides final data set and highlights of the data preparation
}
\details{
Missing imputation using impute function from MLR

MLR package have a appropriate way to impute missing value using multiple methods.  default value is listed below
#' \itemize{
  \item mean value for integer variable
  \item median value for numeric variable
  \item mode value for character or factor variable
}
Optional: You might be interested to impute missing variable using ML method. List of algortihms will be handle missing variables in MLR package
listLearners("classif", check.packages = TRUE, properties = "missings")[c("class", "package")]

Feature engineering
\itemize{
  \item Missing not completely at random variable using autoMAR function
  \item Date transfomer like year, month, quarter, week
  \item Frequent transformer counts each categorical value in the dataset
  \item Interaction transformer using multiplication
  \item one hot dummy coding for categorical value
  \item outlier flag and capping variable for numerical value
}

Feature reduction
\itemize{
  \item Zero variance using nearZeroVar caret function
  \item Pearson's Correlation value
  \item AUC with target variable
}
}
\examples{
#Auto data prep
traindata <- autoDataprep(heart, target = "target_var", missimpute = "default",
dummyvar = TRUE, aucv = 0.02, corr = 0.98, outlier_flag = TRUE,
interaction_var = TRUE, frequent_var = TRUE)
train <- traindata$master

# Print auto data prep object
printautoDataprep(traindata)

}
\seealso{
\code{\link[mlr:impute]{impute}}
}
