% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/forestry.R
\name{multilayer-forestry}
\alias{multilayer-forestry}
\alias{multilayerForestry}
\title{Multilayer forestry}
\usage{
multilayerForestry(
  x,
  y,
  ntree = 500,
  nrounds = 1,
  eta = 0.3,
  replace = FALSE,
  sampsize = nrow(x),
  sample.fraction = NULL,
  mtry = ncol(x),
  nodesizeSpl = 3,
  nodesizeAvg = 3,
  nodesizeStrictSpl = max(round(nrow(x)/128), 1),
  nodesizeStrictAvg = max(round(nrow(x)/128), 1),
  minSplitGain = 0,
  maxDepth = 99,
  splitratio = 1,
  OOBhonest = FALSE,
  doubleBootstrap = if (OOBhonest) TRUE else FALSE,
  seed = as.integer(runif(1) * 1000),
  verbose = FALSE,
  nthread = 0,
  splitrule = "variance",
  middleSplit = TRUE,
  maxObs = length(y),
  linear = FALSE,
  symmetric = rep(0, ncol(x)),
  linFeats = 0:(ncol(x) - 1),
  monotonicConstraints = rep(0, ncol(x)),
  groups = NULL,
  minTreesPerGroup = 0,
  monotoneAvg = FALSE,
  featureWeights = rep(1, ncol(x)),
  deepFeatureWeights = featureWeights,
  observationWeights = NULL,
  overfitPenalty = 1,
  scale = FALSE,
  doubleTree = FALSE,
  reuseforestry = NULL,
  savable = TRUE,
  saveable = TRUE
)
}
\arguments{
\item{x}{A data frame of all training predictors.}

\item{y}{A vector of all training responses.}

\item{ntree}{The number of trees to grow in the forest. The default value is
500.}

\item{nrounds}{Number of iterations used for gradient boosting.}

\item{eta}{Step size shrinkage used in gradient boosting update.}

\item{replace}{An indicator of whether sampling of training data is with
replacement. The default value is TRUE.}

\item{sampsize}{The size of total samples to draw for the training data. If
sampling with replacement, the default value is the length of the training
data. If sampling without replacement, the default value is two-thirds of
the length of the training data.}

\item{sample.fraction}{If this is given, then sampsize is ignored and set to
be round(length(y) * sample.fraction). It must be a real number between 0 and 1}

\item{mtry}{The number of variables randomly selected at each split point.
The default value is set to be one-third of the total number of features of the training data.}

\item{nodesizeSpl}{Minimum observations contained in terminal nodes.
The default value is 5.}

\item{nodesizeAvg}{Minimum size of terminal nodes for averaging dataset.
The default value is 5.}

\item{nodesizeStrictSpl}{Minimum observations to follow strictly in terminal nodes.
The default value is 1.}

\item{nodesizeStrictAvg}{The minimum size of terminal nodes for averaging data set to follow when predicting.
No splits are allowed that result in nodes with observations less than this parameter.
This parameter enforces overlap of the averaging data set with the splitting set when training.
When using honesty, splits that leave less than nodesizeStrictAvg averaging
observations in either child node will be rejected, ensuring every leaf node
also has at least nodesizeStrictAvg averaging observations. The default value is 1.}

\item{minSplitGain}{Minimum loss reduction to split a node further in a tree.}

\item{maxDepth}{Maximum depth of a tree. The default value is 99.}

\item{splitratio}{Proportion of the training data used as the splitting dataset.
It is a ratio between 0 and 1. If the ratio is 1 (the default), then the splitting
set uses the entire data, as does the averaging set---i.e., the standard Breiman RF setup.
If the ratio is 0, then the splitting data set is empty, and the entire dataset is used
for the averaging set (This is not a good usage, however, since there will be no data available for splitting).}

\item{OOBhonest}{In this version of honesty, the out-of-bag observations for each tree
are used as the honest (averaging) set. This setting also changes how predictions
are constructed. When predicting for observations that are out-of-sample
(using predict(..., aggregation = "average")), all the trees in the forest
are used to construct predictions. When predicting for an observation that was in-sample (using
predict(..., aggregation = "oob")), only the trees for which that observation
was not in the averaging set are used to construct the prediction for that observation.
aggregation="oob" (out-of-bag) ensures that the outcome value for an observation
is never used to construct predictions for a given observation even when it is in sample.
This property does not hold in standard honesty, which relies on an asymptotic
subsampling argument. By default, when OOBhonest = TRUE, the out-of-bag observations
for each tree are resamples with replacement to be used for the honest (averaging)
set. This results in a third set of observations that are left out of both
the splitting and averaging set, we call these the double out-of-bag (doubleOOB)
observations. In order to get the predictions of only the trees in which each
observation fell into this doubleOOB set, one can run predict(... , aggregation = "doubleOOB").
In order to not do this second bootstrap sample, the doubleBootstrap flag can
be set to FALSE.}

\item{doubleBootstrap}{The doubleBootstrap flag provides the option to resample
with replacement from the out-of-bag observations set for each tree to construct
the averaging set when using OOBhonest. If this is FALSE, the out-of-bag observations
are used as the averaging set. By default this option is TRUE when running OOBhonest = TRUE.
This option increases diversity across trees.}

\item{seed}{random seed}

\item{verbose}{Indicator to train the forest in verbose mode}

\item{nthread}{Number of threads to train and predict the forest. The default
number is 0 which represents using all cores.}

\item{splitrule}{Only variance is implemented at this point and it
specifies the loss function according to which the splits of random forest
should be made.}

\item{middleSplit}{Indicator of whether the split value is takes the average of two feature
values. If FALSE, it will take a point based on a uniform distribution
between two feature values. (Default = FALSE)}

\item{maxObs}{The max number of observations to split on.}

\item{linear}{Indicator that enables Ridge penalized splits and linear aggregation
functions in the leaf nodes. This is recommended for data with linear outcomes.
For implementation details, see: https://arxiv.org/abs/1906.06463. Default is FALSE.}

\item{symmetric}{Used for the experimental feature which imposes strict symmetric
marginal structure on the predictions of the forest through only selecting
symmetric splits with symmetric aggregation functions. Should be a vector of size ncol(x) with a single
1 entry denoting the feature to enforce symmetry on. Defaults to all zeroes.
For version >= 0.9.0.83, we experimentally allow more than one feature to
enforce symmetry at a time. This should only be used for a small number of
features as it has a runtime that is exponential in the number of symmetric
features (O(N 2^|S|) where S is the set of symmetric features).}

\item{linFeats}{A vector containing the indices of which features to split
linearly on when using linear penalized splits (defaults to use all numerical features).}

\item{monotonicConstraints}{Specifies monotonic relationships between the continuous
features and the outcome. Supplied as a vector of length p with entries in
1,0,-1 which 1 indicating an increasing monotonic relationship, -1 indicating
a decreasing monotonic relationship, and 0 indicating no constraint.
Constraints supplied for categorical variable will be ignored.}

\item{groups}{A vector of factors specifying the group membership of each training observation.
these groups are used in the aggregation when doing out of bag predictions in
order to predict with only trees where the entire group was not used for aggregation.
This allows the user to specify custom subgroups which will be used to create
predictions which do not use any data from a common group to make predictions for
any observation in the group. This can be used to create general custom
resampling schemes, and provide predictions consistent with the Out-of-Group set.}

\item{minTreesPerGroup}{The number of trees which we make sure have been created leaving
out each group. This is 0 by default, so we will not give any special treatment to
the groups when sampling, however if this is set to a positive integer, we
modify the bootstrap sampling scheme to ensure that exactly that many trees
have the group left out. We do this by, for each group, creating minTreesPerGroup
trees which are built on observations sampled from the set of training observations
which are not in the current group. This means we create at least # groups * minTreesPerGroup
trees for the forest. If ntree > # groups * minTreesPerGroup, we create
max(# groups * minTreesPerGroup,ntree) total trees, in which at least minTreesPerGroup
are created leaving out each group. For debugging purposes, these group sampling
trees are stored at the end of the R forest, in blocks based on the left out group.}

\item{monotoneAvg}{This is a boolean flag that indicates whether or not monotonic
constraints should be enforced on the averaging set in addition to the splitting set.
This flag is meaningless unless both honesty and monotonic constraints are in use.
The default is FALSE.}

\item{featureWeights}{weights used when subsampling features for nodes above or at interactionDepth.}

\item{deepFeatureWeights}{weights used when subsampling features for nodes below interactionDepth.}

\item{observationWeights}{Denotes the weights for each training observation
that determine how likely the observation is to be selected in each bootstrap sample.
This option is not allowed when sampling is done without replacement.}

\item{overfitPenalty}{Value to determine how much to penalize the magnitude
of coefficients in ridge regression when using linear splits.}

\item{scale}{A parameter which indicates whether or not we want to scale and center
the covariates and outcome before doing the regression. This can help with
stability, so by default is TRUE.}

\item{doubleTree}{if the number of tree is doubled as averaging and splitting
data can be exchanged to create decorrelated trees. (Default = FALSE)}

\item{reuseforestry}{Pass in an `forestry` object which will recycle the
dataframe the old object created. It will save some space working on the
same data set.}

\item{savable}{If TRUE, then RF is created in such a way that it can be
saved and loaded using save(...) and load(...). However, setting it to TRUE
(default) will take longer and use more memory. When
training many RF, it makes sense to set this to FALSE to save time and memory.}

\item{saveable}{deprecated. Do not use.}
}
\value{
A `multilayerForestry` object.
}
\description{
Construct a gradient boosted ensemble with random forest base learners.
}
