% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/orsf_pd.R
\name{orsf_pd_oob}
\alias{orsf_pd_oob}
\alias{orsf_pd_inb}
\alias{orsf_pd_new}
\title{ORSF partial dependence}
\usage{
orsf_pd_oob(
  object,
  pred_spec,
  pred_horizon = NULL,
  pred_type = "risk",
  expand_grid = TRUE,
  prob_values = c(0.025, 0.5, 0.975),
  prob_labels = c("lwr", "medn", "upr"),
  boundary_checks = TRUE,
  n_thread = 1,
  ...
)

orsf_pd_inb(
  object,
  pred_spec,
  pred_horizon = NULL,
  pred_type = "risk",
  expand_grid = TRUE,
  prob_values = c(0.025, 0.5, 0.975),
  prob_labels = c("lwr", "medn", "upr"),
  boundary_checks = TRUE,
  n_thread = 1,
  ...
)

orsf_pd_new(
  object,
  pred_spec,
  new_data,
  pred_horizon = NULL,
  pred_type = "risk",
  na_action = "fail",
  expand_grid = TRUE,
  prob_values = c(0.025, 0.5, 0.975),
  prob_labels = c("lwr", "medn", "upr"),
  boundary_checks = TRUE,
  n_thread = 1,
  ...
)
}
\arguments{
\item{object}{(\emph{orsf_fit}) a trained oblique random survival forest
(see \link{orsf}).}

\item{pred_spec}{(\emph{named list} or \emph{data.frame}).
\itemize{
\item If \code{pred_spec} is a named list,
Each item in the list should be a vector of values that will be used as
points in the partial dependence function. The name of each item in the
list should indicate which variable will be modified to take the
corresponding values.
\item If \code{pred_spec} is a \code{data.frame}, columns will
indicate variable names, values will indicate variable values, and
partial dependence will be computed using the inputs on each row.
}}

\item{pred_horizon}{(\emph{double}) a value or vector indicating the time(s)
that predictions will be calibrated to. E.g., if you were predicting
risk of incident heart failure within the next 10 years, then
\code{pred_horizon = 10}. \code{pred_horizon} can be \code{NULL} if \code{pred_type} is
\code{'mort'}, since mortality predictions are aggregated over all
event times}

\item{pred_type}{(\emph{character}) the type of predictions to compute. Valid
options are
\itemize{
\item 'risk' : probability of having an event at or before \code{pred_horizon}.
\item 'surv' : 1 - risk.
\item 'chf': cumulative hazard function
\item 'mort': mortality prediction
}}

\item{expand_grid}{(\emph{logical}) if \code{TRUE}, partial dependence will be
computed at all possible combinations of inputs in \code{pred_spec}. If
\code{FALSE}, partial dependence will be computed for each variable
in \code{pred_spec}, separately.}

\item{prob_values}{(\emph{numeric}) a vector of values between 0 and 1,
indicating what quantiles will be used to summarize the partial
dependence values at each set of inputs. \code{prob_values} should
have the same length as \code{prob_labels}. The quantiles are calculated
based on predictions from \code{object} at each set of values indicated
by \code{pred_spec}.}

\item{prob_labels}{(\emph{character}) a vector of labels with the same length
as \code{prob_values}, with each label indicating what the corresponding
value in \code{prob_values} should be labelled as in summarized outputs.
\code{prob_labels} should have the same length as \code{prob_values}.}

\item{boundary_checks}{(\emph{logical}) if \code{TRUE}, \code{pred_spec} will be checked
to make sure the requested values are between the 10th and 90th
percentile in the object's training data. If \code{FALSE}, these checks are
skipped.}

\item{n_thread}{(\emph{integer}) number of threads to use while computing predictions. Default is one thread. To use the maximum number of threads that your system provides for concurrent execution, set \code{n_thread = 0}.}

\item{...}{Further arguments passed to or from other methods (not currently used).}

\item{new_data}{a \link{data.frame}, \link[tibble:tibble-package]{tibble}, or \link[data.table:data.table]{data.table} to compute predictions in.}

\item{na_action}{(\emph{character}) what should happen when \code{new_data} contains missing values (i.e., \code{NA} values). Valid options are:
\itemize{
\item 'fail' : an error is thrown if \code{new_data} contains \code{NA} values
\item 'omit' : rows in \code{new_data} with incomplete data will be dropped
}}
}
\value{
a \link[data.table:data.table]{data.table} containing
partial dependence values for the specified variable(s) at the
specified prediction horizon(s).
}
\description{
Compute partial dependence for an ORSF model.
Partial dependence (PD) shows the expected prediction from a model as a function of a single predictor or multiple predictors. The expectation is marginalized over the values of all other predictors, giving something like a multivariable adjusted estimate of the model's prediction.
You can compute partial dependence three ways using a random forest:
\itemize{
\item using in-bag predictions for the training data
\item using out-of-bag predictions for the training data
\item using predictions for a new set of data
}

See examples for more details
}
\details{
Partial dependence has a number of \href{https://christophm.github.io/interpretable-ml-book/pdp.html#disadvantages-5}{known limitations and assumptions} that users should be aware of (see Hooker, 2021). In particular, partial dependence is less intuitive when >2 predictors are examined jointly, and it is assumed that the feature(s) for which the partial dependence is computed are not correlated with other features (this is likely not true in many cases). Accumulated local effect plots can be used (see \href{https://christophm.github.io/interpretable-ml-book/ale.html}{here}) in the case where feature independence is not a valid assumption.
}
\section{Examples}{
Begin by fitting an ORSF ensemble:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(aorsf)

set.seed(329730)

index_train <- sample(nrow(pbc_orsf), 150) 

pbc_orsf_train <- pbc_orsf[index_train, ]
pbc_orsf_test <- pbc_orsf[-index_train, ]

fit <- orsf(data = pbc_orsf_train, 
            formula = Surv(time, status) ~ . - id,
            oobag_pred_horizon = 365.25 * 5)
}\if{html}{\out{</div>}}
\subsection{Three ways to compute PD and ICE}{

You can compute partial dependence and ICE three ways with \code{aorsf}:
\itemize{
\item using in-bag predictions for the training data

\if{html}{\out{<div class="sourceCode r">}}\preformatted{pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5))

pd_train
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##    pred_horizon bili      mean        lwr       medn       upr
## 1:      1826.25    1 0.2188047 0.01435497 0.09604722 0.8243506
## 2:      1826.25    2 0.2540831 0.03086042 0.13766124 0.8442959
## 3:      1826.25    3 0.2982917 0.05324065 0.19470910 0.8578131
## 4:      1826.25    4 0.3536969 0.09755193 0.27774884 0.8699063
## 5:      1826.25    5 0.3955249 0.14622431 0.29945708 0.8775099
}\if{html}{\out{</div>}}
\item using out-of-bag predictions for the training data

\if{html}{\out{<div class="sourceCode r">}}\preformatted{pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5))

pd_train
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##    pred_horizon bili      mean        lwr      medn       upr
## 1:      1826.25    1 0.2182691 0.01218789 0.1008030 0.8304537
## 2:      1826.25    2 0.2542021 0.02447359 0.1453580 0.8484741
## 3:      1826.25    3 0.2980946 0.04854875 0.1997769 0.8640601
## 4:      1826.25    4 0.3552203 0.10116417 0.2691853 0.8642393
## 5:      1826.25    5 0.3959143 0.14768055 0.3264149 0.8737186
}\if{html}{\out{</div>}}
\item using predictions for a new set of data

\if{html}{\out{<div class="sourceCode r">}}\preformatted{pd_test <- orsf_pd_new(fit, 
                       new_data = pbc_orsf_test, 
                       pred_spec = list(bili = 1:5))

pd_test
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##    pred_horizon bili      mean        lwr      medn       upr
## 1:      1826.25    1 0.2643662 0.01758300 0.2098936 0.8410357
## 2:      1826.25    2 0.2990578 0.04063388 0.2516202 0.8553218
## 3:      1826.25    3 0.3432503 0.06843859 0.3056799 0.8670726
## 4:      1826.25    4 0.3968111 0.11801725 0.3593064 0.8725208
## 5:      1826.25    5 0.4388962 0.16038177 0.4094224 0.8809027
}\if{html}{\out{</div>}}
\item in-bag partial dependence indicates relationships that the model has
learned during training. This is helpful if your goal is to interpret
the model.
\item out-of-bag partial dependence indicates relationships that the model
has learned during training but using the out-of-bag data simulates
application of the model to new data. if you want to test your model’s
reliability or fairness in new data but you don’t have access to a
large testing set.
\item new data partial dependence shows how the model predicts outcomes for
observations it has not seen. This is helpful if you want to test your
model’s reliability or fairness.
}
}
}

\references{
Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires at least One More Model, or There Is No Free Variable Importance. \emph{arXiv e-prints} 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151
}
