% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plotting_functions.R
\name{lineplot}
\alias{lineplot}
\title{Lineplot for LLO-adjusted Probability Predictions}
\usage{
lineplot(
  x = NULL,
  y = NULL,
  t_levels = NULL,
  plot_original = TRUE,
  plot_MLE = TRUE,
  df = NULL,
  Pmc = 0.5,
  event = 1,
  return_df = FALSE,
  epsilon = .Machine$double.eps,
  title = "Line Plot",
  ylab = "Probability",
  xlab = "Posterior Model Probability",
  ylim = c(0, 1),
  breaks = seq(0, 1, by = 0.2),
  thin_to = NULL,
  thin_prop = NULL,
  thin_by = NULL,
  thin_percent = deprecated(),
  seed = 0,
  optim_options = NULL,
  nloptr_options = NULL,
  ggpoint_options = list(alpha = 0.35, size = 1.5, show.legend = FALSE),
  ggline_options = list(alpha = 0.25, linewidth = 0.5, show.legend = FALSE)
)
}
\arguments{
\item{x}{a numeric vector of predicted probabilities of an event. Must only
contain values in [0,1].}

\item{y}{a vector of outcomes corresponding to probabilities in \code{x}. Must
only contain two unique values (one for "events" and one for "non-events").
By default, this function expects a vector of 0s (non-events) and 1s
(events).}

\item{t_levels}{Vector of desired level(s) of calibration at which to plot
contours.}

\item{plot_original}{Logical.  If \code{TRUE}, the original probabilities passed
in \code{x} are plotted.}

\item{plot_MLE}{Logical.  If \code{TRUE}, the MLE-recalibrated probabilities are
plotted.}

\item{df}{Dataframe returned by previous call to lineplot() specially
formatted for use in this function. Only used for faster plotting when
making minor cosmetic changes to a previous call.}

\item{Pmc}{The prior model probability for the calibrated model \eqn{M_c}.}

\item{event}{Value in \code{y} that represents an "event".  Default value is 1.}

\item{return_df}{Logical.  If \code{TRUE}, the dataframe used to build this plot
will be returned.}

\item{epsilon}{Amount by which probabilities are pushed away from 0 or 1
boundary for numerical stability. If a value in \code{x} < \code{epsilon}, it will be
replaced with \code{epsilon}.  If a value in \code{x} > \code{1-epsilon}, that value will
be replaced with \code{1-epsilon}.}

\item{title}{Plot title.}

\item{ylab}{Label for x-axis.}

\item{xlab}{Label for x-axis.}

\item{ylim}{Vector with bounds for y-axis, must be in [0,1].}

\item{breaks}{Locations along y-axis at which to draw horizontal guidelines,
passed to \code{scale_y_continous()}.}

\item{thin_to}{When non-null, the observations in (x,y) are randomly sampled
without replacement to form a set of size \code{thin_to}.}

\item{thin_prop}{When non-null, the observations in (x,y) are randomly
sampled without replacement to form a set that is \code{thin_prop} * 100\% of
the original size of (x,y).}

\item{thin_by}{When non-null, the observations in (x,y) are thinned by
selecting every \code{thin_by} observation.}

\item{thin_percent}{This argument is deprecated, use \code{thin_prop} instead.}

\item{seed}{Seed for random thinning.  Set to NULL for no seed.}

\item{optim_options}{List of additional arguments to be passed to \link[stats]{optim}().}

\item{nloptr_options}{List with options to be passed to \code{nloptr()}.}

\item{ggpoint_options}{List with options to be passed to \code{geom_point()}.}

\item{ggline_options}{List with options to be passed to \code{geom_line()}.}
}
\value{
If \code{return_df = TRUE}, a list with the following attributes is
returned: \item{\code{plot}}{A \code{ggplot} object showing how the predicted
probabilities under MLE recalibration and specified levels of
boldness-recalibration.}
\item{\code{df}}{Dataframe used to create \code{plot}, specially
formatted for use in \code{lineplot()}.}
Otherwise just the \code{ggplot} object of the plot is returned.
}
\description{
Function to visualize how predicted probabilities change under
MLE-recalibration and boldness-recalibration.
}
\details{
This function leverages \code{ggplot()} and related functions from the \code{ggplot2}
package (REF).

The goal of this function is to visualize how predicted probabilities change
under different recalibration parameters. By default this function only shows
how the original probabilities change after MLE recalibration.  Argument
\code{t_levels} can be used to specify a vector of levels of
boldness-recalibration to visualize in addition to MLE recalibration.

While the x-axis shows the posterior model probabilities of each set of
probabilities, note the posterior model probabilities are not in ascending or
descending order.  Instead, they simply follow the ordering of how one might
use the \code{BRcal} package: first looking at the original predictions, then
maximizing calibration, then examining how far they can spread out
predictions while maintaining calibration with boldness-recalibration.
}
\section{Reusing underlying dataframe via \code{return_df}}{


While this function does not typically come with a large burden on time
under moderate sample sizes, there is still a call to \code{optim()} under the
hood for MLE recalibration and a call to \code{nloptr()} for each level of
boldness-recalibration that could cause a bottleneck on time.  With this in
mind, users can specify \code{return_df=TRUE} to return the underlying dataframe
used to build the resulting lineplot.  Then, users can pass this dataframe
to \code{df} in subsequent calls of \code{lineplot} to circumvent these calls to
\code{optim} and \code{nloptr} and make cosmetic changes to the plot.

When \code{return_df=TRUE}, both the plot and the dataframe are returned in a
list. The dataframe contains 6 columns:
\itemize{
\item \code{probs}: the values of each predicted probability under each set
\item \code{outcome}: the corresponding outcome for each predicted probability
\item \code{post}: the posterior model probability of the set as a whole
\item \code{id}: the id of each individual probability used for mapping observations between sets
\item \code{set}: the set with which the probability belongs to
\item \code{label}: the label used for the x-axis in the lineplot
}

Essentially, each set of probabilities (original, MLE-, and each level of
boldness-recalibration) and outcomes are "stacked" on top of each other.
The \code{id} tells the plotting function how to connect (with line) the same
observation as is changes from the original set to MLE- or
boldness-recalibration.
}

\section{Thinning}{


Another strategy to save time when plotting is to thin the amount of data
plotted.  When sample sizes are large, the plot can become overcrowded and
slow to plot.  We provide three options for thinning: \code{thin_to},
\code{thin_prop}, and \code{thin_by}.  By default, all three of these settings are
set to \code{NULL}, meaning no thinning is performed.  Users can only specify
one thinning strategy at a time. Care should be taken in selecting a
thinning approach based on the nature of your data and problem.  Note that
MLE recalibration and boldness-recalibration will be done using the full
set.

Also note that if a thinning strategy is used with \code{return_df=TRUE}, the
returned data frame will \strong{only contain the reduced set} (i.e. the data
\emph{after} thinning).
}

\section{Passing additional arguments to \code{geom_point()} and \code{geom_line()}}{


To make cosmetic changes to the points and lines plotted, users can pass a
list of any desired arguments of \code{geom_point()} and \code{geom_line()} to
\code{ggpoint_options} and \code{ggline_options}, respectively.  These will overwrite
everything passed to \code{geom_point()} or \code{geom_line()} except any aesthetic
arguments in \code{aes()}.
}

\examples{

set.seed(28)
# Simulate 100 predicted probabilities
x <- runif(100)
# Simulated 100 binary event outcomes using x
y <- rbinom(100, 1, x)  # By construction, x is well calibrated.

# Lineplot show change in probabilities from original to MLE-recalibration to 
# specified Levels of Boldness-Recalibration via t_levels
# Return a list with dataframe used to construct plot with return_df=TRUE
lp1 <- lineplot(x, y, t_levels=c(0.98, 0.95), return_df=TRUE)
lp1$plot

# Reusing the previous dataframe to save calculation time
lineplot(df=lp1$df)

# Adjust geom_point cosmetics via ggpoint
# Increase point size and change to open circles
lineplot(df=lp1$df, ggpoint_options=list(size=3, shape=4))

# Adjust geom_line cosmetics via ggline
# Increase line size and change transparencys
lineplot(df=lp1$df, ggline_options=list(linewidth=2, alpha=0.1))

# Thinning down to 75 randomly selected observation
lineplot(df=lp1$df, thin_to=75)

# Thinning down to 53\% of the data
lineplot(df=lp1$df, thin_prop=0.53)

# Thinning down to every 3rd observation
lineplot(df=lp1$df, thin_by=3)

# Setting a different seed for thinning
lineplot(df=lp1$df, thin_prop=0.53, seed=47)

# Setting NO seed for thinning (plot will be different every time)
lineplot(df=lp1$df, thin_to=75, seed=NULL)

}
\references{
Guthrie, A. P., and Franck, C. T. (2024) Boldness-Recalibration
for Binary Event Predictions, \emph{The American Statistician} 1-17.

Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis.
Springer-Verlag New York.
}
