% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggm_compare_estimate.default.R
\name{ggm_compare_estimate}
\alias{ggm_compare_estimate}
\title{GGM Compare: Estimate}
\usage{
ggm_compare_estimate(
  ...,
  formula = NULL,
  type = "continuous",
  mixed_type = NULL,
  analytic = FALSE,
  prior_sd = 0.5,
  iter = 5000,
  impute = TRUE,
  progress = TRUE,
  seed = 1
)
}
\arguments{
\item{...}{Matrices (or data frames) of dimensions \emph{n} (observations) by  \emph{p} (variables).
Requires at least two.}

\item{formula}{An object of class \code{\link[stats]{formula}}. This allows for including
control variables in the model (i.e., \code{~ gender}). See the note for further details.}

\item{type}{Character string. Which type of data for \strong{Y} ? The options include \code{continuous},
\code{binary}, \code{ordinal}, or \code{continuous}. See the note for further details.}

\item{mixed_type}{Numeric vector. An indicator of length \emph{p} for which varibles should be treated as ranks.
(1 for rank and 0 to use the 'empirical' or observed distribution). The default is currently to treat all integer variables
as ranks when \code{type = "mixed"} and \code{NULL} otherwise. See note for further details.}

\item{analytic}{Logical. Should the analytic solution be computed (default is \code{FALSE})? This is only available
for continous data. Note that if \code{type = "mixed"} and \code{analytic = TRUE}, the data will
automatically be treated as continuous.}

\item{prior_sd}{The scale of the prior distribution (centered at zero), in reference to a beta distribtuion
(defaults to 0.50).
See note for further details.}

\item{iter}{Number of iterations (posterior samples; defaults to 5000).}

\item{impute}{Logicial. Should the missing values (\code{NA})
be imputed during model fitting (defaults to \code{TRUE}) ?}

\item{progress}{Logical. Should a progress bar be included (defaults to \code{TRUE}) ?}

\item{seed}{An integer for the random seed.}
}
\value{
A list of class \code{ggm_compare_estimate} containing:
 \itemize{
 \item \code{pcor_diffs} partial correlation differences (posterior distribution)
 \item \code{p} number of variable
 \item \code{info} list containing information about each group (e.g., sample size, etc.)
 \item \code{iter} number of posterior samples
 \item \code{call} \code{match.call}
 }
}
\description{
Compare partial correlations that are estimated from any number of groups. This method works for
continuous, binary, ordinal, and mixed data (a combination of categorical and continuous variables).
The approach (i.e., a difference between posterior distributions) was
described in  \insertCite{Williams2019;textual}{BGGM}.
}
\details{
This function can be used to compare the partial correlations for any number of groups.
This is accomplished with pairwise comparisons for each relation. In the case of three groups,
for example, group 1 and group 2 are compared, then group 1 and group 3 are compared, and then
group 2 and group 3 are compared. There is a full distibution for each difference that can be
summarized (i.e., \code{\link{summary.ggm_compare_estimate}}) and then visualized
(i.e., \code{\link{plot.summary.ggm_compare_estimate}}). The graph of difference is selected with
\code{\link{select.ggm_compare_estimate}}).


\strong{Controlling for Variables}:

When controlling for variables, it is assumed that \code{Y} includes \emph{only}
the nodes in the GGM and the control variables. Internally, \code{only} the predictors
that are included in \code{formula} are removed from \code{Y}. This is not behavior of, say,
\code{\link{lm}}, but was adopted to ensure  users do not have to write out each variable that
should be included in the GGM. An example is provided below.

\strong{Mixed Type}:

 The term "mixed" is somewhat of a misnomer, because the method can be used for data including \emph{only}
 continuous or \emph{only} discrete variables. This is based on the ranked likelihood which requires sampling
 the ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationally
 expensive when there are many levels. For example, with continuous data, there are as many ranks
 as data points!

 The option \code{mixed_type} allows the user to determine  which variable should be treated as ranks
 and the "emprical" distribution is used otherwise. This is accomplished by specifying an indicator
 vector of length \emph{p}. A one indicates to use the ranks, whereas a zero indicates to "ignore"
 that variable. By default all integer variables are handled as ranks.

\strong{Dealing with Errors}:

An error is most likely to arise when \code{type = "ordinal"}. The are two common errors (although still rare):

\itemize{

\item The first is due to sampling the thresholds, especially when the data is heavily skewed.
      This can result in an ill-defined matrix. If this occurs, we recommend to first try
      decreasing \code{prior_sd} (i.e., a more informative prior). If that does not work, then
      change the data type to \code{type = mixed} which then estimates a copula GGM
      (this method can be used for data containing \strong{only} ordinal variable). This should
      work without a problem.

\item  The second is due to how the ordinal data are categorized. For example, if the error states
       that the index is out of bounds, this indicates that the first category is a zero. This is not allowed, as
       the first category must be one. This is addressed by adding one (e.g., \code{Y + 1}) to the data matrix.

}

\strong{Imputing Missing Values}:

Missing values are imputed with the approach described in \insertCite{hoff2009first;textual}{BGGM}.
The basic idea is to impute the missing values with the respective posterior pedictive distribution,
given the observed data, as the model is being estimated. Note that the default is \code{TRUE},
but this ignored when there are no missing values. If set to \code{FALSE}, and there are missing
values, list-wise deletion is performed with \code{na.omit}.
}
\note{
\strong{Mixed Data}:

The mixed data approach was introduced  \insertCite{@in @hoff2007extending;textual}{BGGM}
(our paper describing an extension to Bayesian hypothesis testing if forthcoming).
This is a semi-paramateric copula model based on the ranked likelihood. This is computationally
expensive when treating continuous data as ranks. The current default is to treat only integer data as ranks.
This should of course be adjusted for continous data that is skewed. This can be accomplished with the
argument \code{mixed_type}. A \code{1} in the numeric vector of length \emph{p}indicates to treat that
respective node as a rank (corresponding to the column number) and a zero indicates to use the observed
(or "emprical") data.


It is also important to note that \code{type = "mixed"} is not restricted to mixed data (containing a combination of
categorical and continuous): all the nodes can be ordinal or continuous (but again this will take some time).


\strong{Interpretation of Conditional (In)dependence Models for Latent Data}:

See \code{\link{BGGM-package}} for details about interpreting GGMs based on latent data
(i.e, all data types besides \code{"continuous"})



\strong{Additional GGM Compare Methods}

Bayesian hypothesis testing is implemented in \code{\link{ggm_compare_explore}} and
\code{\link{ggm_compare_confirm}} \insertCite{Williams2019_bf}{BGGM}. The latter allows for confirmatory
hypothesis testing.  An approach based on a posterior predictive check is implemented in \code{\link{ggm_compare_ppc}}
\insertCite{williams2020comparing}{BGGM}. This provides  a 'global' test for comparing the entire GGM and a 'nodewise'
test for comparing each variable in the network \insertCite{Williams2019;textual}{BGGM}.
}
\examples{
\donttest{
# note: iter = 250 for demonstrative purposes

# data
Y <- bfi

# males and females
Ymale <- subset(Y, gender == 1,
                   select = -c(gender,
                               education))[,1:10]

Yfemale <- subset(Y, gender == 2,
                     select = -c(gender,
                                 education))[,1:10]

# fit model
fit <- ggm_compare_estimate(Ymale,  Yfemale,
                           type = "ordinal",
                           iter = 250,
                           prior_sd = 0.25,
                           progress = FALSE)

###########################
### example 2: analytic ###
###########################
# only continuous

# fit model
fit <- ggm_compare_estimate(Ymale, Yfemale,
                            analytic = TRUE)

# summary
summ <- summary(fit)

# plot summary
plt_summ <- plot(summary(fit))

# select
E <- select(fit)

# plot select
plt_E <- plot(select(fit))

}

}
\references{
\insertAllCited{}
}
