% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/negbinBvs.R
\name{negbinBvs}
\alias{negbinBvs}
\title{Bayesian variable selection for the negative binomial model}
\usage{
negbinBvs(y, offset = NULL, X, model = list(), prior = list(),
  mcmc = list(), start = NULL, BVS = TRUE)
}
\arguments{
\item{y}{an integer vector of count data}

\item{offset}{an (optional) offset term; should be \code{NULL} or an integer 
vector of length equal to the number of counts.}

\item{X}{a design matrix (including an intercept term)}

\item{model}{an (optional) list specifying the structure of the model (see 
details)}

\item{prior}{an (optional) list of prior settings and hyper-parameters 
controlling the priors (see details)}

\item{mcmc}{an (optional) list of MCMC sampling options (see details)}

\item{start}{an (optional), numeric vector containing starting values for the
regression effects (including an intercept term); defaults to \code{NULL} 
(i.e. a vector of zeros is used).}

\item{BVS}{if \code{TRUE} (default), Bayesian variable selection is performed
to identify regressors with a non-zero effect; otherwise, an unrestricted 
model is estimated (without variable selection).}
}
\value{
The function returns an object of class "\code{pogit}" with methods
 \code{\link{print.pogit}}, \code{\link{summary.pogit}} and 
 \code{\link{plot.pogit}}. 
 
 The returned object is a list containing the following elements: 
 
 \item{\code{samplesNB}}{a named list containing the samples from the posterior
   distribution of the parameters in the negative binomial model 
   (see also \code{msave}): 
   \describe{
   \item{\code{beta, rho}}{regression coefficients \eqn{\beta} and \eqn{\rho}}
   \item{\code{pdeltaBeta}}{P(\eqn{\delta_\beta}=1)}
   \item{\code{psiBeta}}{scale parameter \eqn{\psi_\beta} of the slab component}
   }}
 \item{\code{data}}{a list containing the data \code{y}, \code{offset} and \code{X}}
 \item{\code{model.nb}}{a list containing details on the model specification, 
 see details for \code{model}}
 \item{\code{mcmc}}{see details for \code{mcmc}}
 \item{\code{prior.nb}}{see details for \code{prior}}
 \item{\code{dur}}{a list containing the total runtime (\code{total}) 
   and the runtime after burn-in (\code{durM}), in seconds}
 \item{\code{acc.rho}}{acceptance rate of parameter \eqn{\rho}}
 \item{\code{BVS}}{see arguments}
 \item{\code{start}}{a list containing starting values, see arguments}
 \item{\code{family}}{"negbin"}
 \item{\code{call}}{function call}
}
\description{
This function performs Bayesian regression modelling of overdispersed count 
data including variable selection via spike and slab priors. 
Posterior inference is based on MCMC sampling techniques.
}
\details{
The method provides Bayesian variable selection in regression modelling 
of overdispersed count data. The negative binomial 
distribution is derived marginally from a Poisson-Gamma (mixture) model, 
which can be interpreted as an overdispersed Poisson model with 
observation-specific random intercept log \eqn{\gamma}, where 
\eqn{\gamma|\rho \sim \Gamma(\rho,\rho)}. A hyper-prior for \eqn{\rho} is
specified as \eqn{\rho \sim \Gamma(c_0,C_0)}, see details for \code{prior} 
below. 
By default, variable selection is incorporated in the model based on mixture 
priors with a spike and a slab component for the regression effects 
\eqn{\beta}. More specifically, a Dirac spike is used, i.e. a point mass at 
zero, and (by default), the slab component is specified as a scale mixture 
of normal distributions, resulting in a Student-t distribution with 
2\code{psi.nu} degrees of freedom.

The MCMC sampling scheme relies on the representation of the conditional 
Poisson model as a Gaussian regression model in auxiliary variables, 
as described in \code{\link{poissonBvs}}. Data augmentation is based
on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter 
et al. (2009). For details concerning the algorithm, see 
Dvorzak and Wagner (2016b), available on request from the authors.

Details for the model specification (see arguments): 
\describe{ 
 \item{\code{model}}{\describe{\item{}{A list:}  
   \item{\code{deltafix}}{an indicator vector of length 
   \code{ncol(X)-1} specifying which regression effects are subject to selection
   (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector 
   of zeros.} 
 }}
 \item{\code{prior}}{\describe{\item{}{A list:}      
   \item{\code{slab}}{distribution of the slab component, i.e. "\code{Student}" 
   (default) or "\code{Normal}".} 
   \item{\code{psi.nu}}{hyper-parameter of the Student-t slab (used for a 
   "\code{Student}" slab); defaults to 5.} 
   \item{\code{m0}}{prior mean for the intercept parameter; defaults to 0.} 
   \item{\code{M0}}{prior variance for the intercept parameter; defaults to 100.} 
   \item{\code{aj0}}{a vector of prior means for the regression effects 
   (which is encoded in a normal distribution, see notes); defaults to vector of 
   zeros.} 
   \item{\code{V}}{variance of the slab; defaults to 5.} 
   \item{\code{w}}{hyper-parameters of the Beta-prior for the mixture weight
   \eqn{\omega}; defaults to \code{c(wa0=1, wb0=1)}, i.e. a uniform distribution.} 
   \item{\code{c0, C0}}{scale and rate of the gamma prior for the 
   hyper-parameter \eqn{\rho}; defaults to 2 and 1.} 
   \item{\code{eps}}{tuning parameter in the MH-step to sample \eqn{\rho}; 
   defaults to 0.05.} 
 }}
 \item{\code{mcmc}}{\describe{\item{}{A list:}
   \item{\code{M}}{number of MCMC iterations after the burn-in phase; defaults
   to 8000.} 
   \item{\code{burnin}}{number of MCMC iterations discarded as burn-in; defaults 
   to 2000.} 
   \item{\code{thin}}{thinning parameter; defaults to 1.} 
   \item{\code{startsel}}{number of MCMC iterations drawn from the unrestricted 
   model (e.g., \code{burnin/2}); defaults to 1000.} 
   \item{\code{verbose}}{MCMC progress report in each \code{verbose}-th iteration 
   step; defaults to 500. If \code{verbose=0}, no output is generated.} 
   \item{\code{msave}}{returns additional output with variable selection details 
   (i.e. posterior samples for \eqn{\omega}, \eqn{\delta}); defaults to \code{FALSE}.}
 }}}
}
\note{
Alternatively, a Poisson model with observation-specific normal 
random intercept (i.e., a Poisson-log-normal mixture model) can be used 
to deal with overdispersion of count data, which is provided in the 
function \code{\link{poissonBvs}}.

If prior information on the regression parameters is available, this 
information is encoded in a normal distribution instead of the spike 
and slab prior (consequently, \code{BVS} is set to \code{FALSE}).
}
\examples{
\dontrun{
## Examples below should take about 1-2 minutes.

## ------ (use simul_pois1) ------
data(simul_pois1)
y <- simul_pois1$y
X <- as.matrix(simul_pois1[, -1])

# Bayesian variable selection for simulated data set
m1 <- negbinBvs(y = y, X = X)

# print results (check acceptance rate for 'rho')
print(m1)

# re-run with adapted tuning parameter 'eps'
m2 <- negbinBvs(y = y, X = X, prior = list(eps = 0.4)) 

# print and summarize results
print(m2)
summary(m2)

# alternatively, compare results to overdispersed Poisson model with 
# normal random intercept (subject to selection), provided in 'poissonBvs' 

# specify observation-specific random intercept
cID <- seq_along(y)
m3  <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID))

# print, summarize and plot results
print(m3)
summary(m3) 
# note that thetaB is not selected (!)

plot(m3, burnin = FALSE, thin = FALSE)


## ------ (use data set "azdrg112" from package "COUNT") ------

if (!requireNamespace("COUNT", quietly = TRUE)){
 stop("package 'COUNT' is needed for this example to work. 
       Please install it.")
}

library(COUNT)
# load data set 'azdrg112' 
# (Arizona Medicare data for DRG (Diagnostic Related Group) 112)
data(azdrg112) 

y <- as.numeric(azdrg112$los) # hospital length of stay: 1-53 days
X <- as.matrix(azdrg112[,-1]) # covariates (gender, type1, age75)
m4 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000))  

# print results (check acceptance rate for 'rho')
print(m4)
summary(m4)
plot(m4, burnin = FALSE)

# adapte tuning parameter eps (and set BVS to FALSE)
prior <- list(eps = 0.1)
m5 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior, 
                BVS = FALSE)  

# print, summarize and plot results
print(m5)
summary(m5)
plot(m5, burnin = FALSE, thin = FALSE)
plot(m5, type = "acf", lag.max = 50)
}
}
\author{
Michaela Dvorzak <m.dvorzak@gmx.at>
}
\references{
Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for 
overdispersed count data subject to underreporting - An application to
norovirus illness in Germany. (Unpublished) working paper.

Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. 
  (2009). Improved auxiliary mixture sampling for hierarchical models of 
  non-Gaussian data. \emph{Statistics and Computing}, \strong{19}, 479 - 492.
}
\seealso{
\code{\link{poissonBvs}}
}
\keyword{models}

