% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/omicwas.R
\name{ctassoc}
\alias{ctassoc}
\title{Cell-Type-Specific Association Testing}
\usage{
ctassoc(
  X,
  W,
  Y,
  C = NULL,
  test = "full",
  regularize = FALSE,
  num.cores = 1,
  chunk.size = 1000,
  seed = 123
)
}
\arguments{
\item{X}{Matrix (or vector) of traits; samples x traits.}

\item{W}{Matrix of cell type composition; samples x cell types.}

\item{Y}{Matrix (or vector) of bulk omics measurements; markers x samples.}

\item{C}{Matrix (or vector) of covariates; samples x covariates.
X, W, Y, C should be numeric.}

\item{test}{Statistical test to apply; either \code{"full"}, \code{"marginal"},
\code{"nls.identity"}, \code{"nls.log"}, \code{"nls.logit"} or \code{"reducedrankridge"}.}

\item{regularize}{Whether to apply Tikhonov (ie ridge) regularization
to \eqn{\beta_{h j k}}.
The regularization parameter is chosen automatically according to
an unbiased version of (Lawless & Wang, 1976).
Effective for \code{nls.*} tests.}

\item{num.cores}{Number of CPU cores to use.
Full and marginal tests are run in serial, thus num.cores is ignored.}

\item{chunk.size}{The size of job for a CPU core in one batch.
If you have many cores but limited memory, and there is a memory failure,
decrease num.cores and/or chunk.size.}

\item{seed}{Seed for random number generation.}
}
\value{
A list with one element, which is named "coefficients".
The element gives the estimate, statistic, p.value in tibble format.
In order to transform the estimate for \eqn{\alpha_{h j}} to the original scale,
apply \code{plogis} for \code{test = nls.logit} and
\code{exp} for \code{test = nls.log}.
The estimate for \eqn{\beta_{h j k}} by \code{test = nls.log} is
the natural logarithm of fold-change, not the log2.
If numerical convergence fails, \code{NA} is returned for that marker.
}
\description{
Cell-Type-Specific Association Testing
}
\details{
Let the indexes be
\eqn{h} for cell type, \eqn{i} for sample,
\eqn{j} for marker (CpG site or gene),
\eqn{k} for each trait that has cell-type-specific effect,
and \eqn{l} for each trait that has a uniform effect across cell types.
The input data are \eqn{X_{i k}}, \eqn{C_{i l}}, \eqn{W_{i h}} and \eqn{Y_{j i}},
where \eqn{C_{i l}} can be omitted.
\eqn{X_{i k}} and \eqn{C_{i l}} are the values for two types of traits,
showing effects that are cell-type-specific or not, respectively.
Thus, calling \eqn{X_{i k}} and \eqn{C_{i l}} as "traits" and "covariates"
gives a rough idea, but is not strictly correct.
\eqn{W_{i h}} represents the cell type composition and
\eqn{Y_{j i}} represents the marker level,
such as methylation or gene expression.
For each tissue sample, the cell type proportion \eqn{W_{i h}}
is the proportion of each cell type in the bulk tissue,
which is measured or imputed beforehand.
The marker level \eqn{Y_{j i}} in bulk tissue is measured and provided as input.

The parameters we estimate are
the cell-type-specific trait effect \eqn{\beta_{h j k}},
the tissue-uniform trait effect \eqn{\gamma_{j l}},
and the basal marker level \eqn{\alpha_{h j}} in each cell type.

We first describe the conventional linear regression models.
For marker \eqn{j} in sample \eqn{i},
the maker level specific to cell type \eqn{h} is
\deqn{\alpha_{h j} + \sum_k \beta_{h j k} * X_{i k}.}
This is a representative value rather than a mean, because we do not model
a probability distribution for cell-type-specific expression.
The bulk tissue marker level is the average weighted by \eqn{W_{i h}},
\deqn{\mu_{j i} = \sum_h W_{i h} [ \alpha_{h j} + \sum_k \beta_{h j k} * X_{i k} ] +
                  \sum_l \gamma_{j l} C_{i l}.}
The statistical model is
\deqn{Y_{j i} = \mu_{j i} + \epsilon_{j i},}
\deqn{\epsilon_{j i} ~ N(0, \sigma^2_j).}
The error of the marker level is is noramlly distributed with variance
\eqn{\sigma^2_j}, independently among samples.

The \code{full} model is the linear regression
\deqn{Y_{j i} = (\sum_h \alpha_{h j} * W_{i h}) +
                (\sum_{h k} \beta_{h j k} * W_{i h} * X_{i k}) +
                (\sum_l \gamma_{j l} * C_{i l}) +
                error.}
The \code{marginal} model tests the trait association only in one
cell type \eqn{h}, under the linear regression,
\deqn{Y_{j i} = (\sum_{h'} \alpha_{h' j} * W_{i h'}) +
                (\sum_k \beta_{h j k} * W_{i h} * X_{i k}) +
                (\sum_l \gamma_{j l} * C_{i l}) +
                error.}

The nonlinear model simultaneously analyze cell type composition in
linear scale and differential expression/methylation in log/logit scale.
The normalizing function is the natural logarithm \eqn{f} = log for gene
expression, and \eqn{f} = logit for methylation. Conventional linear regression
can be formulated by defining \eqn{f} as the identity function. The three models
are named \code{nls.log}, \code{nls.logit} and \code{nls.identity}.
We denote the inverse function of \eqn{f} by \eqn{g}; \eqn{g} = exp for
gene expression, and \eqn{g} = logistic for methylation.
The mean normalized marker level of marker \eqn{j} in sample \eqn{i} becomes
\deqn{\mu_{j i} = f(\sum_h W_{i h} g( \alpha_{h j} + \sum_k \beta_{h j k} * X_{i k} )) +
                  \sum_l \gamma_{j l} C_{i l}.}
The statistical model is
\deqn{f(Y_{j i}) = \mu_{j i} + \epsilon_{j i},}
\deqn{\epsilon_{j i} ~ N(0, \sigma^2_j).}
The error of the marker level is is noramlly distributed with variance
\eqn{\sigma^2_j}, independently among samples.

The ridge regression aims to cope with multicollinearity of
the interacting terms \eqn{W_{i h} * X_{i k}}.
Ridge regression is fit by minimizing the residual sum of squares (RSS) plus
\eqn{\lambda \sum_{h k} \beta_{h j k}^2}, where \eqn{\lambda > 0} is the
regularization parameter.
}
\examples{
\donttest{
data(GSE42861small)
X = GSE42861small$X
W = GSE42861small$W
Y = GSE42861small$Y
C = GSE42861small$C
result = ctassoc(X, W, Y, C = C)
result$coefficients
}

}
\references{
Lawless, J. F., & Wang, P. (1976). A simulation study of ridge and other
regression estimators.
Communications in Statistics - Theory and Methods, 5(4), 307–323.
\url{https://doi.org/10.1080/03610927608827353}
}
\seealso{
ctcisQTL
}
