% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/olink_normalization.R
\name{olink_normalization}
\alias{olink_normalization}
\title{Normalize two Olink datasets}
\usage{
olink_normalization(
  df1,
  df2 = NULL,
  overlapping_samples_df1,
  overlapping_samples_df2 = NULL,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  reference_medians = NULL
)
}
\arguments{
\item{df1}{First dataset to be used for normalization (required).}

\item{df2}{Second dataset to be used for normalization. Required for bridge
and subset normalization.}

\item{overlapping_samples_df1}{Character vector of samples to be used for the
calculation of adjustment factors in \code{df1} (required).}

\item{overlapping_samples_df2}{Character vector of samples to be used for the
calculation of adjustment factors in \code{df2}. Required for subset
normalization.}

\item{df1_project_nr}{Project name of first dataset (required).}

\item{df2_project_nr}{Project name of second dataset. Required for bridge and
subset normalization.}

\item{reference_project}{Project to be used as reference project. Should
be one of \code{df1_project_nr} and \code{df2_project_nr}. Required for bridge and
subset normalization.}

\item{reference_medians}{Dataset with columns "OlinkID" and "Reference_NPX".
Required for reference median normalization.}
}
\value{
Tibble or ArrowObject with the normalized dataset.
}
\description{
Normalizes two Olink datasets to each other, or one Olink dataset to a
reference set of medians values.
}
\details{
The function handles three different types of normalization:
\itemize{
\item \strong{Bridge normalization}: One of the datasets is adjusted to another
using overlapping samples (bridge samples). Overlapping samples need to have
the same identifiers in both datasets. Normalization is performed using the
median of the pair-wise differences between the bridge samples in the two
datasets. The two datasets are provided as \code{df1} and \code{df2}, and the one
being adjusted to is specified in the input \code{reference_project}; overlapping
samples are specified in \code{overlapping_samples_df1}. Only
\code{overlapping_samples_df1} should be provided regardless of the dataset used
as \code{reference_project}.
\item \strong{Subset normalization}: One of the datasets is adjusted to another
using a subset of samples from each. Normalization is performed using the
differences of the medians between the subsets from the two datasets. Both
\code{overlapping_samples_df1} and \code{overlapping_samples_df2} need to be provided,
and sample identifiers do not need to be the same.
\itemize{
\item A special case of subset normalization occurs when all samples (except
control samples and samples with QC warnings) from each dataset are used
for normalization; this special case is called intensity normalization. In
intensity normalization all unique sample identifiers from \code{df1} are
provided as input in \code{overlapping_samples_df1} and all unique sample
identifiers from \code{df2} are provided as input in \code{overlapping_samples_df2}.
}
\item \strong{Reference median normalization}: One of the datasets (\code{df1}) is
adjusted to a predefined set of adjustment factors. This is effectively
subset normalization, but using differences of medians to pre-recorded
median values. \code{df1}, \code{overlapping_samples_df1}, \code{df1_project_nr} and
\code{reference_medians} need to be specified. Dataset \code{df1} is normalized using
the differences in median between the overlapping samples and the reference
medians.
\item \strong{Cross-product normalization}: One of the datasets is adjusted to
another using the median of pair-wise differences of overlapping samples
(bridge samples) or quantile smoothing using overlapping
samples as reference to adjust the distributions. Overlapping samples need
to have the same identifiers in both datasets. The two datasets are provided
as \code{df1} and \code{df2}, and the one being adjusted to is specified in the input
\code{reference_project}; \strong{Note that} in cross-product normalization the
reference project is predefined, and in case the argument
\code{reference_project} does not match the expected reference project an error
will be returned. Overlapping samples are specified in
\code{overlapping_samples_df1}. Only \code{overlapping_samples_df1} should be provided
regardless of the dataset used as \code{reference_project}. This functionality
\strong{does not} modify the column with original quantification values
(e.g. NPX), instead it normalizes it with 2 different approaches in columns
"MedianCenteredNPX" and "QSNormalizedNPX", and provides a recommendation in
"BridgingRecommendation" about which of the two columns is to be used.
}

The output dataset is \code{df1} if reference median normalization, or \code{df2}
appended to \code{df1} if bridge, subset or cross-product normalization. The
output dataset contains all original columns from the original dataset(s),
and the columns:
\itemize{
\item "Project" and "Adj_factor" in case of reference median, bridge and subset
normalization. The former marks the project of origin based on
\code{df1_project_nr} and \code{df2_project_nr}, and the latter the adjustment factor
that was applied to the non-reference dataset.
\item "Project", "OlinkID_E3072", "MedianCenteredNPX", "QSNormalizedNPX",
"BridgingRecommendation" in case of cross-product normalization. The columns
correspond to the project of origin based on \code{df1_project_nr} and
\code{df2_project_nr}, the assay identifier in the non-reference project, the
bridge-normalized quantification value, the quantile smoothing-normalized quantification
value, and the recommendation about which of the two normalized values is
more suitable for downstream analysis.
}
}
\examples{
\donttest{

# prepare datasets
npx_df1 <- npx_data1 |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# bridge normalization

# overlapping samples - exclude control samples
overlap_samples <- intersect(x = npx_df1$SampleID,
                             y = npx_df2$SampleID) |>
  (\(x) x[!grepl("^CONTROL_SAMPLE", x)])()

# normalize
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = overlap_samples,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1"
)

# subset normalization

# find a suitable subset of samples from each dataset:
# exclude control samples
# exclude samples that do not pass QC
df1_samples <- npx_df1 |>
  dplyr::group_by(
    dplyr::pick(
      dplyr::all_of("SampleID")
    )
  )|>
  dplyr::filter(
    all(.data[["QC_Warning"]] == 'Pass')
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(
    !grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()
df2_samples <- npx_df2 |>
  dplyr::group_by(
    dplyr::pick(
      dplyr::all_of("SampleID")
    )
  )|>
  dplyr::filter(
    all(.data[["QC_Warning"]] == 'Pass')
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(
    !grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()

# select a subset of samples from each set from above
df1_subset <- sample(x = df1_samples, size = 16L)
df2_subset <- sample(x = df2_samples, size = 20L)

# normalize
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = df1_subset,
  overlapping_samples_df2 = df2_subset,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1"
)

# special case of subset normalization using all samples
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = df1_samples,
  overlapping_samples_df2 = df2_samples,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1"
)

# reference median normalization

# For the sake of this example, set the reference median to 1
ref_med_df <- npx_data1 |>
  dplyr::select(
    dplyr::all_of(
      c("OlinkID")
    )
  ) |>
  dplyr::distinct() |>
  dplyr::mutate(
    Reference_NPX = runif(n = dplyr::n(),
                          min = -1,
                          max = 1)
  )

# normalize
olink_normalization(
  df1 = npx_df1,
  overlapping_samples_df1 = df1_subset,
  reference_medians = ref_med_df
)

# cross-product normalization

# get reference samples
overlap_samples_product <- intersect(
  x = unique(OlinkAnalyze:::data_ht_small$SampleID),
  y = unique(OlinkAnalyze:::data_3k_small$SampleID)
) |>
  (\(.) .[!grepl("CONTROL", .)])()

# normalize
olink_normalization(
  df1 = OlinkAnalyze:::data_ht_small,
  df2 = OlinkAnalyze:::data_3k_small,
  overlapping_samples_df1 = overlap_samples_product,
  df1_project_nr = "proj_ht",
  df2_project_nr = "proj_3k",
  reference_project = "proj_ht"
)
}

}
