% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/intervalaverage_functions.R
\name{intervalaverage}
\alias{intervalaverage}
\title{time-weighted average of values measured over intervals}
\usage{
intervalaverage(
  x,
  y,
  interval_vars,
  value_vars,
  group_vars = NULL,
  required_percentage = 100,
  skip_overlap_check = FALSE,
  verbose = FALSE
)
}
\arguments{
\item{x}{a data.table containing values measured over intervals. see
\code{interval_vars} parameter
for how to specify interval columns and \code{value_vars} for how to
specify value columns.
intervals in \code{x} must must be completely non-overlapping within
groups defined by group_vars. if \code{group_vars} is specified (non-\code{NULL}), \code{x} must
also contain columns specified in \code{group_vars}.}

\item{y}{a data.table object containing intervals over which averages of \code{x} values should be computed.
averaging intervals in \code{y}, unlike measurement intervals in \code{x}, may be overlapping within groups.
if \code{group_vars} is specified (non-\code{NULL}),  \code{y} must contains those \code{group_vars} column names
(and this would allow different averaging periods for each group)}

\item{interval_vars}{a length-2 character vector of column names in both \code{x} and \code{y}.
These column names specify columns in x and y that define
closed (inclusive) starting and ending intervals. The column name
specifying the lower-bound column must be specified first.
these columns in x and y must all be of the same class and either be integer or IDate.
The interval_vars character vector cannot be named. This is reserved for future use allowing
different interval_vars column names in x and y.}

\item{value_vars}{a character vector of column names in \code{x}. This specifies
the columns to be averaged.}

\item{group_vars}{A character vector of column names in both x and y.
The interaction of
these variables define groups in which averages of \code{x} values will be taken.
specifying subjects/monitors/locations within which to take averages.
By default this is \code{NULL}, in which case averages are taken over the entire \code{x}
dataset for each \code{y} period.
The group_vars character vector cannot be named. This is reserved for future use allowing
different interval_vars column names in x and y.}

\item{required_percentage}{This percentage of the duration of each (possibly group-specific)
\code{y} interval must be observed
and nonmissing for a specific \code{value_var} in \code{x} in order for the return table to
contain a nonmissing average of the \code{value_var} for that \code{y} interval.  If the percentage
of the nonmissing \code{value_var} observations is less than \code{required_percentage} an NA will be be returned
for that average.
The default is 100, meaning that if \emph{any} portion of a \code{y} interval is either not recorded or
missing in \code{x}, then the corresponding return row will contain a an NA for the average of that
\code{value_var}.}

\item{skip_overlap_check}{by default, FALSE. setting this to TRUE will skip
internal checks to make sure x intervals are non-overlapping within
groups defined by group_vars.
intervals in x must be non-overlapping,
but you may want to skip this check if you've  already checked this because
it is computationally intensive for large datasets.}

\item{verbose}{include printed timing information? by default, FALSE}
}
\value{
returns a data.table object.
Rows of the return data.table correspond to intervals from y. i.e, the number
of rows of the return will be the number of rows of y.
Columns of the returned data.table are as follows: \cr
\itemize{
\item grouping variables as specified in \code{group_vars} \cr
\item interval columns corresponding to intervals in y. These columns are named the
same they were in x and y and as specified in \code{interval_vars}
\item value variable columns from x, averaged to periods in y.
named the same as they were in x \cr
\item \code{yduration}: the length of the interval (ie as a count) specified in y \cr
\item \code{xduration}: the total length of the intervals (ie as a count)
from x that fall into this interval from y. this will be equal to
yduration if x is comprehensive for (ie, fully covers)  this interval from y. \cr
\item \code{nobs_<value_vars>}: for each \code{value_var} specified, this is the count of
non-missing values from x that fall into this interval from y. this will be
equal to xduration if the value_var contains no NA values over the y
interval. If there are NAs in value variables, then \code{nobs_<value_vars>}
will be different from \code{xduration} and won't necessarily be all the same
for each value_var.
\item \code{xminstart}: For each returned interval (ie the intervals from Y) the minimum of the
start intervals represented in x.  If the start of the earliest x interval is less than the start
of the y interval, the minimum of the y interval is returned. Note, this is the minimum start
time in x matching with the y interval whether or not any value_vars were missing or not for that start time.
If you need non-missing minimum start times, you could remove NA intervals from
x prior to calling intervalaverage (this would need to be done separately for each value_var).
\item \code{xmaxend}:  similar to xminstart but the maximum of the end intervals represented in x.
Again, this does not pay attention to whether the interval in x had non-missing value_vars.
}
}
\description{
\code{intervalaverage} takes values recorded over
non-overlapping intervals and averages them to defined intervals, possibly within
groups (individuals/monitors/locations/etc).  This function could be used to take averages over long
intervals
of values measured over short intervals and/or to take short "averages" of values measured over
longer intervals (ie, downsample without smoothing). Measurement intervals and averaging intervals need
not align. In the event that an averaging interval contains more than one measurement interval,
a weighted average is calculated (ie each measurement is weighted on the duration of its interval's
overlap with the averaging period interval).
}
\details{
All intervals are treated as closed (ie inclusive of the start and end values in interval_vars)

x and y are not copied but rather passed by reference to function internals
but the order of these data.tables is restored on function completion or error,

When required_percentage is less than 100, xminstart and xmaxend may be useful to
determine whether an average meets specified coverage requirements in terms of not
just percent of missingness but whether values are represented through the range of the y interval
}
\examples{
x <- data.table(start=seq(1L,by=7L,length=6),
               end=seq(7L,by=7L,length=6),
               pm25=c(10,12,8,14,22,18))

y <- data.table(start=seq(3L,by=7L,length=6),
               end=seq(9L,by=7L,length=6))

z <- intervalaverage(x,y,interval_vars=c("start","end"),
                    value_vars=c("pm25"))

#also see vignette for more extensive examples
}
