% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/selectData.R
\name{selectData}
\alias{selectData}
\title{Select data for analysis from a larger data frame}
\usage{
selectData(
  df,
  dep,
  stat,
  layer = NA,
  transform = TRUE,
  remMiss = TRUE,
  analySpec
)
}
\arguments{
\item{df}{data frame}

\item{dep}{dependent variable}

\item{stat}{station}

\item{layer}{layer (optional)}

\item{transform}{logical field to return log-transformed value (TRUE [default])}

\item{remMiss}{logical field to remove records where dependent
variable, dep, is a missing value (TRUE [default])}

\item{analySpec}{analytical specifications}
}
\value{
A nest list is returned. The first element of the nest list is the down-selected
  data frame. The second element is the list, iSpec, contains specifications for
  data extraction. See examples for usage and details for further discussion of the data
  processing and components of each element.
}
\description{
Select data for analysis from a larger data frame based on dependent
variable, station, and layer. Removing records with missing
values, performing log-transformations, and adding a centering date are
performed based on settings.
}
\details{
The returned data frame will include dyear and cyear. dyear is the decimal
 year computed using smwrBase::baseDay2decimal and smwrBase::baseDay. From
 this, the minimum and maximum 'dyear' are averaged. This averaged value,
 centerYear, is used to compute the centering date, cyear, using cyear =
 dyear - centerYear.

 The variable identified by dep is copied to the variable name dep+".orig"
 (e.g., chla.orig) allowing the user to track the original concentrations. A
 new column, recensor, is added. The value of recensor is FALSE unless the
 value of dep.orig was <=0. In the cases where dep.orig is <= 0, recensor is
 set to TRUE and the value of dep is set to "less-than" a small positive
 value which is stored as iSpec$recensor. If transform=TRUE, the returned
 data frame will also include a variable "ln"+dep (i.e., "lnchla" for log
 transformed chla).

 The data frame will include a column, intervention, which is a factor identifying
 different periods of record such as when different laboratory methods were
 used and is based on the data frame methodsList that is loaded into the
 global environment. This column is set to "A" with only 1 level if the data
 frame methodsList has not been loaded into the global environment.

 The data frame will include a column, lowCensor, to indicate whether the
 data record occurs in a year with a low level of censoring over that
 particular year. The function gamTest uses this column to identify years of
 record (i.e., when lowCensor==FALSE) that should not be used in analyses.

 If remMiss=TRUE, then the returned data frame will be down selected by
 removing records where the variable identified in 'dep' is missing;
 otherwise, no down selection is performed.

 iSpec contains a large list of information

dep - name of column where dependent variable is stored, could be "ln"+dep
for variables that will be analyzed after natural log transformation

depOrig - name of original dependent variable, could be same as dep if no
transformation is used

stat - name of station

stationMethodGroup - name of station group that the station belongs to,
derived from station list (stationMasterList) and used to identify interventions
specified in methodsList table

intervenNum - number of interventions found for this station and dependent
variable as derived from methodsList table, a value of 1 is assigned if no
methodsList entry is found

intervenList - data frame of interventions identified by beginning and ending
date and labeled consecutively starting with "A"

layer - layer

layerName - layer name derived from layerLukup

transform - TRUE/FALSE indicating whether log transformations were taken

trendIncrease - an indicator for interpretation of an increasing concentration

logConst - not currently used

recensor - small value that observations <=0 are recensored to as "less than"
the small value

censorFrac - data frame indicating the yearly number of observations and
fraction of observations reported as less than, uncensored, interval
censored, less than zero, and recensored; also includes a 'lowCensor' field
indicating which years will be dropped by gamTest due to high yearly
censoring

yearRangeDropped - year range of data that will be dropped due to censoring

censorFracSum - censoring overall summary

centerYear - centering year

parmName - parameter name

parmNamelc - parameter name in lower case

parmUnits - parameter units

statLayer - station/layer label, e.g., "LE3.1 (S)"

usgsGageID - USGS gage used for flow adjustments

usgsGageName - USGS gage used for flow adjustments

numObservations - number of observations

dyearBegin - begin date in decimal form

dyearEnd - end date in decimal form

dyearLength - period of record length

yearBegin - period of record begin year

yearend - period of record end year

dateBegin - begin date

dateEnd - end date

The baseDay and baseDay2decimal functions have been added to this package 
from the smwrBase package.
}
\examples{
\dontrun{
dfr    <- analysisOrganizeData(dataCensored)

# retrieve Secchi depth for Station CB5.4, no transformations are applied
dfr1   <- selectData(dfr[["df"]], 'secchi', 'CB5.4', 'S', transform=FALSE,
                    remMiss=FALSE, analySpec=dfr[["analySpec"]])
df1    <- dfr1[[1]]   # data frame of selected data
iSpec1 <- dfr1[[2]]   # meta data about selected data

# retrieve surface corrected chlorophyll-a concentrations for Station CB5.4,
# missing values are removed and transformation applied
dfr2   <- selectData(dfr[["df"]], 'chla', 'CB5.4', 'S', analySpec=dfr[["analySpec"]])
df2    <- dfr2[[1]]   # data frame of selected data
iSpec2 <- dfr2[[2]]   # meta data about selected data
}
}
