#' Country dictionary for standardizing country names and codes
#'
#' @description
#' `country_dictionary` provides a set of lookup tables used to standardize
#' country names and country codes in occurrence datasets.
#'
#' The dictionary is built from `rnaturalearthdata::map_units110`
#' and consolidates a wide variety of country name variants (in several
#' languages and formats), as well as multiple coding systems, into a single
#' suggested standardized name.
#'
#' This object is used internally by functions that clean or harmonize
#' country fields, ensuring that country names in occurrence datasets (e.g.,
#' `"Brasil"`,`"brasil"`, `"BR"`, `"BRA"`, `"République Française"`) are all
#' mapped consistently to a single standardized form (`"brazil"`, `"france"`,
#' etc.).
#'
#' @format
#' A named list of two data frames:
#'
#' \describe{
#'
#'   \item{`country_name`}{A data frame with two columns:
#'     \describe{
#'       \item{`country_name`}{Character. Lowercased and accent-stripped country
#'       name variants (from multiple `rnaturalearthdata` fields such as
#'       *name*, *name_long*, *abbrev*, *formal_en*, and alternative names in
#'       several languages).}
#'       \item{`country_suggested`}{Character. The standardized country name,
#'       derived from the `name` column of `map_units110`, also lowercased and
#'       accent-stripped.}
#'     }}
#'
#'   \item{`country_code`}{A data frame with two columns:
#'     \describe{
#'       \item{`country_code`}{Character. Country codes from several systems,
#'       including ISO-2, ISO-3, FIPS, postal codes, and others, after filtering
#'       invalid or ambiguous codes.}
#'       \item{`country_suggested`}{Character. The standardized country name
#'       corresponding to each code.}
#'     }}
#'
#' }
#'
#' @details
#' The dictionary is generated by:
#' \itemize{
#'   \item extracting multiple name and code fields from
#'   `rnaturalearthdata::map_units110`,
#'   \item converting names to lowercase and removing accents,
#'   \item converting codes to uppercase,
#'   \item removing invalid or ambiguous codes (e.g., `-99`, `"J"`, various
#'   country mismatches),
#'   \item and ensuring uniqueness across all entries.
#' }
#'
#'
#' @examples
#' data(country_dictionary)
#'
#' head(country_dictionary$country_name)
#' head(country_dictionary$country_code)
#'
"country_dictionary"


#' States dictionary for standardizing state and province names and codes
#'
#' @description
#' Provides lookup tables used to standardize subnational administrative units
#' (states and provinces) in occurrence datasets.
#'
#' Generated from `rnaturalearth::ne_states()`, it includes a wide range of
#' name variants (in multiple languages, transliterations, and common
#' abbreviations), as well as postal codes for each unit.
#'
#' This dictionary allows consistent mapping of user-provided names such as
#' `"são paulo"`, `"sao paulo"`, `"SP"`, `"illinois"`, `"ill."`, `"bayern"`,
#' `"bavaria"` to a single standardized state or province name.
#'
#' @format
#' A named list with two data frames:
#'
#' \describe{
#'   \item{states_name}{A data frame with columns:
#'     \describe{
#'       \item{state_name}{Character. Name variants of states or provinces
#'         from `ne_states()`, lowercased and accent-stripped.}
#'       \item{state_suggested}{Character. Standardized state/province name,
#'         also lowercased and accent-stripped.}
#'       \item{country}{Character. Country associated with the state/province,
#'         lowercased and accent-stripped.}
#'     }
#'   }
#'   \item{states_code}{A data frame with columns:
#'     \describe{
#'       \item{state_code}{Character. Postal codes from `ne_states()`, cleaned
#'         and converted to uppercase.}
#'       \item{state_suggested}{Character. Standardized state/province name
#'         corresponding to the code.}
#'       \item{country}{Character. Country associated with the code.}
#'     }
#'   }
#' }
#'
#' @details
#' The dictionary is constructed by:
#' \itemize{
#'   \item selecting administrative units of type `"State"` or `"Province"`;
#'   \item extracting multiple name fields, including alternative names and
#'   multilingual fields;
#'   \item normalizing names to lowercase and removing accents;
#'   \item normalizing codes to uppercase;
#'   \item removing duplicates and ambiguous entries;
#'   \item removing rows with missing names or codes.
#' }
#'
#' @examples
#' data(states_dictionary)
#' head(states_dictionary$states_name)
#' head(states_dictionary$states_code)
"states_dictionary"


#' Dictionary of terms used to flag cultivated individuals
#'
#' @description
#' `cultivated` is a list of character vectors containing keywords used to
#' identify whether an occurrence record refers to cultivated or
#' non-cultivated individuals.
#'
#' This object is used internally by `flag_cultivated()` to scan occurrence
#' fields (such as notes, habitat descriptions, or remarks) and classify
#' records as *cultivated* or *not cultivated* based on textual patterns.
#'
#' The list combines terms from `plantR` (`plantR:::cultivated` and
#' `plantR:::notCultivated`) with additional multilingual variants commonly
#' found in herbarium metadata.
#'
#' @format
#' A named list with two elements:
#'
#' \describe{
#'   \item{`cultivated`}{Character vector. Terms that indicate an individual is
#'   cultivated. Imported from `plantR:::cultivated`.}
#'
#'   \item{`not_cultivated`}{Character vector. Terms suggesting an individual is
#'   *not* cultivated (e.g., “not cultivated”, “not planted”, “no plantada”,
#'   “no cultivada”), including terms from `plantR:::notCultivated`.}
#' }
#'
#' @details
#' These terms are matched case-insensitively after text cleaning (e.g.,
#' lowercasing and accent removal).
#'
#' @seealso
#' \code{flag_cultivated}
#'
#' @references de Lima, Renato AF, et al. plantR: An R package and workflow for
#' managing species records from biological collections.
#' **Methods in Ecology and Evolution**, 14.2 (2023): 332-339.

#' @examples
#' data(cultivated)
#'
#' cultivated$cultivated
#' cultivated$not_cultivated
#'
"cultivated"

#' Fake occurrence data for testing coordinate validation functions
#'
#' @description
#' `fake_data` is a synthetic dataset created for testing functions that validate
#' and correct country- or state-level geographic coordinates.
#'
#' Controlled coordinate errors were introduced (e.g., inverted signs,
#' swapped values, combinations of swaps and inversions) to simulate common
#' georeferencing mistakes.
#'
#' This dataset is intended for automated testing of functions such as
#' `check_countries()` and `check_states()`.
#'
#' @format
#' A data frame with the same structure as `all_occ`, containing
#' occurrence records with intentionally manipulated coordinates.
#' An additional column `data_source = "fake_data"` identifies these records.
#'
#' @details
#' The coordinate errors include:
#'
#' \itemize{
#'   \item **Inverted longitude**: multiplying longitude by -1.
#'   \item **Inverted latitude**: multiplying latitude by -1.
#'   \item **Both coordinates inverted**.
#'   \item **Swapped coordinates**: (`lon`, `lat`) → (`lat`, `lon`).
#'   \item **Swapped + inverted** in four combinations:
#'     \itemize{
#'       \item swapped only,
#'       \item swapped + inverted longitude,
#'       \item swapped + inverted latitude,
#'       \item swapped + both inverted.
#'     }
#' }
#'
#'
#'
#' @examples
#' data(fake_data)
"fake_data"

#' Color palette for flagged records
#'
#' @description
#' `flag_colors` is a named character vector defining the default colors used to
#' plot occurrence records flagged with `mapview_here()`.
#'
#' @format
#' A named character vector where:
#'
#' \describe{
#'   \item{names}{Flag labels corresponding to categories generated by the
#'   various `flag_*` and checking functions.}
#'
#'   \item{values}{Hex color codes or standard R color names used for plotting.}
#' }
#
#' @seealso
#' \code{mapview_here}
#'
#' @examples
#' data(flag_colors)
#'
#' # View all flag categories and their colors
#' flag_colors
#'
"flag_colors"

#' Flag name dictionary
#'
#' @description
#' A named character vector used to convert internal flag column names
#' (produced by the package's flagging functions) into human-readable labels.
#'
#' @format
#' A named character vector of length 25.
#' The names correspond to the original flag codes (e.g., `"correct_country"`,
#' `"duplicated_flag"`, `".cen"`, `"consensus_flag"`), and the values are the
#' cleaned, human-readable labels (e.g., `"Wrong country"`, `"Duplicated"`,
#' `"Country/Province centroid"`, `"consensus"`).
#'
#' @details
#' This object is used internally by functions such as `mapview_here()` and
#' `remove_flagged()`to display more intuitive flag names to users.
#'
#' @usage
#' flag_names
#'
"flag_names"

#' Occurrence records of Yellow Trumpet Tree from BIEN
#'
#' @description
#' A cleaned dataset of occurrence records for Yellow Trumpet Tree
#' (*Handroanthus serratifolius*) retrieved from the BIEN database.
#' The raw data were downloaded using `get_bien()`
#'
#' The dataset was subsequently processed with the package’s internal
#' flagging workflow (`flag_duplicates()` and `remove_flagged()`) to remove
#' duplicated records.
#'
#' @format
#' A data frame containing spatial coordinates, taxonomic information, and
#' metadata returned by BIEN, after cleaning.
#' Columns include (but may not be limited to):
#' - `scrubbed_species_binomial`: Cleaned species name
#' - `longitude`, `latitude`: Geographic coordinates
#' - `country`, `state_province`, and other political boundary fields
#'
#'
#' @usage
#' occ_bien
#'
#' @examples
#' # View dataset
#' head(occ_bien)
#'
#' # Number of records
#' nrow(occ_bien)
#'
#'
#' @seealso
#' `get_bien()`
"occ_bien"

#' Occurrence records of *Araucaria angustifolia* from GBIF
#'
#' @description
#' A cleaned dataset of occurrence records for *Araucaria angustifolia* (Parana
#' pine) retrieved from GBIF.
#'
#' Records were downloaded using the package’s GBIF workflow
#' (`prepare_gbif_download()`, `request_gbif()`, `import_gbif()`), and then
#' cleaned using the internal flagging workflow (duplicate detection and
#' removal).
#'
#' @format
#' A data frame containing georeferenced GBIF occurrence records for
#' *A. angustifolia* after all cleaning steps.
#'
#' @usage
#' occ_gbif
#'
#' @examples
#' # Preview dataset
#' head(occ_gbif)
#'
#' # Number of cleaned records
#' nrow(occ_gbif)
#'
#' @seealso
#' `prepare_gbif_download()`, `request_gbif()`, `import_gbif()`,
#' `flag_duplicates()`, `remove_flagged()`
"occ_gbif"

#' Occurrence records of azure jay from iDigBio
#'
#' @description
#' A cleaned dataset of occurrence records for azure jay (*Cyanocorax caeruleus*)
#' retrieved from the iDigBio using `get_idigbio()`.
#'
#' Records were cleaned using the package's internal duplicate-flagging workflow.
#'
#' @format
#' A data frame containing georeferenced iDigBio occurrence records for
#' *C. caeruleus* after all cleaning steps.
#'
#' @usage
#' occ_idig
#'
#' @examples
#' # First rows
#' head(occ_idig)
#'
#' # Number of cleaned records
#' nrow(occ_idig)
#'
#' @seealso
#' `get_idigbio()`, `flag_duplicates()`, `remove_flagged()`
"occ_idig"

#' Occurrence records of azure jay from SpeciesLink
#'
#' @description
#' A cleaned dataset of occurrence records for azure jay (*Cyanocorax caeruleus*)
#' retrieved from the SpeciesLink using `get_specieslink()`.
#'
#' Records were cleaned using the package's internal duplicate-flagging workflow.
#'
#' @format
#' A data frame containing georeferenced SpeciesLink occurrence records for
#' *C. caeruleus* after all cleaning steps.
#'
#' @usage
#' occ_splink
#'
#' @examples
#' # First rows
#' head(occ_splink)
#'
#' # Number of cleaned records
#' nrow(occ_splink)
#'
#' @seealso
#' `get_specieslink()`, `flag_duplicates()`, `remove_flagged()`
"occ_splink"

#' Integrated occurrence dataset for three example species
#'
#' @description
#' A harmonized, multi-source occurrence dataset containing cleaned
#' georeferenced records for three species:
#'
#' - *Araucaria angustifolia* (Parana pine)
#' - *Cyanocorax caeruleus* (Azure jay)
#' - *Handroanthus serratifolius* (Yellow trumpet tree)
#'
#' Records were retrieved from **GBIF**, **speciesLink**, **BIEN**, and
#' **iDigBio**, standardized through the package workflow, merged, and
#' cleaned to remove duplicates.
#'
#' @format
#' A data frame where each row represents a georeferenced occurrence
#' record for one of the three species.
#'
#' Columns correspond to the standardized output of
#' `format_columns()`, including:
#'
#' - `species`: Cleaned binomial species name
#' - `decimalLongitude`, `decimalLatitude`: Coordinates
#' - `year`: Year of collection/observation
#' - Various taxonomic, temporal, locality, and metadata fields
#' - Source identifiers added by `format_columns()` (e.g., `data_source`)
#'
#' @usage
#' occurrences
#'
#' @examples
#' # Show the first rows
#' head(occurrences)
#'
#' # Number of occurrences per species
#' table(occurrences$species)
#'
#' @seealso
#' `format_columns()`, `bind_here()`, `flag_duplicates()`, `remove_flagged()`
"occurrences"

#' Flagged occurrence records of *Araucaria angustifolia*
#'
#' @description
#' A dataset containing the occurrence records of *Araucaria angustifolia*
#' after applying several of the package’s flagging and data-quality
#' assessment functions.
#'
#' @format
#' A data frame where each row corresponds to a georeferenced occurrence
#' of *A. angustifolia*.
#'
#' @usage
#' occ_flagged
#'
#' @examples
#' # First rows
#' head(occ_flagged)
#'
#' # Count flagged vs. unflagged records
#' table(occ_flagged$correct_country)
#'
#'
#' @seealso
#' `occurrences`,
#' `standardize_countries()`, `standardize_states()`,
#' `flag_florabr()`, `flag_wcvp()`, `flag_iucn()`,
#' `flag_cultivated()`, `flag_inaturalist()`,
#' `flag_duplicates()`, `mapview_here()`
"occ_flagged"

#' Metadata templates used internally by `format_columns()`
#'
#' @description
#' A named list of data frames containing metadata templates for the main
#' biodiversity data providers supported by the package (GBIF, SpeciesLink,
#' iDigBio, and BIEN).
#'
#' These templates are used internally by `format_columns()` to harmonize
#' columns.
#'
#' @details
#' Each element of `prepared_metadata` is a single-row data frame where:
#'
#' - **column names** correspond to the package’s standardized output fields
#' - **values in the row** represent the original column names used by each
#'   data provider
#'
#' These mappings allow `format_columns()` to:
#'
#' - rename fields (e.g., `scientificname` → `scientificName`)
#' - identify which variables are missing or provider-specific
#' - coerce classes consistently (e.g., dates, coordinates)
#' - ensure compatibility when combining datasets from different sources
#'
#' @format
#' A named list of four data frames:
#'
#' - **`$gbif`** — template for GBIF dataset.
#' - **`$specieslink`** — template for SpeciesLink dataset.
#' - **`$idigbio`** — template for iDigBio dataset.
#' - **`$bien`** — template for BIEN dataset.
#'
#' @usage
#' prepared_metadata
#'
#' @examples
#' # View template for GBIF records
#' prepared_metadata$gbif
#'
#'
#' @seealso
#' `format_columns()`
"prepared_metadata"

#' Occurrence records of *Puma concolor* from AtlanticR
#'
#' @description
#' A subset of Atlantic mammals records obtained from the
#' `atlanticr::atlantic_mammals` dataset, containing occurrences of
#' *Puma concolor*.
#'
#' This dataset is provided as an example to illustrate how to create
#' user-defined metadata templates for occurrence records from external
#' sources using the package’s `create_metadata()` function.
#'
#' @format
#' A data frame where each row represents a single occurrence record of
#' *Puma concolor*. Columns include species name, location, and other
#' relevant metadata fields provided by the `atlantic_mammals` dataset.
#'
#' @usage
#' puma_atlanticr
#'
#' @examples
#' # Preview first rows
#' head(puma_atlanticr)
#'
#' # Count occurrences per year
#' table(puma_atlanticr$year)
#'
#'
#' @seealso
#' `create_metadata()`,
#' `format_columns()`
"puma_atlanticr"

#' World Countries
#'
#' A `"PackedSpatVector"` containing country polygons from **Natural Earth**,
#' processed and cleaned for use within the package. Country names were
#' converted to lowercase and had accents removed.
#'
#' @format A `PackedSpatVector` object with country polygons and one attribute:
#' \describe{
#'   \item{name}{Country name.}
#' }
#'
#' @details
#' The dataset is sourced from `rnaturalearthdata::map_units110`, then:
#' \itemize{
#'   \item converted to a `SpatVector` using **terra**,
#'   \item attribute `"name"` cleaned (`tolower()`, `remove_accent()`),
#'   \item wrapped using `terra::wrap()` for robust internal storage.
#' }
#'
#' @source Natural Earth data, via **rnaturalearthdata**.
#'
#' @examples
#' data(world)
#' world <- terra::unwrap(world)
#' terra::plot(world)
"world"

#' Administrative Units (States, Provinces, and Regions)
#'
#' A simplified `PackedSpatVector` containing state-level polygons (e.g.,
#' provinces, departments, regions) for countries worldwide. Names and parent
#' countries (`geonunit`) were cleaned (lowercase, accents removed).
#'
#' @format A `PackedSpatVector` object with polygons of administrative divisions
#' and one attribute:
#' \describe{
#'   \item{name}{State/province/region name.}
#' }
#'
#' @details
#' The dataset was generated from `rnaturalearth::ne_states()`. The following
#' processing steps were applied:
#' \itemize{
#'   \item kept only administrative types: `"Province"`, `"State"`,
#'   `"Department"`, `"Region"`, `"Federal District"`;
#'   \item selected only `"name"` and `"geonunit"` columns;
#'   \item both fields were cleaned via `tolower()` and `remove_accent()`;
#'   \item records where state name = country name were removed;
#'   \item geometries were simplified using `terra::simplifyGeom(tolerance = 0.05)`;
#'   \item wrapped with `terra::wrap()` for internal storage.
#' }
#'
#' @source Natural Earth data, via **rnaturalearth**.
#'
#' @examples
#' data(states)
#' states <- terra::unwrap(states)
#' terra::plot(states)
"states"

#' Bioclimatic Variables from WorldClim (bio_1, bio_7, bio_12)
#'
#' A `PackedSpatRaster` containing three bioclimatic variables from the
#' WorldClim, cropped to a region of interest South America.
#'
#' @format A `SpatRaster` with 3 layers and the following characteristics:
#' \describe{
#'   \item{Dimensions}{151 rows × 183 columns}
#'   \item{Resolution}{0.08333333° × 0.08333333°}
#'   \item{Extent}{xmin = -57.08333, xmax = -41.83333,
#'                 ymin = -32.08333, ymax = -19.5}
#'   \item{CRS}{WGS84 (EPSG:4326)}
#'   \item{Layers}{
#'     \describe{
#'       \item{bio_1}{Mean Annual Temperature (°C × 10)}
#'       \item{bio_7}{Temperature Annual Range (°C × 10)}
#'       \item{bio_12}{Annual Precipitation (mm)}
#'     }
#'   }
#' }
#'
#' @details
#' This raster corresponds to three standard bioclimatic variables from the
#' **WorldClim 2.1** dataset.
#'
#' @source \url{https://www.worldclim.org/}
#'
#' @examples
#' data(worldclim)
#' bioclim <- terra::unwrap(worldclim)
#' terra::plot(bioclim)
"worldclim"

