% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dumpers.R
\name{dumpers}
\alias{dumpers}
\alias{dump_raw_to_txt}
\alias{dump_to_rds}
\alias{dump_raw_to_db}
\title{Result dumpers}
\usage{
dump_raw_to_txt(res, args, as, file_pattern = "oaidump",
  file_dir = ".", file_ext = ".xml")

dump_to_rds(res, args, as, file_pattern = "oaidump", file_dir = ".",
  file_ext = ".rds")

dump_raw_to_db(res, args, as, dbcon, table_name, field_name, ...)
}
\arguments{
\item{res}{results, depends on \code{as}, not to be specified by the user}

\item{args}{list, query arguments, not to be specified by the user}

\item{as}{character, type of result to return, not to be specified by the
user}

\item{file_pattern, file_dir, file_ext}{character respectively: initial part of
the file name, directory name, and file extension used to create file
names. These arguments are passed to \code{\link[=tempfile]{tempfile()}} arguments
\code{pattern}, \code{tmpdir}, and \code{fileext} respectively.}

\item{dbcon}{\pkg{DBI}-compliant database connection}

\item{table_name}{character, name of the database table to write into}

\item{field_name}{character, name of the field in database table to write
into}

\item{...}{arguments passed to/from other functions}
}
\value{
Dumpers should return \code{NULL} or a value that will be collected
and returned by the function using the dumper.

\code{dump_raw_to_txt} returns the name of the created file.

\code{dump_to_rds} returns the name of the created file.

\code{dump_xml_to_db} returns \code{NULL}
}
\description{
Result dumpers are functions allowing to handle the chunks of results from
OAI-PMH service "on the fly". Handling can include processing, writing to
files, databases etc.
}
\details{
Often the result of a request to a OAI-PMH service are so large that it is
split into chunks that need to be requested separately using
\code{resumptionToken}. By default functions like
\code{\link[=list_identifiers]{list_identifiers()}} or \code{\link[=list_records]{list_records()}} request these
chunks under the hood and return all concatenated in a single R object. It
is convenient but insufficient when dealing with large result sets that
might not fit into RAM. A result dumper is a function that is called on
each result chunk. Dumper functions can write chunks to files or databases,
include initial pre-processing or extraction, and so on.

A result dumper needs to be function that accepts at least the arguments:
\code{res}, \code{args}, \code{as}. They will get values by the enclosing
function internally. There may be additional arguments, including \code{...}.
Dumpers should return \code{NULL} or a value that will
be collected and returned by the function calling the dumper (e.g.
\code{\link[=list_records]{list_records()}}).

Currently result dumpers can be used with functions:
\code{\link[=list_identifiers]{list_identifiers()}}, \code{\link[=list_records]{list_records()}}, and \code{\link[=list_sets]{list_sets()}}.
To use a dumper with one of these functions you need to:
\itemize{
\item Pass it as an additional argument \code{dumper}
\item Pass optional addtional arguments to the dumper function in a list
as the \code{dumper_args} argument
}

See Examples. Below we provide more details on the dumpers currently
implemented.

\code{dump_raw_to_txt} writes raw XML to text files. It requires
\code{as=="raw"}. File names are created using \code{\link[=tempfile]{tempfile()}}. By
default they are written in the current working directory and have a format
\code{oaidump*.xml} where \code{*} is a random string in hex.

\code{dump_to_rds} saves results in an \code{.rds} file via \code{\link[=saveRDS]{saveRDS()}}.
Type of object being saved is determined by the \code{as} argument. File names
are generated in the same way as by \code{dump_raw_to_txt}, but with default
extension \code{.rds}

\code{dump_xml_to_db} writes raw XML to a single text column of a table in a
database. Requires \code{as == "raw"}. Database connection \code{dbcon}
should be a connection object as created by \code{\link[DBI:dbConnect]{DBI::dbConnect()}} from
package \pkg{DBI}. As such, it can connect to any database supported by
\pkg{DBI}. The records are written to a field \code{field_name} in a table
\code{table_name} using \code{\link[DBI:dbWriteTable]{DBI::dbWriteTable()}}. If the table does not
exist, it is created. If it does, the records are appended. Any additional
arguments are passed to \code{\link[DBI:dbWriteTable]{DBI::dbWriteTable()}}
}
\examples{
\dontrun{

### Dumping raw XML to text files

# This will write a set of XML files to a temporary directory
fnames <- list_identifiers(from="2018-06-01T",
                           until="2018-06-14T",
                           as="raw",
                           dumper=dump_raw_to_txt,
                           dumper_args=list(file_dir=tempdir()))
# vector of file names created
str(fnames)
all( file.exists(fnames) )
# clean-up
unlink(fnames)


### Dumping raw XML to a database

# Connect to in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), dbname=":memory:")
# Harvest and dump the results into field "bar" of table "foo"
list_identifiers(from="2018-06-01T",
                 until="2018-06-14T",
                 as="raw",
                 dumper=dump_raw_to_db,
                 dumper_args=list(dbcon=con,
                                  table_name="foo",
                                  field_name="bar") )
# Count records, should be 101
DBI::dbGetQuery(con, "SELECT count(*) as no_records FROM foo")

DBI::dbDisconnect(con)




}
}
\references{
OAI-PMH specification
\url{https://www.openarchives.org/OAI/openarchivesprotocol.html}
}
\seealso{
Functions supporting the dumpers:
\code{\link[=list_identifiers]{list_identifiers()}}, \code{\link[=list_sets]{list_sets()}}, and \code{\link[=list_records]{list_records()}}
}
