% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/near_strings.R
\name{near_strings2}
\alias{near_strings2}
\title{Strings of Near Repeats using KDtrees}
\usage{
near_strings2(dat, id, x, y, tim, DistThresh, TimeThresh, k = 300, eps = 1e-04)
}
\arguments{
\item{dat}{data frame}

\item{id}{string for id variable in data frame (should be unique)}

\item{x}{string for variable that has the x coordinates}

\item{y}{string for variable that has the y coordinates}

\item{tim}{string for variable that has the time stamp (should be numeric or datetime)}

\item{DistThresh}{scaler for distance threshold (in whatever units x/y are in)}

\item{TimeThresh}{scaler for time threshold (in whatever units tim is in)}

\item{k, }{the k for the max number of neighbors to grab in the nn2 function in RANN package}

\item{eps, }{the nn2 function returns <=, so to return less (like \code{near_strings1()}), needs a small fudge factor}
}
\value{
A data frame that contains the ids as row.names, and two columns:
\itemize{
\item \code{CompId}, a unique identifier that lets you collapse original cases together
\item \code{CompNum}, the number of linked cases inside of a component
}
}
\description{
Identifies cases that are nearby each other in space/time
}
\details{
This function returns strings of cases nearby in space and time. Useful for near-repeat analysis, or to
identify potentially duplicate cases. This particular function uses kdtrees (from the RANN library).
For very large data frames, this will run quite a bit faster than \code{near_strings1} (although still may run out of memory).
And it is not 100\% guaranteed to grab all of the pairs. Tests I have done
\href{https://andrewpwheeler.com/2017/04/12/identifying-near-repeat-crime-strings-in-r-or-python/}{on my machine}
~100k rows takes around 2 minutes with this code.
}
\examples{
# Simplified example showing two clusters
s <- c(0,0,0,4,4)
ccheck <- c(1,1,1,2,2)
dat <- data.frame(x=1:5,y=0,
                  ti=s,
                  id=1:5)
res1 <- near_strings2(dat,'id','x','y','ti',2,1)
print(res1)

\donttest{
# This runs faster than near_strings1
library(sp)
nyc_shoot$id <- 1:nrow(nyc_shoot)  #incident ID can have dups
print(Sys.time())
res <- near_strings2(nyc_shoot@data,id='id',x='X_COORD_CD',y='Y_COORD_CD',
                     tim='OCCUR_DATE',DistThresh=1500,TimeThresh=3)
print(Sys.time()) #around 4 seconds on my machine
head(res)
}

}
\references{
Wheeler, A. P., Riddell, J. R., & Haberman, C. P. (2021). Breaking the chain: How arrests reduce the probability of near repeat crimes. \emph{Criminal Justice Review}, 46(2), 236-258.
}
\seealso{
\code{\link[=near_strings1]{near_strings1()}}, which uses loops but is guaranteed to get all pairs of cases and should be memory safe.
}
