% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Lineage.R
\name{buildPhylipLineage}
\alias{buildPhylipLineage}
\title{Infer an Ig lineage using PHYLIP}
\usage{
buildPhylipLineage(clone, dnapars_exec, rm_temp = FALSE, verbose = FALSE)
}
\arguments{
\item{clone}{\code{\link{ChangeoClone}} object containing clone data.}

\item{dnapars_exec}{path to the PHYLIP dnapars executable.}

\item{rm_temp}{if \code{TRUE} delete the temporary directory after running dnapars;
if \code{FALSE} keep the temporary directory.}

\item{verbose}{if \code{FALSE} suppress the output of dnapars; 
if \code{TRUE} STDOUT and STDERR of dnapars will be passed to 
the console.}
}
\value{
An igraph \code{graph} object defining the Ig lineage tree. Each unique input 
          sequence in \code{clone} is a vertex of the tree, with additional vertices being
          either the germline (root) sequences or inferred intermediates. The \code{graph} 
          object has the following attributes.
          
          Vertex attributes:
          \itemize{
            \item  \code{name}:      value in the \code{SEQUENCE_ID} column of the \code{data} 
                                     slot of the input \code{clone} for observed sequences. 
                                     The germline (root) vertex is assigned the name 
                                     "Germline" and inferred intermediates are assigned
                                     names with the format {"Inferred1", "Inferred2", ...}.
            \item  \code{sequence}:  value in the \code{SEQUENCE} column of the \code{data} 
                                     slot of the input \code{clone} for observed sequences.
                                     The germline (root) vertex is assigned the sequence
                                     in the \code{germline} slot of the input \code{clone}.
                                     The sequence of inferred intermediates are extracted
                                     from the dnapars output.
            \item  \code{label}:     same as the \code{name} attribute.
          }
          Additionally, each other column in the \code{data} slot of the input 
          \code{clone} is added as a vertex attribute with the attribute name set to 
          the source column name. For the germline and inferred intermediate vertices,
          these additional vertex attributes are all assigned a value of \code{NA}.
          
          Edge attributes:
          \itemize{
            \item  \code{weight}:    Hamming distance between the \code{sequence} attributes
                                     of the two vertices.
            \item  \code{label}:     same as the \code{weight} attribute.
          }
          Graph attributes:
          \itemize{
            \item  \code{clone}:     clone identifier from the \code{clone} slot of the
                                     input \code{ChangeoClone}.
            \item  \code{v_gene}:    V-segment gene call from the \code{v_gene} slot of 
                                     the input \code{ChangeoClone}.
            \item  \code{j_gene}:    J-segment gene call from the \code{j_gene} slot of 
                                     the input \code{ChangeoClone}.
            \item  \code{junc_len}:  junction length (nucleotide count) from the 
                                     \code{junc_len} slot of the input \code{ChangeoClone}.
          }
}
\description{
\code{buildPhylipLineage} reconstructs an Ig lineage via maximum parsimony using the 
dnapars application of the PHYLIP package.
}
\details{
\code{buildPhylipLineage} builds the lineage tree of a set of unique Ig sequences via
maximum parsimony through an external call to the dnapars application of the PHYLIP
package. dnapars is called with default algorithm options, except for the search option, 
which is set to "Rearrange on one best tree". The germline sequence of the clone is used 
for the outgroup. 

Following tree construction using dnapars, the dnapars output is modified to allow
input sequences to appear as internal nodes of the tree. Intermediate sequences 
inferred by dnapars are replaced by children within the tree having a Hamming distance 
of zero from their parent node. The distance calculation allows IUPAC ambiguous 
character matches, where an ambiguous character has distance zero to any character in 
the set of characters it represents. Distance calculation and movement of child nodes 
up the tree is repeated until all parent-child pairs have a distance greater than zero 
between them. The germline sequence (outgroup) is moved to the root of the tree and
excluded from the node replacement processes, which permits the trunk of the tree to be
the only edge with a distance of zero. Edge weights of the resultant tree are assigned 
as the distance between each sequence.
}
\examples{
\dontrun{
# Load example data
file <- system.file("extdata", "ExampleDb.gz", package="alakazam")
df <- readChangeoDb(file)

# Preprocess clone
clone <- subset(df, CLONE == 164)
clone <- makeChangeoClone(clone, text_fields=c("SAMPLE", "ISOTYPE"), num_fields="DUPCOUNT")

# Run PHYLIP and process output
dnapars_exec <- "~/apps/phylip-3.69/dnapars"
graph <- buildPhylipLineage(clone, dnapars_exec, rm_temp=TRUE)

# Plot graph with a tree layout
library(igraph)
ly <- layout_as_tree(graph, root="Germline", circular=F, flip.y=T)
plot(graph, layout=ly)
}

}
\references{
\enumerate{
  \item  Felsenstein J. PHYLIP - Phylogeny Inference Package (Version 3.2). 
           Cladistics. 1989 5:164-166.
  \item  Stern JNH, Yaari G, Vander Heiden JA, et al. B cells populating the multiple 
           sclerosis brain mature in the draining cervical lymph nodes. 
           Sci Transl Med. 2014 6(248):248ra107.
}
}
\seealso{
Takes as input a \code{\link{ChangeoClone}}. 
          Temporary directories are created with \code{\link{makeTempDir}}.
          Distance is calculated using \code{\link{getSeqDistance}}. 
          See \code{\link{igraph}} and \code{\link{igraph.plotting}} for working 
          with igraph \code{graph} objects.
}

