
{\color{red} \bf Warning:}
The findings and conclusions in this article have not been
formally disseminated by the U.S. Food and Drug Administration
and should not be construed to represent any determination or
policy of any University, Institution, Agency, Adminstration
and National Laboratory.

Ranjan Maitra and this research was supported in part by the National Institute of
Biomedical Imaging and Bioengineering (NIBIB) of the National
Institutes of Health (NIH) under its Award No. R21EB016212.
The content of this paper however is solely the responsibility of the 
authors and does not represent the official views of either the
NIBIB or the NIH.

This document is written to explain the main
function of \pkg{MixfMRI}~\citep{Chen2018}, version 0.1-0.
Every effort will be made to ensure future versions are consistent with
these instructions, but features in later versions may not be explained
in this document.



\section[Introduction]{Introduction}
\label{sec:introduction}
\addcontentsline{toc}{section}{\thesection. Introduction}

The main purpose of this vignette is to demonstrate basic usage of
\pkg{MixfMRI} which is prepared to implement  developed methodology,
simulation studies, data analyses in
``Improved Activation Detection in Single-Subject fMRI
Studies''~\citep{ChenMaitra2018}.
The methodology mainly utilizes model-based clustering (unsupervised)
of functional Magnetic Resonance (fMRI) data that identifies regions
of brain activation associated with the performance of a task or the
application of a stimulus. The implemented methods include
2D and 3D unsupervised segmentation analyses for fMRI signals. 
For simplification, only 2D clustering is demonstrated in this vignette.
In this package, the data on fMRI signals are on the form of the
$p$-values at each voxel of a Statistical Parametric Map (SPM).

The clustering and segmentation analyses identify activated
voxels/signals (in terms of small $p$-values) from normal brain
behaviors within a single subject but also use spatial context. 

Note that the $p$-values may be derived from statistical models where
typically a proper experiment design is required.
These $p$-values are our data, and are used for 
% The methods and analyses presented in this document are for
% exploratory purposes only, so that no claims  nor 
analysis and no statements about significance level of $p$-values
associated with activated voxels/signals are needed.
Our analysis
approach allows for the prespecification of {\em a priori} expected
upper bounds for the proportion of activated voxels/signals that can
guide in determining activated voxels. 

For large datasets, the methods and analyses are also implemented in a
distributed manner 
especially using SPMD programming framework.
The package also includes workflows which utilize SPMD techniques.
The workflows serve as examples of data analyses and
large scale simulation studies. Several workflows are also built in to
automatically process clusterings, hypotheses, merging clusters, and
visualizations. See Section~\ref{sec:workflows} and files in
\code{MixfMRI/inst/workflow/} for more information.


\subsection[Dependent Packages]{Dependent Packages}
\addcontentsline{toc}{subsection}{\thesubsection. Dependent Packages}

The \pkg{MixfMRI} package depends on other \proglang{R} packages to be
functional even though they are not always required. This is because
some examples, functions and workflows of the 
\pkg{MixfMRI} may need utilities of those dependent packages.
For instance,
\\
\hspace*{0.5cm}
{\bf Imports:} \pkg{MASS}, \pkg{Matrix}, \pkg{RColorBrewer},
\pkg{fftw}, \pkg{MixSim}, \pkg{EMCluster}.
\\
\hspace*{0.5cm}
{\bf Enhances:} \pkg{pbdMPI}, \pkg{AnalyzeFMRI}, \pkg{oro.nifti}.


\subsection[The Main Function]{The Main Function}
\addcontentsline{toc}{subsection}{\thesubsection. The Main Function}

The main function, \code{fclust()}, implements model-based clustering
using the EM algorithm~\citep{McLachlan1996} for fMRI signal data and
provides unsupervised clustering results that identify activated regions
in the brain. The \code{fclust()} function contains an initialization method and
EM algorithms for clustering fMRI signal data which have two parts:
\begin{itemize}
\item
\code{PV.gbd} for $p$-value of signals associated with voxels, and
\item
\code{X.gbd} for voxel information/locations in either 2D or 3D,
\end{itemize}
where \code{PV.gbd} is of length $N$ (number of voxels) and
\code{X.gbd} is of dimension $N \times 2$ or $N \times 3$ (for 2D or 3D).
Each signal (per voxel) is assumed to follow a mixture distribution
of \code{K} components with mixing proportion \code{ETA}.
Each component has
two independent coordinates (one for each part)
with density functions: Beta and multivariate
Normal distributions, for each part of fMRI signal data.
\\

{\bf Beta Density:}\\
The first component (\code{k = 1})
is restricted by \code{min.1st.prop} and $Beta(1, 1)$ (equivalently,
the standard uniform) distribution. The rest \code{k = 2, 3, ..., K -
  1} components have different 
$Beta(alpha, beta)$ distributions with \code{alpha < 1 < beta} for all
\code{k > 1} components.
This coordinate mainly represents the results of test statistics
for determining activation of voxels (those that have smaller $p$-values).
Note that the test statistics may be developed/smoothed/computed
from a time course model associated with voxel behaviors.
See the main paper \citet{ChenMaitra2018} for information.
\\

{\bf Multivariate Normal Density:}\\
The logarithm of the multivariate normal density is used as a penalty
to provide regularization of the estimated parameters and in the
estimated activation. 
\code{model.X = "I"} is for identity covariance matrix of this
multivariate Normal 
distribution, and \code{"V"} for unstructured covariance matrix.
\code{ignore.X = TRUE} is to ignore \code{X.gbd} and normal density,
{\em i.e.} there is no regularization and only the Beta density is used.
Note that this coordinate (for each axis) is recommended to be
normalized in the $(0, 1)$ scale which is on the same scale of Beta
density. From a modeling perspective, rescaling \code{X.gbd} does not
have an effect. 
\\

In this package, the two parts \code{PV.gbd} and \code{X.gbd} are
assumed to be independent because the latter comes through the
addition of a penalty term to the log likelihood of the voxel-wise data on
$p$-values. % dependences were typically ignored in the designed
            % models (time course of voxel activation associated with the stimula)from which the \code{PV.gbd} are dervided.
The goal of the main function is to provide spatial
clusters (in addition to the \code{PV.gbd}) indicating spatial correlations.

Currently, APECMa~\citep{Chen2011}
and EM algorithms are implemented with EGM algorithm~\citep{Chen2013}
to speed up convergence when MPI and \pkg{pbdMPI}~\citep{Chen2012}
are available.
RndEM initialization~\citep{Maitra2009} with a specific way of choosing
initial seeds is implemented for obtaining
good initial values that has the potential to increase the chances of
convergence. 


\subsection[Datasets]{Datasets}
\addcontentsline{toc}{subsection}{\thesubsection. Datasets}

The package has been built with several datasets including
\begin{itemize}
\item
three 2D phantoms, \code{shepp0fMRI}, \code{shepp1fMRI}, and \code{shepp2fMRI},
\item
one 3D dataset, \code{pstats}, with $p$-values obtained from the SPM
obtained after running the Analysis of Functional Neuroimaging (AFNI)
software~\citep{cox96,coxandhyde97,cox12} on the imagination dataset of
\citet{tabelowandpolzehl11}. 
\item 
two small 2D voxels datasets, \code{pval.2d.complex} and \code{pval.2d.mag},
in $p$-values
\item
two toy examples, \code{toy1} and \code{toy2}.
\end{itemize}


\subsection[Examples]{Examples}
\addcontentsline{toc}{subsection}{\thesubsection. Examples}
The scripts in \code{MixfMRI/demo/} have several examples that demonstrate
the main function, the example datasets and other utilities in this package.
For a quick start,
\begin{itemize}
\item
the scripts \code{MixfMRI/demo/fclust2d.r} and \code{MixfMRI/demo/fclust3d.r}
show the basic usage of the main function \code{fclust()} using the
two toy datasets,
\item
the scripts \code{MixfMRI/demo/maitra_2d.r} and
\code{MixfMRI/demo/shepp.r} show and visualize
examples on how to generate simulated datasets with given overlap levels, and
\item
the scripts \code{MixfMRI/demo/alter_*.r} show alternative methods.
\end{itemize}


\subsection[Workflows]{Workflows}
\label{sec:workflows}
\addcontentsline{toc}{subsection}{\thesubsection. Workflows}
The package also has several workflows established for simulation studies.
The main examples are located in \code{MixfMRI/inst/workflow/simulation/}.
See the file \code{create_simu.txt} that generates scripts for simulations.

The files under \code{MixfMRI/inst/workflow/spmd/} have the main
scripts for the workflows.
Note that MPI and \pkg{pbdMPI} are required for workflows because these
simulations require potentially long computing times.

