% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/choR-package.R
\docType{package}
\name{ChoR}
\alias{ChoR}
\alias{ChoR-package}
\title{Getting started with the ChoR package}
\description{
The chordalysis algorithm allows to learn the structure of graphical models from datasets with thousands of variables.
More information about the research papers detailing the theory behind Chordalysis is available at
\url{http://www.francois-petitjean.com/Research}

If you have problems using ChoR, find a bug, or have suggestions, please
contact the package maintainer by email.
Do not write to the general R lists or contact the authors of the original chordalysis software.

If you use the package, please cite references in your publications.
}
\details{
Chordalysis allows to learn the structure of graphical models from datasets with thousands of variables.
There are 3 differentes algorithms versions: SMT, Budget and MML. SMT, standing for Subfamiliwize Multiple Testing,
is generally the method of choice. It superseeds Budget and is always superior to it. Demonstration is in our KDD'16 paper (see CITATION). Both SMT and Budget
are based on statistical testing, while MML uses information theory to decide upon a model. The objective of the different techniques is slightly different: SMT controls the familywise 
error rate (FWER) while MML is a probabilistic method. Our experiments (again in KDD'16) indicate that SMT is superior to MML 
for most datasets.
}
\examples{
# Warning: RJava requires to **copy** your data from R into a JVM.
# If you need extra memory, use this option (here, for 4Gb) **before** loading choR.
# Note: not needed in our case, kept for the example
options( java.parameters = "-Xmx4g" )
library(ChoR)

## Test JAVA version
jv <- rJava::.jcall("java/lang/System", "S", "getProperty", "java.runtime.version")
jvn <- as.numeric(paste0(strsplit(jv, "[.]")[[1L]][1:2], collapse = "."))
if(jvn < 1.8){ stop("Java 8 is needed for this package but not available") }

# Helper function for graph printing. Require Rgraphviz:
# source("https://bioconductor.org/biocLite.R")
# biocLite("Rgraphviz")
printGraph = function(x){
  if(requireNamespace("Rgraphviz", quietly=TRUE)){
    attrs <- list(node=list(shape="ellipse", fixedsize=FALSE, fontsize=25))
    Rgraphviz::plot(x, attrs=attrs)
  } else { stop("Rgraphviz required for graph printing.") }
}


###### MUSHROOM #####
# We read the data from internet: http://repository.seasr.org/Datasets/UCI/csv/mushroom.csv
MR.data =
  read.csv(  "http://repository.seasr.org/Datasets/UCI/csv/mushroom.csv",
              header            = TRUE,             # Here, we have a header
              na.strings        = c("NA","?",""),   # Configure the missing values
              stringsAsFactors  = FALSE,            # Keep strings for now
              check.names       = TRUE              # Replace some special characters
            )

# This file has a special line with types. You can check this with MR.data[1,].
# Let's remove it:
MR.data = MR.data[-1, ]

# Launch the SMT analysis, with:
# ## default pValueThreshold=0.05
# ## computation of attributes cardinality from the data
MR.res = ChoR.SMT(MR.data)

# Access the result:
# ## As a list of cliques:
NR.cl = ChoR.as.cliques(MR.res)
print(NR.cl)
# ## As a formula
NR.fo = ChoR.as.formula(MR.res)
print(NR.fo)
# ## As a graph
if(requireNamespace("graph", quietly=TRUE)){
  NR.gr = ChoR.as.graph(MR.res)
  printGraph(NR.gr)
} else {
  print("'graph' package not installed; Skipping 'as graph' example.")
}



###### Titanic #####
T.data =
  read.csv( "https://ww2.amstat.org/publications/jse/datasets/titanic.dat.txt",
            sep               = "",       # White spaces
            header            = FALSE,
            stringsAsFactors  = FALSE
          )

# Give meaningful names
colnames(T.data) = c(   "Class", "Age", "Sex", "Survived" )
# Chordalysis
T.res = ChoR.SMT(T.data, card = c(4, 2, 2, 2))

if(requireNamespace("graph", quietly=TRUE)){
  T.gr = ChoR.as.graph(T.res)
  printGraph(T.gr)
}



####### Solar flare #####
#SF.data =
#  read.csv( # "https://archive.ics.uci.edu/ml/machine-learning-databases/solar-flare/flare.data2",
#            "https://raw.githubusercontent.com/jeffheaton/proben1/master/flare/flare.data2",
#            sep               = "",       # White spaces
#            skip              = 1,
#            header            = FALSE,
#            stringsAsFactors  = FALSE,
#            check.names       = TRUE
#          )
#
## Remove last 3 columns (classes, not attributes)
#SF.data = SF.data[-11:-13]
## Give meaningful names
#colnames(SF.data) = c(  "ClassCode", "LSpotSizeCode", "DistCode", "Activity",
#                        "Evolution", "PrevActivity", "HistoricallyComplex", "BecomeComplex",
#                        "Area", "AreaLSpot")
## Chordalysis
#SF.res = ChoR.SMT(SF.data, card = c(7, 6, 4, 2, 3, 3, 2, 2, 2, 2))
#
#if(requireNamespace("graph", quietly=TRUE)){
#  SF.gr = ChoR.as.graph(SF.res)
#  printGraph(SF.gr)
#}
#
}
\references{
See citation("ChoR")
}
\keyword{linear-log-analysis}
\keyword{model}
\keyword{package}
