Alternating decision tree

An Alternating Decision Tree (ADTree) is a machine learning methodfor classification. The ADTree data structure and algorithmare a generalization of decision trees and have connections to
boosting. ADTrees were introduced by Yoav Freund and Llew MasonYoav Freund and Llew Mason. TheAlternating Decision Tree Algorithm. Proceedings of the 16thInternational Conference on Machine Learning, pages 124-133 (1999)] .

Motivation

Original boosting algorithms typically combined either decision stumps or decision trees. Boosting decision stumps createsa set of $T$ weighted weak hypotheses (where $T$ is the number of boosting iterations), which can be visualized as aset for reasonable values of $T$ . Boosting decision treescould result in a final combined classifier with thousands (ormillions) of nodes for modest values of $T$ . Both of thesescenarios produce final classifiers in which it is either difficult tovisualize correlations or difficult to visualize at all.Alternating decision trees provide a method for visualizing decisionstumps in an ordered and logical way to demonstrate correlations. Indoing so, they simultaneously generalize decision trees and can beused to essentially grow boosted decision trees in parallel.

Description of the structure

The alternating decision tree structure consists of two components:decision nodes and prediction nodes. Decision nodes specify apredicate condition. Prediction nodes specify a value to add tothe score based on the result of the decision node. Eachdecision node can be seen as a conjunction between aprecondition (the decision node was reached) and the conditionspecified in the decision node.

Perhaps the easiest way to understand the interaction of decision andprediction nodes is through an example.The following example is taken from JBoost performing boosting for 6 iterations on the [http://www.ics.uci.edu/~mlearn/databases/spambase/ spambase dataset] (available from the [http://www.ics.uci.edu/~mlearn/MLRepository.html UCI Machine Learning Repository] ).Positive examples indicate that the message is spam and negativeexamples are not spam. During each iteration, a single node is added to the ADTree. The ADTree determined by the learning algorithm implemented in JBoost is:

center|800px|An ADTree for 6 iterations on theSpambase dataset.

The tree construction algorithm is described below in theDescription of the algorithm section. We now show how tointerpret the tree once it has been constructed. We focuson one specific instance:

For this instance, we obtain a score that determines theclassification of the instance. This score not only acts as aclassification, but also as a measure of confidence. The actual orderthat the ADTree nodes are evaluated will likely be different then theorder in which they were created. That is, the node from iteration 4can be evaluated before the node from iteration 1. There areconstraints to this (e.g. node from iteration 2 must be evaluatedbefore the node from iteration 5). In general, either breadth-firstor depth-first evaluation will yield the correct interpretation.

The following table shows how the score is created (progressivescore) for our above example instance:

There are a few observations that we should make
* The final classification of the example is positive (0.657), meaning that the example is considered to be spam.
* All nodes at depth 1 have their predicate evaluated and one of their prediction nodes contributes to the score. Thus a tree with depth 1 is the equivalent of boosted decision stumps.
* If a decision node is not reached (the node from iteration 5 in the above example) then the node's predicate and subsequent prediction nodes will not be evaluated.

Description of the algorithm

The alternating decision tree learning algorithm is described in theoriginal paper. The general idea involves afew main concepts:
* The root decision node is always TRUE or FALSE
* The tree is grown iteratively. The total number of iterations is generally decided prior to starting the algorithm.
* Each decision node ( $c_2$ ) is selected by the algorithm based on how well it discriminates between positive and negative examples.
* Once a decision node is created, the prediction node is determined by how well the decision node discriminates.

Before the algorithm, we first define some notation. Let $c$ be a predicate, then
* $W_+(c)$ is the weight of all positively labeled examples that satisfy $c$
* $W_-(c)$ is the weight of all negatively labeled examples that satisfy $c$
* $W(c)$ is the weight of all examples that satisfy $c$
* We call $c$ a precondition when it is a conjunction of previous base conditions and negations of previous base conditions

The exact algorithm is:

INPUT: $m$ examples and labels $(x_1,y_1),ldots,(x_m,y_m)$

Set the weight of all examples to $W_1(x_j) = 1/m$

Set the margin of all examples to $r_1(x_j) = 0$

The root decision node is always $c=$ TRUE, with a single prediction node

$a=frac{1}{2} extrm{ln}frac{W_+(T)}{W_-(T)}$

For $t = 1, ldots, T$ do:
* Let $c_1 in P_t$ be a precondition (that is, the node being created can be reached via $c_1$ ) and $c_2$ be a condition (the new node). Then each decision node ( $c_2$ ) is selected by the algorithm based on how well it discriminates between positive and negative examples. The original ADTree algorithm minimizes the criterion $2 left( sqrt{W_+(c_1wedge c_2) W_-(c_1 wedge c_2)} + sqrt{W_+(c_1wedge
eg c_2) W_-(c_1 wedge
eg c_2)}
ight) +W(
eg c_2)$ .
* Once a decision node is created, the prediction nodes are determined by $a=frac{1}{2} extrm{ln}frac{W_+(c_1wedge c_2)}{W_-(c_1 wedge c_2)}$ and $b=frac{1}{2} extrm{ln}frac{W_+(c_1wedge
eg c_2)}{W_-(c_1 wedge
eg c_2)}$
* Add the conditions $c_1 wedge c_2$ and $c_1 wedge
eg c_2$ to the set of possible preconditions $P_{t+1}$
* Update the weights: $W_{t+1}(x_j) = W_{t}(x_j)e^{r_t(x_j)y_j}$

Empirical Results

Figure 6 in the original paper demonstrates thatADTrees are typically as robust as boosted decision trees and boosted decision stumps.

References

External links

* [http://www.cs.ucsd.edu/~aarvey/jboost/presentations/BoostingLightIntro.pdf An introduction to Boosting and ADTrees] (Has many graphical examples of alternating decision trees in practice)

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

Decision tree learning — This article is about decision trees in machine learning. For the use of the term in decision analysis, see Decision tree. Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model… … Wikipedia
Árbol de decisión alternativo — Un Árbol de decisión alternativo es un método de clasificación proveniente del aprendizaje automático conocido en inglés como Alternating Decision Tree (ADTree). Las estructuras de datos y el algoritmo son una generalización de los árboles de… … Wikipedia Español
National Christmas Tree (United States) — This article is about the Christmas Tree placed on the Ellipse near the White House. For the Christmas Tree placed inside the White House, see White House Christmas tree. For for the giant Sequoia known as the Nation s Christmas Tree , see… … Wikipedia
Computation tree logic — Computation tree logic (CTL) is a branching time logic, meaning that its model of time is a tree like structure in which the future is not determined; there are different paths in the future, any one of which might be an actual path that is… … Wikipedia
BrownBoost — is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods.… … Wikipedia
List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… … Wikipedia
Список терминов, относящихся к алгоритмам и структурам данных — Это служебный список статей, созданный для координации работ по развитию темы. Данное предупреждение не устанавливается на информационные списки и глоссарии … Википедия
Список терминов — Список терминов, относящихся к алгоритмам и структурам данных Это сл … Википедия
Liste de problèmes NP-complets — Ceci est une liste des problèmes NP complets les plus connus en théorie de la complexité des algorithmes, exprimés sous la forme d un problème de décision. Puisqu on connaît plus de 3000 problèmes NP complets, cette liste n est pas exhaustive. La … Wikipédia en Français
List of NP-complete problems — Here are some of the more commonly known problems that are NP complete when expressed as decision problems. This list is in no way comprehensive (there are more than 3000 known NP complete problems). Most of the problems in this list are taken… … Wikipedia

Academic Dictionaries and Encyclopedias

Alternating decision tree

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Alternating decision tree

Look at other dictionaries:

Share the article and excerpts

Direct link