About Decision Trees
* All samples will start from the root.
* At each node, one feature will split the samples.
About ID3
If you check tools like Weka, ID3 (Iterative Dichotomiser 3) algo is useful, it builds a decision tree from the top down repeatedly. (By Ross Quinlan). It is a predictive classification algorithm which is IG-based.
Key Steps:
Inputs: Sample set , Attributes set
.
Output: ID3 Decision Tree
1. If all attributes are processed, return; unless go to step 2.
2. Find out the attribute which has the max IG, set it as a node.
If can classify the samples, return; unless go to step 3.
3. For all possible values of , do :
1) Subset is a collection of the samples that meet
.
2) Generate attribute set: .
3) Set as sample set and
as new inputs, iteratively ran ID3.
About Random Forest
Statistical Methods for Prediction and Understanding.
If we build more than one decision trees using different subsets of the data, then they can vote on a new input, which will raise the accuracy (better than only on dtree).
Generally, the dtrees in the forest are independent. When a new sample is given as an input, we let every tree give a result (classification). Then based on the numbers, choose a max value of that class as the final result.
For each tree, the training set is a subset with replacements. Which means, in one tree, a record will appear more than one times in a tree, or a record never appears in one tree. When training a node, the attributes are randomly selected without replacements following a porportion, might be ,
or
.
Advantages:
•High training speed.
•Performs well on high features (no feature selection).
•Gives conclusions on features (shows the importance).
•Easy for implementation.
•Easy to be transferred to parallel process.
Algo:
Inputs:
• Training set: N examples represented by M features.
• Ensemble size T (i.e. number of trees).
• Reduced dimension size m ≪ M (i.e. number of features used to calculate the best split at each node).
Steps:
for i = 1 to T:
• Selecting N samples randomly as training set with replacement.
• Train a decision tree. For each node, randomly choose m features (not all of them) and calculate the best split (IG value).
• Use majority voting to make predictions based on all T trees.
References:
http://blog.csdn.net/leeshuheng/article/details/7777722 (Chinese)
http://www.stat.berkeley.edu/~breiman/RandomForests/
http://www.oschina.net/translate/random-forests-in-python?cmp
https://csmoodle.ucd.ie/moodle/pluginfile.php/38406/mod_resource/content/2/3-Random%20Forests.pdf (maybe not available with out logging in.)
To be continued…