Posted in Machine Learning

Random Forest: intro and an example

About Decision Trees
* All samples will start from the root.
* At each node, one feature will split the samples.

About ID3
If you check tools like Weka, ID3 (Iterative Dichotomiser 3) algo is useful, it builds a decision tree from the top down repeatedly. (By Ross Quinlan). It is a predictive classification algorithm which is IG-based.

Key Steps:

Inputs: Sample set S, Attributes set A.
Output: ID3 Decision Tree

1. If all attributes are processed, return; unless go to step 2.
2. Find out the attribute a which has the max IG, set it as a node.
If a can classify the samples, return; unless go to step 3.
3. For all possible values of a , do :
1) Subset { S }_{ v } is a collection of the samples that meet a=v.
2) Generate attribute set: { A }_{ T }=A-\{ a\} .
3) Set { S }_{ v } as sample set and { A }_{ T } as new inputs, iteratively ran ID3.

About Random Forest

Statistical Methods for Prediction and Understanding.
If we build more than one decision trees using different subsets of the data, then they can vote on a new input, which will raise the accuracy (better than only on dtree).
Generally, the dtrees in the forest are independent. When a new sample is given as an input, we let every tree give a result (classification). Then based on the numbers, choose a max value of that class as the final result.
For each tree, the training set is a subset with replacements. Which means, in one tree, a record will appear more than one times in a tree, or a record never appears in one tree. When training a node, the attributes are randomly selected without replacements following a porportion, might be \sqrt { M },\sqrt[2]{ M } or \sqrt[1/2] { M }.

•High training speed.
•Performs well on high features (no feature selection).
•Gives conclusions on features (shows the importance).
•Easy for implementation.
•Easy to be transferred to parallel process.


• Training set: N examples represented by M features.
• Ensemble size T (i.e. number of trees).
• Reduced dimension size m ≪ M (i.e. number of features used to calculate the best split at each node).


for i = 1 to T:
• Selecting N samples randomly as training set with replacement.
• Train a decision tree. For each node, randomly choose m features (not all of them) and calculate the best split (IG value).
• Use majority voting to make predictions based on all T trees.

References: (Chinese) (maybe not available with out logging in.)

To be continued…


Keep calm and update blog.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s