AI::DecisionTree 0.08 review

by on

AI::DecisionTree is Perl module for automatically Learns Decision Trees. SYNOPSIS use AI::DecisionTree; my $dtree = new AI

License: Perl Artistic License
File size: 25K
Developer: Ken Williams
0 stars award from

AI::DecisionTree is Perl module for automatically Learns Decision Trees.


use AI::DecisionTree;
my $dtree = new AI::DecisionTree;

# A set of training data for deciding whether to play tennis
(attributes => {outlook => 'sunny',
temperature => 'hot',
humidity => 'high'},
result => 'no');

(attributes => {outlook => 'overcast',
temperature => 'hot',
humidity => 'normal'},
result => 'yes');

... repeat for several more instances, then:

# Find results for unseen instances
my $result = $dtree->get_result
(attributes => {outlook => 'sunny',
temperature => 'hot',
humidity => 'normal'});

The AI::DecisionTree module automatically creates so-called "decision trees" to explain a set of training data. A decision tree is a kind of categorizer that use a flowchart-like process for categorizing new instances. For instance, a learned decision tree might look like the following, which classifies for the concept "play tennis":

/ |
/ |
/ |
sunny/ overcast rainy
/ |
/ *no* /
/ /
high/ normal /
/ strong/ weak
*no* *yes* /
*no* *yes*

(This example, and the inspiration for the AI::DecisionTree module, come directly from Tom Mitchell's excellent book "Machine Learning", available from McGraw Hill.)

A decision tree like this one can be learned from training data, and then applied to previously unseen data to obtain results that are consistent with the training data.

The usual goal of a decision tree is to somehow encapsulate the training data in the smallest possible tree. This is motivated by an "Occam's Razor" philosophy, in which the simplest possible explanation for a set of phenomena should be preferred over other explanations. Also, small trees will make decisions faster than large trees, and they are much easier for a human to look at and understand. One of the biggest reasons for using a decision tree instead of many other machine learning techniques is that a decision tree is a much more scrutable decision maker than, say, a neural network.

The current implementation of this module uses an extremely simple method for creating the decision tree based on the training instances. It uses an Information Gain metric (based on expected reduction in entropy) to select the "most informative" attribute at each node in the tree. This is essentially the ID3 algorithm, developed by J. R. Quinlan in 1986. The idea is that the attribute with the highest Information Gain will (probably) be the best attribute to split the tree on at each point if we're interested in making small trees.


AI::DecisionTree 0.08 search tags