Decision Tree Classifier

Decision Tree Classifier is a simple and widely used classification technique that applies a straightforward idea to solve the classification problem.

Deep Patel
7 min readAug 21, 2021
Working of Decision Tree

The above example depicts the example of how we decide by looking at different parameters in our life. The same steps are followed in the decision tree classifier.

Introduction

A Decision Tree is a simple representation for classifying examples. It is Supervised Machine Learning algorithm where the data is continuously split according to a certain parameter.

The decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

The decision tree consists of :

  • Root: It is the topmost node in a decision tree. It learns to partition based on the attribute value. It partitions the tree in a recursive manner call recursive partitioning.
  • Branch / Sub-Tree: A subsection of a decision tree is called a branch or sub-tree. It connects two nodes of the tree.
  • Decision Node: When a sub-node splits into further sub-nodes, then it is called a decision node.
  • Leaf/ Terminal Node: Nodes with no children (no further split) are called Leaf or Terminal nodes.

How does the Decision Tree algorithm work?

Working Algorithm

The basic idea behind any decision tree algorithm is as follows:

  1. Select the best attribute(feature) using Attribute Selection Measures(ASM) to split the records.
  2. Make that attribute a decision node and breaks the dataset into smaller subsets.
  3. Starts tree building by repeating this process recursively for each child until one of the conditions will match:
  • All the tuples belong to the same attribute value.
  • There are no more remaining attributes.
  • There are no more instances.

Before diving deeper into the concept let’s understand impurity, types of measures of impurity. It will help us to understand the algorithm better.

Impurity, as its name suggests, it is a mixture of two or more things (like heterogenous), instead of a single component (like homogenous).

Credit: https://www.datasciencecentral.com

As seen in the figure, the raw form of data that we will get is impure data(Fig A). Creating node by nodes(Fig B), we try to make it pure at each level and finally we categorize it(Fig C).

There are a couple of impurity measures are there, but in this story, we will talk about only two such measures,

  1. Entropy
  2. Gini index/ Gini impurity

Entropy

Entropy is the amount of information is needed to accurately describe some sample. So if a sample is homogeneous (pure), means all the elements are similar then Entropy is 0, else if a sample is equally divided then entropy is a maximum of 1.

Entropy Calculation (i is no. of cases)

Gini index / Gini impurity

Gini index is a measure of inequality in a sample. It has a value between 0 and 1. Gini index of value 0 means samples are perfectly homogeneous(pure) and all elements are similar, whereas, Gini index of value 1 means maximal inequality among elements(impure). It is a sum of the square of the probabilities of each class. It is illustrated as,

Formula for Gini index

Construction of Decision Tree:

Step 1: Calculate the entropy of the target.

Step 2: The dataset is then split into different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain or decrease in entropy.

Step 3: Choose the attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.

Now, we have learned almost all the necessary intuition behind Decision-Tree Algorithm, so let’s see it’s code.

# Importing important libraries 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Make a model
model = DecisionTreeClassifier()
# fit the model with the training data
model.fit(train_x,train_y)
# depth of the decision tree
print('Depth of the Decision Tree :', model.get_depth())
# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)
# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test)
# Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

Decision Tree — Overfitting

Overfitting is a significant practical difficulty for decision tree models and many other predictive models. Overfitting happens when the learning algorithm continues to develop hypotheses that reduce training set error at the cost of an increased test set error. There are several approaches to avoiding overfitting in building decision trees.

  • Pre-pruning stops growing the tree earlier, before it perfectly classifies the training set.
  • Post-pruning allows the tree to perfectly classify the training set, and then post-prune the tree.

Practically, the second approach of post-pruning overfits trees is more successful because it is not easy to precisely estimate when to stop growing the tree. But to solve this overfitting problem in the decision we use another technique i.e. Random Forest Classifier.

Advantages and Disadvantages of Decision Trees in Machine Learning

Decision Tree is used to solve both classification and regression problems. But the main drawback of the Decision Tree is that it generally leads to overfitting of the data. Let us discuss its advantages and disadvantages in detail.

Advantages of Decision Tree

1. Clear Visualization: The algorithm is simple to understand, interpret and visualize as the idea is mostly used in our daily lives. The output of a Decision Tree can be easily interpreted by humans.

2. Simple and easy to understand: Decision Tree looks like simple if-else statements which are very easy to understand.

3. Decision Trees can be used for both classification and regression problems.

4. Decision Trees can handle both continuous and categorical variables.

5. No feature scaling required: No feature scaling (standardization and normalization) is required in the case of Decision Tree as it uses a rule-based approach instead of distance calculation.

6. Decision Tree can automatically handle missing values.

7. Decision Tree is usually robust to outliers and can handle them automatically.

9. Less Training Period: Training period is less as compared to Random Forest because it generates only one tree unlike forest of trees in the Random Forest.

Disadvantages of Decision Tree

1. Overfitting: This is the main problem of the Decision Tree. It generally leads to overfitting of the data which ultimately leads to wrong predictions. To fit the data (even noisy data), it keeps generating new nodes and ultimately the tree becomes too complex to interpret. In this way, it loses its generalization capabilities. It performs very well on the trained data but starts making a lot of mistakes on the unseen data.

2. High variance: As mentioned in point 1, a Decision Tree generally leads to the overfitting of data. Due to the overfitting, there are very high chances of high variance in the output which leads to many errors in the final estimation and shows high inaccuracy in the results. Achieve zero bias (overfitting), which leads to high variance.

3. Unstable: Adding a new data point can lead to regeneration of the overall tree and all nodes need to be recalculated and recreated.

4. Affected by noise: Little bit of noise can make it unstable which leads to wrong predictions.

5. Not suitable for large datasets: If the data size is large, then one single tree may grow complex and lead to overfitting. So in this case, we should use Random Forest instead of a single Decision Tree.

To overcome the limitations of the Decision Tree, we should use Random Forest which does not rely on a single tree. It creates a forest of trees and takes the decision based on the vote count. Random Forest is based on the bagging method which is one of the Ensemble Learning techniques, we will discuss that in further blogs.

I hope that this section helped understand the working behind Decision tree classifier. Comment down your thoughts, feedback or suggestions if any below. The next blog will be on Random Forest, hope to see you soon! ✌️

--

--

Deep Patel

Hi everyone, I am a software engineer at American Express. I am here to share my experience in tech.