Software Automation
Decision Trees

Syllabus Content

Research models used by software engineers to design and analyse ML, including:
- decision trees

Decision Trees

What Are Decision Trees?

Think about how you make decisions in your daily life. If it's raining, you take an umbrella. If it's hot, you wear sunscreen. These "if-then" decisions form a branching pattern that looks remarkably like a tree when drawn out - and that's exactly what a decision tree is in machine learning.

A decision tree is a model that makes predictions by following a series of questions and answers, branching at each step until it reaches a conclusion.

Let's look at a simple example of a decision tree that predicts whether someone will play tennis based on weather conditions:

In this tree:

Each internal node represents a question about a feature (like "Is it sunny?")
Each branch represents an answer to that question
Each leaf node represents the final decision or prediction

Explore a Decision Tree

The "Slice of ML" interactive tool from Google lets you see how a decision tree classifies animals based on their features. Click through it and observe how the tree makes decisions at each step:

As you explore, notice how:

The tree asks questions about features like "Does it have fins?"
Each question splits the animals into more specific groups
Eventually, the tree can identify the type of animal based on its features

Google Machine Learning - Decision Tree Curriculum

How Software Engineers Design Decision Trees

When a software engineer designs a decision tree for a machine learning application, they follow a process that looks something like this:

Gather and Prepare Data: First, they collect data with features (attributes) and known outcomes. For example, in a loan approval system, features might include income, credit score, and employment history, with outcomes of "approved" or "denied".
Feature Selection: Next, they determine which features are most relevant to the prediction task. Not all features are equally useful - using too many can lead to overcomplicated trees.
Tree Construction: Using algorithms like ID3, C4.5, or CART, they build the tree by determining which questions to ask at each node. These algorithms generally work by:
- Calculating which feature provides the most information gain (helps most with classification)
- Creating a node that splits on that feature
- Recursively continuing this process for each resulting subset of data
Pruning: To prevent overfitting (when a model works well on training data but poorly on new data), engineers often "prune" branches that don't significantly improve predictions.
Validation and Testing: Finally, they test the tree on new data to ensure it generalizes well to cases it hasn't seen before.

Real-World Applications of Decision Trees

Decision trees aren't just theoretical constructs - they're used in numerous real-world applications:

Banking and Finance

Loan approval systems that assess creditworthiness
Fraud detection tools that flag suspicious transactions
Customer segmentation for targeted financial products

Healthcare

Diagnostic support systems that help identify diseases based on symptoms
Treatment recommendation systems
Patient risk assessment tools

Customer Service

Troubleshooting workflows that guide technicians through repair processes
Chatbots that direct customers to appropriate resources

Environmental Science

Species identification based on observed characteristics
Environmental risk assessment

Strengths and Limitations

Like any tool, decision trees have their strengths and limitations that software engineers need to consider:

Strengths:

Transparency: You can follow the logic of a decision tree step by step, making it easy to explain and interpret
Minimal data preparation: They can handle numerical and categorical data without much preprocessing
Intuitive representation: Non-technical stakeholders can understand them
Handle missing values: Many decision tree algorithms can work around missing data

Limitations:

Overfitting: They can create overly complex trees that don't generalize well to new data
Instability: Small changes in data can result in completely different trees
Bias toward dominant classes: They can be biased toward features with more levels
Limited for complex relationships: They may struggle with tasks that involve subtle patterns or complex interactions

Interactive Example: Medical Diagnosis

To see how decision trees work in a more complex scenario, try this interactive game that shows how decision trees can be used in medical diagnosis:

https://learn.concord.org/resources/1241/trees-in-a-diagnosis-game

Prac: Code a Decision Tree

Partner up and plan out a decision tree of your liking. Then, use branching in Python to make it work.

Page updated

Report abuse

Software AutomationDecision Trees