4.1 Supervised Learning Walkthrough

Modern AI Fundamentals

0% completed

Supervised learning is a cornerstone of machine learning where labeled data (data paired with the correct answer) is used to train a model to make predictions on unseen data.

Supervised learning is like teaching a child with examples.

You give a computer lots of labeled data (for instance, pictures labeled “cat” or “dog”), so it can learn what makes each category different.

Over time, the computer figures out the patterns in these labeled examples.

Then, when it sees new data—like a picture it’s never seen—it can guess the correct label based on what it learned.

Essentially, supervised learning means the computer is “supervised” by having the right answers during its training, just like a student learning with the help of a teacher’s marked examples.

In this section, we will explain you the end-to-end process—from gathering and preparing your data to choosing a model and training it with code.

Supervised Learning

Data Gathering & Preprocessing

Acquiring Data
- You might download a publicly available dataset (e.g., from Kaggle) or collect your own.
- Data can arrive in various formats: CSVs, Excel, databases, or even text files.
Handling Missing Values
- Detect: Check for NaN (Not a Number) or blank entries.
- Deal With Them:
  - Drop rows/columns that have too many missing values if they’re not crucial.
  - Impute by filling with mean, median, mode, or a default value.
- Some ML models can’t handle missing data at all, while others degrade in accuracy if large chunks are missing.
Categorical Encoding
- If you have columns like “City” or “Color,” you can’t directly feed them as strings into most models.
- Label Encoding: Assign each category a numeric code (e.g., Red=0, Blue=1, Green=2).
- One-Hot Encoding: Create separate binary columns (e.g., is_Red, is_Blue, is_Green), each set to 1 or 0. This approach prevents the model from assuming a numeric relationship (like Green > Red).
Train-Test Split
- Why: Splitting your dataset into training (to learn) and testing (to evaluate) segments ensures you measure how well the model generalizes to unseen data.

How: Often a simple 80/20 or 70/30 split. In Python, you can do:

Python3

. . . .

Choosing a Model

Linear Regression vs. Decision Trees Linear Regression
- Use Case: Predicting a continuous value (e.g., house price, sales revenue).
- How It Works: Finds a line (or hyperplane in multiple dimensions) that best fits your data, minimizing the difference between predicted and actual values.
- Advantages: Simple, fast, easily interpretable.
- Limitations: Not ideal for highly complex relationships or data with lots of nonlinear patterns.
Decision Trees
- Use Case: Can do both regression (predict a number) and classification (predict a category).
- How It Works: Splits the data into branches based on feature values, creating a tree structure that aims to group similar outcomes together.
- Advantages: Easy to visualize, handles nonlinear data well, and works with both numerical and categorical features.
- Limitations: Prone to overfitting if the tree grows too large.

Which One to Use?

If you suspect your data has a linear trend (e.g., weight vs. height), a linear model might suffice.
If your data is more complex or has clear splits (e.g., if-then rules), a decision tree could capture the patterns better.

Code Demo: Building & Training a Simple Model

Let’s illustrate a basic regression approach with the popular Iris dataset—though Iris is commonly used for classification, you can adapt a similar workflow for house prices or any other regression/classification tasks.

Step-by-Step

Import Libraries & Load Data

Python3

. . . .

Note: If using a house price dataset, simply replace the loading steps. For the Iris dataset, we’ll pretend we want to predict the petal_length based on other features.

Data Selection & Preprocessing

Python3

. . . .

Choose a Model & Train

Python3

. . . .

Make Predictions & Evaluate

Python3

. . . .

Interpret the Results

Look at the MSE value. Lower is better. Compare it against the scale of your target variable. For example, if your petal_length ranges from 1 to 6, an MSE of 0.2 might be decent, while 5.0 would be terrible.
If using Decision Trees, consider tuning parameters like max_depth or min_samples_split to avoid overfitting.

.....

Like the course? Get enrolled and start learning!

Data Gathering & Preprocessing

Choosing a Model

Linear Regression vs. Decision Trees Linear Regression

Decision Trees

Which One to Use?

Code Demo: Building & Training a Simple Model

Step-by-Step