Supervised Machine Learning is a fundamental aspect of artificial intelligence, involving the training of models on labeled datasets to generate predictions. The two primary categories of supervised learning tasks are Regression and Classification. Although they are based on similar foundational concepts, they have different objectives and employ various algorithms.
Regression in Supervised Learning
Regression is a technique for predictive modeling that aims to estimate continuous outcomes. It analyzes the relationship between independent variables (features) and a dependent variable (target). The main features of regression include:
- The output is a continuous variable (such as temperature, sales, or stock prices).
- It predicts numerical values based on the provided input features.
The following are the commonly used regression algorithms:
- Linear Regression: Finds the best-fit line for the data.
Equation:
\[y = \beta_0 + \beta_1x_1 + \beta_2x_2 +...+ \beta_nx_n + \in\]where,
\(y\implies\) Dependent variable (e.g., house price)
\(x_i\implies\) Independent variables (e.g., size, location)
\(\beta_i\implies\) Coefficients/weights for the features
\(\in\implies\) Error term (residual)
Example: Predicting house prices based on factors like size and location.
- Polynomial Regression: Models non-linear relationships by using polynomial terms, enhancing Linear Regression by adding higher-degree terms.
Equation:
\[y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 +...+ \beta_nx^n + \in\]Example: Predicting sales growth over time.
- Support Vector Regression (SVR): Extends support vector machines for regression tasks.
SVR minimizes
\[\frac{1}{2}||w||^2 + C\sum_{i=1}^{n}max(0,|y_i - (\langle {w,x_i} \rangle + b)| - \in)\]where,
\(w\implies\) Weight vector
\(x_i\implies\) Input features
\(y_i\implies\) Target Values
\(b\implies\) Bias term
\(\in\implies\) Epsilon-tube defining tolerance for error
\(C\implies\) Regularization parameter controlling margin flexibility
Example: Predicting stock prices.
- Decision Trees/Random Forest Regressor: Splits data into decision nodes to predict a continuous outcome. It Splits data into nodes based on feature thresholds to minimize variance:
\[\text{Variance Reduction} = \text{Variance(Parent Node)}\sum_{i}(\frac{\text{Samples in Child Node i}}{\text{Samples in Parent Node}}.\text{Variance(Child Node i)})\]
Random Forest averages multiple decision trees to make a final prediction.
Example: Predicting rainfall levels.
Evaluation Metrics for Regression
- Mean Absolute Error (MAE): This metric assesses the average size of errors, disregarding their direction. \[MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y_i}|\]
- Mean Squared Error (MSE): This metric emphasizes larger errors by squaring the differences, thus imposing a greater penalty on them compared to smaller errors. \[MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y_i})^2\]
- Root Mean Squared Error (RMSE): This metric offers a value that is expressed in the same units as the target variable, facilitating interpretation. \[RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y_i})^2}\]
- R-Squared (Coefficient of Determination): This statistic reflects the extent to which the model accounts for the variability in the target variable. \[R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y_i})^2}{\sum_{i=1}^{n}(y_i - \bar{y_i})^2}\]
- Adjusted R-Squared: Adjusted R-squared penalizes the addition of non-significant predictors. If new predictors improve the model, Adjusted R-squared increases; otherwise, it decreases.
\[R_{adj}^{2} = 1 - (\frac{(1-R^2)(n-1)}{n-k-1})\]
where,
\(R^2\implies\) Coefficient of determination
\(n\implies\) Total no. of observations
\(k\implies\) Number of independent variables (predictors) in the model
Example of Regression
Suppose we want to predict housing prices based on size, location, and amenities. A dataset could be structured as follows:
House Size (sq ft) | Location Score | Amenities Score | Price ($) |
---|---|---|---|
2000 | 8 | 7 | 350,000 |
1500 | 6 | 6 | 250,000 |
3000 | 9 | 8 | 500,000 |
Using Linear Regression, the model outputs a numerical price prediction for new input data.
Regression Visualization:
The above scatter plot showing house sizes versus prices, with a red regression line indicating the trend represents how the house price changes with size.
Classification in Supervised Learning
Classification is a technique in predictive modeling that categorizes input data into established classes or categories. The main features of Classification within Supervised Learning include:
- The output is a categorical variable (for instance, spam/not spam or disease/healthy).
- It predicts discrete labels.
The following are the commonly used classification algorithms:
- Logistic Regression: This method estimates probabilities and assigns binary classes through the sigmoid function.
Equation:
\[P(y = 1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \beta_2x_2 +... + \beta_nx_n)}}\]- \(P(y = 1|x) \implies \) Probability of class 1 (e.g., spam)
- Decision threshold (e.g., 0.5) determines class
Example: Predicting whether an email is spam or not.
- Support Vector Machines (SVM): This algorithm identifies the hyperplane that effectively separates different classes of data.
Equation:
Maximizes margins between classes:
\[Maximize: \frac{2}{||w||}\]Subject to:
\[y_i(\langle {w_i,x_i} \rangle + b) >= 1\]where,
\(w\implies\) Weight vector
\(b\implies\) Bias term
\(y_i\implies\) Class label (+1 or -1)
Example: Classifying different types of flowers.
- Decision Trees/Random Forest Classifier: These algorithms create tree-like structures for classification purposes. It Splits data into nodes by maximizing information gain (e.g., Gini Index or Entropy):
\[\text{Information Gain} = \text{Entropy(Parent Node)}\sum_{i}(\frac{\text{Samples in Child Node i}}{\text{Samples in Parent Node}}.\text{Entropy(Child Node i)})\]
The Random Forest approach enhances classification accuracy by averaging the results of multiple decision trees.
Example: Predicting customer churn.
- K-Nearest Neighbors (KNN): This method classifies data points based on the majority vote from the k nearest neighbors.
Distance metrics (e.g., Euclidean distance):
\[d(x,x^{'}) = \sqrt{\sum_{i=1}^{n}(x_i - x_i^{'})^2}\]Example: Classifying handwritten digits.
Evaluation Metrics for Classification
- Accuracy: This metric assesses the overall correctness of the classification model. \[Accuracy = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}\]
- Precision: This indicates the proportion of predicted positive instances that are truly positive. \[Precision = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}\]
- Recall (Sensitivity or True Positive Rate): This evaluates the model's capability to detect all actual positive instances. \[Recall = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}\]
- F1-Score: This represents the harmonic mean of precision and recall, providing a balance between the two metrics. \[F1-Score = 2.\frac{Precision.Recall}{Precision + Recall}\]
- Receiver Operating Characteristic (ROC) Curve and Area Under Curve (AUC): The ROC curve illustrates the relationship between the True Positive Rate and the False Positive Rate, while the AUC measures the model's overall effectiveness in classifying different categories.
- Log Loss (Cross-Entropy Loss): This metric penalizes the discrepancies between actual and predicted probabilities. \[\text{Log Loss} = \frac{1}{n}\sum_{i=1}^{n}(y_i log(\hat{y_i}) + (1 - y_i)log(1-\hat{y_i}))\]
- Confusion Matrix: This is a comprehensive table that summarizes true positives, true negatives, false positives, and false negatives, facilitating the calculation of various metrics.
- Specificity (True Negative Rate): This measures the model's ability to accurately identify negative instances. \[Specificity = \frac{\text{True Negatives}}{\text{True Negatives} + \text{False Positives}}\]
Example of Classification
To classify emails as either spam or not spam, we could utilize a dataset that appears as follows:
Email Length (words) | Links Count | Contains "Free"? | Spam/Not Spam |
---|---|---|---|
120 | 2 | Yes | Spam |
50 | 0 | No | Not Spam |
200 | 5 | Yes | Spam |
Using Logistic Regression, the model outputs the probability of an email being spam, assigning it to a class based on a threshold (e.g., 0.5).
Classification Visualization:
The above 2D scatter plot showing two categories: "Spam" and "Not Spam." includes a decision boundary (black line) separating the two categories, with shaded regions for classification zones.
Conclusion
Regression and Classification are fundamental components of supervised learning, each tailored for distinct types of problems. Regression focuses on predicting continuous outcomes, whereas Classification is concerned with categorizing data points into specific labels. Grasping the distinctions between these methods is crucial for choosing the right strategy for your machine learning projects. By gaining proficiency in these techniques and their related algorithms, you can address a diverse range of practical challenges, such as predicting sales trends or identifying fraudulent activities.
Go to Index page
Disclaimer
The content or analysis presented in the Blog is exclusively intended for educational purposes. It is important to note that this should not be considered as a suggestion for investing in stocks or as legal or medical advice. It is highly recommended to seek guidance from an expert before making any decision.
You would also like to read:
- Analysis of 'titanic' dataset to predict survivor traits using Logistic Regression Model
- Estimating Fuel Efficiency using Linear Regression Model
- Supervised Machine Learning: How Machines Learn with Labeled Data
- Unsupervised Machine Learning: How Machines Discover Insights Without Labels
- Selecting the Best Free Python Data Science Environments
- TensorFlow: The Go-To Tool for AI and Machine Learning
- What is a Decision Tree?
- Neural Network: the core framework for Deep Learning Models
- Reinforcement Learning: Focus on Learning through interactions with environment
- How do machine learning models choose appealing ads for user segments?
- Safeguarding Human Intelligence: Essential Improvements for Thriving in an AI-Driven World
- The Transformative Power of Artificial Intelligence (AI) and Machine Learning (ML)
- General AI: How Close Are We to Achieving Human-Like Intelligence?
- Narrow AI - the Specialized Artificial Intelligence: Key Aspects and Recent Advancements