-
Feature scaling is a technique used to standardize or normalize the range of independent variables (features) in a dataset.
-
It is an essential step in the preprocessing pipeline for many machine learning algorithms, particularly those that use distance-based calculations (e.g., k-NN, SVM, logistic regression, and neural networks).
-
When features are on different scales (e.g., age might range from 0-100, while income ranges in the thousands), models may be biased toward the higher magnitude features.
-
To avoid this, feature scaling brings all the features to a common scale without distorting differences in the ranges of values.
Common Methods of Feature Scaling
- Min-Max Scaling (Normalization): This technique scales the data between a fixed range, usually 0 and 1.
Formula: $$ X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}} $$
Implementation:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
- Z-score Normalization (Standardization): This method scales the data such that the mean is 0 and the standard deviation is 1. It is also known as standard scaling or Z-score normalization.
Formula: $$ X_{\text{scaled}} = \frac{X - \mu}{\sigma} $$
When to use: Best when features follow a Gaussian distribution (normal distribution).
Implementation:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
- MaxAbs Scaling: This scales the data by its maximum absolute value, preserving the sign of the data (useful for data that is already centered around 0).
Formula: $$ X_{\text{scaled}} = \frac{X}{|X_{\max}|} $$
Implementation:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)
- Robust Scaling: This technique is less sensitive to outliers as it uses the median and interquartile range (IQR) instead of the mean and standard deviation for scaling.
Formula: $$ X_{\text{scaled}} = \frac{X - \text{median}}{\text{IQR}} $$
Implementation:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
Example of Feature Scaling:
Here’s an example using StandardScaler and MinMaxScaler:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Example dataset
data = {'Age': [25, 45, 35, 50],
'Income': [50000, 64000, 58000, 62000]}
df = pd.DataFrame(data)
# Using StandardScaler
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
# Using MinMaxScaler
scaler = MinMaxScaler()
df_normalized = scaler.fit_transform(df)
print("Standard Scaled Data:\n", df_scaled)
print("Min-Max Scaled Data:\n", df_normalized)
When to Use Feature Scaling:
Algorithms based on distance: Algorithms like k-NN, K-Means, SVM, and PCA rely on the distance between points and can be influenced by the scale of features.
Gradient-based algorithms: Algorithms like gradient descent (used in linear regression, logistic regression, and neural networks) converge faster with feature scaling.
Neural networks: Feature scaling improves performance because it ensures faster convergence and avoids issues with large weight updates.
When Not to Use Feature Scaling:
Tree-based algorithms: Algorithms like Decision Trees, Random Forests, and XGBoost do not require feature scaling because they are not distance-based and can handle features with varying magnitudes.
Feature scaling ensures that your model performs optimally, treating each feature equally and helping distance- or gradient-based algorithms perform well.