feature-scaling | TNPSC Fuhrer Notes

Feature scaling is a technique used to standardize or normalize the range of independent variables (features) in a dataset.
It is an essential step in the preprocessing pipeline for many machine learning algorithms, particularly those that use distance-based calculations (e.g., k-NN, SVM, logistic regression, and neural networks).
When features are on different scales (e.g., age might range from 0-100, while income ranges in the thousands), models may be biased toward the higher magnitude features.
To avoid this, feature scaling brings all the features to a common scale without distorting differences in the ranges of values.

Common Methods of Feature Scaling

Min-Max Scaling (Normalization): This technique scales the data between a fixed range, usually 0 and 1.

Formula: $$ X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}} $$

Implementation:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

Z-score Normalization (Standardization): This method scales the data such that the mean is 0 and the standard deviation is 1. It is also known as standard scaling or Z-score normalization.

Formula: $$ X_{\text{scaled}} = \frac{X - \mu}{\sigma} $$

When to use: Best when features follow a Gaussian distribution (normal distribution).

Implementation:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

MaxAbs Scaling: This scales the data by its maximum absolute value, preserving the sign of the data (useful for data that is already centered around 0).

Formula: $$ X_{\text{scaled}} = \frac{X}{|X_{\max}|} $$

Implementation:

from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)

Robust Scaling: This technique is less sensitive to outliers as it uses the median and interquartile range (IQR) instead of the mean and standard deviation for scaling.

Formula: $$ X_{\text{scaled}} = \frac{X - \text{median}}{\text{IQR}} $$

Implementation:

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)

Example of Feature Scaling:

Here’s an example using StandardScaler and MinMaxScaler:

import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Example dataset
data = {'Age': [25, 45, 35, 50],
        'Income': [50000, 64000, 58000, 62000]}

df = pd.DataFrame(data)

# Using StandardScaler
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)

# Using MinMaxScaler
scaler = MinMaxScaler()
df_normalized = scaler.fit_transform(df)

print("Standard Scaled Data:\n", df_scaled)
print("Min-Max Scaled Data:\n", df_normalized)

When to Use Feature Scaling:

Algorithms based on distance: Algorithms like k-NN, K-Means, SVM, and PCA rely on the distance between points and can be influenced by the scale of features.

Gradient-based algorithms: Algorithms like gradient descent (used in linear regression, logistic regression, and neural networks) converge faster with feature scaling.

Neural networks: Feature scaling improves performance because it ensures faster convergence and avoids issues with large weight updates.

When Not to Use Feature Scaling:

Tree-based algorithms: Algorithms like Decision Trees, Random Forests, and XGBoost do not require feature scaling because they are not distance-based and can handle features with varying magnitudes.

Feature scaling ensures that your model performs optimally, treating each feature equally and helping distance- or gradient-based algorithms perform well.