linear-scaling

Linear scaling (more commonly shortened to just scaling) means converting floating-point values from their natural range into a standard range—usually 0 to 1 or -1 to +1.

Linear scaling is a good choice when all of the following conditions are met:

  • The lower and upper bounds of your data don’t change much over time.
  • The feature contains few or no outliers, and those outliers aren’t extreme.
  • The feature is approximately uniformly distributed across its range. That is, a histogram would show roughly even bars for most values.

Suppose human age is a feature. Linear scaling is a good normalization technique for age because:

  • The approximate lower and upper bounds are 0 to 100.
  • age contains a relatively small percentage of outliers. Only about 0.3% of the population is over 100.
  • Although certain ages are somewhat better represented than others, a large dataset should contain sufficient examples of all ages.

Note :

  • Most real-world features do not meet all of the criteria for linear scaling.
  • [[Z-score]] scaling is typically a better normalization choice than linear scaling.