Scaling and outlier adjustments of clustering features
The range of values for clustering features can vary substantially. For example, the value of Seasonality varies between a narrow range of 0 and 1, while Spikiness can assume a value of 5 x 10^8. When such differences occur, the clustering algorithm can struggle when grouping multiple time series with such disparate values, and thus produce poor clustering results. The algorithm performs outlier adjustments and scaling of feature scores to address this problem.
Outliers in the following features can assume extremely low/high values:
- Spikiness
- Lumpiness
- Variance
- Variance Change
- Peak
- Trough
Outlier values of these features are adjusted by bringing outliers to 10% and 90% of the quantile levels before scaling is performed.
Once outliers are adjusted for these features, standard normal distribution is used to scale all the feature scores. After the scaling operation, 99.73% of the values of all features will lie between -3 and 3. These adjustments significantly improve clustering accuracy.
Last modified: Thursday December 19, 2024