Demand Clustering
Available for licensed Demand Guru users only, the Demand Guru Clustering
action allows you to run cluster definitions created in Demand Guru’s Time Series Forecasting Workbench, and save the output data generated from the cluster definitions.
When you use Demand Guru to create demand clusters, the clusters you define are saved within the workbench. However, the output generated when you run those cluster definitions in Demand Guru is only in memory and therefore not saved. Use this action to run the cluster definitions as configured in Demand Guru, and generate output that can be saved in Data Guru.
On the Connections tab:
-
Provide a name and description.
-
Select the Demand Guru Workbench that contains the cluster definitions.
-
Select the clusters to be run when the action is executed.
-
Select output options, including the prefix to be used for output table names.
The most common application for this action is using demand cluster output and input to other actions in Data Guru.
Execute Demand Guru cluster definition
- Drag the Demand Guru Clustering icon onto the design surface.
- Enter a Name and a Description to identify this action.
- For Input, select a Workbench created in Demand Guru that contains cluster definitions. Note that all workbenches are listed here, regardless of whether they are configured to use clusters.
- For Cluster Definitions, choose the cluster definitions to be included in the output. The list includes all cluster definitions for the selected workbench, with two additional choices that are always available here:
- The Select All option is a choice that reflects one of three different states, depending on whether any clusters in this list are already selected -
- Selected (traditional check mark), means that all clusters in the list are selected.
- Semi-selected (solid square), means that some but not all clusters in the list are selected.
- Unselected (empty) means that none of the clusters in the list are selected.
As you make or change selections to the clusters in this list, the state of the Select All checkbox is updated when necessary to reflect your current selections.
When you select this checkbox -
- If the checkbox is already selected or semi-selected, all clusters in the list become unselected.
- If the checkbox is not already selected, all clusters become selected.
- The Automatic cluster definition option allows you to let the application determine the relevant features and number of clusters.
To choose the cluster definitions to be included in the output -
- Select Run All Cluster Definitions to generate output data for all clusters in the workbench.
- Select Run Selected Cluster Definitions to run specific clusters, then check the box next to each cluster to be included.If you elect to run selected clusters, those clusters are selected by default the next time you open the action.
- The Select All option is a choice that reflects one of three different states, depending on whether any clusters in this list are already selected -
- For Output, choose your output database and table options:
- For Database Connection, select the database to which the output tables will be written.
- For Output Table Prefix, enter a string by which the output table names will be identified. The following tables are created:
- For Output Mode, indicate whether the output tables should be deleted after macro execution.
Demand Guru Clustering output tables
When you execute the Demand Guru Cluster action, you create a set of output tables similar to those created in Coupa’s Demand Guru. This output is based on the last saved clusters in the Demand Guru Workbench you are referencing in this action.
This table estimates the relative individual importance of each feature as a substitute for its importance in the overall clustering model, assigning a score to each feature.
Cluster Model
The name of the cluster definition.
Time Stamp
The date and time at which the cluster definition was run by this action.
Feature
The clustering feature that was extracted during execution.
Importance Score
The relative individual importance of the feature as a substitute for its importance in the overall clustering model.
Selected
Indicates if the feature was chosen as a part of the final clustering model.
Model ID
The internal ID of the model.
This table displays a score for overall quality of each cluster run by this action, indicating how well the cluster represents the time series it contains.
Cluster Model
The name of the cluster definition.
Time Stamp
The date and time at which the cluster definition was run by this action.
Cluster Quality Score
The overall score for the cluster.
Model ID
The internal ID of the model.
The Demand Guru Clustering action outputs two cluster feature tables with the same columns:
- Scaled - The data is normalized between -1 and 1.
- Unscaled - The data is not normalized.
Cluster Model
The name of the cluster definition.
Time Stamp
The date and time at which the cluster was run by this action.
Time Series
The user-specified name of the time series; for example, SKU identifiers, customer identifiers, or any of the "Group by" tags used in the data being clustered.
Cluster
The cluster to which the time series belongs after execution of the cluster model.
Representative Time Series
Indicates if the time series is representative of the cluster in which it is classified.
Seasonality
A coefficient value based on the periodicity of the most prominent seasonal period in a time series. Low values indicate low frequency or long period seasons, while high values indicate high frequency or short periodic time series. An absence of seasonality in the data is assumed to indicate extremely high frequency (noisy data) and has a high value associated with it.
Trend
An index of the strength of a trend. High positive values indicate strong upward trend, high negative values indicate a strong downward trend, and values close to zero indicate that the trend is flat.
Mean
Mean of the demand values in a time series.
Variance
Variance of the demand values in a time series.
Auto Correlation
Represents the extent of dependence on past demand values. While calculating this score, 10 lags are considered. A higher dependence of the time series on its past 10 values results in a higher auto correlation score.
Lumpiness
The variability of variance of each period in a time series. Conceptually, divide the time series into multiple sections, calculate the variance of each section, and then calculate the variance of these variance values. Low values indicate that the variance of the time series does not change much across its different sections, while high values indicate the variance is changing a lot across different sections of a time series.
Level Shift
The maximum absolute value of mean values of slices of a time series, when a rolling window (of size =1) is used. Seasonal time series are divided into multiple slices, with the slice length equal to its most prominent period. For non-seasonal time series, this slice length is equal to a fixed constant.
The first slice is rolled by the window size (= 1 here), and the maximum absolute value of the mean of the rolled slice gives the value of the level shift.
Intuitively, this value represents the maximum “level” in a section of a time series.
Variance Change
The maximum absolute value of variance values of slices of a time series, when a rolling window (of size =1) is used. Seasonal time series are divided into multiple slices, with the slice length equal to its most prominent period. For non-seasonal time series, this slice length is equal to a fixed constant.
The first slice is rolled by the window size (= 1 here), and the maximum absolute value of the variance of the rolled slice gives the value of the variance change.
Intuitively, this value represents the maximum variance in a section of a time series.
Crossing Points
The number of times a time series crosses the midpoint of its range. Range is nothing but (Maximum value – Minimum value).
Linearity
Strength of the linearity component of a time series, with high values indicating strong linearity, and positive values indicating an upward trending time series.
Curvature
The strength of a trend's curvature component in a time series. Positive values indicate convex shaped time series, while negative values indicate concave shaped time series.
Peak
Strength of the highest point on the seasonal component of a time series.
Trough
Value of the lowest point on the seasonal component of a time series.
Entropy
A measure of the forecastability of a time series. It reveals the degree of difficulty associated with forecasting a specific time series, based on only the demand values. Low values indicate that the time series is relatively easy to forecast, and higher values indicate increased difficulty.
Spikiness
The strength of spikes of residuals in a time series. The seasonality and trend components are first removed, and then the value is calculated over the residual component.
Flat Spots
The length of flat spots in a time series. To arrive at this value, a time series is broken down into multiple discrete levels. Then, an analysis is made to determine the number periods for which the time series maintains the same level. The overall length of these periods provides the value for this feature. A lower value means that the time series is changing its discretized levels more often, and a higher value means that the time series is changing its levels less often.
Model ID
The internal ID of the cluster model.
Last modified: Thursday December 19, 2024