Machine Learning model explainability

A new home for the Coupa Supply Chain documentation

Starting with Supply Chain 42, our documentation will be located on Coupa Compass. The help here will continue to be accessible for the foreseeable future, but will no longer be updated. If you need a Compass user account, contact your company’s Designated Support Contact (DSC).

You can access new and updated Supply Chain documentation in the following location on Compass:

https://compass.coupa.com/en-us/products/supply-chain-design-and-planning

Machine Learning model explainability

Tree-based models like XGBoost use multiple decision trees to make predictions. Depending on the number of features we have, such models can become very complex and understanding their results may be challenging. We use Shapley Additive Explanations (SHAP) approaches to explain the prediction paths and quantify the magnitude and directionality of each feature in predicting cost. These are referred to as “scores” of a feature.

SHAP

SHAP stands for Shapley Additive Explanations, which is a method to explain model predictions based on Shapley values from game theory. The downside of this approach is the computation involved when there are a large number of features in the model. To overcome this, we have used FastTreeShap, which is an efficient way to compute feature contributions. More information about this can be found here.

For the default Machine Learning model (XGBoost), FastTreeSHAP is used to compute the feature local scores to calculate scores for every path.

Local Scores

Local scores are the contributions of features in predicting the cost of a single path. This is calculated using SHAP for an XGBoost model. As a default, Local Scores are calculated for the XGBoost model using the FastTreeShap approach.

Interpretation of the score

Below is an example showing the local scores of features contributing to the transportation cost for Path ID “3”.

Path ID	Feature Name	Local Score	Feature Values
3	Truckload_Distance	0.7323	3406.24
3	NumberSitesMFG	0.246	1
3	TotalFlowUnitQty	-0.03	.222

By default, a logarithmic transformation of the business objective is performed before fitting the Machine Learning model. For better interpretation, the local scores generated for the features are retained in the log-space when reporting. The business objective’s true value and the Machine Learning model predicted value are converted to the original scale (after applying a reverse transformation - exponentiation).

A positive Local Score attributes to an increase in cost. A negative local score attributes to a reduction in cost. In this example, larger truckload distances increase transportation cost. On the contrary, large flow unit quantities are associated with lower transportation costs.

The True Value and predicted Value of the response is reported in the original space (after performing an inverse transformation). In order to derive the model predicted value from the Local scores, this calculation needs to be performed:

Model Predicted Score = Exp (Sum of Individual Local Scores) * Mean Prediction

Global Scores

An aggregated score of feature contributions across all paths of a model is referred to as a Global score. This is calculated by grouping the individual local feature scores and averaging their absolute values across all paths.

Interpretation of the score

Feature Name	Global Score
UniqueModeNumber	1.10
CustomerServiceDistance	-1.03
NumberSitesMFG	1.001

A positive global score attributes to an increase in cost and a negative score attributes to a decrease in cost.

Last modified: Friday May 12, 2023

Is this useful?

Yes No