Support pred_contrib argument at predict #38

Fish-Soup · 2024-09-18T15:21:06Z

Hi

I've been learning how to use this great package. One thing I like using in lightGBM is the pred_contrib=True argument at prediction time.

I notice if we run model.booster.predict(data, pred_contrib=True) this does generate data of the expected shape (n_features + 1) * (n_distributional_parameters).

However when I try and sum the contributions from each feature, this does not result in the same result as that which is generated when we call predict.

I believe this is due to the logic within the distribution class that occurs after booster.predict

EDIT: Upon more reading I realize that it is not always possible to decompose in such a way, as a function is called on each component, such that the sum of the function call on each component is not the same as the function call on the sum of the components.

However in the case of the identity function it is. In this case the difference seems to come from setting of the init score that is applied before the distribution function is called.

In lightGBM this is also an issue with Classifiers where the prediction_contributions are returned in logit_space.

Therefore in lightGBMLSS, pred_contrib could be supported, but when it is the distributional functions are not applied.

StatMixedML · 2024-09-19T12:05:54Z

Hi @Fish-Soup,

Thanks for the interest in the project.

Can you please provide a miminal working example so that I can look into it? Thanks!

Fish-Soup · 2024-09-19T20:35:01Z

Sure I managed to get something working, in a slightly in-elegant way. Extended the LightGBMLSS class and overwrote the prediction function.

Here is the bit that does the pred_contrib argument.


            y_pred = self.booster.predict(X, pred_contrib=True, raw_score=True)

            feature_columns = X.columns.tolist() + ["Const"]

            y_pred = pd.DataFrame(
                y_pred,
                columns=pd.MultiIndex.from_product(
                    [self.dist.distribution_arg_names, feature_columns],
                    names=["distribution_arg", "FeatureContribution"]
                ),
                index=X.index
            )

            init_score_pred = torch.tensor(
                np.ones(shape=(X.shape[0], 1)) * self.start_values,
                dtype=torch.float32
            )

            init_score_pred = pd.DataFrame(
                init_score_pred,
                columns=pd.MultiIndex.from_product(
                    [self.dist.distribution_arg_names, ["Const"]],
                    names=["distribution_arg", "FeatureContribution"]
                ),
                index=X.index
            )
            
            # add the init_score to the const contribution in lightgbm
            
            y_pred[init_score_pred.columns] = y_pred[init_score_pred.columns] + init_score_pred
            

           return y_pred

This gives a multi index column with 2 levels one for the distribution arg_name (e.g. loc and scale for a Gaussian) and Feature Contributions which is the number of features + 1 (for the constant column)

It seems however the better place to do this is in the distribution class itself, possibly when pred_type=="contribution".

generally speaking we cannot apply the response functions in self.param_dict as the contribution columns need to be summed before this function is applied.

The above also allows us to get a feature_importance metric for a prediction data set for each distributional parameter as follows

feature_importance = y_pred.abs().sum().unstack("distribution_arg")

For example we get each features contribution to both loc and scale for a gaussian.

Fish-Soup · 2024-09-23T13:43:11Z

I'm writing a PR to review

Fish-Soup · 2024-09-24T08:56:26Z

Hi @StatMixedML I've written a PR to the code, including adding a small unit test.

#39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pred_contrib argument at predict #38

Support pred_contrib argument at predict #38

Fish-Soup commented Sep 18, 2024 •

edited

Loading

StatMixedML commented Sep 19, 2024

Fish-Soup commented Sep 19, 2024 •

edited

Loading

Fish-Soup commented Sep 23, 2024

Fish-Soup commented Sep 24, 2024

Support pred_contrib argument at predict #38

Support pred_contrib argument at predict #38

Comments

Fish-Soup commented Sep 18, 2024 • edited Loading

StatMixedML commented Sep 19, 2024

Fish-Soup commented Sep 19, 2024 • edited Loading

Fish-Soup commented Sep 23, 2024

Fish-Soup commented Sep 24, 2024

Fish-Soup commented Sep 18, 2024 •

edited

Loading

Fish-Soup commented Sep 19, 2024 •

edited

Loading