Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pred_contrib argument at predict #38

Open
Fish-Soup opened this issue Sep 18, 2024 · 4 comments
Open

Support pred_contrib argument at predict #38

Fish-Soup opened this issue Sep 18, 2024 · 4 comments

Comments

@Fish-Soup
Copy link

Fish-Soup commented Sep 18, 2024

Hi

I've been learning how to use this great package. One thing I like using in lightGBM is the pred_contrib=True argument at prediction time.

I notice if we run model.booster.predict(data, pred_contrib=True) this does generate data of the expected shape (n_features + 1) * (n_distributional_parameters).

However when I try and sum the contributions from each feature, this does not result in the same result as that which is generated when we call predict.

I believe this is due to the logic within the distribution class that occurs after booster.predict

EDIT: Upon more reading I realize that it is not always possible to decompose in such a way, as a function is called on each component, such that the sum of the function call on each component is not the same as the function call on the sum of the components.

However in the case of the identity function it is. In this case the difference seems to come from setting of the init score that is applied before the distribution function is called.

In lightGBM this is also an issue with Classifiers where the prediction_contributions are returned in logit_space.

Therefore in lightGBMLSS, pred_contrib could be supported, but when it is the distributional functions are not applied.

@StatMixedML
Copy link
Owner

Hi @Fish-Soup,

Thanks for the interest in the project.

Can you please provide a miminal working example so that I can look into it? Thanks!

@Fish-Soup
Copy link
Author

Fish-Soup commented Sep 19, 2024

Sure I managed to get something working, in a slightly in-elegant way. Extended the LightGBMLSS class and overwrote the prediction function.

Here is the bit that does the pred_contrib argument.


            y_pred = self.booster.predict(X, pred_contrib=True, raw_score=True)

            feature_columns = X.columns.tolist() + ["Const"]

            y_pred = pd.DataFrame(
                y_pred,
                columns=pd.MultiIndex.from_product(
                    [self.dist.distribution_arg_names, feature_columns],
                    names=["distribution_arg", "FeatureContribution"]
                ),
                index=X.index
            )

            init_score_pred = torch.tensor(
                np.ones(shape=(X.shape[0], 1)) * self.start_values,
                dtype=torch.float32
            )

            init_score_pred = pd.DataFrame(
                init_score_pred,
                columns=pd.MultiIndex.from_product(
                    [self.dist.distribution_arg_names, ["Const"]],
                    names=["distribution_arg", "FeatureContribution"]
                ),
                index=X.index
            )
            
            # add the init_score to the const contribution in lightgbm
            
            y_pred[init_score_pred.columns] = y_pred[init_score_pred.columns] + init_score_pred
            

           return y_pred

This gives a multi index column with 2 levels one for the distribution arg_name (e.g. loc and scale for a Gaussian) and Feature Contributions which is the number of features + 1 (for the constant column)

It seems however the better place to do this is in the distribution class itself, possibly when pred_type=="contribution".

generally speaking we cannot apply the response functions in self.param_dict as the contribution columns need to be summed before this function is applied.

The above also allows us to get a feature_importance metric for a prediction data set for each distributional parameter as follows

feature_importance = y_pred.abs().sum().unstack("distribution_arg")

For example we get each features contribution to both loc and scale for a gaussian.

@Fish-Soup
Copy link
Author

I'm writing a PR to review

@Fish-Soup
Copy link
Author

Hi @StatMixedML I've written a PR to the code, including adding a small unit test.

#39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants