Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Treatments with Econml #930

Open
turankeles opened this issue Nov 25, 2024 · 8 comments
Open

Multiple Treatments with Econml #930

turankeles opened this issue Nov 25, 2024 · 8 comments

Comments

@turankeles
Copy link

Hi,
I greatly enjoy the EconML library. However, regarding multiple treatments, there is an issue I could not figure out. I would really appreciate your help.
Here is the brief of my problem:

I have 2 binary columns (email_campaign,social_media_ad) with an X variable and binary outcome.I ran a combined treatment with CausalForestDML and ran separate CausalForestDML separately for each treatment. why I get different ate results? When running multiple treatments, when I set T0=0,T1=1 why the ate result is different than running a separate model with only treatment email_campaign? The combined treatment column is 0 when email_campaign and social_media_ad is zero, 1 when social_media_ad is 1 and social_media_ad is 0 , 2 when email_campaign is 1 and social_media_ad is 0, 3 when both are 1. A sample of the data is:

image

import pandas as pd
import numpy as np
from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
np.random.seed(123)

Sample data (replace with your actual data)

data = pd.DataFrame({
'Customer ID': range(1, 1001),
'Sales': np.random.randint(0, 1000, 1000),
'churn': np.random.randint(0, 2, 1000),
'Email Campaign': np.random.randint(0, 2, 1000),
'Social Media Ad': np.random.randint(0, 2, 1000)
})

Create the combined treatment variable

data['Combined Treatment'] = data['Email Campaign'] * 2 + data['Social Media Ad']
data.columns=data.columns.str.lower().str.replace(' ','_')

Define features and target variable

X = data[['sales']]
T = data['combined_treatment']
Y = data['churn']

Initialize the CausalForestDML model

est = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)

Fit the model

model_est=est.fit(Y, T, X=X)

The ate result of each treatment:
est.ate(X,T0=0,T1=1) --> -0.0016 Social_media_ad ( combined_treatment==1)
est.ate(X,T0=0,T1=2) --> -0.033 The email_campaign (combined_treatment==2)
est.ate(X,T0=1,T1=2) --> -0.032
est.ate(X,T0=0,T1=3) --> -0.051

Email:
est_mail = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)

        est_mail.fit(Y, data["email_campaign"], X=X)
        est_mail.ate(X)  --> -0.019

In the above example, T0=0,T1=2 means the treatment of email_campaign. My question is why it yields different results with multiple treatments and separate treatments? How to utilize the multiple treatments approach in EconML?
Social media ad:
est_social_media_ad = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)

      est_social_media_ad .fit(Y, data["social_media_ad"], X=X)
      est_social_media_ad .ate(X) -->0.010

In the above example, T0=0,T1=1 means the treatment of social_media_ad. The result from multiple treatment model is negative but in the single treatment model is positive. Why?

Note:
1- I receive even contrasting (negative vs positive) results when running on different datasets.
2- I receive inconsistent results even if two treatment variables are totally independent, meaning when each customer receives only one treatment.

Best

@kbattocchi
Copy link
Collaborator

At least with the sample data in your example, the confidence intervals are pretty wide (e.g. (-0.37, 0.37) for est.ate_interval(X,T0=0,T1=1)) so the point estimates for each estimator are well within the confidence intervals of the other, so I wouldn't worry about it.

It's not surprising that the point estimates aren't exactly the same: we stratify on treatment when creating samples for cross-fitting, so the estimators aren't seeing exactly the same samples, and the treatment models will behave slightly differently since they're predicting different things (email vs. not email in one case, as opposed to distinguishing between all of None, Email, Social, Both in the other).

@fhz-3722
Copy link

Hi, I guess this example will answer your question https://github.com/py-why/EconML/blob/main/notebooks/Double%20Machine%20Learning%20Examples.ipynb

@turankeles
Copy link
Author

Thank you very much!
I have a few other questions, though.
1- For binary (discrete) treatment and binary outcome, the model_t and model_y both should be classifier?
2- The negative ate means the treatment is decreasing churn? And positive ate increasing the churn
3- And how to interpret the ate? As the probability of churn?

@turankeles
Copy link
Author

Hi, I guess this example will answer your question https://github.com/py-why/EconML/blob/main/notebooks/Double%20Machine%20Learning%20Examples.ipynb

Thanks!

@kbattocchi
Copy link
Collaborator

Thank you very much!
I have a few other questions, though.
1- For binary (discrete) treatment and binary outcome, the model_t and model_y both should be classifier?
2- The negative ate means the treatment is decreasing churn? And positive ate increasing the churn
3- And how to interpret the ate? As the probability of churn?

  1. Yes, pass discrete_treatment=True and discrete_outcome=True and then use classifiers for both models.
  2. Negative ATE would mean that on average the treatment decreases the likelihood of the 'high' outcome. If your outcome is churn, then yes, negative ATE would mean that it decreases churn.
  3. The ATE is the average change in the probability of the outcome if the treatment goes from 0 to 1.

@turankeles
Copy link
Author

Following the previous questions, I have Encoded four treatments into one column, combined_treatment. This multi-treatment column’s values range from 0 to 15. I am running a CausalForestDML with XGBClassifier, as shown below. However, some of the point estimates are bigger than 1 or lower than -1. I get similar results with all of the treatment interactions. If the output of this CausalForestDML model is the probability of the outcome (churn, binary), why I get point_estimates higher than 1 or lower than -1?
Switching from XGBClassifier to other algorithms such as RandomForestClassifier lowers the numbers of point_estimates that are outside of (-1,1) range, but still have some.

model = CausalForestDML(
model_t=XGBClassifier(),
model_y= XGBClassifier (),
discrete_treatment=True
)

est_model=model.dowhy .fit(Y, combined_treatment, X=X,W=W)

output1=est_model.effect_inference(X_test,T0=0,T1=1)
output2=est_model.effect_inference(X_test,T0=0,T1=2)
output3=est_model.effect_inference(X_test,T0=0,T1=3)
……
output15=est_model.effect_inference(X_test,T0=0,T1=15)
All these outputs yield some point_estimates out of -1,1 range.
If the results are probabilities of outcome, how to interpret or justify these results? If the results are not probabilities of outcome, how to interpret?

This is the output for T0=0, T1=12:
WhatsApp Image 2024-12-04 at 12 35 34_987fb6c4

Really appreciate your input!

@turankeles
Copy link
Author

Setting discrete_treatment=False does not help either.
CausalForestDML and LinearDML do not have discrete_outcome, so I cannot set discrete_outcome=True.

I tried the wrapper class here, #334 (comment), it doesnt change the results either.

@kbattocchi
Copy link
Collaborator

I'm a bit confused by your last statement - both CausalForestDML and LinearDML do have discrete_outcome arguments to their initializers (and as a side note, if your treatment is discrete you might want to use the DRLearner subclasses instead of DML ones anyway, though this same issue can also happen there).

The basic issue that can cause this type of result is just a kind of extrapolation. Imagine a setting where there's a binary treatment and we've learned first stage models where P(treatment=1) = 0.4 and P(outcome=1) = 0.2 for set of characteristics (e.g. for some rare combination of Xs). Then imagine that when we're training our final model, we have only one data point with this set of Xs, and it has treatment=1, outcome=1. Then the "surprise" portion of the outcome is 1-0.2=0.8, and the "surprise" portion of the treatment is 1-0.4=0.6, so the resulting treatment effect we'd calculate for this one-element subset would be 0.8/0.6>1.

As your sample size increases, this problem should become more and more rare (assuming your first stage models get arbitrarily accurate) - as the distribution of observed (treatment, outcome) pairs approaches the true density, it becomes mathematically guaranteed that the computed effect will be in [-1,1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants