-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Treatments with Econml #930
Comments
At least with the sample data in your example, the confidence intervals are pretty wide (e.g. (-0.37, 0.37) for It's not surprising that the point estimates aren't exactly the same: we stratify on treatment when creating samples for cross-fitting, so the estimators aren't seeing exactly the same samples, and the treatment models will behave slightly differently since they're predicting different things (email vs. not email in one case, as opposed to distinguishing between all of None, Email, Social, Both in the other). |
Hi, I guess this example will answer your question https://github.com/py-why/EconML/blob/main/notebooks/Double%20Machine%20Learning%20Examples.ipynb |
Thank you very much! |
Thanks! |
|
Setting discrete_treatment=False does not help either. I tried the wrapper class here, #334 (comment), it doesnt change the results either. |
I'm a bit confused by your last statement - both CausalForestDML and LinearDML do have The basic issue that can cause this type of result is just a kind of extrapolation. Imagine a setting where there's a binary treatment and we've learned first stage models where P(treatment=1) = 0.4 and P(outcome=1) = 0.2 for set of characteristics (e.g. for some rare combination of Xs). Then imagine that when we're training our final model, we have only one data point with this set of Xs, and it has treatment=1, outcome=1. Then the "surprise" portion of the outcome is 1-0.2=0.8, and the "surprise" portion of the treatment is 1-0.4=0.6, so the resulting treatment effect we'd calculate for this one-element subset would be 0.8/0.6>1. As your sample size increases, this problem should become more and more rare (assuming your first stage models get arbitrarily accurate) - as the distribution of observed (treatment, outcome) pairs approaches the true density, it becomes mathematically guaranteed that the computed effect will be in [-1,1]. |
Hi,
I greatly enjoy the EconML library. However, regarding multiple treatments, there is an issue I could not figure out. I would really appreciate your help.
Here is the brief of my problem:
I have 2 binary columns (email_campaign,social_media_ad) with an X variable and binary outcome.I ran a combined treatment with CausalForestDML and ran separate CausalForestDML separately for each treatment. why I get different ate results? When running multiple treatments, when I set T0=0,T1=1 why the ate result is different than running a separate model with only treatment email_campaign? The combined treatment column is 0 when email_campaign and social_media_ad is zero, 1 when social_media_ad is 1 and social_media_ad is 0 , 2 when email_campaign is 1 and social_media_ad is 0, 3 when both are 1. A sample of the data is:
import pandas as pd
import numpy as np
from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
np.random.seed(123)
Sample data (replace with your actual data)
data = pd.DataFrame({
'Customer ID': range(1, 1001),
'Sales': np.random.randint(0, 1000, 1000),
'churn': np.random.randint(0, 2, 1000),
'Email Campaign': np.random.randint(0, 2, 1000),
'Social Media Ad': np.random.randint(0, 2, 1000)
})
Create the combined treatment variable
data['Combined Treatment'] = data['Email Campaign'] * 2 + data['Social Media Ad']
data.columns=data.columns.str.lower().str.replace(' ','_')
Define features and target variable
X = data[['sales']]
T = data['combined_treatment']
Y = data['churn']
Initialize the CausalForestDML model
est = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)
Fit the model
model_est=est.fit(Y, T, X=X)
The ate result of each treatment:
est.ate(X,T0=0,T1=1) --> -0.0016 Social_media_ad ( combined_treatment==1)
est.ate(X,T0=0,T1=2) --> -0.033 The email_campaign (combined_treatment==2)
est.ate(X,T0=1,T1=2) --> -0.032
est.ate(X,T0=0,T1=3) --> -0.051
Email:
est_mail = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)
In the above example, T0=0,T1=2 means the treatment of email_campaign. My question is why it yields different results with multiple treatments and separate treatments? How to utilize the multiple treatments approach in EconML?
Social media ad:
est_social_media_ad = CausalForestDML(
model_t=RandomForestClassifier(random_state=123),
model_y=RandomForestRegressor(random_state=123),
discrete_treatment=True,random_state=123
)
In the above example, T0=0,T1=1 means the treatment of social_media_ad. The result from multiple treatment model is negative but in the single treatment model is positive. Why?
Note:
1- I receive even contrasting (negative vs positive) results when running on different datasets.
2- I receive inconsistent results even if two treatment variables are totally independent, meaning when each customer receives only one treatment.
Best
The text was updated successfully, but these errors were encountered: