Release 0.3.3 · st-tech/zr-obp

The changes are summarized below:

add sample_action method to obp.policy.IPWLearner which trains an offline bandit policy that samples a non-repetitive set of actions for new data. Thus, it can be used in practice even when the action interface has a list structure
- #22
- detailed description: https://zr-obp.readthedocs.io/en/latest/_autosummary/obp.policy.offline.html#module-obp.policy.offline
fix a bug in the fit_predict method of obp.ope.RegressonModel
- #23
Complete the benchmark experiments on a wide variety of OPE estimators using the full size of Open Bandit Dataset.
- The detailed results and discussions can be found at the coming arXiv updates.
- https://github.com/st-tech/zr-obp/tree/master/benchmark/ope

Provide feedback