0.3.3
The changes are summarized below:
- add
sample_action
method toobp.policy.IPWLearner
which trains an offline bandit policy that samples a non-repetitive set of actions for new data. Thus, it can be used in practice even when the action interface has a list structure - fix a bug in the
fit_predict
method ofobp.ope.RegressonModel
- Complete the benchmark experiments on a wide variety of OPE estimators using the full size of Open Bandit Dataset.
- The detailed results and discussions can be found at the coming arXiv updates.
- https://github.com/st-tech/zr-obp/tree/master/benchmark/ope