Skip to content

annieyan/Bandits-using-UCB-algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bandits-using-UCB-algorithm

Thompson Sampling for Bandits using UCB policy

Multi-armed bandits problem, see https://en.wikipedia.org/wiki/Multi-armed_bandit

Suppose there are K slot machines, each of them provide reward amount specific to its own distribution. A gambler has to decide which arm to lift each time and at what order to maximize the total rewards he/she will recieve.

The UCB algorithm specifies at time t, we pull arm a_t that has the maximum value of (observed_mean reward of a + UCB confidence bound)

This program assumes K = 5, and the reward each arm gives subjects to Bernoulli distribution. If we adopt a Bayes point of view, our prior belief is that the probability of each arm is distributed according to a Beta(1, 1) distribution (i.e. our prior is uniform for each arm).

About

Thompson Sampling for Bandits using UCB policy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages