Statistical Machine Learning - Multi-Armed Bandits

1 minute read

This project implements famous MAB algorithms and evaluates them on the basis of their performance - EpsilonGreedy, UCB, BetaThompson, LinUCB, LinThompson.

Github Source Link: https://github.com/abhinavcreed13/Multi-armed-bandits-MAB

Algorithms Implemented:

  • EpsGreedy
  • UCB
  • BetaThompson: Shipra Agrawal and Navin Goyal, ‘Analysis of Thompson sampling for the multi-armed bandit problem’, in Proceedings of the Conference on Learning Theory (COLT 2012), 2012. http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
  • LinUCB: Lihong Li, Wei Chu, John Langford, Robert E. Schapire, ‘A Contextual-Bandit Approach to Per- sonalized News Article Recommendation’, in Proceedings of the Nineteenth International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, 2010. https://arxiv.org/pdf/1003.0146.pdf
  • LinThompson: Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. ‘Unbiased offline evaluation of contextual- bandit-based news article recommendation algorithms.’ In Proceedings of the Fourth ACM Interna- tional Conference on Web Search and Data Mining (WSDM’2011), pp. 297-306. ACM, 2011. https://arxiv.org/pdf/1003.5956.pdf

Offline evaluation for algorithms without parameter tuning

....
# Run offline evaluation for algorithms
# EpsilonGreedy
mab = EpsGreedy(10, params['eps_greedy'])
results_EpsGreedy = offlineEvaluate(mab, arms, rewards, contexts, T)
# UCB
mab = UCB(10, params['ucb'])
results_UCB = offlineEvaluate(mab, arms, rewards, contexts, T)
# BetaThompson
mab = BetaThompson(10, params['beta_thompson'][0], params['beta_thompson'][1])
results_BetaThompson = offlineEvaluate(mab, arms, rewards, contexts, T)
# LinUCB
mab = LinUCB(10, 10, params['lin_ucb'])
results_LinUCB = offlineEvaluate(mab, arms, rewards, contexts, T)
# LinThompson
mab = LinThompson(10, 10, params['lin_thompson'])
results_LinThompson = offlineEvaluate(mab, arms, rewards, contexts, T)
....

Algorithms Evaluation

GridSearch for LinUCB

# generate 20 numbers for alpha between 10^-5 to 10^1
alphas = np.logspace(-5,1,20)

# set grid parameters with alphas
grid_parameters = {'narms': 10, 'ndims': 10, 'param_range': alphas,'rounds': 800}

# create gridsearch object with strategy and scoring function
grid_linucb = GridSearchMAB(mab=LinUCB, param_grid=grid_parameters, scoring=offlineEvaluate, strategy='naive',verbose=1)

# run gridsearch on provided data
grid_linucb.fit(arms,rewards,contexts)

LinUCB gridsearch

GridSearch for LinThompson

# generate 20 numbers for alpha between 10^-5 to 10^1
vparams = np.logspace(-5,1,20)

# set grid parameters with alphas
grid_parameters = {'narms': 10, 'ndims': 10, 'param_range': vparams,'rounds': 800}

# create gridsearch object with strategy and scoring function
grid_linthompson = GridSearchMAB(mab=LinThompson, param_grid=grid_parameters, scoring=offlineEvaluate, strategy='naive',verbose=1)

# run gridsearch on provided data
grid_linthompson.fit(arms,rewards,contexts)

LinThompson gridsearch

Offline evaluation for algorithms with parameter tuning

evaluation(T=800,params={'eps_greedy':grid_epsgreedy.best_param_, 'ucb':grid_ucb.best_param_, 'beta_thompson':[1.0,1.0], 'lin_ucb':grid_linucb.best_param_, 'lin_thompson':grid_linthompson.best_param_})

Evaluation

Cheers!

Leave a comment