読者です 読者をやめる 読者になる 読者になる

バンディットアルゴリズムの資料・論文のめも

だいぶ前に深層学習もいいけど理論系もやっとかないとダメだなあと書いた気がして、実際バンディットの問題を解く必要がある気がしたので一旦めも。 とりあえず理論を理解したいところ、多いけど。

導入

バンディットアルゴリズム入門と実践

www.slideshare.net

I’m a bandit

「俺がバンディット」というこのブログ、さすがに式もしっかり乗っています。

blogs.princeton.edu

blogs.princeton.edu

Thompson Sampling

アルゴリズム

qiita.com

バンディット問題の各定式化について

f:id:misos:20161111144636p:plain

pdf: ここ

資料:本多淳也, 東京大学 新領域創成科学研究科 助教. FIT2013

Introduction to Bandits: Algorithms and Theory

ICML Tutorial on bandits

ICML 2011, Bellevue (WA), USAのチュートリアル資料。以下が主な内容です。

  • Stochastic bandits
  • Adversarial bandits
  • Many-armed bandit problem
  • Linear bandits
  • Lipschitz bandits
  • Bandits in trees

応用例:レコメンデーション

Recommendations with Thompson Sampling - RichRelevance Engineering Blog : RichRelevance Engineering Blog

論文

年代順、選んだのはこのブログを最低限読めるのに必要そうなもの。

Some aspects of the sequential design of experiments

Robbins, Herbert. "Some aspects of the sequential design of experiments." Herbert Robbins Selected Papers. Springer New York, 1985. 169-177.

Adversarial multi-armed bandit, Auer, Cesa-Bianchi, Freund and Schapire (1995)

Auer, Peter, et al. "Gambling in a rigged casino: The adversarial multi-armed bandit problem." Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 1995.

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems (2006)

Even-Dar, Eyal, Shie Mannor, and Yishay Mansour. "Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems." Journal of machine learning research 7.Jun (2006): 1079-1105.

Bayesian multi-armed bandit (2010)

Scott, Steven L. "A modern Bayesian look at the multi‐armed bandit." Applied Stochastic Models in Business and Industry 26.6 (2010): 639-658.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem(2012)

Agrawal, Shipra, and Navin Goyal. "Analysis of Thompson Sampling for the Multi-armed Bandit Problem." COLT. 2012.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (2013)

Agrawal, Shipra, and Navin Goyal. "Analysis of Thompson Sampling for the Multi-armed Bandit Problem." COLT. 2012.

Bandits With Heavy Tail (2013)

Bubeck, Sébastien, Nicolo Cesa-Bianchi, and Gábor Lugosi. "Bandits with heavy tail." IEEE Transactions on Information Theory 59.11 (2013): 7711-7717.

Matroid Bandits: Fast Combinatorial Optimization with Learning (2014)

Kveton, Branislav, et al. "Matroid bandits: Fast combinatorial optimization with learning." arXiv preprint arXiv:1403.5045 (2014).