- 導入
- 論文
- Some aspects of the sequential design of experiments
- Adversarial multi-armed bandit, Auer, Cesa-Bianchi, Freund and Schapire (1995)
- Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems (2006)
- Bayesian multi-armed bandit (2010)
- Analysis of Thompson Sampling for the Multi-armed Bandit Problem(2012)
- Analysis of Thompson Sampling for the Multi-armed Bandit Problem (2013)
- Bandits With Heavy Tail (2013)
- Matroid Bandits: Fast Combinatorial Optimization with Learning (2014)
だいぶ前に深層学習もいいけど理論系もやっとかないとダメだなあと書いた気がして、実際バンディットの問題を解く必要がある気がしたので一旦めも。 とりあえず理論を理解したいところ、多いけど。
導入
バンディットアルゴリズム入門と実践
www.slideshare.net
I’m a bandit
「俺がバンディット」というこのブログ、さすがに式もしっかり乗っています。
Thompson Sampling
アルゴリズム
バンディット問題の各定式化について
pdf: ここ
資料:本多淳也, 東京大学 新領域創成科学研究科 助教. FIT2013
Introduction to Bandits: Algorithms and Theory
ICML 2011, Bellevue (WA), USAのチュートリアル資料。以下が主な内容です。
- Stochastic bandits
- Adversarial bandits
- Many-armed bandit problem
- Linear bandits
- Lipschitz bandits
- Bandits in trees
応用例:レコメンデーション
論文
年代順、選んだのはこのブログを最低限読めるのに必要そうなもの。
Some aspects of the sequential design of experiments
Robbins, Herbert. "Some aspects of the sequential design of experiments." Herbert Robbins Selected Papers. Springer New York, 1985. 169-177.
Adversarial multi-armed bandit, Auer, Cesa-Bianchi, Freund and Schapire (1995)
Auer, Peter, et al. "Gambling in a rigged casino: The adversarial multi-armed bandit problem." Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 1995.
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems (2006)
Even-Dar, Eyal, Shie Mannor, and Yishay Mansour. "Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems." Journal of machine learning research 7.Jun (2006): 1079-1105.
Bayesian multi-armed bandit (2010)
Scott, Steven L. "A modern Bayesian look at the multi‐armed bandit." Applied Stochastic Models in Business and Industry 26.6 (2010): 639-658.
Analysis of Thompson Sampling for the Multi-armed Bandit Problem(2012)
Agrawal, Shipra, and Navin Goyal. "Analysis of Thompson Sampling for the Multi-armed Bandit Problem." COLT. 2012.
Analysis of Thompson Sampling for the Multi-armed Bandit Problem (2013)
Agrawal, Shipra, and Navin Goyal. "Analysis of Thompson Sampling for the Multi-armed Bandit Problem." COLT. 2012.
Bandits With Heavy Tail (2013)
Bubeck, Sébastien, Nicolo Cesa-Bianchi, and Gábor Lugosi. "Bandits with heavy tail." IEEE Transactions on Information Theory 59.11 (2013): 7711-7717.
Matroid Bandits: Fast Combinatorial Optimization with Learning (2014)
Kveton, Branislav, et al. "Matroid bandits: Fast combinatorial optimization with learning." arXiv preprint arXiv:1403.5045 (2014).