書きかけ、後ほど各アルゴリズムの説明や実験など追加予定。 →水曜夜あたり→土日あたり

前回

UCB,その前はε-greedy＋softmax。

Contextual Bandit

スライド

Contexual bandit @TokyoWebMining from 正志坪坂

www.slideshare.net

リクルートの方のスライド。

説明

それぞれのアーム・もしくはアームを引く人に特徴ベクトルがついた multi-armed bandit problem と見て取れる。問題設定の積報酬の最大化はこれまでと同じ。アイテムのレコメンドの文脈では商品＝アーム、ユーザー＝アームを引く人（それぞれが年齢などのコンテクストを持つ）といった形式。

時刻tにk個の行動を選択する環境の下で、コンテキスト（特徴と呼ばれる時もある）X_t が与えられる時のバンディットでありざっくり以下のように書ける。

for t:
  - コンテキストX_tを取得
  - Contextual banditのアルゴリズムに X_t を入力し取るアーム a を決定
  - a を引いて報酬 r を得る
  - (a, r, X_t)を用いてアルゴリズムの内部状態を更新

各アルゴリズム外観

少ししか調べてない＋全くの専門外なので漏れがあるかも、という言い訳します。

EXP4

Auer, Peter, et al. "Gambling in a rigged casino: The adversarial multi-armed bandit problem." Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 1995.

Epoch-Greedy

Langford, John, and Tong Zhang. "The epoch-greedy algorithm for multi-armed bandits with side information." Advances in neural information processing systems. 2008.

LinUCB

Chu, Wei, et al. "Contextual Bandits with Linear Payoff Functions." AISTATS. Vol. 15. 2011.

Thompson sampling for Contextual Bandits

Thompson, William R. "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples". Biometrika, 25(3–4):285–294, 1933.

Agrawal, Shipra, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." ICML (3). 2013.

HyperTS・HyperTSFB

Tang, Liang, et al. "Ensemble contextual bandits for personalized recommendation." Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 2014.