Bandit

Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the …

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) …

A Fully Adaptive Algorithm for Pure Exploration in Linear Bandits

We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences …