Online algorithm for unsupervised sequential selection with contextual information
Files
Date
2020
DOI
Authors
Verma, Arun
Hanawal, Manjesh K.
Szepesvári, Csaba
Saligrama, Venkatesh
Version
Published version
OA Version
Citation
Arun Verma, Manjesh K Hanawal, Csaba Szepesvári, Venkatesh Saligrama. 2020. "Online Algorithm for Unsupervised Sequential Selection with Contextual Information." Advances in Neural Information Processing Systems.
Abstract
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a
new variant of the stochastic contextual bandits problem where the loss of an arm
cannot be inferred from the observed feedback. In our setup, arms are associated
with fixed costs and are ordered, forming a cascade. In each round, a context is
presented, and the learner selects the arms sequentially till some depth. The total
cost incurred by stopping at an arm is the sum of fixed costs of arms selected and
the stochastic loss associated with the arm. The learner’s goal is to learn a decision
rule that maps contexts to arms with the goal of minimizing the total expected loss.
The problem is challenging as we are faced with an unsupervised setting as the
total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm
can be inferred (explicitly or implicitly) from the problem structure. We observe
that learning is still possible when the problem instance satisfies the so-called
‘Contextual Weak Dominance’ (CWD) property. Under CWD, we propose an
algorithm for the contextual USS problem and demonstrate that it has sub-linear
regret. Experiments on synthetic and real datasets validate our algorithm.