Multi-armed Bandit Allocation Indices
eBook - ePub

Multi-armed Bandit Allocation Indices

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Multi-armed Bandit Allocation Indices

About this book

In 1989 the first edition of this book set out Gittins' pioneering index solution to the multi-armed bandit problem and his subsequent investigation of a wide of sequential resource allocation and stochastic scheduling problems. Since then there has been a remarkable flowering of new insights, generalizations and applications, to which Glazebrook and Weber have made major contributions.

This second edition brings the story up to date. There are new chapters on the achievable region approach to stochastic optimization problems, the construction of performance bounds for suboptimal policies, Whittle's restless bandits, and the use of Lagrangian relaxation in the construction and evaluation of index policies. Some of the many varied proofs of the index theorem are discussed along with the insights that they provide. Many contemporary applications are surveyed, and over 150 new references are included.

Over the past 40 years the Gittins index has helped theoreticians and practitioners to address a huge variety of problems within chemometrics, economics, engineering, numerical analysis, operational research, probability, statistics and website design. This new edition will be an important resource for others wishing to use this approach.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher
Wiley
Year
2011
Print ISBN
9780470670026
eBook ISBN
9781119990215
Chapter 1
Introduction or Exploration
This book is about mathematical models for optimizing in a sequential manner the allocation of effort between a number of competing projects. The effort and the projects may take a variety of forms. Examples are: an industrial processor and jobs waiting to be processed; a server with a queue of customers; an industrial laboratory with research projects; any busy person with jobs to do; a stream of patients and alternative treatments (yes, the patients do correspond to effort—see Problem 5 later in this chapter); a searcher who may look in different places. In every case effort is treated as being homogeneous, and the problem is to allocate it between the different projects so as to maximize the expected total reward which they yield. It is a sequential problem, as effort is allowed to be reallocated in a feedback manner, taking account of the pattern of rewards so far achieved. The reallocations are assumed to be costless, and to take a negligible time, since the alternative is to impose a travelling-salesman-like feature, thereby seriously complicating an already difficult problem.
The techniques which come under the heading of dynamic programming have been devised for sequential optimization problems. The key idea is a recurrence equation relating the expected total reward (call this the payoff) at a given decision time to the distribution of its possible values at the next decision time (see equation (2.2)). Sometimes this equation may be solved analytically. Otherwise a recursive numerical solution may, at any rate in principle, be carried out. This involves making an initial approximation to the payoff function, and then successive further approximations by substituting in the right-hand side of the recurrence equation. As Bellman (1957), for many years the chief protagonist of this methodology, pointed out, using the recurrence equation involves less computing than a complete enumeration of all policies and their corresponding payoffs, but none the less soon runs into the sands of intractable storage and processing requirements as the number of variables on which the payoff function depends increases.
For the problem of allocating effort to projects, the number of variables is at least equal to the number of projects. An attractive idea, therefore, is to establish priority indices for each project, depending on its past history but not that of any other project, and to allocate effort at each decision time only to the project with the highest current index value. To calculate these indices it should be possible to calibrate a project in a given state against some set of standard projects with simple properties. If this could be done we should have a reasonable policy without having to deal with any function of the states of more than one project.
That optimal policies for some problems of effort allocation are expressible in terms of such indices is well known. The first three problems described in this chapter are cases in point. Chapters 2, 3 and 4 set out a general theory which puts these results into context. They include several theorems asserting the optimality of index policies under different conditions. In fact five of the seven problems described in this chapter may be solved fairly rapidly by using the index theorem (Theorem 2.1), and its Corollary 3.5. Problem 4 requires Theorem 3.1, which is a continuous-time version of the index theorem.
Since they may change as more effort is allocated, these priority indices may aptly be termed dynamic allocation indices. The policies that they define include, in relevant circumstances, the cμ-rule, Smith's rule, and the shortest processing time rule (SPT). The main aim of this chapter is to exhibit their range of application by describing a number of particular instances. A second aim is to give an informal introduction to the main methods available for determining indices. These are by (i) interchange arguments; (ii) exploiting any special features of the bandit processes concerned, in particular those which lead to the optimality of myopic policies; (iii) calibration by reference to standard bandit processes, often involving iteration using the dynamic programming recurrence equation; and (iv) using the fact that a dynamic allocation index may be regarded as a maximized equivalent constant reward rate.
Problem 1 Single-Machine Scheduling (see also Section 2.12, Section 3.2 and Exercise 4.1)
There are n jobs ready to be processed on a single machine. A cost ci is incurred per unit of time until job i has been completed, and the service time (or processing time) for job i is si. In what order should the jobs be processed so as to minimize the total cost?
Problem 2 Gold-Mining (see also Section 3.5.2)
A woman owns n gold mines and a gold-mining machine. Each day she must assign the machine to one of the mines. When the machine is assigned to mine i there is a probability pi that it extracts a proportion qi of the gold left in the mine, and a probability 1 − pi that it extracts no gold and breaks down permanently. To what sequence of mines on successive days should the machine be assigned so as to maximize the expected amount of gold mined before it breaks down?
Problem 3 Search (see also Section 3.5.4 and Exercise 3.6)
A stationary object is hidden in one of n boxes. The probability that a search of box i finds the object if it is in box i is qi. The probability that the object is in box i is pi, and changes by Bayes' theorem as successive boxes are searched. The cost of a single search of box i is ci. How shoul...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Foreword
  5. Foreword to the First Edition
  6. Preface
  7. Preface to the First Edition
  8. Chapter 1: Introduction or Exploration
  9. Chapter 2. Main Ideas: Gittins Index
  10. Chapter 3: Necessary Assumptions for Indices
  11. Chapter 4: Superprocesses, Precedence Constraints and Arrivals
  12. Chapter 5: The Achievable Region Methodology
  13. Chapter 6: Restless Bandits and Lagrangian Relaxation
  14. Chapter 7: Multi-Population Random Sampling (Theory)
  15. Chapter 8: Multi-Population Random Sampling (Calculations)
  16. Chapter 9: Further Exploitation
  17. References
  18. Tables
  19. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Multi-armed Bandit Allocation Indices by John Gittins,Kevin Glazebrook,Richard Weber in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.