Deep Reinforcement Learning Hands-On
eBook - ePub

Deep Reinforcement Learning Hands-On

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Buch teilen
  1. 546 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Deep Reinforcement Learning Hands-On

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems.

Key Features

  • Explore deep reinforcement learning (RL), from the first principles to the latest algorithms
  • Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms
  • Keep up with the very latest industry developments, including AI-driven chatbots

Book Description

Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace.

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.

What you will learn

  • Understand the DL context of RL and implement complex DL models
  • Learn the foundation of RL: Markov decision processes
  • Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others
  • Discover how to deal with discrete and continuous action spaces in various environments
  • Defeat Atari arcade games using the value iteration method
  • Create your own OpenAI Gym environment to train a stock trading agent
  • Teach your agent to play Connect4 using AlphaGo Zero
  • Explore the very latest deep RL research on topics including AI-driven chatbots

Who this book is for

Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Deep Reinforcement Learning Hands-On als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Deep Reinforcement Learning Hands-On von Maxim Lapan im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatik & Künstliche Intelligenz (KI) & Semantik. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Deep Reinforcement Learning Hands-On


Table of Contents

Deep Reinforcement Learning Hands-On
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
1. What is Reinforcement Learning?
Learning – supervised, unsupervised, and reinforcement
RL formalisms and relations
Reward
The agent
The environment
Actions
Observations
Markov decision processes
Markov process
Markov reward process
Markov decision process
Summary
2. OpenAI Gym
The anatomy of the agent
Hardware and software requirements
OpenAI Gym API
Action space
Observation space
The environment
Creation of the environment
The CartPole session
The random CartPole agent
The extra Gym functionality – wrappers and monitors
Wrappers
Monitor
Summary
3. Deep Learning with PyTorch
Tensors
Creation of tensors
Scalar tensors
Tensor operations
GPU tensors
Gradients
Tensors and gradients
NN building blocks
Custom layers
Final glue – loss functions and optimizers
Loss functions
Optimizers
Monitoring with TensorBoard
TensorBoard 101
Plotting stuff
Example – GAN on Atari images
Summary
4. The Cross-Entropy Method
Taxonomy of RL methods
Practical cross-entropy
Cross-entropy on CartPole
Cross-entropy on FrozenLake
Theoretical background of the cross-entropy method
Summary
5. Tabular Learning and the Bellman Equation
Value, state, and optimality
The Bellman equation of optimality
Value of action
The value iteration method
Value iteration in practice
Q-learning for FrozenLake
Summary
6. Deep Q-Networks
Real-life value iteration
Tabular Q-learning
Deep Q-learning
Interaction with the environment
SGD optimization
Correlation between steps
The Markov property
The final form of DQN training
DQN on Pong
Wrappers
DQN model
Training
Running and performance
Your model in action
Summary
7. DQN Extensions
The PyTorch Agent Net library
Agent
Agent's experience
Experience buffer
Gym env wrappers
Basic DQN
N-step DQN
Implementation
Double DQN
Implementation
Results
Noisy networks
Implementation
Results
Prioritized replay buffer
Implementation
Results
Dueling DQN
Implementation
Results
Categorical DQN
Implementation
Results
Combining everything
Implementation
Results
Summary
References
8. Stocks Trading Using RL
Trading
Data
Problem statements and key decisions
The trading environment
Models
Training code
Results
The feed-forward model
The convolution model
Things to try
Summary
9. Policy Gradients – An Alternative
Values and policy
Why policy?
Policy representation
Policy gradients
The REINFORCE method
The CartPole example
Results
Policy-based versus value-based methods
REINFORCE issues
Full episodes are required
High gradients variance
Exploration
Correlation between samples
PG on CartPole
Results
PG on Pong
Results
Summary
10. The Actor-Critic Method
Variance reduction
CartPole variance
Actor-critic
A2C on Pong
A2C on Pong results
Tuning hyperparameters
Learning rate
Entropy beta
Count of environments
Batch size
Summary
11. Asynchronous Advantage Actor-Critic
Correlation and sample efficiency
Adding an extra A to A2C
Multiprocessing in Python
A3C – data parallelism
Results
A3C – gradients parallelism
Results
Summary
12. Chatbots Training with RL
Chatbots overview
Deep NLP basics
Recurrent Neural Networks
Embeddings
Encoder-Decoder
Training of seq2seq
Log-likelihood training
Bilingual evaluation understudy (BLEU) score
RL in seq2seq
Self-critical sequence training
The chatbot example
The example structure
Modules: cornell.py and data.py
BLEU score and utils.py
Model
Training: cross-entropy
Running the training
Checking the data
Testing the trained model
Training: SCST
Running the SCST training
Results
Telegram bot
Summary
13. Web Navigation
Web navigation
Browser automation and RL
Mini World of Bits benchmark
OpenAI Universe
Installation
Actions and observations
Environment creation
MiniWoB stability
Simple clicking approach
Grid actions
Example overview
Model
Training code
Starting containers
Training process
Checking the learned policy
Issues with simple clicking
Human demonstrations
Recording the demonstrations
Recording format
Training using demonstrations
Results
TicTacToe problem
Adding text description
Results
Things to try
Summary
14. Continuous Action Space
Why a continuous space?
Action space
Environments
The Actor-Critic (A2C) method
Implementation
Results
Using models and recording videos
Deterministic policy gradients
Exploration
Implementation
Results
Recording videos
Distributional policy gradients
Architecture
Implementation
Results
Things to try
Summary
15. Trust Regions – TRPO, PPO, and ACKTR
Introduction
Roboschool
A2C baseline
Results
Videos recording
Proximal Policy Optimization
Implementation
Results
Trust Region Policy Optimization
Implementation
Results
A2C using ACKTR
Implementation
Results
Summary
16. Black-Box Optimization in RL
Black-box methods
Evolution strategies
ES on CartPole
Results
ES on HalfCheetah
Results
Genetic algorithms
GA on CartPole
Results
GA tweaks
Deep GA
Novelty search
GA on Cheetah
Results
Summary
References
17. Beyond Model-Free – Imagination
Model-based versus model-free
Model imperfections
Imagination-augmented agent
The environment model
The rollout policy
The rollout encoder
Paper results
I2A on Atari Breakout
The baseline A2C agent
EM training
The imagination agent
The I2A model
The Rollout encoder
Training of I2A
Experiment results
The baseline agent
Training EM weights
Training with the I2A model
Summary
References
18. AlphaGo Zero
Board games
The AlphaGo Zero method
Overview
Monte-Carlo Tree Search
Self-play
Training and evaluation
Connect4 bot
Game model
Implementing MCTS
Model
Training
Testing and comparison
Connect4 results
Summary
References
Book summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index

Deep Reinforcement Learning Hands-On

Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Acquisition Editors: Frank Pohlmann, Suresh Jain
Project Editor: Kishor Rit
Technical Editor: Nidhisha Shetty
Proofreader: Tom Jacob
Indexer: Tejal Daruwale Soni
Graphics: Sandip Tadge
Production Coordinator: Shantanu Zagade
First published: June 2018
Production reference: 1150618
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78883-424-7
www.packtpub.com
Deep Reinforcement Learning Hands-On
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

  • Spend less time learning and mor...

Inhaltsverzeichnis