eBook - ePub

Deep Reinforcement Learning Hands-On

Name: Deep Reinforcement Learning Hands-On
Author: Maxim Lapan

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Share book

546 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Deep Reinforcement Learning Hands-On

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Book details

Book preview

Table of contents

Citations

About This Book

This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems.

Key Features

Explore deep reinforcement learning (RL), from the first principles to the latest algorithms
Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms
Keep up with the very latest industry developments, including AI-driven chatbots

Book Description

Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace.

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.

What you will learn

Understand the DL context of RL and implement complex DL models
Learn the foundation of RL: Markov decision processes
Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others
Discover how to deal with discrete and continuous action spaces in various environments
Defeat Atari arcade games using the value iteration method
Create your own OpenAI Gym environment to train a stock trading agent
Teach your agent to play Connect4 using AlphaGo Zero
Explore the very latest deep RL research on topics including AI-driven chatbots

Who this book is for

Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Deep Reinforcement Learning Hands-On an online PDF/ePUB?

Yes, you can access Deep Reinforcement Learning Hands-On by Maxim Lapan in PDF and/or ePUB format, as well as other popular books in Informatik & Künstliche Intelligenz (KI) & Semantik. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781788839303

Edition

Topic

Informatik

Subtopic

Künstliche Intelligenz (KI) & Semantik

Deep Reinforcement Learning Hands-On

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

1. What is Reinforcement Learning?

Learning – supervised, unsupervised, and reinforcement

RL formalisms and relations

Reward

The agent

The environment

Actions

Observations

Markov decision processes

Markov process

Markov reward process

Markov decision process

Summary

2. OpenAI Gym

The anatomy of the agent

Hardware and software requirements

OpenAI Gym API

Action space

Observation space

The environment

Creation of the environment

The CartPole session

The random CartPole agent

The extra Gym functionality – wrappers and monitors

Wrappers

Monitor

Summary

3. Deep Learning with PyTorch

Tensors

Creation of tensors

Scalar tensors

Tensor operations

GPU tensors

Gradients

Tensors and gradients

NN building blocks

Custom layers

Final glue – loss functions and optimizers

Loss functions

Optimizers

Monitoring with TensorBoard

TensorBoard 101

Plotting stuff

Example – GAN on Atari images

Summary

4. The Cross-Entropy Method

Taxonomy of RL methods

Practical cross-entropy

Cross-entropy on CartPole

Cross-entropy on FrozenLake

Theoretical background of the cross-entropy method

Summary

5. Tabular Learning and the Bellman Equation

Value, state, and optimality

The Bellman equation of optimality

Value of action

The value iteration method

Value iteration in practice

Q-learning for FrozenLake

Summary

6. Deep Q-Networks

Real-life value iteration

Tabular Q-learning

Deep Q-learning

Interaction with the environment

SGD optimization

Correlation between steps

The Markov property

The final form of DQN training

DQN on Pong

Wrappers

DQN model

Training

Running and performance

Your model in action

Summary

7. DQN Extensions

The PyTorch Agent Net library

Agent

Agent's experience

Experience buffer

Gym env wrappers

Basic DQN

N-step DQN

Implementation

Double DQN

Implementation

Results

Noisy networks

Implementation

Results

Prioritized replay buffer

Implementation

Results

Dueling DQN

Implementation

Results

Categorical DQN

Implementation

Results

Combining everything

Implementation

Results

Summary

References

8. Stocks Trading Using RL

Trading

Data

Problem statements and key decisions

The trading environment

Models

Training code

Results

The feed-forward model

The convolution model

Things to try

Summary

9. Policy Gradients – An Alternative

Values and policy

Why policy?

Policy representation

Policy gradients

The REINFORCE method

The CartPole example

Results

Policy-based versus value-based methods

REINFORCE issues

Full episodes are required

High gradients variance

Exploration

Correlation between samples

PG on CartPole

Results

PG on Pong

Results

Summary

10. The Actor-Critic Method

Variance reduction

CartPole variance

Actor-critic

A2C on Pong

A2C on Pong results

Tuning hyperparameters

Learning rate

Entropy beta

Count of environments

Batch size

Summary

11. Asynchronous Advantage Actor-Critic

Correlation and sample efficiency

Adding an extra A to A2C

Multiprocessing in Python

A3C – data parallelism

Results

A3C – gradients parallelism

Results

Summary

12. Chatbots Training with RL

Chatbots overview

Deep NLP basics

Recurrent Neural Networks

Embeddings

Encoder-Decoder

Training of seq2seq

Log-likelihood training

Bilingual evaluation understudy (BLEU) score

RL in seq2seq

Self-critical sequence training

The chatbot example

The example structure

Modules: cornell.py and data.py

BLEU score and utils.py

Model

Training: cross-entropy

Running the training

Checking the data

Testing the trained model

Training: SCST

Running the SCST training

Results

Telegram bot

Summary

13. Web Navigation

Web navigation

Browser automation and RL

Mini World of Bits benchmark

OpenAI Universe

Installation

Actions and observations

Environment creation

MiniWoB stability

Simple clicking approach

Grid actions

Example overview

Model

Training code

Starting containers

Training process

Checking the learned policy

Issues with simple clicking

Human demonstrations

Recording the demonstrations

Recording format

Training using demonstrations

Results

TicTacToe problem

Adding text description

Results

Things to try

Summary

14. Continuous Action Space

Why a continuous space?

Action space

Environments

The Actor-Critic (A2C) method

Implementation

Results

Using models and recording videos

Deterministic policy gradients

Exploration

Implementation

Results

Recording videos

Distributional policy gradients

Architecture

Implementation

Results

Things to try

Summary

15. Trust Regions – TRPO, PPO, and ACKTR

Introduction

Roboschool

A2C baseline

Results

Videos recording

Proximal Policy Optimization

Implementation

Results

Trust Region Policy Optimization

Implementation

Results

A2C using ACKTR

Implementation

Results

Summary

16. Black-Box Optimization in RL

Black-box methods

Evolution strategies

ES on CartPole

Results

ES on HalfCheetah

Results

Genetic algorithms

GA on CartPole

Results

GA tweaks

Deep GA

Novelty search

GA on Cheetah

Results

Summary

References

17. Beyond Model-Free – Imagination

Model-based versus model-free

Model imperfections

Imagination-augmented agent

The environment model

The rollout policy

The rollout encoder

Paper results

I2A on Atari Breakout

The baseline A2C agent

EM training

The imagination agent

The I2A model

The Rollout encoder

Training of I2A

Experiment results

The baseline agent

Training EM weights

Training with the I2A model

Summary

References

18. AlphaGo Zero

Board games

The AlphaGo Zero method

Overview

Monte-Carlo Tree Search

Self-play

Training and evaluation

Connect4 bot

Game model

Implementing MCTS

Model

Training

Testing and comparison

Connect4 results

Summary

References

Book summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Deep Reinforcement Learning Hands-On

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Acquisition Editors: Frank Pohlmann, Suresh Jain

Project Editor: Kishor Rit

Technical Editor: Nidhisha Shetty

Proofreader: Tom Jacob

Indexer: Tejal Daruwale Soni

Graphics: Sandip Tadge

Production Coordinator: Shantanu Zagade

First published: June 2018

Production reference: 1150618

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78883-424-7

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and mor...

About This Book

Frequently asked questions

Information

Deep Reinforcement Learning Hands-On

Table of Contents

Deep Reinforcement Learning Hands-On

Why subscribe?

Table of contents