eBook - ePub

Deep Reinforcement Learning Hands-On

Name: Deep Reinforcement Learning Hands-On
Author: Maxim Lapan

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Buch teilen

546 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Deep Reinforcement Learning Hands-On

Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Maxim Lapan

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems.

Key Features

Explore deep reinforcement learning (RL), from the first principles to the latest algorithms
Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms
Keep up with the very latest industry developments, including AI-driven chatbots

Book Description

Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace.

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.

What you will learn

Understand the DL context of RL and implement complex DL models
Learn the foundation of RL: Markov decision processes
Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others
Discover how to deal with discrete and continuous action spaces in various environments
Defeat Atari arcade games using the value iteration method
Create your own OpenAI Gym environment to train a stock trading agent
Teach your agent to play Connect4 using AlphaGo Zero
Explore the very latest deep RL research on topics including AI-driven chatbots

Who this book is for

Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Deep Reinforcement Learning Hands-On als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Deep Reinforcement Learning Hands-On von Maxim Lapan im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatik & Künstliche Intelligenz (KI) & Semantik. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2018

ISBN

9781788839303

Auflage

Thema

Informatik

Thema

Künstliche Intelligenz (KI) & Semantik

Deep Reinforcement Learning Hands-On

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

1. What is Reinforcement Learning?

Learning – supervised, unsupervised, and reinforcement

RL formalisms and relations

Reward

The agent

The environment

Actions

Observations

Markov decision processes

Markov process

Markov reward process

Markov decision process

Summary

2. OpenAI Gym

The anatomy of the agent

Hardware and software requirements

OpenAI Gym API

Action space

Observation space

The environment

Creation of the environment

The CartPole session

The random CartPole agent

The extra Gym functionality – wrappers and monitors

Wrappers

Monitor

Summary

3. Deep Learning with PyTorch

Tensors

Creation of tensors

Scalar tensors

Tensor operations

GPU tensors

Gradients

Tensors and gradients

NN building blocks

Custom layers

Final glue – loss functions and optimizers

Loss functions

Optimizers

Monitoring with TensorBoard

TensorBoard 101

Plotting stuff

Example – GAN on Atari images

Summary

4. The Cross-Entropy Method

Taxonomy of RL methods

Practical cross-entropy

Cross-entropy on CartPole

Cross-entropy on FrozenLake

Theoretical background of the cross-entropy method

Summary

5. Tabular Learning and the Bellman Equation

Value, state, and optimality

The Bellman equation of optimality

Value of action

The value iteration method

Value iteration in practice

Q-learning for FrozenLake

Summary

6. Deep Q-Networks

Real-life value iteration

Tabular Q-learning

Deep Q-learning

Interaction with the environment

SGD optimization

Correlation between steps

The Markov property

The final form of DQN training

DQN on Pong

Wrappers

DQN model

Training

Running and performance

Your model in action

Summary

7. DQN Extensions

The PyTorch Agent Net library

Agent

Agent's experience

Experience buffer

Gym env wrappers

Basic DQN

N-step DQN

Implementation

Double DQN

Implementation

Results

Noisy networks

Implementation

Results

Prioritized replay buffer

Implementation

Results

Dueling DQN

Implementation

Results

Categorical DQN

Implementation

Results

Combining everything

Implementation

Results

Summary

References

8. Stocks Trading Using RL

Trading

Data

Problem statements and key decisions

The trading environment

Models

Training code

Results

The feed-forward model

The convolution model

Things to try

Summary

9. Policy Gradients – An Alternative

Values and policy

Why policy?

Policy representation

Policy gradients

The REINFORCE method

The CartPole example

Results

Policy-based versus value-based methods

REINFORCE issues

Full episodes are required

High gradients variance

Exploration

Correlation between samples

PG on CartPole

Results

PG on Pong

Results

Summary

10. The Actor-Critic Method

Variance reduction

CartPole variance

Actor-critic

A2C on Pong

A2C on Pong results

Tuning hyperparameters

Learning rate

Entropy beta

Count of environments

Batch size

Summary

11. Asynchronous Advantage Actor-Critic

Correlation and sample efficiency

Adding an extra A to A2C

Multiprocessing in Python

A3C – data parallelism

Results

A3C – gradients parallelism

Results

Summary

12. Chatbots Training with RL

Chatbots overview

Deep NLP basics

Recurrent Neural Networks

Embeddings

Encoder-Decoder

Training of seq2seq

Log-likelihood training

Bilingual evaluation understudy (BLEU) score

RL in seq2seq

Self-critical sequence training

The chatbot example

The example structure

Modules: cornell.py and data.py

BLEU score and utils.py

Model

Training: cross-entropy

Running the training

Checking the data

Testing the trained model

Training: SCST

Running the SCST training

Results

Telegram bot

Summary

13. Web Navigation

Web navigation

Browser automation and RL

Mini World of Bits benchmark

OpenAI Universe

Installation

Actions and observations

Environment creation

MiniWoB stability

Simple clicking approach

Grid actions

Example overview

Model

Training code

Starting containers

Training process

Checking the learned policy

Issues with simple clicking

Human demonstrations

Recording the demonstrations

Recording format

Training using demonstrations

Results

TicTacToe problem

Adding text description

Results

Things to try

Summary

14. Continuous Action Space

Why a continuous space?

Action space

Environments

The Actor-Critic (A2C) method

Implementation

Results

Using models and recording videos

Deterministic policy gradients

Exploration

Implementation

Results

Recording videos

Distributional policy gradients

Architecture

Implementation

Results

Things to try

Summary

15. Trust Regions – TRPO, PPO, and ACKTR

Introduction

Roboschool

A2C baseline

Results

Videos recording

Proximal Policy Optimization

Implementation

Results

Trust Region Policy Optimization

Implementation

Results

A2C using ACKTR

Implementation

Results

Summary

16. Black-Box Optimization in RL

Black-box methods

Evolution strategies

ES on CartPole

Results

ES on HalfCheetah

Results

Genetic algorithms

GA on CartPole

Results

GA tweaks

Deep GA

Novelty search

GA on Cheetah

Results

Summary

References

17. Beyond Model-Free – Imagination

Model-based versus model-free

Model imperfections

Imagination-augmented agent

The environment model

The rollout policy

The rollout encoder

Paper results

I2A on Atari Breakout

The baseline A2C agent

EM training

The imagination agent

The I2A model

The Rollout encoder

Training of I2A

Experiment results

The baseline agent

Training EM weights

Training with the I2A model

Summary

References

18. AlphaGo Zero

Board games

The AlphaGo Zero method

Overview

Monte-Carlo Tree Search

Self-play

Training and evaluation

Connect4 bot

Game model

Implementing MCTS

Model

Training

Testing and comparison

Connect4 results

Summary

References

Book summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Deep Reinforcement Learning Hands-On

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Acquisition Editors: Frank Pohlmann, Suresh Jain

Project Editor: Kishor Rit

Technical Editor: Nidhisha Shetty

Proofreader: Tom Jacob

Indexer: Tejal Daruwale Soni

Graphics: Sandip Tadge

Production Coordinator: Shantanu Zagade

First published: June 2018

Production reference: 1150618

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78883-424-7

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and mor...

Über dieses Buch

Häufig gestellte Fragen

Information

Deep Reinforcement Learning Hands-On

Table of Contents

Deep Reinforcement Learning Hands-On

Why subscribe?

Inhaltsverzeichnis