Real-World SRE
eBook - ePub

Real-World SRE

The Survival Guide for Responding to a System Outage and Maximizing Uptime

  1. 340 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Real-World SRE

The Survival Guide for Responding to a System Outage and Maximizing Uptime

About this book

This hands-on survival manual will give you the tools to confidently prepare for and respond to a system outage.

Key Features

  • Proven methods for keeping your website running
  • A survival guide for incident response
  • Written by an ex-Google SRE expert

Book Description

Real-World SRE is the go-to survival guide for the software developer in the middle of catastrophic website failure. Site Reliability Engineering (SRE) has emerged on the frontline as businesses strive to maximize uptime. This book is a step-by-step framework to follow when your website is down and the countdown is on to fix it. Nat Welch has battle-hardened experience in reliability engineering at some of the biggest outage-sensitive companies on the internet. Arm yourself with his tried-and-tested methods for monitoring modern web services, setting up alerts, and evaluating your incident response.Real-World SRE goes beyond just reacting to disaster—uncover the tools and strategies needed to safely test and release software, plan for long-term growth, and foresee future bottlenecks. Real-World SRE gives you the capability to set up your own robust plan of action to see you through a company-wide website crisis.The final chapter of Real-World SRE is dedicated to acing SRE interviews, either in getting a first job or a valued promotion.

What you will learn

  • Monitor for approaching catastrophic failure
  • Alert your team to an outage emergency
  • Dissect your incident response strategies
  • Test automation tools and build your own software
  • Predict bottlenecks and fight for user experience
  • Eliminate the competition in an SRE interview

Who this book is for

Real-World SRE is aimed at software developers facing a website crisis, or who want to improve the reliability of their company's software. Newcomers to Site Reliability Engineering looking to succeed at interview will also find this invaluable.

]]>

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Real-World SRE by Nat Welch in PDF and/or ePUB format, as well as other popular books in Computer Science & Quality Assurance & Testing. We have over one million books available in our catalogue for you to explore.

Real-World SRE


Table of Contents

Real-World SRE
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
1. Introduction
A brief history
What is SRE?
What is in the book?
SRE as a framework for new projects
Summary
References
2. Monitoring
Why monitoring?
Instrumenting an application
What should we measure?
A short introduction to SLIs, SLOs, and error budgets
Service levels
Error budgets
Collecting and saving monitoring data
Polling applications
Nagios
Prometheus
Cacti
Sensu
Push applications
StatsD
Telegraf
ELK
Displaying monitoring information
Arbitrary queries
Graphs
Dashboards
Chatbots
Managing and maintaining monitoring data
Communicating about monitoring
Do they even know there is monitoring?
References and related reading
Future reading
Summary
3. Incident Response
What is an incident?
What is incident response?
Alerting
When do you alert?
How do you alert?
Alerting services
What is in an alert?
Who do you alert?
Being on call
Communication
Incident Command System (ICS)
Where do you communicate?
Recovering the system
Calling all clear
Summary
4. Postmortems
What is a postmortem?
Why write a postmortem?
When to write a postmortem document
Carrying out incident analysis
How to write a postmortem document
Summary
Impact
Timeline
Root cause
Action items
Postmortems without action items
Appendix
Blameless postmortems
Holding a postmortem meeting
Analyzing past postmortems
MTTR and MTBF
Alert fatigue
Discussing past outages
Summary
References
5. Testing and Releasing
Testing
What do you test?
Testing code
Code reviews
Unit, feature, and integration tests
Unit tests
Feature tests
Integration tests
Testing infrastructure
Testing processes
Releasing
When to release
Releasing to production
Validating your release
Rollbacks
Automation
Continuous everything
Summary
6. Capacity Planning
A quick introduction to business finance
Why plan?
Managing risk and managing expectations
Defining a plan
What is our current capacity?
When are we going to run out of capacity?
How should we change our capacity?
State and concurrency
Is your service limited by another service?
Scaling for events
Unpredictable growth–user-generated content
Preplanned versus autoscaling
Delivering
Execute the plan
Architecture–where performance changes come from
Tech as a profit center and procurement
Summary
7. Building Tools
Finding projects
Defining projects
RDD
Example
Design documents
Planning projects
Example
Retrospectives and standups
Allocation
Building projects
Advice for writing code
Separation of concerns
Long-term work
Example OKRs
Notebooks
Documenting and maintaining projects
Summary
8. User Experience
An introduction to design and UX
Real-world interaction design
User testing
Picking an experience
Designing the test
Finding people to test
Developer experience
Experience of tools
Performance budgets
Security
Authentication
Authorization
Risk profile
Phishing
ACM code of ethics
Summary
References
9. Networking Foundations
The internet
Sending an HTTP request
DNS
dig
Ethernet and TCP/IP
Ethernet
IP
CIDR notation
ICMP
UDP
TCP
HTTP
curl and wget
Tools for watching the network
netstat
nc
tcpdump
Summary
References
10. Linux and Cloud Foundations
Linux fundamentals
Everything is a file
Files, directories, and inodes
Permissions
Sockets
Devices
/proc
Filesystem layout
What is a process?
Zombies
Orphans
What is nice?
syscalls
How to trace
Watching processes
Load averages
Build your own
Cloud fundamentals
VMs
Containers
Load balancing
Autoscaling
Storage
Queues and Pub/Sub
Units of scale
Example architecture interview
Summary
References
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index

Real-World SRE

Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Acquisition Editors: Ben Renow-Clarke, Suresh Jain
Project Editor: Veronica Pais
Technical Editor: Nidhisha Shetty
Proofreader: Safis Editing
Indexer: Rekha Nair
Graphics: Sandip Tadge
Production Coordinator: Sandip Tadge
First published: August 2018
Production reference: 2040918
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78862-888-4
www.packtpub.com

mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Learn better with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Mapt is fully searchable
  • Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Nat Welch is a software developer based in the US. Since 2005 he has been building websites and keeping them running. He has always had a deep love of infrastructure and building to support the creative efforts of others. In 2012, Nat became a Site Reliability Engineer at Google and fell in love with the specialty. Since then, he has worked at companies of all sizes trying to promote reliability and help developers build reliable systems.

Table of contents

  1. Real-World SRE