eBook - ePub

Machine Learning with the Elastic Stack

Name: Machine Learning with the Elastic Stack
ISBN: 9781801078467

Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition

Rich Collier,

Camilla Montonen,

Bahaaldine Azarmi,

450 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Machine Learning with the Elastic Stack

Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition

Rich Collier,

Camilla Montonen,

Bahaaldine Azarmi,

About this book

Discover expert techniques for combining machine learning with the analytic capabilities of Elastic Stack and uncover actionable insights from your data

Key Features

Integrate machine learning with distributed search and analytics
Preprocess and analyze large volumes of search data effortlessly
Operationalize machine learning in a scalable, production-worthy way

Book Description

Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection.

The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with.

By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.

What you will learn

Find out how to enable the ML commercial feature in the Elastic Stack
Understand how Elastic machine learning is used to detect different types of anomalies and make predictions
Apply effective anomaly detection to IT operations, security analytics, and other use cases
Utilize the results of Elastic ML in custom views, dashboards, and proactive alerting
Train and deploy supervised machine learning models for real-time inference
Discover various tips and tricks to get the most out of Elastic machine learning

Who this book is for

If you're a data professional looking to gain insights into Elasticsearch data without having to rely on a machine learning specialist or custom development, then this Elastic Stack machine learning book is for you. You'll also find this book useful if you want to integrate machine learning with your observability, security, and analytics applications. Working knowledge of the Elastic Stack is needed to get the most out of this book.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2021

Edition

eBook ISBN

9781801078467

Topic

Computer Science

Subtopic

Computer Science General

Index

Computer Science

Section 1 – Getting Started with Machine Learning with Elastic Stack

This section provides an intuitive understanding of the way Elastic ML works – from the perspective of not only what the algorithms are doing but also the logistics of the operation of the software within the Elastic Stack.

This section covers the following chapters:

Chapter 1, Machine Learning for IT
Chapter 2, Enabling and Operationalization

Chapter 1: Machine Learning for IT

A decade ago, the idea of using machine learning (ML)-based technology in IT operations or IT security seemed a little like science fiction. Today, however, it is one of the most common buzzwords used by software vendors. Clearly, there has been a major shift in both the perception of the need for the technology and the capabilities that the state-of-the-art implementations of the technology can bring to bear. This evolution is important to fully appreciate how Elastic ML came to be and what problems it was designed to solve.

This chapter is dedicated to reviewing the history and concepts behind how Elastic ML works. It also discusses the different kinds of analysis that can be done and the kinds of use cases that can be solved. Specifically, we will cover the following topics:

Overcoming the historical challenges in IT
Dealing with the plethora of data
The advent of automated anomaly detection
Unsupervised versus supervised ML
Using unsupervised ML for anomaly detection
Applying supervised ML to data frame analytics

Overcoming the historical challenges in IT

IT application support specialists and application architects have a demanding job with high expectations. Not only are they tasked with moving new and innovative projects into place for the business, but they also have to keep currently deployed applications up and running as smoothly as possible. Today's applications are significantly more complicated than ever before—they are highly componentized, distributed, and possibly virtualized/containerized. They could be developed using Agile, or by an outsourced team. Plus, they are most likely constantly changing. Some DevOps teams claim they can typically make more than 100 changes per day to a live production system. Trying to understand a modern application's health and behavior is like a mechanic trying to inspect an automobile while it is moving.

IT security operations analysts have similar struggles in keeping up with day-to-day operations, but they obviously have a different focus of keeping the enterprise secure and mitigating emerging threats. Hackers, malware, and rogue insiders have become so ubiquitous and sophisticated that the prevailing wisdom is that it is no longer a question of whether an organization will be compromised—it's more of a question of when they will find out about it. Clearly, knowing about a compromise as early as possible (before too much damage is done) is preferable to learning about it for the first time from law enforcement or the evening news.

So, how can they be helped? Is the crux of the problem that application experts and security analysts lack access to data to help them do their job effectively? Actually, in most cases, it is the exact opposite. Many IT organizations are drowning in data.

Dealing with the plethora of data

IT departments have invested in monitoring tools for decades, and it is not uncommon to have a dozen or more tools actively collecting and archiving data that can be measured in terabytes, or even petabytes, per day. The data can range from rudimentary infrastructure- and network-level data to deep diagnostic data and/or system and application log files.

Business-level key performance indicators (KPIs) could also be tracked, sometimes including data about the end user's experience. The sheer depth and breadth of data available, in some ways, is the most comprehensive than it has ever been. To detect emerging problems or threats hidden in that data, there have traditionally been several main approaches to distilling the data into informational insights:

Filter/search: Some tools allow the user to define searches to help trim down the data into a more manageable set. While extremely useful, this capability is most often used in an ad hoc fashion once a problem is suspected. Even then, the success of using this approach usually hinges on the ability for the user to know what they are looking for and their level of experience—both with prior knowledge of living through similar past situations and expertise in the search technology itself.
Visualizations: Dashboards, charts, and widgets are also extremely useful to help us understand what data has been doing and where it is trending. However, visualizations are passive and require being watched for meaningful deviations to be detected. Once the number of metrics being collected and plotted surpasses the number of eyeballs available to watch them (or even the screen real estate to display them), visual-only analysis becomes less and less useful.
Thresholds/rules: To get around the requirement of having data be physically watched in order for it to be proactive, many tools allow the user to define rules or conditions that get triggered upon known conditions or known dependencies between items. However, it is unlikely that you can realistically define all appropriate operating ranges or model all of the actual dependencies in today's complex and distributed applications. Plus, the amount and velocity of changes in the application or environment could quickly render any static rule set useless. Analysts find themselves chasing down many false positive alerts, setting up a boy who cried wolf paradigm that leads to resentment of the tools generating the alerts and skepticism of the value that alerting could provide.

Ultimately, there needed to be a different approach—one that wasn't necessarily a complete repudiation of past techniques, but could bring a level of automation and empirical augmentation of the evaluation of data in a meaningful way. Let's face it, humans are imperfect—we have hidden biases and limitations of capacity for remembering information and we are easily distracted and fatigued. Algorithms, if used correctly, can easily make up for these shortcomings.

The advent of automated anomaly detection

ML, while a very broad topic that encompasses everything from self-driving cars to game-winning computer programs, was a natural place to look for a solution. If you realize that most of the requirements of effective application monitoring or security threat hunting are merely variations on the theme of find me something that is different from normal, then the discipline of anomaly detection emerges as the natural place to begin using ML techniques to solve these problems for IT professionals.

The science of anomaly detection is certainly nothing new, however. Many very smart people have researched and employed a variety of algorithms and techniques for many years. However, the practical application of anomaly detection for IT data poses some interesting constraints that make the otherwise academically worthy algorithms inappropriate for the job. These include the following:

Timeliness: Notification of an outage, breach, or other significant anomalous situation should be known as quickly as possible to mitigate it. The cost of downtime or the risk of a continued security compromise is minimized if remedied or contained quickly. Algorithms that cannot keep up with the real-time nature of today's IT data have limited value.
Scalability: As mentioned earlier, the volume, velocity, and variation of IT data continue to explode in modern IT environments. Algorithms that inspect this vast data must be able to scale linearly with the data to be usable in a practical sense.
Efficiency: IT budgets are often highly scrutinized for wasteful spending, and many organizations are constantly being asked to do more with less. Tacking on an additional fleet of super-computers to run algorithms is not practical. Rather, modest commodity hardware with typical specifications must be able to be employed as part of the solution.
Generalizability: While highly specialized data science is often the best way to solve a specific information problem, the diversity of data in IT environments drives a need for something that can be broadly applicable across most use cases. Reusability of the same techniques is much more cost-effective in the long run.
Adaptability: Ever-changing IT environments will quickly render a brittle algorithm useless in no time. Training and retraining the ML model would only introduce yet another time-wasting venture that cannot be afforded.
Accuracy: We already know that alert fatigue from legacy threshold and rule-based systems is a r...

Machine Learning with the Elastic Stack
Second Edition
Preface
Section 1 – Getting Started with Machine Learning with Elastic Stack
Chapter 1: Machine Learning for IT
Chapter 2: Enabling and Operationalization
Section 2 – Time Series Analysis – Anomaly Detection and Forecasting
Chapter 3: Anomaly Detection
Chapter 4: Forecasting
Chapter 5: Interpreting Results
Chapter 6: Alerting on ML Analysis
Chapter 7: AIOps and Root Cause Analysis
Chapter 8: Anomaly Detection in Other Elastic Stack Apps
Section 3 – Data Frame Analysis
Chapter 9: Introducing Data Frame Analytics
Chapter 10: Outlier Detection
Chapter 11: Classification Analysis
Chapter 12: Regression
Chapter 13: Inference
Appendix: Anomaly Detection Tips
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Machine Learning with the Elastic Stack by Rich Collier,Camilla Montonen,Bahaaldine Azarmi in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Machine Learning with the Elastic Stack

Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition

Machine Learning with the Elastic Stack

Gain valuable insights from your data with Elastic Stack's machine learning features, 2nd Edition

About this book

Trusted by 375,005 students

Information

Section 1 – Getting Started with Machine Learning with Elastic Stack

Chapter 1: Machine Learning for IT

Overcoming the historical challenges in IT

Dealing with the plethora of data

The advent of automated anomaly detection

Table of contents

Frequently asked questions