Scalable Data Analytics with Azure Data Explorer
eBook - ePub

Scalable Data Analytics with Azure Data Explorer

  1. 364 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Scalable Data Analytics with Azure Data Explorer

About this book

Write efficient and powerful KQL queries to query and visualize your data and implement best practices to improve KQL execution performanceKey Features• Apply Azure Data Explorer best practices to manage your data at scale and reduce KQL execution time• Discover how to query and visualize your data using the powerful KQL• Manage cluster performance and monthly costs by understanding how to size your ADX cluster correctlyBook DescriptionAzure Data Explorer (ADX) enables developers and data scientists to make data-driven business decisions. This book will help you rapidly explore and query your data at scale and secure your ADX clusters.The book begins by introducing you to ADX, its architecture, core features, and benefits. You'll learn how to securely deploy ADX instances and navigate through the ADX Web UI, cover data ingestion, and discover how to query and visualize your data using the powerful Kusto Query Language (KQL). Next, you'll get to grips with KQL operators and functions to efficiently query and explore your data, as well as perform time series analysis and search for anomalies and trends in your data. As you progress through the chapters, you'll explore advanced ADX topics, including deploying your ADX instances using Infrastructure as Code (IaC). The book also shows you how to manage your cluster performance and monthly ADX costs by handling cluster scaling and data retention periods. Finally, you'll understand how to secure your ADX environment by restricting access with best practices for improving your KQL query performance.By the end of this Azure book, you'll be able to securely deploy your own ADX instance, ingest data from multiple sources, rapidly query your data, and produce reports with KQL and Power BI.What you will learn• Become well-versed with the core features of the Azure Data Explorer architecture• Discover how ADX can help manage your data at scale on Azure• Get to grips with deploying your ADX environment and ingesting and analyzing your data• Explore KQL and learn how to query your data• Query and visualize your data using the ADX UI and Power BI• Ingest structured and unstructured data types from an array of sources• Understand how to deploy, scale, secure, and manage ADXWho this book is forThis book is for data analysts, data engineers, and data scientists who are responsible for analyzing and querying their team's large volumes of data on Azure. SRE and DevOps engineers who deploy, maintain, and secure infrastructure will also find this book useful. Prior knowledge of Azure and basic data querying will help you to get the most out of this book.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Section 1: Introduction to Azure Data Explorer

This section introduces you to Azure Data Explorer (ADX) by discussing the core features and benefits of ADX, such as low-latency data ingestion, the ADX architecture, and how to quickly deploy your instance of ADX via the Azure portal, PowerShell, and ARM templates. The final chapter of this section presents an overview of the ADX web UI, where you will spend most of your time analyzing your data. By the end of this section, you will understand the core features of ADX, be able to deploy your own ADX instances, and be comfortable navigating and using the ADX web UI. This section sets the foundations for Section 2, where you will begin to ingest and analyze the data.
This section consists of the following chapters:
  • Chapter 1, Introducing Azure Data Explorer
  • Chapter 2, Building Your Azure Data Explorer Environment
  • Chapter 3, Exploring the Azure Data Explorer UI

Chapter 1: Introducing Azure Data Explorer

Welcome to Scalable Data Analytics with Azure Data Explorer! More than 90% of today's data is digital and most of that data is considered unstructured, such as text messages and other forms of free text. So how can we analyze all our data? The answer is data analytics and Azure Data Explorer (ADX). Data analytics is a complex topic and Microsoft Azure provides a comprehensive selection of data analytics services, which can seem overwhelming when you are first starting your journey into data analytics.
In this chapter, we begin by introducing the data analytics pipeline and learning about each of the steps in the pipeline. These steps are required for taking raw data and producing reports and visuals as a result of your analysis, which will help you understand the workflow used by ADX.
Next, we will introduce some of the popular Azure data services and understand where they fit in the data analytics pipeline. Some of these services, such as Azure Event Hubs, will be used in later chapters when we learn about data ingestion.
We will also learn what ADX is, the features that make it a powerful data exploration platform, the architecture, and key components of ADX, such as the engine cluster, and understand some of the use cases for ADX, for example, in IoT monitoring, telemetry, and log analysis. Finally, we will get our feet wet and dive right into running your first Kusto Query Language (KQL) query using the Data Explorer UI.
In this chapter, we are going to cover the following main topics:
  • Introducing the data analytics pipeline
  • What is Azure Data Explorer?
  • Azure Data Explorer use cases
  • Running your first query

Technical requirements

If you do not already have an Azure account, head over to https://azure.microsoft.com/en-us/free/search/ and sign up. Microsoft provides 12 months of popular free services and $200 credit, which is enough to cover the cost of our Azure Data Explorer journey with this book. Microsoft also provides a free to use cluster (https://help.kusto.windows.net/) that is already populated with data. We will use this free cluster and create our own clusters throughout this book.
Please remember to clone or download the Git repository that accompanies the book from https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer. All the code and query samples listed in the book are available in our repository. Download the latest version of Git from https://git-scm.com if you have not already installed the command-line tools.
Important Note
When developing and cloning repositories, I create a development folder in my home directory. On Windows, this is C:\Users\jason\development. On macOS, this is /Users/jason/development. When referencing specific code examples, I will refer to the repository's parent directory as ${HOME}, for example, ${HOME}/Scalable-Data-Analytics-with-Azure-Data-Explorer/Chapterxx/file.kql.

Introducing the data analytics pipeline

Before diving into ADX, it is worth spending some time to understand the data analytics pipeline. Whenever I am learning something new that is large and complex in scope, such as data analytics, I break the topic down into smaller chunks to help with learning and measuring my progress. Therefore, an understanding of the various stages of the data analytics pipeline will help you understand how ADX takes raw data and generates reports and visuals as a result of our analytical tasks, such as time series analysis.
Figure 1.1 illustrates the stages of the data analytics pipeline required to take data from a data source, perform some analysis, and produce the result of the analysis in the form of a visual, such as tables, reports, and graphs:
Figure 1.1 – Data analytics pipeline
Figure 1.1 – Data analytics pipeline
In the spirit of breaking a complex subject into smaller chunks, let's look at each stage in detail:
  1. Data: The first step in the pipeline is the data sources. In Chapter 4, Ingesting Data in Azure Data Explorer, we will discuss the different types of data. For now, suffice it to say there are three different categories of data: structured, semi-structured, and unstructured. Data can range from structured, such as tables, to unstructured, such as free-form text.
  2. Ingestion: Once the data sources have been identified, the data needs to be ingested by the pipeline. The primary purpose of the ingestion stage is to take the raw data, perform some Extract-Transform-Load (ETL) operations to format the data in a way that helps with your analysis, and send the data to the storage stage. The data can be ingested using tools and services such as Apache Kafka, Azure Event Hubs, and IoT Hub. Chapter 4, Ingesting Data in Azure Data Explorer, discusses the different ingestion methods, such as streaming versus batch, and demonstrates how to ingest data using multiple services, such as Azure Event Hubs and Azure Blob storage.
  3. Store: Once ingested, ADX natively compresses and stores the data in a proprietary format. The data is then cached locally on the cluster based on the hot cache settings....

Table of contents

  1. Scalable Data Analytics with Azure Data Explorer
  2. Foreword
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Preface
  7. Section 1: Introduction to Azure Data Explorer
  8. Chapter 1: Introducing Azure Data Explorer
  9. Chapter 2: Building Your Azure Data Explorer Environment
  10. Chapter 3: Exploring the Azure Data Explorer UI
  11. Section 2: Querying and Visualizing Your Data
  12. Chapter 4: Ingesting Data in Azure Data Explorer
  13. Chapter 5: Introducing the Kusto Query Language
  14. Chapter 6: Introducing Time Series Analysis
  15. Chapter 7: Identifying Patterns, Anomalies, and Trends in your Data
  16. Chapter 8: Data Visualization with Azure Data Explorer and Power BI
  17. Section 3: Advanced Azure Data Explorer Topics
  18. Chapter 9: Monitoring and Troubleshooting Azure Data Explorer
  19. Chapter 10: Azure Data Explorer Security
  20. Chapter 11: Performance Tuning in Azure Data Explorer
  21. Chapter 12: Cost Management in Azure Data Explorer
  22. Chapter 13: Assessment
  23. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Scalable Data Analytics with Azure Data Explorer by Jason Myerscough,Arunee Singhchawla in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.