Cloud Analytics with Microsoft Azure
eBook - ePub

Cloud Analytics with Microsoft Azure

Transform your business with the power of analytics in Azure, 2nd Edition

Has Altaiar, Jack Lee, Michael Peña

  1. 184 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Cloud Analytics with Microsoft Azure

Transform your business with the power of analytics in Azure, 2nd Edition

Has Altaiar, Jack Lee, Michael Peña

Book details
Book preview
Table of contents
Citations

About This Book

Learn to extract actionable insights from your big data in real time using a range of Microsoft Azure features

Key Features

  • Updated with the latest features and new additions to Microsoft Azure
  • Master the fundamentals of cloud analytics using Azure
  • Learn to use Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse) to derive real-time customer insights

Book Description

Cloud Analytics with Microsoft Azure serves as a comprehensive guide for big data analysis and processing using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data

The book begins by introducing you to the power of data with big data analytics, the Internet of Things (IoT), machine learning, artificial intelligence, and DataOps. You will learn about cloud-scale analytics and the services Microsoft Azure offers to empower businesses to discover insights. You will also be introduced to the new features and functionalities added to the modern data warehouse.

Finally, you will look at two real-world business use cases to demonstrate high-level solutions using Microsoft Azure. The aim of these use cases will be to illustrate how real-time data can be analyzed in Azure to derive meaningful insights and make business decisions. You will learn to build an end-to-end analytics pipeline on the cloud with machine learning and deep learning concepts.

By the end of this book, you will be proficient in analyzing large amounts of data with Azure and using it effectively to benefit your organization.

What you will learn

  • Explore the concepts of modern data warehouses and data pipelines
  • Discover unique design considerations while applying a cloud analytics solution
  • Design an end-to-end analytics pipeline on the cloud
  • Differentiate between structured, semi-structured, and unstructured data
  • Choose a cloud-based service for your data analytics solutions
  • Use Azure services to ingest, store, and analyze data of any scale

Who this book is for

This book is designed to benefit software engineers, Azure developers, cloud consultants, and anyone who is keen to learn the process of deriving business insights from huge amounts of data using Azure.

Though not necessary, a basic understanding of data analytics concepts such as data streaming, data types, the machine learning life cycle, and Docker containers will help you get the most out of the book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Cloud Analytics with Microsoft Azure an online PDF/ePUB?
Yes, you can access Cloud Analytics with Microsoft Azure by Has Altaiar, Jack Lee, Michael Peña in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781800200289
Edition
2

1. Introducing analytics on Azure

According to a survey by Dresner Advisory Service in 2019, an all-time high of 48% of organizations say business intelligence in the cloud is either critical or very important in conducting their business operations. The Cloud Computing and Business Intelligence Market Study also showed that sales and marketing teams get the most value out of analytics.
As businesses grow, they generate massive amounts of data every day. This data comes from different sources, such as mobile phones, the Internet of Things (IoT) sensors, and various Software as a Service (SaaS) products such as Customer Relationship Management (CRM) systems. Enterprises and businesses need to scale and modernize their data architecture and infrastructure in order to cope with the demand to stay competitive in their respective industries.
Having cloud-scale analytics capabilities is the go-to strategy for achieving this growth. Instead of managing your own data center, harnessing the power of the cloud allows your businesses to be more accessible to your users. With the help of a cloud service provider such as Microsoft Azure, you can accelerate your data analytics practice without the limitations of your IT infrastructure. The game has changed in terms of maintaining IT infrastructures, as data lakes and cloud data warehouses are capable of storing and maintaining massive amounts of data.
Simply gathering data does not add value to your business; you need to derive insights from it and help your business grow using data analytics, or it will just be a data swamp. Azure is more than just a hub for gathering data; it is an invaluable resource for data analytics. Data analytics provides you with the ability to understand your business and customers better. By applying various data science concepts, such as ML, regression analysis, classification algorithms, and time series forecasting, you can test your hypotheses and make data-driven decisions for the future. However, one of the challenges that organizations continuously face is how to derive these analytical modeling capabilities quickly when processing billions of data rows. This is where having a modern data warehouse and data pipeline can help (more on this in the next sections).
There are a number of ways in which data analytics can help your business thrive. In the case of retail, if you understand your customers better, you will have a better idea of what products you should sell, where to sell them, when to sell them, and how to sell them. In the financial sector, data analytics is helping authorities fight crime by detecting fraudulent transactions and providing more informed risk assessments based on historical criminal intelligence.
This chapter will cover fundamental topics on the power of data with:
  • Big data analytics
  • IoT
  • Machine Learning (ML)
  • Artificial Intelligence (AI)
  • DataOps
You will also learn why Microsoft Azure is the platform of choice for performing analytics on the cloud. Lastly, you will study the fundamental concepts of a modern data warehouse and data pipelines.

The power of data

As a consumer, you have seen how the advent of data has influenced our activities in the daily grind. Most popular entertainment applications, such as YouTube, now provide a customized user experience with features such as video recommendations based on our interests and search history logging information. It is now child's play to discover new content that's similar to our preferred content, and also to find new and popular trending content.
Due to the major shift in wearable technology, it has also become possible to keep track of our health statistics by monitoring heart rates, blood pressure, and so on. These devices then formulate a tailored recommendation based on the averages of these statistics. But these personalized health stats are only a sample of the massive data collection happening every day on a global scale, to which we actively contribute.
Millions of people all over the world use social networking platforms and search engines every day. Internet giants such as Facebook, Instagram, and Google use clickstream data to come up with innovations and improve their services. Data collection is also carried out extensively under projects such as The Great Elephant Census and eBird that aim to boost wildlife conservation. Data-driven techniques have been adopted for tiger conservation projects in India. It even plays an invaluable role in global efforts to compile evidence, causes, and possible responses to climate change—to understand sea surface temperature, analyze natural calamities such as coastal flooding, and highlight global warming patterns in a collective effort to save the ecosystem.
Organizations such as Global Open Data for Agriculture and Nutrition (GODAN), which can be used by farmers, ranchers, and consumers alike, contribute to this tireless data collection as well.
Furthermore (as with the advent of wearable technology), data analysis is contributing to pioneering advancements in the healthcare sector. Patient datasets are analyzed to identify patterns and early symptoms of diseases in order to divine better solutions to known problems.
The scale of data being talked about here is massive—hence, the popular term big data is used to describe the harnessing power of this data at scale.

Note

You can read more about open data https://www.data.gov/.

Big data analytics

The term "big data" is often used to describe massive volumes of data that traditional tools cannot handle. It can be characterized by the five Vs:
  • Volume: This indicates the volume of data that needs to be analyzed for big data analytics. We are now dealing with larger datasets than ever before. This has been made possible because of the availability of electronic products such as mobile devices and IoT sensors that have been widely adopted all over the globe for commercial purposes.
  • Velocity: This refers to the rate at which data is being generated. Devices and platforms, such as those just mentioned, constantly produce data on a large scale and at rapid speed. This makes collecting, processing, analyzing, and serving data at rapid speeds necessary.
  • Variety: This refers to the structure of data being produced. Data sources are inconsistent, having a mix of structured, unstructured, and some semi-structured data (you will learn more about this in the Bringing your data together section).
  • Value: This refers to the value of the data being extracted. Accessible data may not always be valuable. With the right tools, you can derive value from the data in a cost-effective and scalable way.
  • Veracity: This is the quality or trustworthiness of data. A raw dataset will usually contain a lot of noise (or data that needs cleaning) and bias and will need cleaning. Having a large dataset is not useful if most of the data is not accurate.
Big data analytics is the process of finding patterns, trends, and correlations in unstructured data to derive meaningful insights that shape business decisions. This unstructured data is usually large in file size (images, videos, and social graphs, for instance).
This does not mean that relational databases are not relevant for big data. In fact, modern data warehouse platforms such as Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse) support structured and semi-structured data (such as JSON) and can infinitely scale to support terabytes to petabytes of data. Using Microsoft Azure, you have the flexibility to choose any platform. These technologies can complement each other to achieve a robust data analytics pipeline.
Here are some of the best use cases of big data analytics:
  • Social media analysis: Through social media sites such as Twitter, Facebook, and Instagram, companies can learn what customers are saying about their products and services. Social media analysis helps companies to target their audiences by utilizing user preferences and market trends. The challenges here are the massive amount of data and the unstructured nature of tweets and posts.
  • Fraud prevention: This is one of the most familiar use cases of big data. One of the prominent features of big data analytics when used for fraud prevention is the ability to detect anomalies in a dataset. Validating credit card transactions by understanding transaction patterns such as location data and categories of purchased items is an example of this. The biggest challenge here is ensuring that the AI/ML models are clean and unbiased. There might be a chance that the model was trained just for a specific parameter, such as a user's country of origin, hence the model will focus on determining patterns on just the user's location and might miss out on other parameters.
  • Price optimization: Using big data analytics, you can predict what price points will yield the best results based on historical market data. This allows companies to ensure that they do not price their items too high or too low. The challenge here is that many factors can affect prices. Focusing on just a specific factor, such as a competitor's price, might eventually train your model to just focus on that area, and may disregard other factors such as weather and traffic data.
Big data for businesses and enterprises is usually accompanied by the concept of having an IoT infrastructure, where hundreds, thousands, or even millions of devices are connected to a network that constantly sends data to a server.

Internet of Things (IoT)

IoT plays a vital role in scaling your application to go beyond your current data sources. IoT is simply an interconnection of devices that are embedded to serve a single purpose in objects around us to send and receive data. IoT allows us to constantly gather data about "things" without manually encoding them into a database.
A smartwatch is a good example of an IoT device that constantly measures your body's vital signs. Instead of getting a measuring device and encoding it to a system, a smartwatch allows you to record your data automatically. Another good example is a device tracker for an asset that captures location, temperature, and humidity information. This allows logistics companies to monitor their items in transit, ensuring the quality and efficiency of their services.
At scale, these IoT devices generate anywhere from gigabytes to terabytes of data. This data is usually stored in a data lake in a raw, unstructured format, and is later analyzed to derive business insights. A data lake is a centralized repository of all structured, semi-structured, and unstructured data. In the example of the logistic company mentioned previously, patterns (such as the best delivery routes) could be generated. The data could also be used to understand anomalies such as data leakage or suspected fraudulent activities.

Machine learning

As your data grows in size, it opens a lot of opportunities for businesses to go beyond understanding business trends and patterns. Machine learning and artificial intelligence are examples of innovations that you can exploit with your data. Building your artificial intelligence and ML capabilities is relatively easy now because of the availability of the requisite technologies and the ability to scale your storage and compute on the cloud.
Machine learning and artificial intelligence are terms that are often mixed up. In a nutshell, machine learning is a subset (or application) of artificial intelligence. Machine learning aims to allow systems to learn from past datasets and adapt automatically without human assistance. This is made possible by a series of algorithms being applied to the dataset; the algorithm analyzes the data in near-real-time and then comes up with possible actions based on accuracy or confidence derived from previous experience.
The word "learning" indicates that the program is constantly learning from data fed to it. The aim of machine learning is to strive for accuracy rather than success. There are three main categories of machine learning algorithms: supervised, unsupervised, and reinforcement.
Supervised machine learning algorithms create a mapping function to map input variables with an output variable. The algorithm uses existing datasets to train itself to predict the output. Classification is a form of supervised ML that can be used in applications such as image categorization or customer segmentation, which is used for targeted marketing campaigns.
Unsupervised machine learning, on the other hand, is when you let a program find a pattern of its own without any labels. A good example is understanding customer purchase patterns when buying products. You get inherent groupings (clustering) according to purchasing behaviors, and the program can associate customers and products according to patterns of purchase. For instance, you may discern that customers who buy Product A tend to buy Product B too. This is an example of a user-based recommendation algorithm and market-based analysis. What it would eventually mean for users is that when they buy a particular item, such as a book, the user is also encouraged to buy other books that belong to the same series, genre, or category.
Reinforcement Learning (RL) provides meaningful insights and actions based on rewards and punishment. The main difference between this and supervised learning is that it does not need labeled input and output as part of the algorithm. An excellent example of this is the new financial trend for "robo-advisors." Robo-advisors run using agent...

Table of contents