eBook - ePub

Data Modeling for Azure Data Services

Name: Data Modeling for Azure Data Services
Author: Peter ter Braake

Peter ter Braake,

428 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Data Modeling for Azure Data Services

Peter ter Braake,

About this book

Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guideKey Features• Design a cost-effective, performant, and scalable database in Azure• Choose and implement the most suitable design for a database• Discover how your database can scale with growing data volumes, concurrent users, and query complexityBook DescriptionData is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.What you will learn• Model relational database using normalization, dimensional, or Data Vault modeling• Provision and implement Azure SQL DB and Azure Synapse SQL Pools• Discover how to model a Data Lake and implement it using Azure Storage• Model a NoSQL database and provision and implement an Azure Cosmos DB• Use Azure Data Factory to implement ETL/ELT processes• Create a star schema model using dimensional modelingWho this book is forThis book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Data Modeling for Azure Data Services by Peter ter Braake in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Modelado y diseño de datos. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2021

eBook ISBN

9781801076708

Edition

Topic

Ciencia de la computación

Subtopic

Modelado y diseño de datos

Section 1 – Operational/OLTP Databases

Databases are used in everyday applications such as webshops, CRM systems, and financial systems. These are our Line Of Business (LOB) applications. In this section, we will learn how to choose between a SQL database and a Cosmos DB database. We will also learn how to design a database for optimal performance and scalability.

This section comprises the following chapters:

Chapter 1, Introduction to Databases
Chapter 2, Entity Analysis
Chapter 3, Normalizing Data
Chapter 4, Provisioning and Implementing an Azure SQL DB
Chapter 5, Designing a NoSQL Database
Chapter 6, Provisioning and Implementing an Azure Cosmos DB Database

Chapter 1: Introduction to Databases

Data has become increasingly important over the last few years. Almost all applications use data, whether the application is a Customer Relationship Management (CRM) system at work or a social media app on your phone. All that data is stored in databases. Since the 1980s, almost all those databases have been relational databases. Nowadays, with the advent of big data, there are different ways to store and process huge amounts of data. Some of them can be classified as so-called NoSQL databases. NoSQL stands for "not only" SQL. This means that we are seeing other types of databases emerge and being used alongside relational databases. NoSQL databases are important in the area of big data. The "SQL" in NoSQL stands for Structured Query Language. This is the programming language of relational databases and has become the "equivalent" of relational databases.

In this chapter, you will learn the basics of databases. A lot of the theory discussed in this chapter stems from relational databases, although the majority is applicable to other database systems as well.

We will discuss the following topics in this chapter:

Overview of relational databases
Introduction to Structured Query Language
Impact of intended usage patterns on database design
Understanding relational theory
Keys
Types of workload

Overview of relational databases

Databases hadn't yet been invented when we first started programming computer applications. All data had to be stored in files. Oftentimes, those files were simple comma-separated value files (CSV files). An example of a CSV file can be seen in the following screenshot:

Figure 1.1 – Person.csv

As you can see, it is just some data without anything else.

Files

Using files to store data for use in applications entailed a number of issues. After trying file formats other than CSV files, developers started using databases instead of plain files. Plain files or flat files are files with just data stored in them. Figure 1.1 is an example of a flat file. Let's look into the issues that using flat files posed.

From the header of the screenshot in Figure 1.1, it is clear that the file is called Person.csv. We may infer that the data in the files represents persons. However, it is not clear whether those people are patients, customers, employees, or even someone completely different. Furthermore, you cannot ascertain that extra information from the file or its content.

Drawbacks

The use of these types of flat files to store data comes with three drawbacks:

You cannot infer from the file itself what the data is about.
It is not flexible from a programming perspective and is bad for performance when working with the data.
It is (almost) impossible for multiple persons to work with flat files simultaneously.

We will now examine each of these drawbacks in turn.

Drawback 1 – You cannot infer from the file itself what the data is about

It is clear from looking at the screenshot that each line has two commas, meaning that there are three columns per row. The second column very likely holds a first name. This is a reasonable assumption based on our knowledge of names, although you may require a knowledge of Dutch names to make this assumption. The third column is more difficult to guess. It could be the year of birth of the person in question, but it could also be a postal code or perhaps a monthly salary.

The file only stores the actual data and not the metadata. It may be that you can guess what the values mean, but you cannot infer it from the file itself. Metadata is the data describing the data. Column names are an example of metadata. Column names such as PatientID, Patient_FirstName, and PostalCode would already render it more readable. That is why we often add those column names as a first row in flat files.

Note

Metadata is data that describes the "actual" data.

There is even more to ascertain regarding this data. You cannot perform calculations with postal codes, such as adding up two postal codes (it may be that you can, but it doesn't make any sense). A postal code is an alphanumeric code that you cannot perform computations with. When the last column in Figure 1.1 is a salary and not a postal code, you do want (and need) to be able to perform calculations on this column, for instance, to calculate an annual salary from the monthly salaries. In this case, the column would have been numerical. In other words, it would be beneficial to know a column's data type. Generally speaking, data can be numerical, alphanumerical (text), or dates. Nowadays, of course, there are a lot of variations, such as binary data for pictures.

With data stored in flat files, the data itself and the metadata are stored separately.

Today, we have overcome some of these issues by not using flat files but storing data as XML or as JSON files. Both file types allow you to store metadata with the actual data in the file itself. In the (recent) past, this was too expensive to do. Only recently has storage become cheap enough and compute power plentiful enough to work with text files by storing data and metadata in the way that JSON does.

Drawback 2 – It is not flexible from a programming perspective and is bad for performance when working with the data

It gets nastier when we start using (old-fashioned) program code to work with the data. Suppose you need to know the postal code of the person called Janneke. Your code would now look something like this:

Read a line.
Read the second column.
If the value you read equals Janneke, then return the third column.
Repeat lines 1 to 3 until there are no more lines in the file.

With only two lines in the file, this is pretty fast. This code will become more problematic, however, when a file contains many, many rows. It will become really slow.

It gets even worse when someone changes the file structure. Suppose we add a new column, storing the patient's family name between the second and third columns. The code we just wrote will break because it assumes that the postal code is the third column. However, following the change, it is the fourth column. Retrieving the postal code should be independent of which column it actually is.

Drawback 3 – It is (almost) impossible for multiple persons to work with flat files simultaneously

In most applications, there will be multiple users working with the data simultaneously. What if your webshop could only have one visitor at a time? A database should make it easy for multiple people or processes to work with the same data at the same time. In the case of relational databases, this is part of their core. A relational database has what is known as the ACID properties to cater to multi-user workloads. You will learn more about the ACID properties in Chapter 5, Designing a NoSQL Database. Without a database system, whether relational or not, multiple users working with the same data would not be i...

Data Modeling for Azure Data Services
Contributors
Preface
Section 1 – Operational/OLTP Databases
Chapter 1: Introduction to Databases
Chapter 2: Entity Analysis
Chapter 3: Normalizing Data
Chapter 4: Provisioning and Implementing an Azure SQL DB
Chapter 5: Designing a NoSQL Database
Chapter 6: Provisioning and Implementing an Azure Cosmos DB Database
Section 2 – Analytics with a Data Lake and Data Warehouse
Chapter 7: Dimensional Modeling
Chapter 8: Provisioning and Implementing an Azure Synapse SQL Pool
Chapter 9: Data Vault Modeling
Chapter 10: Designing and Implementing a Data Lake Using Azure Storage
Section 3 – ETL with Azure Data Factory
Chapter 11: Implementing ETL Using Azure Data Factory
Other Books You May Enjoy

About this book

Frequently asked questions

Information

Section 1 – Operational/OLTP Databases

Chapter 1: Introduction to Databases

Overview of relational databases

Files

Drawbacks

Drawback 1 – You cannot infer from the file itself what the data is about

Drawback 2 – It is not flexible from a programming perspective and is bad for performance when working with the data

Drawback 3 – It is (almost) impossible for multiple persons to work with flat files simultaneously

Table of contents