Data Modeling for Azure Data Services
eBook - ePub

Data Modeling for Azure Data Services

Peter ter Braake

Share book
  1. 428 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Modeling for Azure Data Services

Peter ter Braake

Book details
Book preview
Table of contents
Citations

About This Book

Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guideKey Featuresā€¢ Design a cost-effective, performant, and scalable database in Azureā€¢ Choose and implement the most suitable design for a databaseā€¢ Discover how your database can scale with growing data volumes, concurrent users, and query complexityBook DescriptionData is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.What you will learnā€¢ Model relational database using normalization, dimensional, or Data Vault modelingā€¢ Provision and implement Azure SQL DB and Azure Synapse SQL Poolsā€¢ Discover how to model a Data Lake and implement it using Azure Storageā€¢ Model a NoSQL database and provision and implement an Azure Cosmos DBā€¢ Use Azure Data Factory to implement ETL/ELT processesā€¢ Create a star schema model using dimensional modelingWho this book is forThis book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Data Modeling for Azure Data Services an online PDF/ePUB?
Yes, you can access Data Modeling for Azure Data Services by Peter ter Braake in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781801076708
Edition
1

Section 1 ā€“ Operational/OLTP Databases

Databases are used in everyday applications such as webshops, CRM systems, and financial systems. These are our Line Of Business (LOB) applications. In this section, we will learn how to choose between a SQL database and a Cosmos DB database. We will also learn how to design a database for optimal performance and scalability.
This section comprises the following chapters:
  • Chapter 1, Introduction to Databases
  • Chapter 2, Entity Analysis
  • Chapter 3, Normalizing Data
  • Chapter 4, Provisioning and Implementing an Azure SQL DB
  • Chapter 5, Designing a NoSQL Database
  • Chapter 6, Provisioning and Implementing an Azure Cosmos DB Database

Chapter 1: Introduction to Databases

Data has become increasingly important over the last few years. Almost all applications use data, whether the application is a Customer Relationship Management (CRM) system at work or a social media app on your phone. All that data is stored in databases. Since the 1980s, almost all those databases have been relational databases. Nowadays, with the advent of big data, there are different ways to store and process huge amounts of data. Some of them can be classified as so-called NoSQL databases. NoSQL stands for "not only" SQL. This means that we are seeing other types of databases emerge and being used alongside relational databases. NoSQL databases are important in the area of big data. The "SQL" in NoSQL stands for Structured Query Language. This is the programming language of relational databases and has become the "equivalent" of relational databases.
In this chapter, you will learn the basics of databases. A lot of the theory discussed in this chapter stems from relational databases, although the majority is applicable to other database systems as well.
We will discuss the following topics in this chapter:
  • Overview of relational databases
  • Introduction to Structured Query Language
  • Impact of intended usage patterns on database design
  • Understanding relational theory
  • Keys
  • Types of workload

Overview of relational databases

Databases hadn't yet been invented when we first started programming computer applications. All data had to be stored in files. Oftentimes, those files were simple comma-separated value files (CSV files). An example of a CSV file can be seen in the following screenshot:
Figure 1.1 ā€“ Person.csv
Figure 1.1 ā€“ Person.csv
As you can see, it is just some data without anything else.

Files

Using files to store data for use in applications entailed a number of issues. After trying file formats other than CSV files, developers started using databases instead of plain files. Plain files or flat files are files with just data stored in them. Figure 1.1 is an example of a flat file. Let's look into the issues that using flat files posed.
From the header of the screenshot in Figure 1.1, it is clear that the file is called Person.csv. We may infer that the data in the files represents persons. However, it is not clear whether those people are patients, customers, employees, or even someone completely different. Furthermore, you cannot ascertain that extra information from the file or its content.

Drawbacks

The use of these types of flat files to store data comes with three drawbacks:
  • You cannot infer from the file itself what the data is about.
  • It is not flexible from a programming perspective and is bad for performance when working with the data.
  • It is (almost) impossible for multiple persons to work with flat files simultaneously.
We will now examine each of these drawbacks in turn.

Drawback 1 ā€“ You cannot infer from the file itself what the data is about

It is clear from looking at the screenshot that each line has two commas, meaning that there are three columns per row. The second column very likely holds a first name. This is a reasonable assumption based on our knowledge of names, although you may require a knowledge of Dutch names to make this assumption. The third column is more difficult to guess. It could be the year of birth of the person in question, but it could also be a postal code or perhaps a monthly salary.
The file only stores the actual data and not the metadata. It may be that you can guess what the values mean, but you cannot infer it from the file itself. Metadata is the data describing the data. Column names are an example of metadata. Column names such as PatientID, Patient_FirstName, and PostalCode would already render it more readable. That is why we often add those column names as a first row in flat files.
Note
Metadata is data that describes the "actual" data.
There is even more to ascertain regarding this data. You cannot perform calculations with postal codes, such as adding up two postal codes (it may be that you can, but it doesn't make any sense). A postal code is an alphanumeric code that you cannot perform computations with. When the last column in Figure 1.1 is a salary and not a postal code, you do want (and need) to be able to perform calculations on this column, for instance, to calculate an annual salary from the monthly salaries. In this case, the column would have been numerical. In other words, it would be beneficial to know a column's data type. Generally speaking, data can be numerical, alphanumerical (text), or dates. Nowadays, of course, there are a lot of variations, such as binary data for pictures.
With data stored in flat files, the data itself and the metadata are stored separately.
Today, we have overcome some of these issues by not using flat files but storing data as XML or as JSON files. Both file types allow you to store metadata with the actual data in the file itself. In the (recent) past, this was too expensive to do. Only recently has storage become cheap enough and compute power plentiful enough to work with text files by storing data and metadata in the way that JSON does.

Drawback 2 ā€“ It is not flexible from a programming perspective and is bad for performance when working with the data

It gets nastier when we start using (old-fashioned) program code to work with the data. Suppose you need to know the postal code of the person called Janneke. Your code would now look something like this:
  1. Read a line.
  2. Read the second column.
  3. If the value you read equals Janneke, then return the third column.
  4. Repeat lines 1 to 3 until there are no more lines in the file.
With only two lines in the file, this is pretty fast. This code will become more problematic, however, when a file contains many, many rows. It will become really slow.
It gets even worse when someone changes the file structure. Suppose we add a new column, storing the patient's family name between the second and third columns. The code we just wrote will break because it assumes that the postal code is the third column. However, following the change, it is the fourth column. Retrieving the postal code should be independent of which column it actually is.

Drawback 3 ā€“ It is (almost) impossible for multiple persons to work with flat files simultaneously

In most applications, there will be multiple users working with the data simultaneously. What if your webshop could only have one visitor at a time? A database should make it easy for multiple people or processes to work with the same data at the same time. In the case of relational databases, this is part of their core. A relational database has what is known as the ACID properties to cater to multi-user workloads. You will learn more about the ACID properties in Chapter 5, Designing a NoSQL Database. Without a database system, whether relational or not, multiple users working with the same data would not be i...

Table of contents