Data Mapping for Data Warehouse Design
eBook - ePub

Data Mapping for Data Warehouse Design

  1. 180 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Mapping for Data Warehouse Design

About this book

Data mapping in a data warehouse is the process of creating a link between two distinct data models' (source and target) tables/attributes. Data mapping is required at many stages of DW life-cycle to help save processor overhead; every stage has its own unique requirements and challenges. Therefore, many data warehouse professionals want to learn data mapping in order to move from an ETL (extract, transform, and load data between databases) developer to a data modeler role. Data Mapping for Data Warehouse Design provides basic and advanced knowledge about business intelligence and data warehouse concepts including real life scenarios that apply the standard techniques to projects across various domains. After reading this book, readers will understand the importance of data mapping across the data warehouse life cycle. - Covers all stages of data warehousing and the role of data mapping in each - Includes a data mapping strategy and techniques that can be applied to many situations - Based on the author's years of real-world experience designing solutions

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Data Mapping for Data Warehouse Design by Qamar Shahbaz in PDF and/or ePUB format, as well as other popular books in Informatique & Sciences générales de l'informatique. We have over one million books available in our catalogue for you to explore.
Chapter 1

Introduction

Abstract

Data mapping is the most important design step in the data warehouse lifecycle and impacts project success or failure. The process links the design and implementation phase of the project. The outcome of the process is the data mapping document, which is the main tool for communication between project designers and developers.

Keywords

data mapping; source matrix (SMX); transformation design document
Data mapping is the most important design step in the data warehouse lifecycle, and it impacts project success or failure. The process links the design and implementation phases of the project. The outcome of the process is the data mapping document, which is the main tool for communication between project designers and developers.
The data mapping document provides detailed steps in the data mapping process and provides a guide that a data mapper can use to successfully complete his or her task. The document also provides data mapping scenarios explaining different approaches to a problem and their pros and cons.

Definition

Data mapping in a data warehouse is the process of creating a link between two distinct data models’ (source and target) tables or attributes.
Chapter 2

Data Mapping Stages

Abstract

Data mapping is required at many stages of the data warehouse lifecycle; every stage has its own unique requirements and challenges. A data mapper’s biggest challenge is to understand how data will flow from the source system to the final graphical user interface; this flow will determine how data should be transformed to achieve the end goal.

Keywords

access layer; extract; transform; load (ETL); landing area; load-ready area; staging
Data mapping is required at many stages of the data warehouse life cycle; every stage has its own unique requirements and challenges. A data mapper’s biggest challenge is to understand how data will flow from the source system to the final graphical user interface; this flow will determine how data should be transformed to achieve the end goal.

Mapping from the Source to the Data Warehouse Landing Area

This kind of mapping is usually one to one, but may sometimes include transformations that can be done inside the source database engine. Such mapping helps by saving processing overhead toward the technology end of the architecture.

Mapping from the Landing Area to the Staging Database

Mapping from the landing area to the staging area is done by:
1. Selecting a subset of columns from the complete source file
2. Splitting a single column into multiple columns
3. Using information coming from the header or trailer for different purposes to cast the timestamp or date values to match the Target Database formats and so on.

Mapping from the Staging Database to the Load Ready or Target Database

In this stage of the data warehouse lifecycle, source data is transformed into data warehouse data; data from here onward will be treated as information. This is why maximum importance, resources, and time should be given to this stage of data mapping. This book highlights various data mapping techniques for this stage.
The rules at this point can be complex and may involve multiple tables or sources. The rules here are governed by the vision that the data modeler has in mind about the data in a logical data model (LDM). All kinds of data integrations, history handling, data joining, lookups, reference data population, data-type conversion, and so on should be documented here. Usually this kind of data mapping is referred to as source matrix (SMX) or detail transformation design (DTD).

Mapping from Logical Data Model to the Semantic or Access Layer

Data mapping LDM to the semantic, access, or PL layer involves data transformation to bring data into a state where business users can run reports and use the data as information. We will discuss this kind of data mapping in this chapter; however, Chapter 12 maps data for a scenario involving PL attributes in LDM.
Chapter 3

Data Mapping Types

Abstract

There are two types of data mapping done in any data warehouse project. The high-level logical data mapping is part of the data modeling process, and implementation-oriented physical data mapping is used to document the transformation rule in detail.

Keywords

logical data mapping; physical data mapping
There are two types of data mapping done in any data warehouse project. High-level logical data mapping is part of the data modeling process, and implementation-oriented physical data mapping is used to document the transformation rule in detail.

Logical Data Mapping

After the logical data model (LDM) is complete, the data mapper will start mapping source elements to LDM. This is high-level mapping and provides a baseline for more detailed physical mappings; the rules written are related more to logical concepts than to implementation.

Physical Data Mapping

When the physical data model is complete, the data mapper will write physical mappings or would physicalize the logical mappings written earlier. Here more detailed rules are needed that convey the mapper’s vision of the data to the ETL (extract, transform, load) developer.
In this book, we will only discuss physical data mapping.
Chapter 4

Data Models

Abstract

The logical data model defines how data will be stored or linked in the data warehouse. The process gives a single picture of an organization’s business and is the key process in the data warehouse lifecycle.

Keywords

logical data model (LDM); data modeling; physical data model (PDM); primary key; ER (Entity Relationship) Diagram
Before we go into a detailed discussion about data mapping, we need to understand the work that has been done before data mapping starts. The sequence in which a project usually flows is described in Figure 4.1.
image

Figure 4.1 Data warehouse design steps.
A data mapper needs to understand the data model of the project to be able to make correct data mappings. The model might contain different forms of modeling techniques and require special considerations for certain entities or tables.
The data modeler starts by modeling the client’s real-life objects into high-level concepts. This will result in a conceptual model; it can be high level or a relatively detailed one. Here the data modeler will not add any columns and might club many tables into a single concept. For example, the data modeler can group an employee, personal details, history, and so on into one concept employee (Figure 4.2).
image

Figure 4.2 Data model types for data warehouse.
The next step in the modeling process is to identify logical entities in the client’s business, add attributes for each entity, and create relationships among entities. This generates a logical data model of the business that represents all entities of the business. This model will have complete information from a logical representation perspec...

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Chapter 1. Introduction
  7. Chapter 2. Data Mapping Stages
  8. Chapter 3. Data Mapping Types
  9. Chapter 4. Data Models
  10. Chapter 5. Data Mapper’s Strategy and Focus
  11. Chapter 6. Uniqueness of Attributes and its Importance
  12. Chapter 7. Prerequisites of Data Mapping
  13. Chapter 8. Surrogate Keys versus Natural Keys
  14. Chapter 9. Data Mapping Document Format
  15. Chapter 10. Data Analysis Techniques
  16. Chapter 11. Data Quality
  17. Chapter 12. Data Mapping Scenarios
  18. Glossary and Nomenclature List
  19. Bibliography