A Manager's Guide to Data Warehousing
eBook - ePub

A Manager's Guide to Data Warehousing

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

A Manager's Guide to Data Warehousing

About this book

Aimed at helping business and IT managers clearly communicate with each other, this helpful book addresses concerns straight-on and provides practical methods to building a collaborative data warehouse. You'll get clear explanations of the goals and objectives of each stage of the data warehouse lifecycle while learning the roles that both business managers and technicians play at each stage. Discussions of the most critical decision points for success at each phase of the data warehouse lifecycle help you understand ways in which both business and IT management can make decisions that best meet unified objectives.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access A Manager's Guide to Data Warehousing by Laura Reeves in PDF and/or ePUB format, as well as other popular books in Informatique & Bases de données. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2009
Print ISBN
9780470176382
eBook ISBN
9780470549384
Edition
1
PART I
The Essentials of Data Warehousing
CHAPTER 1
Gaining Data Warehouse Success
You’ve heard about them. You may have used one. You may be asked to pay for one. But what is a data warehouse and why should you invest any time, energy, and money on one? The short answer is that a data warehouse can help your organization to be more profitable, run more efficiently, and meet the challenges of today’s marketplace. Yet it is not a quick, simple, or inexpensive undertaking to build a data warehouse.
There is often a disconnect between the technical side that builds and maintains a data warehouse and the business side that will use it. This book will help bridge that gap. Both business managers and IT managers will learn what is involved with building and deploying a successful data warehouse. Executives and senior managers will also find this book helpful, especially Part 1, in order to be able to provide effective oversight and support. This book will also be beneficial for all business and technical personnel involved with a data warehouse, providing a common foundation for better communication. Managers on both sides need the knowledge and information that will enable them to help their organization build and use a data warehouse most effectively, and this book is the path to that knowledge.
This chapter explains the value of a data warehouse and highlights what is needed for success. To help frame the discussion, the chapter begins with some definitions.

The Essentials of Data Warehousing

Data warehousing is not new. Most large organizations have been investing in data warehousing for years. Currently, cost-effective technology is creating more possibilities for small and medium-size companies to build and deploy data warehouse solutions too. There are many stories about wild successes, and just as many about failed projects. With so much buzz about data warehousing, it is often assumed that everyone already knows the basics. However, many people are being exposed to these concepts for the first time. To ensure a common understanding, it is worth taking the time to boil things down to the essence of data warehousing.

What Is a Data Warehouse?

A data warehouse (DW) is the collection of processes and data whose overarching purpose is to support the business with its analysis and decision-making. In other words, it is not one thing per se, but a collection of many different parts. Before looking more closely at the specific parts of a data warehouse environment, it is helpful to compare the characteristics and purpose of a data warehouse with an operational application system.

Differences Between Operational and DW Systems

Applications that run the business are called online transaction processing systems (OLTPs). OLTP systems are geared toward functions such as processing incoming orders, getting products shipped out, and transferring funds as requested. These applications must ensure that transactions are handled accurately and efficiently. No one wants to wait minutes to get cash from an automated teller machine, or to enter sales orders into a company’s system.
In contrast, the purpose and characteristics of a data warehousing environment are to provide data in a format easily understood by the business community in order to support decision-making processes. The data warehouse supports looking at the business data over time to identify significant trends in buying behavior, customer retention, or changes in employee productivity. Table 1-1 lays out the primary differences between these two types of systems.
The inherent differences between the functions performed in OLTP and DW systems result in methodology, architecture, tool, and technology differences. Data warehousing emerged as an outgrowth of necessity, but has blossomed into a full-fledged industry that serves a valuable function in the business community.
Now that the differences between data warehouse and OLTP systems have been reviewed, it is time to look deeper into the makeup of the data warehouse itself.

The Data Warehousing Environment

There are many different parts of a data warehouse environment, which encompasses everything from where the data lives today through where it is ultimately used on reports and for analysis. Each of the main parts of the data warehousing environment, shown in Figure 1-1, are described in the following sections. This figure indicates how the data flows throughout the environment.
Table 1-1 Comparison of Online Transaction Processing and Data Warehousing Systems
002
Figure 1-1 Basic data warehousing environment
003
Source systems, shown on the left side of Figure 1-1, are where data is created or collected by operational application systems that run the business. These are often large applications that have been in place for a long time. Examples of source systems include the following:
• Order processing
• Production scheduling
• Financial trading systems
• Policy administration
• Claims handling
• Accounts payable/receivable
• Employee payroll
The entire midsection of Figure 1-1 is devoted to the preparation and organization of data. First, the data must be extracted from the source systems. Next, the data needs to be transformed to prepare it for business use. It must be cleansed, validated, integrated together, and reorganized. Finally, the data is loaded into structures that are designed to deliver it to the business community. The entire process is referred to as the extract, transform, and load (ETL) process.
The database in which the data is organized to support the business is called a data mart. A data mart includes all of the data that is loaded into a single database and used together for analysis. Data marts are often developed to meet the needs of a business group such as marketing or finance. The key to a successful data mart is to create it in an integrated manner. It is also recommended that data be loaded into only one data mart and then shared across the organization to ensure data consistency.
Finally, an application or reporting layer is provided to facilitate access and analysis of the data. This is where business users access reports, dashboards, and analytical applications. Collections of these reports and analyses are called business intelligence.
There is one more critical concept that warrants some attention: the mechanism used to help organize data, which is called a data model.

What Is a Data Model?

A data model is an abstraction of how individual data elements relate to each other. It visually depicts how the data is to be organized and stored in a database. A data model provides the mechanism for documenting and understanding how data is organized.
There are many different types of data modeling, each with a specific goal and purpose. As organizations modified how data was structured to support reporting and analysis, a new data modeling technique, now called dimensional modeling, emerged. Ralph Kimball, a pioneer in data warehousing, can be credited with crystallizing these techniques and publishing them for the benefit of the industry. (For more information about dimensional modeling, see Chapter 7.)
The data and processes to perform the work shown in Figure 1-1 are collectively called the data warehouse. These basic concepts have been fine-tuned and relabeled by many different players in the data warehousing field. The two most common philosophies are discussed in the next section.

Understanding Industry Perspectives

At the end of the day, everyone faces the same challenge: getting the data into the hands of the business user to turn it into information that can be used to make decisions. The definitions provided so far provide the backdrop for how terms are used in this book. There are many brilliant and talented people in the data warehousing industry, many of whom have different philosophies about how to design and build a data warehouse. It is worthwhile to look more closely at two of the most frequently used philosophies.
The first is from Ralph Kimball and colleagues, as described in The Data Warehouse Lifecycle Toolkit, Second Edition (Wiley, 2008). Ralph Kimball is a clear thought leader in the data warehousing industry and has written several books that provide detailed information essential for practitioners. That book describes the enterprise data warehouse as:
The complete end-to-end data warehouse and business intelligence system (DW/BI System). Although some would argue that you can theoretically deliver business intelligence without a data warehouse and vice versa, that is ill-advised from our perspective. Linking the two together in the DW/BI acronym reinforces their dependency. Independently, we refer to the queryable data in your DW/BI system as the enterprise data warehouse, and value-add analytics as BI (business intelligence) applications.
A second definition worth looking at is from Bill Inmon, a prolific author and another leader in the data warehousing industry, from Building the Data Warehouse, Fourth Edition (Wiley, 2005):
The data warehouse is a collection of integrated subject-oriented databases designed to support the DSS (decision support system) function, where each unit of data is relevant to some moment in time. The data warehouse contains atomic data and lightly summarized data.
Bill’s definition is also described and expanded by Claudia Imhoff and colleagues in Mastering Data Warehouse Design (Wiley, 2003): . . . It [the DW] is the central point of data integration for business intelligence and is the source of data for the data marts, delivering a common view of enterprise data.
This second viewpoint is incomplete without also including their definition of a data mart. Again, expanding on Bill Inmon’s definition, Claudia states in Mastering Data Warehouse Design:
A data mart is a departmentalized structure of data feeding from the data warehouse where data is denormalized [organized] based on the department’s need for information. It utilizes a common enterprise view of strategic data and provides business units with more flexibility, control, and responsibility. The data mart may or may not be on the same server or location as the data warehouse.
To bring this second viewpoint into the proper context, Mastering Data Warehouse Design further defines business intelligence:
Business intelligence is the set of processes and data structures used to analyze data and information used in strategic decision support. The components of Business Intelligences are the data warehouse, data marts, the DSS (decision support system) interface and the processes to ‘get data in’ to the data warehouse and to ‘get information out’.
The single definition provided by Ralph Kimball is comprehensive. You must look at the full set of definitions set forth by Bill Inmon and Claudia Imhoff to fully understand their perspective. There is much more common ground between these differing philosophies than perceived at first glance. While there are distinct differences, the common theme is that data warehousing must provide the method to prepare and deliver data to the business community to support reporting and analysis. Chapter 9 provides a comprehensive discussion about these different approaches to data warehousing.
The key point here is that there are multiple ways these terms can be interpreted. Understanding which definition is being used is critical to being able to understand what is being discussed and worked on. An organization can avoid confusion by selecting one set of definitions to be used, which enables everyone to use a common language.
Regardless of labels and terminology, all data warehouse initiatives are trying to accomplish the same thing. Now that the basic parts of the data warehouse have been defined, it is time to look at the order in which they are created.

Design and Development Sequence

Earlier in this chapter, you looked at how data flows through the data warehouse environment. While this correctly illustrates how data flows in the completed environment, this is not the recommended sequence for designing and developing a data warehouse. A better way to design the environment is to start from the business user perspective. Figure 1-2 shows the correct order to successfully design and implement a data warehousing environment. Both the technical and business team members play a role throughout. Chapter 4 describes the different roles and responsibilities. Each step in the design process is described as follows:
1. An understanding of what the business is trying to accomplish and how success is measured should be the foundation for all data warehousing initiatives. The starting point for designing the data warehouse is with the business community. Chapter 6 covers what you need to know to effectively provide business requirements.
2. Once the business requirements are understood, the data in the underlying source systems needs to be studied. Many business people have a vision for what they want to do, but it is not always tied to the reality of the organization’s actual data. In preparation for modeling data, Chapter 7 introduces techniques to help you understand your data.
Figure 1-2 Optimal data warehouse design and development sequence
004
3. The foundation for successful data warehousing, now and into the future, is properly structuring the data. Data must be organized to support the business perspective. This provides ease of use and improved query performance. This design is created based on a knowledge of the business requirements, as well as the reality of the existing data. Chapter 7 focuses on how the business and technical team members can work together to develop appropriate data models for this data delivery layer.
4. After defining how the data will be organized, the design for getting the data from the source systems to the database can be created. Decisions about the architecture and tools needed to prepare the data can be made in the pro...

Table of contents

  1. Title Page
  2. Copyright Page
  3. About the Author
  4. Credits
  5. Acknowledgements
  6. Introduction
  7. PART I - The Essentials of Data Warehousing
  8. PART II - The Business Side of Data Warehousing
  9. PART III - Dealing with the Data
  10. PART IV - Building the Project
  11. PART V - Next Steps—Expanding on Success
  12. Glossary
  13. Index