Managing Data Quality
eBook - ePub

Managing Data Quality

A practical guide

  1. 158 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Managing Data Quality

A practical guide

About this book

Data is an increasingly important business asset and enabler for organisational activities. Data quality is a key aspect of data management and failure to understand it increases organisational risk and decreases efficiency and profitability. This book explains data quality management in practical terms, focusing on three key areas - the nature of data in enterprises, the purpose and scope of data quality management, and implementing a data quality management system, in line with ISO 8000-61.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Managing Data Quality by Tim King,Julian Schwarzenbach in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

PART I
THE CHALLENGE OF ENTERPRISE DATA

This first part of the book will help you to understand better the nature of the data asset and why it can be difficult to manage, particularly in an enterprise or organisational context. Generic behaviours of people relating to data will be explored to help understand how people can affect data quality. Finally, some real-life examples and case studies of data quality problems will be used to help you understand some of the impacts of data that have poor quality.

1 THE DATA ASSET

This chapter describes the differences between data and information, and how these relate to most business activities. We then consider the nature of the data asset and the generic life cycles of data and explain what is meant by the term ‘data quality’. Finally, we introduce the objectives of data quality management.

WHAT ARE DATA?

Before going much further, there are some key terms and concepts that need to be defined and clarified to help ensure consistent understanding as you read this book.
The title of this book is Managing Data Quality, and, because they so often appear together when discussing the impact of computer technology on organisations, there are three important relevant terms that need to be clarified: data, information and knowledge.
When you have more than one data professional in a room, it is likely that there will be fierce debate about these terms. Even the ISO Online Browsing Platform1 (a place where all ISO definitions are gathered together) has numerous different definitions for these terms.
As the subject of this book is data, we can establish a solid foundation for our understanding by referring to the definition for data in ISO 8000-2:
Data: ‘reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing’.
In the case of definitions for information and knowledge, making a choice is more controversial, not least because potential definitions often use the other two terms and any single collection of definitions becomes recursive. However, we believe the following key observations provide sufficient understanding to read the remainder of this book (while we leave more detailed discussion to others):
  • Use of the term ‘information’ suggests richness of meaning, and is typically taking an end-user view of the value of data to organisations to enable decision making.
  • Use of the term ‘knowledge’ suggests an understanding acquired through experience or education, putting knowledge outside the scope of this book; for example, it doesn’t matter how many books you read about cycling, it is only when you have ridden a bike that you have knowledge of how to cycle!
Another complication is use of the terms ‘structured data’ and ‘unstructured data’. These terms have been a handy tool for marketing teams who are promoting particular software functionality (typically to extract meaning from unstructured data), but the two terms hide the reality that no data set in digital form is either fully structured or fully unstructured.
Structured data contain explicit, discrete elements (e.g. the tables, columns and keys within a relational database or the tags within an XML file) to represent meaning. These elements enable automation to generate insight and foresight from the meaning (e.g. being able to identify all the children in a hospital database by filtering the rows where age is less than 18).
Unstructured data are fundamentally text and images, which provide meaning in a way that requires either human expertise or artificial intelligence methods to process the meaning (e.g. a doctor reviews the medical scan that is the content of an image file).
In these examples, though, the database will typically also include unstructured elements (e.g. a free-text field to capture observational notes) and the digital file of the MRI scan will also include structured data in the form of metadata (e.g. the creation date) to support management of all the images.
Furthermore, a spreadsheet is essentially semi-structured, sitting somewhere between a database and an image file, because the rows and columns provide some structure but without the full richness of a relational database or an XML file.
In summary, no data set is ever entirely structured or unstructured. Structure is definitely important to data quality, though, because it captures a more precise, controllable set of requirements for the data. Requirements for unstructured data are less easy to enforce by definitive, repeatable computer-based algorithms.

Data as part of business activities

Any business activity should support the strategy of the organisation (and may have some part to play in developing this strategy). There should be governance in place to ensure that there is suitable senior or executive control and monitoring of this activity. Business activity in this context is not just applicable to commercial organisations, but refers to the activity by which any organisation delivers its core mission. Figure 1.1 illustrates this relationship.
Figure 1.1 The components of a business activity
Image
The four core components of a typical business activity are:
  • The process, which defines the individual steps to be undertaken and, importantly, should ensure that the end-to-end process is effective in delivering the desired outcomes.
  • Data, which include inputs to and outputs from the process, and flows through it.
  • Software and hardware systems, which automate the process by storing and manipulating the data, although not every process will be automated by software.
  • People, who are the ‘actors’ in the process, undertaking key process steps and ensuring suitable organisational outcomes.
Despite data being a key enabler for any process, in many organisations there is a greater management focus on the technology elements, particularly when undertaking business change projects involving software. The software product is likely to be expensive, have a recognised name and be a core part of the project, therefore leading to much attention.
In typical situations, however, the data that will be used to enable the technology to deliver the required outcomes are the data in one or more existing software systems. These data will need to be migrated to the new software tool, but the data migration process is typically a high-risk part of the overall project and, if not undertaken correctly, will actually degrade the quality of the data.
If the quality of existing data is perceived to be poor then no matter how good a new software tool is, and how well it has been implemented, the outcomes of the system will be limited by the quality of the data. This poor quality data can mean that data migration is far more challenging and expensive, and may not even be feasible at all.
We have come across instances where an organisation has been using a spreadsheet-based performance dashboard. Concerns about the quality and integrity of the outputs from this triggered these organisations to spend significant money implementing a ‘best of breed’ analysis and dashboard tool to deliver performance dashboards. However, the data sources were not changed and thus, although the outputs looked far more impressive, the data quality was the same, leading to a false perception of the reliability of the performance indicators shown by the dashboard.
Also, don’t forget that at some point in the future the ‘new’ software tool will be replaced by another tool. Where will the data come from for this even newer software? Well, it will be the data that you currently have (which in turn has been migrated from several different previous systems). This means that out of the four components of the business activity, the one that lasts the longest and will have a massive effect on outputs is the data.

Data are an asset

Data are being created at a faster rate than ever before (however conservatively you forecast future data growth) and data are now more important than they have ever been. As the world becomes a more data-driven place, smart businesses can gain competitive advantages by exploiting data more effectively. This vast data explosion brings newer, different challenges to businesses; it is one thing to store lots of data, but the benefits will only come if the data are of suitable quality and reach the right people at the right time in order to deliver better organisational outcomes. A mindset of treating data as an asset will help your organisation to achieve this.
Many larger organisations, such as those in the utilities and transport sectors, are developing management systems that provide more effective and sustainable management of their assets and activities. Managing data requires a similar mindset.
An asset is a resource with value that can deliver benefit to an organisation. Data, therefore, warrant being treated in the same way as a physical asset. Like physical assets, data:
  • can have high value for your organisation;
  • can be assessed for quality;
  • can drive up business performance and safety by enabling better informed decisions;
  • have legal or regulatory requirements to be managed effectively;
  • have a life cycle – from conception, to capture, to operation and renewal;
  • can increase business costs if not managed effectively (and therefore reduce efficiency and profitability).
Unlike physical assets, data support strategic decision making; get this wrong and you will end up making incorrect, potentially expensive, decisions that could have long-term impact for the organisation. Also, unlike physical assets, when the data...

Table of contents

  1. Front Cover
  2. Half-Title Page
  3. BCS, THE CHARTERED INSTITUTE FOR IT
  4. Title Page
  5. Copyright Page
  6. Contents
  7. List of figures and tables
  8. Authors
  9. Acknowledgements
  10. Abbreviations
  11. Glossary
  12. Preface
  13. Part I: The Challenge of Enterprise Data
  14. Part II: A Framework For Data Quality Management
  15. Part III: Implementing Data Quality Management
  16. Conclusions
  17. Bibliography
  18. Index
  19. Back Cover