Managing Data Quality
eBook - ePub

Managing Data Quality

A practical guide

Tim King, Julian Schwarzenbach

Share book
  1. 158 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Managing Data Quality

A practical guide

Tim King, Julian Schwarzenbach

Book details
Book preview
Table of contents
Citations

About This Book

Data is an increasingly important business asset and enabler for organisational activities. With growth in data sets and data volumes, it's becoming ever harder to manage. Data quality - the fitness for purpose of data - is a key aspect of data management and failure to understand it increases organisational risk and decreases efficiency and profitability. This book explains data quality management in practical terms, focusing on three key areas - the nature of data in enterprises, the purpose and scope of data quality management, and implementing a data quality management system, in line with ISO 8000-61.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Managing Data Quality an online PDF/ePUB?
Yes, you can access Managing Data Quality by Tim King, Julian Schwarzenbach in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

PART I
THE CHALLENGE OF ENTERPRISE DATA

This first part of the book will help you to understand better the nature of the data asset and why it can be difficult to manage, particularly in an enterprise or organisational context. Generic behaviours of people relating to data will be explored to help understand how people can affect data quality. Finally, some real-life examples and case studies of data quality problems will be used to help you understand some of the impacts of data that have poor quality.

1 THE DATA ASSET

This chapter describes the differences between data and information, and how these relate to most business activities. We then consider the nature of the data asset and the generic life cycles of data and explain what is meant by the term ‘data quality’. Finally, we introduce the objectives of data quality management.

WHAT ARE DATA?

Before going much further, there are some key terms and concepts that need to be defined and clarified to help ensure consistent understanding as you read this book.
The title of this book is Managing Data Quality, and, because they so often appear together when discussing the impact of computer technology on organisations, there are three important relevant terms that need to be clarified: data, information and knowledge.
When you have more than one data professional in a room, it is likely that there will be fierce debate about these terms. Even the ISO Online Browsing Platform1 (a place where all ISO definitions are gathered together) has numerous different definitions for these terms.
As the subject of this book is data, we can establish a solid foundation for our understanding by referring to the definition for data in ISO 8000-2:
Data: ‘reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing’.
In the case of definitions for information and knowledge, making a choice is more controversial, not least because potential definitions often use the other two terms and any single collection of definitions becomes recursive. However, we believe the following key observations provide sufficient understanding to read the remainder of this book (while we leave more detailed discussion to others):
  • Use of the term ‘information’ suggests richness of meaning, and is typically taking an end-user view of the value of data to organisations to enable decision making.
  • Use of the term ‘knowledge’ suggests an understanding acquired through experience or education, putting knowledge outside the scope of this book; for example, it doesn’t matter how many books you read about cycling, it is only when you have ridden a bike that you have knowledge of how to cycle!
Another complication is use of the terms ‘structured data’ and ‘unstructured data’. These terms have been a handy tool for marketing teams who are promoting particular software functionality (typically to extract meaning from unstructured data), but the two terms hide the reality that no data set in digital form is either fully structured or fully unstructured.
Structured data contain explicit, discrete elements (e.g. the tables, columns and keys within a relational database or the tags within an XML file) to represent meaning. These elements enable automation to generate insight and foresight from the meaning (e.g. being able to identify all the children in a hospital database by filtering the rows where age is less than 18).
Unstructured data are fundamentally text and images, which provide meaning in a way that requires either human expertise or artificial intelligence methods to process the meaning (e.g. a doctor reviews the medical scan that is the content of an image file).
In these examples, though, the database will typically also include unstructured elements (e.g. a free-text field to capture observational notes) and the digital file of the MRI scan will also include structured data in the form of metadata (e.g. the creation date) to support management of all the images.
Furthermore, a spreadsheet is essentially semi-structured, sitting somewhere between a database and an image file, because the rows and columns provide some structure but without the full richness of a relational database or an XML file.
In summary, no data set is ever entirely structured or unstructured. Structure is definitely important to data quality, though, because it captures a more precise, controllable set of requirements for the data. Requirements for unstructured data are less easy to enforce by definitive, repeatable computer-based algorithms.

Data as part of business activities

Any business activity should support the strategy of the organisation (and may have some part to play in developing this strategy). There should be governance in place to ensure that there is suitable senior or executive control and monitoring of this activity. Business activity in this context is not just applicable to commercial organisations, but refers to the activity by which any organisation delivers its core mission. Figure 1.1 illustrates this relationship.
Figure 1.1 The components of a business activity
Image
The four core components of a typical business activity are:
  • The process, which defines the individual steps to be undertaken and, importantly, should ensure that the end-to-end process is effective in delivering the desired outcomes.
  • Data, which include inputs to and outputs from the process, and flows through it.
  • Software and hardware systems, which automate the process by storing and manipulating the data, although not every process will be automated by software.
  • People, who are the ‘actors’ in the process, undertaking key process steps and ensuring suitable organisational outcomes.
Despite data being a key enabler for any process, in many organisations there is a greater management focus on the technology elements, particularly when undertaking business change projects involving software. The software product is likely to be expensive, have a recognised name and be a core part of the project, therefore leading to much attention.
In typical situations, however, the data that will be used to enable the technology to deliver the required outcomes are the data in one or more existing software systems. These data will need to be migrated to the new software tool, but the data migration process is typically a high-risk part of the overall project and, if not undertaken correctly, will actually degrade the quality of the data.
If the quality of existing data is perceived to be poor then no matter how good a new software tool is, and how well it has been implemented, the outcomes of the system will be limited by the quality of the data. This poor quality data can mean that data migration is far more challenging and expensive, and may not even be feasible at all.
We have come across instances where an organisation has been using a spreadsheet-based performance dashboard. Concerns about the quality and integrity of the outputs from this triggered these organisations to spend significant money implementing a ‘best of breed’ analysis and dashboard tool to deliver performance dashboards. However, the data sources were not changed and thus, although the outputs looked far more impressive, the data quality was the same, leading to a false perception of the reliability of the performance indicators shown by the dashboard.
Also, don’t forget that at some point in the future the ‘new’ software tool will be replaced by another tool. Where will the data come from for this even newer software? Well, it will be the data that you currently have (which in turn has been migrated from several different previous systems). This means that out of the four components of the business activity, the one that lasts the longest and will have a massive effect on outputs is the data.

Data are an asset

Data are being created at a faster rate than ever before (however conservatively you forecast future data growth) and data are now more important than they have ever been. As the world becomes a more data-driven place, smart businesses can gain competitive advantages by exploiting data more effectively. This vast data explosion brings newer, different challenges to businesses; it is one thing to store lots of data, but the benefits will only come if the data are of suitable quality and reach the right people at the right time in order to deliver better organisational outcomes. A mindset of treating data as an asset will help your organisation to achieve this.
Many larger organisations, such as those in the utilities and transport sectors, are developing management systems that provide more effective and sustainable management of their assets and activities. Managing data requires a similar mindset.
An asset is a resource with value that can deliver benefit to an organisation. Data, therefore, warrant being treated in the same way as a physical asset. Like physical assets, data:
  • can have high value for your organisation;
  • can be assessed for quality;
  • can drive up business performance and safety by enabling better informed decisions;
  • have legal or regulatory requirements to be managed effectively;
  • have a life cycle – from conception, to capture, to operation and renewal;
  • can increase business costs if not managed effectively (and therefore reduce efficiency and profitability).
Unlike physical assets, data support strategic decision making; get this wrong and you will end up making incorrect, potentially expensive, decisions that could have long-term impact for the organisation. Also, unlike physical assets, when the data...

Table of contents