Managing Data Quality
eBook - ePub

Managing Data Quality

A practical guide

Tim King, Julian Schwarzenbach

Buch teilen
  1. 158 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfĂŒgbar
eBook - ePub

Managing Data Quality

A practical guide

Tim King, Julian Schwarzenbach

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Data is an increasingly important business asset and enabler for organisational activities. With growth in data sets and data volumes, it's becoming ever harder to manage. Data quality - the fitness for purpose of data - is a key aspect of data management and failure to understand it increases organisational risk and decreases efficiency and profitability. This book explains data quality management in practical terms, focusing on three key areas - the nature of data in enterprises, the purpose and scope of data quality management, and implementing a data quality management system, in line with ISO 8000-61.

HĂ€ufig gestellte Fragen

Wie kann ich mein Abo kĂŒndigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kĂŒndigen“ – ganz einfach. Nachdem du gekĂŒndigt hast, bleibt deine Mitgliedschaft fĂŒr den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich BĂŒcher herunterladen?
Derzeit stehen all unsere auf MobilgerĂ€te reagierenden ePub-BĂŒcher zum Download ĂŒber die App zur VerfĂŒgung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die ĂŒbrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den AboplÀnen?
Mit beiden AboplÀnen erhÀltst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst fĂŒr LehrbĂŒcher, bei dem du fĂŒr weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhĂ€ltst. Mit ĂŒber 1 Million BĂŒchern zu ĂŒber 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
UnterstĂŒtzt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nÀchsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Managing Data Quality als Online-PDF/ePub verfĂŒgbar?
Ja, du hast Zugang zu Managing Data Quality von Tim King, Julian Schwarzenbach im PDF- und/oder ePub-Format sowie zu anderen beliebten BĂŒchern aus Computer Science & Data Processing. Aus unserem Katalog stehen dir ĂŒber 1 Million BĂŒcher zur VerfĂŒgung.

Information

PART I
THE CHALLENGE OF ENTERPRISE DATA

This first part of the book will help you to understand better the nature of the data asset and why it can be difficult to manage, particularly in an enterprise or organisational context. Generic behaviours of people relating to data will be explored to help understand how people can affect data quality. Finally, some real-life examples and case studies of data quality problems will be used to help you understand some of the impacts of data that have poor quality.

1 THE DATA ASSET

This chapter describes the differences between data and information, and how these relate to most business activities. We then consider the nature of the data asset and the generic life cycles of data and explain what is meant by the term ‘data quality’. Finally, we introduce the objectives of data quality management.

WHAT ARE DATA?

Before going much further, there are some key terms and concepts that need to be defined and clarified to help ensure consistent understanding as you read this book.
The title of this book is Managing Data Quality, and, because they so often appear together when discussing the impact of computer technology on organisations, there are three important relevant terms that need to be clarified: data, information and knowledge.
When you have more than one data professional in a room, it is likely that there will be fierce debate about these terms. Even the ISO Online Browsing Platform1 (a place where all ISO definitions are gathered together) has numerous different definitions for these terms.
As the subject of this book is data, we can establish a solid foundation for our understanding by referring to the definition for data in ISO 8000-2:
Data: ‘reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing’.
In the case of definitions for information and knowledge, making a choice is more controversial, not least because potential definitions often use the other two terms and any single collection of definitions becomes recursive. However, we believe the following key observations provide sufficient understanding to read the remainder of this book (while we leave more detailed discussion to others):
  • Use of the term ‘information’ suggests richness of meaning, and is typically taking an end-user view of the value of data to organisations to enable decision making.
  • Use of the term ‘knowledge’ suggests an understanding acquired through experience or education, putting knowledge outside the scope of this book; for example, it doesn’t matter how many books you read about cycling, it is only when you have ridden a bike that you have knowledge of how to cycle!
Another complication is use of the terms ‘structured data’ and ‘unstructured data’. These terms have been a handy tool for marketing teams who are promoting particular software functionality (typically to extract meaning from unstructured data), but the two terms hide the reality that no data set in digital form is either fully structured or fully unstructured.
Structured data contain explicit, discrete elements (e.g. the tables, columns and keys within a relational database or the tags within an XML file) to represent meaning. These elements enable automation to generate insight and foresight from the meaning (e.g. being able to identify all the children in a hospital database by filtering the rows where age is less than 18).
Unstructured data are fundamentally text and images, which provide meaning in a way that requires either human expertise or artificial intelligence methods to process the meaning (e.g. a doctor reviews the medical scan that is the content of an image file).
In these examples, though, the database will typically also include unstructured elements (e.g. a free-text field to capture observational notes) and the digital file of the MRI scan will also include structured data in the form of metadata (e.g. the creation date) to support management of all the images.
Furthermore, a spreadsheet is essentially semi-structured, sitting somewhere between a database and an image file, because the rows and columns provide some structure but without the full richness of a relational database or an XML file.
In summary, no data set is ever entirely structured or unstructured. Structure is definitely important to data quality, though, because it captures a more precise, controllable set of requirements for the data. Requirements for unstructured data are less easy to enforce by definitive, repeatable computer-based algorithms.

Data as part of business activities

Any business activity should support the strategy of the organisation (and may have some part to play in developing this strategy). There should be governance in place to ensure that there is suitable senior or executive control and monitoring of this activity. Business activity in this context is not just applicable to commercial organisations, but refers to the activity by which any organisation delivers its core mission. Figure 1.1 illustrates this relationship.
Figure 1.1 The components of a business activity
Image
The four core components of a typical business activity are:
  • The process, which defines the individual steps to be undertaken and, importantly, should ensure that the end-to-end process is effective in delivering the desired outcomes.
  • Data, which include inputs to and outputs from the process, and flows through it.
  • Software and hardware systems, which automate the process by storing and manipulating the data, although not every process will be automated by software.
  • People, who are the ‘actors’ in the process, undertaking key process steps and ensuring suitable organisational outcomes.
Despite data being a key enabler for any process, in many organisations there is a greater management focus on the technology elements, particularly when undertaking business change projects involving software. The software product is likely to be expensive, have a recognised name and be a core part of the project, therefore leading to much attention.
In typical situations, however, the data that will be used to enable the technology to deliver the required outcomes are the data in one or more existing software systems. These data will need to be migrated to the new software tool, but the data migration process is typically a high-risk part of the overall project and, if not undertaken correctly, will actually degrade the quality of the data.
If the quality of existing data is perceived to be poor then no matter how good a new software tool is, and how well it has been implemented, the outcomes of the system will be limited by the quality of the data. This poor quality data can mean that data migration is far more challenging and expensive, and may not even be feasible at all.
We have come across instances where an organisation has been using a spreadsheet-based performance dashboard. Concerns about the quality and integrity of the outputs from this triggered these organisations to spend significant money implementing a ‘best of breed’ analysis and dashboard tool to deliver performance dashboards. However, the data sources were not changed and thus, although the outputs looked far more impressive, the data quality was the same, leading to a false perception of the reliability of the performance indicators shown by the dashboard.
Also, don’t forget that at some point in the future the ‘new’ software tool will be replaced by another tool. Where will the data come from for this even newer software? Well, it will be the data that you currently have (which in turn has been migrated from several different previous systems). This means that out of the four components of the business activity, the one that lasts the longest and will have a massive effect on outputs is the data.

Data are an asset

Data are being created at a faster rate than ever before (however conservatively you forecast future data growth) and data are now more important than they have ever been. As the world becomes a more data-driven place, smart businesses can gain competitive advantages by exploiting data more effectively. This vast data explosion brings newer, different challenges to businesses; it is one thing to store lots of data, but the benefits will only come if the data are of suitable quality and reach the right people at the right time in order to deliver better organisational outcomes. A mindset of treating data as an asset will help your organisation to achieve this.
Many larger organisations, such as those in the utilities and transport sectors, are developing management systems that provide more effective and sustainable management of their assets and activities. Managing data requires a similar mindset.
An asset is a resource with value that can deliver benefit to an organisation. Data, therefore, warrant being treated in the same way as a physical asset. Like physical assets, data:
  • can have high value for your organisation;
  • can be assessed for quality;
  • can drive up business performance and safety by enabling better informed decisions;
  • have legal or regulatory requirements to be managed effectively;
  • have a life cycle – from conception, to capture, to operation and renewal;
  • can increase business costs if not managed effectively (and therefore reduce efficiency and profitability).
Unlike physical assets, data support strategic decision making; get this wrong and you will end up making incorrect, potentially expensive, decisions that could have long-term impact for the organisation. Also, unlike physical assets, when the data...

Inhaltsverzeichnis