Computer Science
Data Quality
Data quality refers to the accuracy, completeness, consistency, and reliability of data. In computer science, it is crucial for ensuring that data is suitable for its intended use, analysis, and decision-making. High data quality is essential for effective data-driven applications and systems.
Written by Perlego with AI-assistance
Related key terms
1 of 5
11 Key excerpts on "Data Quality"
- eBook - ePub
Quality in the Era of Industry 4.0
Integrating Tradition and Innovation in the Age of Data and AI
- Kai Yang(Author)
- 2023(Publication Date)
- Wiley(Publisher)
This often encompasses factors such as accuracy, completeness, consistency, reliability, timeliness, and relevancy. Accuracy refers to whether the data is correct and free from errors. Completeness is about whether all necessary data is present. Consistency pertains to whether data is uniform across different data sets, while reliability considers whether the data is trustworthy and dependable. Timeliness addresses whether the data is up‐to‐date and available when needed, and relevancy examines whether the data fits the needs of the current context or task. In Section 6.3, I will give some detailed information about how to measure the Data Quality quantitatively, given the actual data set. High‐quality data is not just about having error‐free data; it is about having the right data in the right form at the right time for the right purpose. Ensuring Data Quality, therefore, is a multi‐faceted process requiring continuous monitoring and improvement to adapt to changing needs and circumstances. 6.2.2 Categories of Data Data Quality study generally encompasses various types of data. Here is a list of key data categories that can be involved: Metadata : This includes data that provides information about other data. For instance, it might indicate when and by whom a particular set of data was collected, and for what purpose. Example 6.1 Metadata Metadata is often referred to as “data about data.” It provides information about a certain set of data including its means of creation, purpose, time and date of creation, creator or author, location on a computer network where the data was created, and standards used. For instance, consider a photograph taken with a digital camera. The photograph itself (the digital file) is the primary data, but the digital camera also records metadata about the photograph - eBook - PDF
Data Governance
Creating Value from Information Assets
- Neera Bhansali(Author)
- 2013(Publication Date)
- Auerbach Publications(Publisher)
91 References .......................................................................................................... 92 66 • Julia Zhang characteristics of an entity that bear on its ability to satisfy stated and implied needs.” Therefore, we can define Data Quality as the satisfaction of the requirements stated in a particular specification, which reflects the implied needs of the user. An acceptable level of quality has been achieved if the data conforms to a defined specification that correctly reflects the intended use. Data Quality is reflected by multiple factors, such as the time and the way of data collection, the formats, and the types of data collected. High Data Quality should contain many dimensions like accuracy, com-pleteness, integrity, consistency, timeliness, and traceability. As indicated in Figure 4.1, Data Quality is demonstrated by multiple data dimensions. Each data dimension will be addressed below. Data Accuracy: Accuracy of data is the degree to which data correctly reflects the real world object or verifiable sources. All data values should be within the value domains specified by the business. In many cases, accu-racy is measured by how the values agree with an identified source of cor-rect information. The measurable characteristics of accuracy can include value precision (each value conforms to the defined level of precision), value acceptance (each value belongs to the allowed set of values for the observed attributes), and value accuracy (each data value is correct when assessed against a system of records). Data Completeness: Completeness of data is the extent to which the expected attributes of data are provided. - eBook - PDF
Data Quality
Dimensions, Measurement, Strategy, Management, and Governance
- Rupa Mahanti(Author)
- 2019(Publication Date)
- ASQ Quality Press(Publisher)
The quality of master data impacts the quality of transactional data. The quality of metadata impacts the quality of master data, transactional data, reference data, and historical data. Data Quality: AN OVERVIEW Data Quality is the capability of data to satisfy the stated business, system, and technical requirements of an enterprise. Data Quality is an insight into or an evaluation of data’s fitness to serve their purpose in a given context. Data Quality is accomplished when a business uses data that are complete, relevant, and timely. The general definition of Data Quality is “fitness for use,” or more specifically, to what extent some data successfully serve the purposes of the user (Tayi and Ballou 1998; Cappiello et al. 2003; Lederman et al. 2003; Watts et al. 2009). From a business perspective, Data Quality is all about whether the data meet the needs of the information consumer (Scarisbrick-Hauser and Rouse 2007). Redman (2001) comes to the following definition based on Joseph Juran (Juran and Godfrey 1999): “Data are of high quality if they are fit for their intended uses in opera- tions, decision-making, and planning. Data are fit for use if they are free of defects and possess desired features.” Table 1.1 illustrates the desired characteristics for data that make them fit for use. From an assessment perspective, Data Quality has two aspects: intrinsic Data Quality and contextual Data Quality. Intrinsic Data Quality is based on the data elements them- selves, independent of the context in which they are used. Examples include the accuracy, representation, and accessibility of the data (Fisher et al. 2003, Strong et al. 1997, Jarke et al. 2000). For example, data elements such as age, salary, and product dimensions should have a numerical value and cannot be less than zero; the customer name should be spelled correctly. - eBook - PDF
Quality in the Era of Industry 4.0
Integrating Tradition and Innovation in the Age of Data and AI
- Kai Yang(Author)
- 2023(Publication Date)
- Wiley(Publisher)
Each of these categories has unique characteristics and applications, and thus, their quality needs to be assessed in accordance with their intended purpose and use. These data categories will be described in detail later. Metadata, for instance, provides information about other data, such as when and by whom the data was collected and for what purpose. Transactional data, on the other hand, involves the data generated during transactions like customer purchase history or web browsing history. Master data refers to the core data about an organization’s business entities, while reference data is used to categorize other data. Each of these categories, and others not listed here, are integral to the overall data ecosystem of an organization and have specific Data Quality requirements that need to be addressed. 6.2.1.2 Definition of Data Quality Data Quality refers to the degree to which a set of data fulfills the requirements of its intended use. This often encompasses factors such as accuracy, completeness, consistency, reliability, timeliness, and rel- evancy. Accuracy refers to whether the data is correct and free from errors. Completeness is about whether all necessary data is present. Consistency pertains to whether data is uniform across different data sets, while reliability considers whether the data is trustworthy and dependable. Timeliness addresses whether the data is up-to-date and available when needed, and relevancy examines whether 6.2 DaD Dan DaD QDalat 201 the data fits the needs of the current context or task. In Section 6.3, I will give some detailed informa- tion about how to measure the Data Quality quantitatively, given the actual data set. High-quality data is not just about having error-free data; it is about having the right data in the right form at the right time for the right purpose. Ensuring Data Quality, therefore, is a multi- faceted process requiring continuous monitoring and improvement to adapt to changing needs and circumstances. - eBook - PDF
Data Quality
The Accuracy Dimension
- Jack E. Olson(Author)
- 2003(Publication Date)
- Morgan Kaufmann(Publisher)
With proper attention, great returns can be realized through improve- ments in the quality of data. The primary value to the corporation for getting their information sys- tems into a state of high Data Quality and maintaining them there is that it gives them the ability to quickly and efficiently respond to new business model changes. This alone will justify Data Quality assurance initiatives many times over. Data Quality assurance initiatives are becoming more popular as organiza- tions are realizing the impact that improving quality can have on the bottom line. The body of qualified experts, educational information, methodologies, and software tools supporting these initiatives is increasing daily. Corpora- tions are searching for the right mix of tools, organization, and methodologies that will give them the best advantage in such programs. Data accuracy is the foundation of Data Quality. You must get the values right first. The remainder of this book focuses on data accuracy: what it means, what is possible, methods for improving the accuracy of data, and the return you can expect for instituting data accuracy assurance programs. C ItAPTER 2 Definition of Accurate Data To begin the discussion of data accuracy, it is important to first establish where accuracy fits into the larger picture of Data Quality. 2.1 Data Quality Definitions Data Quality is defined as follows: data has quality if it satisfies the require- ments of its intended use. It lacks quality to the extent that it does not satisfy the requirement. In other words, Data Quality depends as much on the intended use as it does on the data itself. To satisfy the intended use, the data must be accurate, timely, relevant, complete, understood, and trusted. Some examples will help in understanding the notion of Data Quality in the context of intended use. The sections that follow explore examples of the pre- viously mentioned aspects of data integrity. - Chee-yong Chan, Sanjay Chawla, Shazia W Zhou(Authors)
- 2009(Publication Date)
- World Scientific(Publisher)
As an automatic quality rating solution our approach is distinguished , especially for large scale datasets.Theory and experiment shows our approach performs well for quality rating. ∗ The work was partially supported by the National Natural Science Foundation of China under Grant No. 60573164 , and by SRF for ROCS, SEM Assessing Data Quality Within Available Context 43 1. Introduction Data Quality is a problem as follows. Given a set of data sources which describe the same entities, we are to give each record or each data source in the data sources a numerical score which indicates how good of that record or data source with regard to its corresponding entity under certain measurements such as accuracy and completeness. This problem is a funda-mental problem in a number of domains, including data integration, data warehouse, and distributed databases. Solving the Data Quality problem in a Cooperative Information System (CIS) is a pressing concern. 3,13 A CIS is an information system that inter-connects an amount of data sources. These data sources are independently developed and represent the same entities. 9 For example,in an e-government scenario public administrations,citizen bureaus and enterprises all have the data related to citizens. In such systems, it is important to figure out which record or data source is good or not in that the data sources are indepen-dently built and many unintentional errors such as typos and misspellings often occurred in the development. The goodness of a data source can be measured in terms of many dimensions, including accuracy, completeness, consistency, minimality and timeliness. 1,10 In this paper, we focus on the accuracy and completeness since they are the most importance measure-ments. A possible way to solve above problem is to invite a human expert to identify the perfect representation for each entity.- eBook - PDF
- Suad Kunosic, Enver Zerem, Suad Kunosic, Enver Zerem(Authors)
- 2019(Publication Date)
- IntechOpen(Publisher)
Such data systems are created accord-ing to specific goals and use purposes of individual organizations, which reflects their specific nature and the surrounding context in which they operate. However, over time these data systems, institutions as well as the research ecosystem at large have evolved, thereby potentially threatening the quality of the collected data and the resulting data analyses, particularly if no formal Data Quality management policy is being implemented. This chapter introduces the readers into the concept of Data Quality and provides methods to assess and improve Data Quality, in order to obtain data that can be used as a reliable source for quantitative and qualitative measurements of research. 2. Definition of Data Quality In general, data can be considered of high quality if the data is fit to serve a purpose in a given context, for example, in operations, decision making and/or planning [1]. Although this definition of Data Quality seems to be straightforward, many other definitions exist that differ in terms of the qualitative or quantitative approach towards defining the concept of Data Quality. Scientometrics Recent Advances 4 2.1 Qualitative approach In the qualitative approach, specific attention is drawn to defining Data Quality in terms of the different aspects, also termed dimensions. In 1996, Wang and Strong developed a Data Quality framework based on a two-stage survey on Data Quality aspects important to data consumers, and captured these dimensions in a hierarchi-cal manner [2]. This model clusters 20 different Data Quality dimensions into four major categories: that is, intrinsic, contextual, representational and access Data Quality. Although the basis of this model still stands, some minor changes have been made over the years resulting in the model depicted in Table 1 [3]. - eBook - ePub
Business Intelligence
The Savvy Manager's Guide
- David Loshin(Author)
- 2012(Publication Date)
- Morgan Kaufmann(Publisher)
Numerous attempts have been made to assign some objective cost to data errors, usually through the presumption of increased costs. Many of these “ROI calculators” are founded on commercial costs of poor Data Quality related to incorrect names and addresses. Although maintaining high-quality identifying information is critical, there are many types of data errors with different types of information that can impact all sorts of business processes leading to increased costs as well as decreased revenues, increased risk, or decreased productivity. In other words, Data Quality is more than just names and addresses.In Chapter 2 , we discussed the value drivers for an organization, and how those guided the development of relevant performance metrics. We can similarly consider how data errors can impact both our perception of business performance (based on incorrect results) as well as the performance itself (when data flaws impair business process success).That suggests that the question of Data Quality is not one of standardized names and addresses, but rather of fitness for the potentially many purposes to which that data will be put to use. In practicality, almost everyone has a different understanding of what Data Quality is; each definition is geared toward the individual’s view of what is fit and what is not. This leads to the conclusion that there is no hard and fast definition of Data Quality, nor can there be a single source of “truth.” Rather, Data Quality is defined in terms of how each data consumer wishes to use the data, and to this end we must discuss some dimensions across which Data Quality can be measured.Dimensions of Data Quality
A dimension of Data Quality effectively describes a method for quantifying the level of compliance with defined expectations. There are many possible dimensions of Data Quality that are relevant within particular business activities, and this is related to the ways that the data will be used downstream. Some of the more typical dimensions include:Completeness - eBook - ePub
Information Technology and Data in Healthcare
Using and Understanding Data
- David Hartzband(Author)
- 2019(Publication Date)
- Productivity Press(Publisher)
Chapter 4Data Quality
DOI: 10.4324/9780429061219-4What Is Data Quality?
The ISO 9000:2015* definition of Data Quality would be “Data Quality can be defined as the degree to which a set of characteristics of data fulfills requirements.”† Remember that data means facts that are verifiable , so how does this definition align with the idea of quality. Let’s explore the idea of quality first.*https://en.wikipedia.org/wiki/ISO_9000†https://en.wikipedia.org/wiki/Data_qualityThere are many attempts to determine the one list of data characteristics that should be used in this definition of quality. The one I have found to work best for healthcare data, and that I am currently using, is from Cai and Zhu.‡ They propose five dimensions of Data Quality and characteristics for each dimension. The dimensions they describe are availability, usability, reliability, relevance, and presentation quality. For instance, if we look at the dimension of reliability, we see that the proposed characteristics are: accuracy, integrity, consistency, completeness, and auditability. These meet our sniff test (or at least mine) of how reliability might be characterized. If we then had definitions for how to measure each of these characteristics, we could produce an integrated measure of reliability for a specific data set. (See Figure 4.1 .)‡L. Cai and Y. Zhu, 2015, The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal , 14:2: 1–10. doi:http://dx.doi.org/10.5334/dsj-2015-002Figure 4.1 Data Quality framework.I deliberately chose reliability as this first example because each of the characteristics proposed can be measured (with the possible exception of integrity), that is, each can produce facts as a consequence of examination. Not all of Cai and Zhu’s dimensions are so clean in this manner, but I’ll take that up shortly. The question now becomes, “how are we to produce and evaluate these facts?” Let’s start with reliability … - eBook - PDF
- Wenfei Fan, Floris Geerts(Authors)
- 2022(Publication Date)
- Springer(Publisher)
Indeed, the market for Data Quality tools is growing at 16% annually, way above the 7% average forecast for other IT segments [Gartner, 2011]. As an example, Data Quality tools deliver “an overall business value of more than 600 million GBP” each year at British Telecom [Otto and Weber, 2009]. Data Quality management is also a critical part of big data management, master data management (MDM) [Loshin, 2009], customer relationship management (CRM), enterprise resource planning (ERP), and supply chain management (SCM), among other things. 1.2 CENTRAL ISSUES OF Data Quality We highlight five central issues in connection with Data Quality, namely, data consistency, data deduplication, data accuracy, information completeness, and data currency. 1.2.1 DATA CONSISTENCY Data consistency refers to the validity and integrity of data representing real-world entities. It aims to detect inconsistencies or conflicts in the data. In a relational database, inconsistencies may exist within a single tuple, between different tuples in the same relation (table), and between tuples across different relations. As an example, consider tuples t 1 , t 2 , and t 3 in Figure 1.1.There are discrepancies and conflicts within each of these tuples, as well as inconsistencies between different tuples. (1) It is known that in the UK (when CC = 44), if the area code is 131, then the city should be Edinburgh (EDI). In tuple t 1 , however, CC = 44 and AC = 131, but city = EDI. That is, there exist inconsistencies between the values of the CC, AC, and city attributes of t 1 ; similarly for tuple t 2 . These tell us that tuples t 1 and t 2 are erroneous. (2) Similarly, in the U.S. (CC = 01), if the area code is 908, the city should be Murray Hill (MH). Nevertheless, CC = 01 and AC = 908 in tuple t 3 , whereas its city is not MH. This indicates that tuple t 3 is not quite correct. (3) It is also known that in the UK, zip code uniquely determines street. - W.H. Inmon, Bonnie O'Neil, Lowell Fryman(Authors)
- 2010(Publication Date)
- Morgan Kaufmann(Publisher)
If you consider data to be raw facts, it makes more sense to discuss information qual-ity than Data Quality. Information takes into account the context of data, and quality makes sense only in regards to a context. Remember our discussion in prior chapters, especially Chapter 1, about the number 7. Without any context, it is just a number, and a number by itself has no quality associated with it. Therefore, although “Data Quality” is a common term used in the discipline of data management, this chapter will hereafter refer to “information quality.” Here is IAIDQ’s definition of information quality (three separate definitions are given, one having two components): Information quality: (1) Consistently meeting all knowledge worker and end-customer expectations in all quality characteristics of the information products and services required to accomplish the enterprise mission (internal knowledge worker) or personal objectives (end customer). (2) The degree to which information consistently meets the requirements and expectations of all knowledge workers who require it to perform their processes. (Larry English, noted data and information quality expert and author) Information Quality: The fitness for use of information; information that meets the requirements of its authors, users, and administrators. (Martin Eppler, co-founder of IAIDQ)(IAIDQ, December 25, 2006) The notion of “meeting business requirements” is common to all three of these definitions. The business ultimately defines the information quality requirements. 10.3 Information Quality as Business Metadata If the purpose of information quality is to meet business expectations, then it is important that the business should be kept apprised of information quality and Chapter 10 Data and Information Quality as Business Metadata 178 its progress toward meeting the requirements set by the business. Therefore, there is an ongoing obligation of information delivery.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.










