Computer Science

Data Quality

Data quality refers to the accuracy, completeness, consistency, and reliability of data. In computer science, it is crucial for ensuring that data is suitable for its intended use, analysis, and decision-making. High data quality is essential for effective data-driven applications and systems.

Written by Perlego with AI-assistance

7 Key excerpts on "Data Quality"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Business Analysis for Business Intelligence

    ...Data Quality for BI purposes is defined and gauged with reference to fitness for purpose as defined by the analytical use of the data and complying with three levels of Data Quality, as defined by: [Level 1] database administrators [Level 2] data warehouse architects [Level 3] business intelligence analysts On level 1, Data Quality is narrowed down to data integrity or the degree to which the attributes of an instance describe the instance accurately and whether the attributes are valid, that is, comply with defined ranges or definitions managed by the business users. This definition remains very close to the transaction view. On level 2, Data Quality is expressed as the percentage completeness and correctness of the analytical perspectives. In other words, to what degree is each dimension, each fact table, complete enough to produce significant information for analytical purposes? Issues such as sparsity and spreads in the data values are harder to tackle. Timeliness and consistency need to be controlled and managed on the data warehouse level. On level 3, Data Quality is the measure in which the available data are capable of adequately answering the business questions. Some use the criterion of accessibility with regard to the usability and clarity of the data. Although this seems a somewhat vague definition, it is most relevant to anyone with some analytical mileage on his odometer. I remember a vast data-mining project in a mail-order company producing the following astonishing result: 99.9% of all dresses sold were bought by women! Although there is no 100% Data Quality possible on this planet and although we defined the fit-for-purpose quality approach as the leading criterion, this does not dismiss us from striving toward the optimum solution, namely the breakeven point between Data Quality prevention costs and the cost of poor Data Quality. ROI APPROACH TO Data Quality Data Quality does not come cheap...

  • Marketing Value Metrics
    eBook - ePub

    Marketing Value Metrics

    A New Metrics Model to Measure Marketing Effectiveness

    • Malcolm McDonald, Peter Mouncey, Stan Maklan(Authors)
    • 2014(Publication Date)
    • Kogan Page
      (Publisher)

    ...Where data are critical to measuring the performance of key business functions, the board could agree to this responsibility being given to the internal audit function. Defining what is meant by Data Quality is a key issue. ‘Fit for purpose’, rather than absolute quality, should be the aim. For example, some gaps and inaccuracies may be acceptable within a data-set used for modelling, but the standard would need to be much higher where transactions data and records of customer contact history, through all channels, are used in real time to support a service call centre or a self-service website. ‘Fit for purpose’ may also be defined by needs to meet regulatory requirements (eg Basel 2 requirements within financial services organizations) and legal requirements (eg European data protection legislation – keeping data accurate and up to date; meeting subject access requirements; being able to differentiate between SMEs and domestic customers or differentiate personal data from non-personal data held about business contacts, etc; safety legislation, such as being able to contact car owners to recall vehicles to rectify safety defects, etc). According to a survey conducted by Privacy Laws & Business International in 2004, many organizations are failing to take data privacy issues seriously, and QCi (QCi Assessments Ltd, 2002) found that only 37 per cent of the companies they had assessed had adequate plans in place to meet the requirements of the 1998 Act. Finally, ‘fit for purpose’ considerations also apply to the issues affecting the capture of source data and the user situation. For example, the competence of employees involved in the capture of data and those who have access to it needs to be taken into account. Data Quality also covers the need to ensure that critical data items are identified and appropriate strategies are developed to ensure that any deficiencies are addressed...

  • Translating Systems Thinking into Practice
    eBook - ePub

    Translating Systems Thinking into Practice

    A Guide to Developing Incident Reporting Systems

    • Natassia Goode, Paul M. Salmon, Michael Lenne, Caroline Finch(Authors)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)

    ...The implementation trial undertaken to evaluate the prototype UPLOADS is used to practically illustrate the process. 10.2 What Is Data Quality? Data Quality refers to the completeness and validity of recorded data (German et al., 2001). There are five characteristics that are relevant to assessing Data Quality in an incident reporting system: data completeness, positive predictive value, sensitivity, specificity and representativeness (see Table 10.1). These characteristics provide important information about whether the data, and resulting analyses, are accurate and valid reflections of the frequency and causes of incidents within the specific context. Table 10.1 Characteristics of Data Quality in an Incident Reporting System Characteristic Definition Data completeness A consistent amount of data is provided about every reported incident. Positive predictive value Incident reports provide an accurate description of the incident. Sensitivity All relevant incidents that occur are reported. Specificity No irrelevant incidents are reported. Representativeness The incident rates accurately represent how frequently incidents (as defined in the scope) are occurring over time relative to the frequency of exposure. It is important to note that Data Quality is closely tied to: (a) the usability of a system; and (b) the strategies used to ensure that incidents are consistent and accurately reported and analyzed. If an incident reporting system is hard to use, or if it requires a significant amount of time to complete an incident report, then end users will be unlikely to submit incident reports. Similarly, sufficient time, training, and resources must be provided during implementation to enable end users to consistently and accurately report and analyze incidents...

  • Data and Analytics Strategy for Business
    eBook - ePub

    Data and Analytics Strategy for Business

    Unlock Data Assets and Increase Innovation with a Results-Driven Data Strategy

    • Simon Asplen-Taylor(Author)
    • 2022(Publication Date)
    • Kogan Page
      (Publisher)

    ...08 Data Quality From day one, any data project must make Data Quality a priority. When Data Quality improves, so does its effectiveness and the trust put in it. Some improvements are swift, some extremely resource - and time - intensive. KEY CONCEPTS Data Quality Data Quality leader Introduction Data Quality is the yang to governance’s yin. One of the most effective ways to raise Data Quality is, of course, to make sure it is governed correctly. But this is not sufficient. There are other aspects to Data Quality that go far beyond governance. Data Quality A measure of how well data represents the real-world phenomena it describes for the purpose of the business. The four dimensions of quality are that it should be accurate, valid, accessible and timely. It’s not hard to find anecdotes about the problems caused by bad Data Quality. A decade ago, in his book Information Quality Applied, Larry English, an expert on the topic who is the president of a consultancy called Data Quality International, was already able to list 122 business failures due to poor-quality data that have become public (English, 2009). The combined cost to the businesses involved was $1.2 trillion. For example, British Gas. It spent £300 million on an overhaul of its billing systems in 2006 that led to thousands of problems with overbilling; the systems might have been exemplary, but clearly the data in them was not. By 2007 the company was getting three times as many complaints as all the other gas and electricity suppliers in the UK put together. In 2007, managing director Phil Bentley described this as ‘teething problems’, which he said were being sorted out (Daily Mirror, 2007). This was far more than teething. The company wrote off £200 million in 2008 after customer complaints due to overcharging and lost a million of its 17 million customers...

  • Ensuring the Integrity of Electronic Health Records
    eBook - ePub

    Ensuring the Integrity of Electronic Health Records

    The Best Practices for E-Records Compliance

    ...On some attributes, simple examples are incorporated. These examples came from https://medium.com/@merwanehamadi /what-is-data-quality-55b7737f1b6e Data Accuracy Data accuracy refers to whether the data values stored for an object are the correct values. It describes the real-world context it refers to. To be correct, data values must be the right value and must be represented in a consistent and unambiguous form. One of the dimensions in the data accuracy is data reliability. Example: The email address of Paul Database [email protected] Reality [email protected] The data accuracy is designed to decrease the risks of not preserving of content and meaning of the data. It includes the built-in checks for the correct and secure entry and processing of data (9). EU Annex 11 paragraphs associated with data accuracy are 4.8, 6, 7.2, 10, and 11 (10). Data accuracy is an element of a workflow that verifies the correctness of the collected data. During the Project Stage in the system life cycle, these accuracy-related workflows are tested and periodically verified during the Operational Stage in SLC as part of the inputs and outputs (I/Os) verifications (11). Data Auditability The changes to a set of data need to be traceable. The history of updates is important to track what and when data edits were made and by whom. The EU Annex 11 paragraph associated with data audibility is 9 (12). Data Conformity Conformity means the data are following a set of standard data definitions related to data type, size, and format. Example: Name Unsubscribed Paul True John True Sam False This workflow is designed as part of the Project Stage in the SLC and executed during the transformation occurring subsequently when the signals for sensor(s) are captured and the data are in transient mode. During the Project Stage in the SLC, these workflows are tested...

  • Social Research Methods
    eBook - ePub

    Social Research Methods

    Qualitative, Quantitative and Mixed Methods Approaches

    ...In this sense, it could be argued that the data are generated through a form of social construction. This means that the data can be perceived as a product, and just like other types of products, social science data can be of varying quality. We are concerned with ensuring that our data is of the best quality possible. Good quality data are necessary to ensure that the results of the analysis are valid and useful. This chapter will first discuss what is meant by quality of social science data, and what criteria are used to assess the quality. Then the two most important quality criteria, reliability and validity, are examined. Different types of reliability and validity are described, and it is discussed how reliability and validity can be assessed. Finally, it is shown how Data Quality can be improved. The quality of social science data cannot be assessed in an entirely general manner. The quality must be seen in connection with the intended use of the data. The purpose of a set of data is that it should be used to shed light on specific research questions. The quality of the data is higher if it is more suitable for shedding light on these problems. The quality of the same data set can therefore vary depending on the research questions to be examined. Data that could be considered to be of high quality for one type of research question could be of low quality for other types of research question. For example, we can collect data that are very suitable for describing the living conditions of different groups in the population, but these data are not necessarily good enough to explain the differences in living conditions between the different groups. The extent to which data are suitable for answering the research questions to be examined in a particular study will depend on a variety of conditions...

  • Information-Driven Business
    eBook - ePub

    Information-Driven Business

    How to Manage Data and Information for Maximum Advantage

    • Robert Hillard(Author)
    • 2010(Publication Date)
    • Wiley
      (Publisher)

    ...Chapter 13 Information and Data Quality The quality of information is paramount. Ask business executives whether they have enough information to do their job and they will say no. Ask the same executives whether the information they do get is entirely trustworthy and they will also say no. Even more dramatic, consider the success and failure of major information technology initiatives. Systems that have been implemented with poor user interfaces but high-quality content are generally regarded as successful, whereas systems that have failed to correctly migrate data, even with the best user interfaces, are regarded as abject failures. In other words, in both business management and technology implementation, there is a direct causal relationship between the quality of information and successful outcomes. While everyone agrees that Data Quality is important, there is very little that is truly agreed about either measuring or improving Data Quality. The problem seems to relate to a common misunderstanding about how to measure or manage the quality of information. Some of the techniques are sophisticated while others simply require a logical and consistent approach. SPREADSHEETS As you know, yesterday Fannie Mae filed a Form 8-K/A with the SEC amending our third quarter press release to correct computational errors in that release. There were honest mistakes made in a spreadsheet used in the implementation of a new accounting standard. —Jayne Shontell, Fannie Mae Senior Vice President for Investor Relations, 2003 Shontell’s admission of an error of more than $1 billion due to a spreadsheet error is becoming increasingly typical, with research from academics such as Raymond Panko 1 showing that between 20 percent and 40 percent of all spreadsheets contain errors, with up to 90 percent of spreadsheets containing more than 150 rows containing errors...