Computer Science

Data Mining

Data mining is the process of discovering patterns and extracting useful information from large datasets. It involves using various techniques such as machine learning, statistical analysis, and database systems to uncover hidden insights and make predictions. Data mining is widely used in areas such as business intelligence, marketing, and scientific research to gain valuable knowledge from complex data.

Written by Perlego with AI-assistance

8 Key excerpts on "Data Mining"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Process Mining Techniques for Pattern Recognition
    eBook - ePub
    • Vikash Yadav, Anil Kumar Dubey, Harivans Pratap Singh, Gaurav Dubey, Erma Suryani, Vikash Yadav, Anil Kumar Dubey, Harivans Pratap Singh, Gaurav Dubey, Erma Suryani(Authors)
    • 2022(Publication Date)
    • CRC Press
      (Publisher)

    ...Data Mining involves pattern recognition technologies, along with statistical as well as mathematical schemes [ 2 ]. Definition provided by SAS Institute: Data Mining is a methodology to discover anomalies, patterns as well as correlations among huge datasets for estimating outcomes/results. With a utilization of many schemes, this information may be further utilized to magnify revenues, cost cutting, for improvement of customer relationships, to minimize risks/dangers and so on [ 3 ]. Data Mining may be considered as the computational methodology to identify patterns in huge data sets by including techniques of machine learning, statistics and artificial intelligence as well as database systems [ 4 ]. Data Mining techniques as well as algorithms are composed of Bayesian networks, cluster analysis, nearest neighbor strategy, artificial neural networks, i.e. ANN, data visualization strategies, genetic algorithms as well as evolutionary programming, decision trees, support vector machine, decision trees, regression analysis, symbolic rules, linear regression and so on. Mainly mathematical models are the main constituent in analysis with the Data Mining concept. There is a possibility to utilize these strategies further to resolve different concrete issues as a result of suitable presence of software as well as hardware. FIGURE 1.1 Data Mining strategies along with allied scientific fields. FIGURE 1.2 Data Mining steps. 2.2 B ASIC R ULES AND T ECHNIQUES FOR P ROCESS M INING Research in process mining is in its early stage at the present time...

  • A-Z of Digital Research Methods
    • Catherine Dawson(Author)
    • 2019(Publication Date)
    • Routledge
      (Publisher)

    ...CHAPTER 12 Data Mining Overview Data Mining is the process that is used to turn raw data into useful information that will help to uncover hidden patterns, relationships and trends. It is a data-driven technique in which data are examined to determine which variables and their values are important and understand how variables are related to each other. It involves applying algorithms to the extraction of hidden information with the aim of building an effective predictive or descriptive model of data for explanation and/or generalisation. The focus is on data sourcing, pre-processing, data warehousing, data transformation, aggregation and statistical modelling. The availability of big data (extremely large and complex datasets, some of which are free to use, re-use, build on and redistribute, subject to stated conditions and licence: see Chapter 3) and technological advancement in software and tools (see below), has led to rapid growth in the use of Data Mining as a research method. There are various sub-categories of Data Mining, including (in alphabetical order): Distributed Data Mining that involves mining distributed data with the intention of obtaining global knowledge from local data at distributed sites. Zeng et al. (2012) provide a useful overview of distributed Data Mining. Educational Data Mining (EDM) that is used to observe how people learn and how they behave when learning, without disturbing their learning (see Chapter 17). It enables researchers to answer educational research questions, develop theory and better support learners. Lee (2019) provides a good example of how EDM is used in research and Chapter 24 discusses learning analytics, which is closely related to EDM. Ethno-mining that enables researchers to study human behaviour and culture through combining ethnography with Data Mining tools and techniques (see Chapter 18). Link mining that applies Data Mining techniques to linkage data...

  • Predictive Analytics for Marketers
    eBook - ePub

    Predictive Analytics for Marketers

    Using Data Mining for Business Advantage

    • Barry Leventhal(Author)
    • 2018(Publication Date)
    • Kogan Page
      (Publisher)

    ...This process of converting data into useful information is known as Data Mining. More formally, Data Mining can be thought of as a process of discovering and interpreting patterns in data to solve business problems. The data-mining process converts data into information in the sense of creating model predictions that you can put into action. Data Mining is often associated with large amounts of data, containing millions of records and thousands of attributes held for each record. Sophisticated software is necessary to identify meaningful relationships and harness them to create analytical models. The related term ‘knowledge discovery’ (KD) is sometimes also used – knowledge discovery is the process of discovering potentially useful information from a collection of data. KD encompasses all types of data and does not necessarily involve analytical modelling, while Data Mining is taken to include creating predictive or descriptive models. Frawley et al (1992) present an overview of the knowledge discovery process, approaches to KD and associated issues. While we are going through definitions, another term you will sometimes meet is ‘business intelligence’. Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyse an organization’s raw data. As a discipline, BI consists of several related activities, including database querying, reporting tools and Data Mining. It tends to be mainly used by IT departments, in the context of a company’s analytical systems. Who are the stakeholders? Data Mining invariably implies a team effort because it involves different functions in your company – typically business operations, analytics and IT...

  • Business Analytics
    eBook - ePub

    Business Analytics

    An Introduction

    ...When someone was called a data miner, it was meant to be a derogatory term applied to a person who tortured data until it told the preconceived story the researcher wanted to tell. Data Mining today concerns analyzing databases, data warehouses, and data marts that already exist for the purpose of solving some problem, discovering new relationships, or to answer some pressing question. Data Mining is the extraction of useful information from large databases ; it is about extracting knowledge or information from large amounts of data. ‡ Data Mining has come to be referenced by a few similar terms; in most cases they all refer to much the same set of techniques that we refer to as Data Mining in this chapter: Exploratory data analysis Business intelligence Data driven discovery Knowledge Discovery in Databases (KDD) Data Mining is quite separate from database management. Keogh points out that in database management, queries are well defined; we even have a language to write these queries (Structured Query Language, or SQL, pronounced as “sequel”). A query in database management might take the form of “find all the customers in South Bend,” or “find all the customers who have missed a recent payment.” Data Mining, however, uses very different queries; they tend to be less structured and are sometimes quite vague. For example: “Find all the customers likely to purchase recreational vehicle insurance in the next six months,” or “group all the customers with similar buying habits.” In one sense, Data Mining is like statistical forecasting in that we are forward looking in an attempt to obtain better information about future likely events. We could probably consider Data Mining an extension, or an advanced form, of OLAP. That would probably be incorrect. Both Data Mining and OLAP look at large amounts of data; it is not the absolute size of the data, however, that distinguishes one from the other...

  • Big Data Mining and Complexity

    ...4 What Is Data Mining? Chapter Overview A bit of Data Mining history 34 The Data Mining process 35 The ‘black box’ of Data Mining 36 Validity and reliability 38 Further Reading 41 If big data is all about the development and emergence of networks of complex data sets (global or otherwise), then Data Mining is all about pattern recognition and knowledge extraction from these complex data sets. Or, that is at least how it is defined today. A bit of Data Mining history Back in the late 1980s and early 1990s, when Data Mining first emerged, it was defined as a complete (and entirely circular) approach to data management and analysis, from data collection and preparation to pattern recognition and knowledge extraction to further data collection and analysis and so forth. For example, as shown in Figure 4.1, the actual analysis of data (which is what Data Mining amounts to today) originally only involved two of Data Mining’s eight major steps, or three if you count the formulation of questions. Figure 4.1 Data Mining steps When Data Mining first emerged, it also had a slightly different name, which was much more explicit about the fact that Data Mining is a process. It was called knowledge discovery in databases, shortened to KDD. Later, as the field developed, the acronym was dropped and the term Data Mining was used instead. The other reason KDD was used initially is because, in the 1980s – when it was always argued that research should be guided by rigorous hypotheses and concise research questions – Data Mining had a rather negative connotation, akin to other related terms such as data fishing or data dredging, both of which conjured images of untrustworthy and unethical researchers combing through their databases to find something (including anything) of significance...

  • Healthcare Fraud
    eBook - ePub

    Healthcare Fraud

    Auditing and Detection Guide

    • Rebecca S. Busch(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)

    ...N., Introduction to Data Mining and its applications [2006]) Data Mining, also known as knowledge discovery in databases (KDD), has been defined as “the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.” (Frawley, W., Piatetsky-Shapiro, G., and Matheus, C., “Knowledge Discovery in Databases: An Overview,” AI [Fall 1992], pp. 213–228) “Data Mining is the process of extracting knowledge hidden from large volumes of raw data.” (www.megaputer.com/data_mining.php) Data Mining in Healthcare Data Mining is the extraction of hidden predictive information and hidden patterns of actual occurrences from large sets of multiple databases. The technology-based audit tool of Data Mining allows the opportunity to encapsulate data from a highly segmented and fragmented marketplace. It creates proactive decision-making tools from clinical (HIP), financial (ARP), operational (OFA), product (PMA), service (SMA), and consumer (CMA) perspectives. It gives HCC model players (P-HCC, S-HCC, I-HCC, C-HCC, T-HCC, R-HCC) the potential to collect and design comprehensive data warehouses through the appropriate mapping function. The design of algorithms applied to a set of data impacts the ability and quality of intelligence that is derived from that data. The information may provide insight on future trends, behaviors, and intelligence-based pipeline decisions. The retrospective analysis tools also help in designing an audit and detecting anomalies. This, in turn, facilitates the ability to implement appropriate internal controls within all pipelines among all HCC Model (P-HCC, S-HCC, I-HCC, C-HCC, T-HCC, R-HCC) players. Components of the Data Mining Process within the HCC Model (P-HCC, S-HCC, I-HCC, C-HCC, T-HCC, R-HCC) Players Once the data map of a market player has been defined and then created, a segmented Data Mining process should be initiated. This may be followed by aggregating data among one or more market players...

  • It's All Analytics!
    eBook - ePub

    It's All Analytics!

    The Foundations of Al, Big Data and Data Science Landscape for Professionals in Healthcare, Business, and Government

    ...However, the abilities of AI to change our world are rapidly increasing in scope and capacity (Miner et al., 2019). There is another reason for the rise in AI research and usage in the last 10 years – computing power. If you look at the history of AI, there was conception in the 1950s, a big spike in the 1980s when computers became more powerful and now in the last ten years we have tremendous computing power and tons of data! More on AI in our next chapter. Data Mining vs Data Dredging (Data Fishing or Data Snooping) Years ago, there was resistance by certain camps, especially statisticians, against Data Mining. The misunderstanding arose from the objective of the process. It is true that if you look at a large dataset you can find relationships that are statistically significant, but are purely by chance (see an example of “Causation vs Correlation” in Chapter 4). Some researchers were misusing the process; they were data dredging by looking at hundreds of relationships, finding one that was statistically significant and reporting the results with the associated statistics – claiming that they had found some new insight. Your 2nd author (Gary Miner) was attending the JSM (Joint Statistical Meetings) annual meeting in Canada in the year 2000, and attended the President’s Banquet and talk one evening during that week. The President got up to give his talk and started by saying: You statisticians are going to have to do an “about face” – in this new century we are in a new age of “Modern Data Analysis – Predictive Analytics – Data Mining” and unless those of you who earn a living as “consulting statisticians” make changes you will soon lose your clients!!! That made me take notice, as I had already committed myself to putting my efforts into Data Mining and text mining...

  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...Walter L. Leite Walter L. Leite Leite, Walter L. Zachary K. Collier Zachary K. Collier Collier, Zachary K. Data Mining Data Mining 458 461 Data Mining Data Mining is a series of methods that aim to discover knowledge from data by applying algorithms. The algorithms for Data Mining are very diverse, depending on their intended objectives and the computational demand of the problem. Data Mining methods have been developed at the intersection of the academic areas of statistics and computer science. Data Mining methods can also be classified broadly into supervised and unsupervised learning. In this entry, methods for supervised learning used for prediction are reviewed first, followed by methods for unsupervised learning. Supervised learning consists of methods applicable to data in which there is an outcome that can be used to determine whether the learning process was successful. The outcome is also commonly referred to as a dependent variable or response variable. Supervised learning methods can be used for prediction and learning about relationships between predictors and the outcome. Examples of methods of supervised learning include generalized linear models, classification and regression trees, random forests, and neural networks (NNs). Methods of supervised learning have found several applications in educational research, such as identifying students at risk of failing to reach achievement milestones or identifying the effects of educational interventions...