Computer Science
Data Mining
Data mining is the process of discovering patterns and extracting useful information from large datasets. It involves using various techniques such as machine learning, statistical analysis, and database systems to uncover hidden insights and make predictions. Data mining is widely used in areas such as business intelligence, marketing, and scientific research to gain valuable knowledge from complex data.
Written by Perlego with AI-assistance
Related key terms
1 of 5
12 Key excerpts on "Data Mining"
- Neha Kaul(Author)
- 2019(Publication Date)
- Arcler Press(Publisher)
By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data Mining depends on effective data collection and warehousing as well as computer processing (Staff, 2018) . Data Mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data Mining tools allow enterprises to predict future trends (“What is Data Mining? – Definition from WhatIs.com,” 2018) . Data Mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems (“Data Mining,” 2018). Data Mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of tech-niques, you can use this information to increase revenues, cut costs, im-prove customer relationships, reduce financial and operational risks and more (“What Is Data Mining?,” 2018). Introduction to Data Mining 3 In the process of Data Mining, smart methods are applied on large quantities of data so as to extract useful information from the large sets of data. The final objective is to extract useful data that can be further manipulated and employed for future use. Several methods of Data Mining have been defined so far. The complete detailed process of Data Mining consists of several steps which are discussed in the upcoming sections. The term ‘Data Mining’ is a misnomer, because the real goal of this process is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself (Han et al., 2000). A better name that was proposed for Data Mining is ‘Knowledge mining from data’ as we actually mine/dig up knowledge from large quantities of data.- eBook - ePub
Data Warehouse and Data Mining
Concepts, techniques and real life applications (English Edition)
- Dr. Jugnesh Kumar(Author)
- 2024(Publication Date)
- BPB Publications(Publisher)
HAPTER 4Data Mining Definition and Task IntroductionData Mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data Mining techniques and tools enable enterprises to predict future trends more-informed business decisions. Data Mining is a key part of data analytics and one of the core disciplines in data science which uses advanced analytics techniques to find useful information in data sets.Structure The chapter discusses the following topics:- Introduction to Data Mining
- Data Mining tasks
- Data Mining functionality
- Knowledge Discovery in Databases versus Data Mining
- Data Mining techniques
- Text mining
- Data Mining tools and applications
Data Mining is the process of discovering patterns, relationships, and insights from large volumes of data. It involves applying various techniques and algorithms to extract meaningful and actionable information from datasets. The goal of Data Mining is to uncover hidden patterns, identify trends, and make predictions or decisions based on the discovered knowledge.Introduction to Data MiningThe first step in Data Mining, defining business objectives or problem statements, is pivotal for project success. It establishes a solid foundation, aligns efforts with organizational goals, and ensures that subsequent steps are purposeful and impactful.Defining business objectives/problem definition The key features of defining business objectives with respect to Data Mining are discussed ahead. Importance The importance of defining business objectives is as follows:- Alignment with organizational goals: Understanding the broader business context is essential. It ensures that Data Mining efforts are directly contributing to the organization’s strategic objectives.
- Clarity in purpose:
- eBook - PDF
Decomposition Methodology for Knowledge Discovery and Data Mining
Theory and Applications
- Oded Maimon, Lior Rokach;;;(Authors)
- 2005(Publication Date)
- WSPC(Publisher)
An analyst (e.g., a statistician) used l 2 Decomposition Methodology for Knowledge Discovery and Data Mining the available domain knowledge to select the variables to be collected. The number of variables selected was usually small and the collection of their values could be done manually (e.g., utilizing hand-written records or oral interviews). In the case of computer-aided analysis, the analyst had to en-ter the collected data into a statistical computer package or an electronic spreadsheet. Due to the high cost of data collection, people learned to make decisions based on limited information. However, since the information-age, the accumulation of data become easier and storing it inexpensive. It has been estimated that the amount of stored information doubles every twenty months [Frawley et al. (1991)]. Unfortunately, as the amount of machine readable information increases, the ability to understand and make use of it does not keep pace with its growth. Data Mining is a term coined to describe the process of sifting through large databases in search of interesting patterns and relationships. Practically, Data Mining provides tools by which large quantities of data can be automatically analyzed. Some of the researchers consider the term Data Mining as misleading and prefer the term Knowledge Mining as it provides a better analogy to gold mining [Klosgen and Zytkow (2002)]. The Knowledge Discovery in Databases (KDD) process was defined my many, for instance [Fayyad et al. (1996)] define it as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Friedman (1997a)] considers the KDD process as an automatic exploratory data analysis of large databases. [Hand (1998)] views it as a secondary data analysis of large databases. The term Secondary emphasizes the fact that the primary purpose of the database was not data analysis. - eBook - PDF
- Jr., Monte F. Hancock(Authors)
- 2011(Publication Date)
- Auerbach Publications(Publisher)
1 Chapter 1 What Is Data Mining and What Can It Do? Purpose The purpose of this chapter is to provide the reader with grounding in the fundamental philosophical principles of Data Mining as a technical practice. The reader is then intro-duced to the wide array of practical applications that rely on Data Mining technology. The issue of computational complexity is addressed in brief. Goals After you have read this chapter, you will be able to define Data Mining from both philosophical and operational perspectives, and enumerate the analytic functions Data Mining performs. You will know the different types of data that arise in practice. You will understand the basics of computational complexity theory. Most importantly, you will understand the difference between data and information. 1.1 Introduction Our study of Data Mining begins with two semi-formal definitions: Definition 1. Data Mining is the principled detection, characterization, and exploita-tion of actionable patterns in data. Table 1.1 explains what is meant by each of these components. 2 Practical Data Mining Table 1.1 Definitive Data Mining Attributes Attribute Connotations Principled Rational, empirical, objective, repeatable Detection Sensing and locating Characterization Consistent, efficient, tractable symbolic representation that does not alter information content Exploitation Decision making that facilitates action Actionable Pattern Conveys information that supports decision making Taking this view of what Data Mining is we can formulate a functional definition that tells us what individuals engaged in Data Mining do. Definition 2. Data Mining is the application of the scientific method to data to obtain useful information. The heart of the scientific approach to problem-solving is rational hypothesis testing guided by empirical experimentation. What we today call science today was referred to as natural philosophy in the 15th century. - Sankar K. Pal, Pabitra Mitra(Authors)
- 2004(Publication Date)
- Chapman and Hall/CRC(Publisher)
1.5.2 Statistical perspective The statistical perspective views Data Mining as computer automated ex-ploratory data analysis of (usually) large complex data sets [79, 92]. The term Data Mining existed in statistical data analysis literature long before its current definition in the computer science community. However, the abun-dance and massiveness of data has provided impetus to development of al-gorithms which, though rooted in statistics, lays more emphasis on compu-tational efficiency. Presently, statistical tools are used in all the KDD tasks like preprocessing (sampling, outlier detection, experimental design), data modeling (clustering, expectation maximization, decision trees, regression, canonical correlation etc), model selection, evaluation and averaging (robust statistics, hypothesis testing) and visualization (principal component analysis, Sammon’s mapping). The advantages of the statistical approach are its solid theoretical back-ground, and ease of posing formal questions. Tasks such as classification and clustering fit easily into this approach. What seems to be lacking are ways for taking into account the iterative and interactive nature of the data min-ing process. Also scalability of the methods to very large, especially tertiary memory data, is still not fully achieved. 1.5.3 Pattern recognition perspective At present, pattern recognition and machine learning provide the most fruitful framework for Data Mining [109, 161]. Not only do they provide a wide range of models (linear/non-linear, comprehensible/complex, predic-tive/descriptive, instance/rule based) for Data Mining tasks (clustering, clas-sification, rule discovery), methods for modeling uncertainties (probabilistic, fuzzy) in the discovered patterns also form part of PR research. Another aspect that makes pattern recognition algorithms attractive for Data Mining is their capability of learning or induction.- eBook - PDF
Understanding Complex Datasets
Data Mining with Matrix Decompositions
- David Skillicorn(Author)
- 2007(Publication Date)
- Chapman and Hall/CRC(Publisher)
For example, the advent of computerized cash registers meant that many businesses had access to unprecedented detail about the purchasing patterns of their customers. It seemed clear that these patterns had implications for the way in which selling was done and, in particular, suggested a way of selling to each individual customer in the way that best suited him or her, a process that has come to be called mass customization and customer relationship management . Initial successes in the business con-1 2 Chapter 1. Data Mining text also stimulated interest in other domains where data was plentiful. For example, data about highway traffic flow could be examined for ways to re-duce congestion; and if this worked for real highways, it could also be applied to computer networks and the Internet. Analysis of such data has become common in many different settings over the past twenty years. The name ‘Data Mining’ derives from the metaphor of data as something that is large, contains far too much detail to be used as it is, but contains nuggets of useful information that can have value. So Data Mining can be defined as the extraction of the valuable information and actionable knowledge that is implicit in large amounts of data. The data used for customer relationship management and other commer-cial applications is, in a sense, quite simple. A customer either did or did not purchase a particular product, make a phone call, or visit a web page. There is no ambiguity about a value associated with a particular person, object, or transaction. It is also usually true in commercial applications that a particular kind of value associated to a customer or transaction, which we call an attribute , plays a similar role in understanding every customer. - Available until 8 Dec |Learn more
- Deyi Li, Yi Du(Authors)
- 2007(Publication Date)
- Chapman and Hall/CRC(Publisher)
The discovered knowledge may be applied to information management, query optimization, deci-sion support, and process control, etc., and it can be used for data self-maintenance as well. It is necessary to point out that all the discovered knowledge is relative, confined by some specific preconditions and constraints and oriented to some specific fields. Data Mining is also a repeatedly interactive process between human and machine. Based on the discovery task and a user’s background knowledge, the user will conduct research on data to adopt and design appropriate mining tools at first step for automatic knowledge extraction. Thereafter, the data will be modified and the mining tools will be adjusted according to the mining result, so as to obtain better or different results. These two steps will iterate until the satisfactory results are 202 Artificial Intelligence with Uncertainty generated. In general, the process of Data Mining and knowledge discovery consists of the following basic steps: 1. The description of discovery task: describes and determines the mining task according to the background knowledge of the user and the data property of the application field. 2. Data collection: selects all the data set related to the discovery task. 3. Data preprocessing: checks the integrity and consistency of the data, filters out noise, corrects the mistake, recovers the lost data, and transforms the data, including a large quantity of attributes or features, to a low-dimensional space, which is the most representative of the internal rela-tionship within the data. 4. Data Mining: designs and adopts effective Data Mining algorithms and develops implementation according to the discovery task. 5. Explanation and evaluation: explains the discovered knowledge and its correctness, verifies and evaluates the effectiveness of the knowledge, the consistency with the initial data, and the novelty for the user. - eBook - PDF
- Christoph Helma(Author)
- 2005(Publication Date)
- CRC Press(Publisher)
A recent review (3) covers DM applications in toxicology. Another recommended reading is Advances in Knowledge Discovery and Data Mining by Fayyad et al. (4). First, we will have to clarify the meaning of DM and its relation to other terms frequently used in this area, namely knowledge discovery in databases (KDD) and machine learn-ing (ML). Common definitions (2,5–8) are: Knowledge discovery (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ulti-mately understandable structure in data. Machine Learning and Data Mining 225 Data Mining ( DM ) is the actual data analysis step within this process. It consists of the application of statistics, machine learning, and database techniques to the dataset at hand. Machine learning ( ML ) is the study of computer algo-rithms that improve automatically through experi-ence. One ML task of particular interest in DM is classification ; that is, to classify new unseen instances on the basis of known training instances. This means that knowledge discovery is the process of supporting humans in their enterprise to make sense of massive amounts of data; Data Mining is the application of techniques to achieve this goal; and machine learning is one of the techniques suitable for this task. Other DM tech-niques originate from diverse fields, such, as statistics, visualization, and database research. The focus in this chap-ter will be primarily on DM techniques based on machine learning. In practice, many of these terms are not used in their strict sense. In this chapter, we will also use sometimes the popular term DM, when we mean KDD or ML. Table 1 shows the typical KDD process as described by Fayyad et al. (6). In the following, we will sketch the ada-pted process for the task of extracting structure–activity Table 1 The Knowledge Discovery (KDD) Process According to Fayyad et al. 1. Definition of the goals of the KDD process 2. Creation or selection of a data set 3. - eBook - PDF
- Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, Vipin Kumar, Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, Vipin Kumar(Authors)
- 2008(Publication Date)
- Chapman and Hall/CRC(Publisher)
Different meth-ods have been developed in this domain by integration and extension of the methods developed in machine learning, Data Mining, pattern recognition, and statistics. For example, statistical analysis (such as hypothesis testing) approach [LFY + 06] can be performed on program execution traces to isolate the locations of bugs that distin-guish program success runs from failing runs. Despite of its limited success, it is still a rich domain for data miners to research and further develop sophisticated, scalable, and real-time data-mining methods. 18 Next Generation of Data Mining 1.3 Conclusions Science and engineering are fertile lands for Data Mining. In the last two decades, science and engineering have evolved to a stage that gigantic amounts of data are constantly being generated and collected, and Data Mining and knowledge discovery becomes the essential scientific discovery process. We have proceeded to the era of data science and data engineering. In this chapter, we have examined a few important research challenges in science and engineering Data Mining. There are still several interesting research issues not covered in this short abstract. One such issue is the development of invisible Data Mining functionality for science and engineering that builds data-mining functions as an invisible process in the system (e.g., rank the results based on the relevance and some sophisticated, preprocessed evaluation functions) so that users may not even sense that Data Mining has been performed beforehand or is being performed and their browsing and mouse clicking are simply using the results of or further exploring of Data Mining. Another research issue is privacy-preserving Data Mining that aims to performing effective Data Mining without disclosure of private or sensitive informa-tion to outsiders. - eBook - PDF
- Jovan Pehcevski(Author)
- 2023(Publication Date)
- Arcler Press(Publisher)
Considering the importance of Data Mining for today’s companies, this paper discusses benefits and challenges of Data Mining for e-commerce companies. Furthermore, it reviews the process of Data Mining in e-com- merce together with the common types of database and cloud computing in the field of e-commerce. Data Mining Data Mining is the process of discovering meaningful pattern and correlation by sifting through large amounts of data stored in repositories. There are several tools for this data generation, which include abstractions, aggregations, summarization and characteristics of data [6] . In the past decade, Data Mining has change the e-commerce business. Data Mining is not specific to one type of data. Data Mining can be germane to any type of information source, however, algorithms and tactics may differ when applied to different kind of data. The challenges presented by different type of data varies. Data Mining is being used in many form of databases like flat file, data warehouses, object oriented databases and etc. This paper concentrates on relational databases. Relational database consists of a set of tables containing either values of entity attributes or values of attributes from entity relationship. Tables have columns and rows, where columns represent attributes and rows represent tuples. A tuple in relational table corresponds to either an object or a relationship between objects and is identified by a set of attribute values representing a unique key [6] . The most commonly used query language for relational database is SQL, which allows to manipulate and retrieve data stored in the tables. Data Mining algorithms using relational database can be more versatile than data Data Mining in Electronic Commerce: Benefits and Challenges 261 mining algorithms specifically written for flat files. Data Mining can benefit from SQL for data selection, transformation and consolidation [7] . - eBook - PDF
Handbook Of Software Engineering And Knowledge Engineering, Vol 1: Fundamentals
Volume I: Fundamentals
- Shi-kuo Chang(Author)
- 2001(Publication Date)
- World Scientific(Publisher)
A pattern’s measure of interestingness is a quantitative indicator used in pattern evaluation. Only interesting patterns are knowledge. A pattern is interesting if it is new, non-trivial, and useful. Data Mining is the process of pattern discovery in a data set from which noise has been previously eliminated and which has been transformed in such a way to enable the pattern discovery process. Data Mining is always based on a data-mining algorithm. 1.3. The context and resources for KDD Figure 1 illustrates the context and computational resources needed to perform KDD [20]. The necessary assumptions are that there exists a database with its Database L2 Knowledge Discovery and Data Mining in Databases Application KDD method evaluation u- 617 Discovered Fig. 1. The context and resources for KDD (after [20]). data dictionary, and that the user wants to discover some patterns in it. There must also exist an application through which the user can select (from the database) and prepare a data set for KDD, adjust DM parameters, start and run the KDD process, and access and manipulate discovered patterns. KDD/DM systems usually let the user choose among several KDD methods. Each method enables preparation of a data set for automatic analysis, searching that set in order to discover/generate patterns (i.e., applying a certain kind of DM over that set), as well as pattern evaluation in terms of certainty and interestingness. KDD methods often make possible to use domain knowledge to guide and control the process and to help evaluate the patterns. In such cases domain knowledge must be represented using an appropriate knowledge representation technique (such as rules, frames, decision trees, and the like). Discovered knowledge may be used directly for database query from the application, or it may be included into another knowledge-based program (e.g. an expert system in that domain), or the user may just save it in a desired form. - eBook - PDF
Microsoft Data Mining
Integrated Business Intelligence for e-Commerce and Knowledge Management
- Barry de Ville(Author)
- 2001(Publication Date)
- Digital Press(Publisher)
2.12 Collaborative Data Mining: the confluence of Data Mining and knowledge management 57 Chapter 2 ilarly, as the enterprise becomes increasingly adept at the capture and analy-sis of business and engineering processes, its ability to operate the business, in a passive and reactive sense, begins to change to an ability to drive the business in a proactive and predictive sense. Data Mining and KDD are important facilitators in the unfolding evo-lution of the enterprise toward higher levels of decision-making maturity. Whereas the identification of tacit knowledge—or know-how—is an essen-tially difficult task, we can expect greater and greater increases in our ability to let data speak for themselves. Data Mining and the associated data manipulation maturity involved in Data Mining mean that data—and the implicit knowledge that data contain—can be more readily deployed to drive the enterprise to greater market success and higher levels of decision-making effectiveness. The topic of intellectual capital development is taken up further in Chapter 7. You can also read more about it in Intellectual Cap-ital (Thomas Stewart). This Page Intentionally Left Blank
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.











