eBook - ePub

Data Mining and Knowledge Discovery for Geoscientists

Name: Data Mining and Knowledge Discovery for Geoscientists
ISBN: 9780124104754

Guangren Shi,

376 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Data Mining and Knowledge Discovery for Geoscientists

Guangren Shi,

About this book

Currently there are major challenges in data mining applications in the geosciences. This is due primarily to the fact that there is a wealth of available mining data amid an absence of the knowledge and expertise necessary to analyze and accurately interpret the same data. Most geoscientists have no practical knowledge or experience using data mining techniques. For the few that do, they typically lack expertise in using data mining software and in selecting the most appropriate algorithms for a given application. This leads to a paradoxical scenario of "rich data but poor knowledge".The true solution is to apply data mining techniques in geosciences databases and to modify these techniques for practical applications. Authored by a global thought leader in data mining, Data Mining and Knowledge Discovery for Geoscientists addresses these challenges by summarizing the latest developments in geosciences data mining and arming scientists with the ability to apply key concepts to effectively analyze and interpret vast amounts of critical information.- Focuses on 22 of data mining's most practical algorithms and popular application samples- Features 36 case studies and end-of-chapter exercises unique to the geosciences to underscore key data mining applications- Presents a practical and integrated system of data mining and knowledge discovery for geoscientists- Rigorous yet broadly accessible to geoscientists, engineers, researchers and programmers in data mining- Introduces widely used algorithms, their basic principles and conditions of applications, diverse case studies, and suggests algorithms that may be suitable for specific applications

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Chapter 1

Introduction

Abstract

This chapter includes four sections. Section 1 describes the motivity of data mining, objectives and scope of data mining, classification of data mining systems, and major issues in data mining for geosciences, indicating the particularities of underground data and relative processing methods with other fields. Section 2 introduces the database, data warehouse, and data bank, which are data systems usable by data mining. Section 3 discusses the linear and nonlinear algorithms, error analysis of calculation results, differences between regression and classification algorithms, nonlinearity of studied problem, and solution accuracy of studied problem, which are shared by regression and classification algorithms introduced in Chapters 2 through 6 and Chapter 10, for the latter two of which the five ranks have been presented. Section 4 introduces the functions, flowchart, and data preprocessing of data mining systems and summarizes the algorithms and case studies in this book. Finally, 10 exercises are provided.

Keywords

data mining; data mining system; geosciences particularities; regression algorithms; classification algorithms; linear algorithms; nonlinear algorithms; error analysis; nonlinearity ranking; solution accuracy ranking

Outline

1.1. Introduction to Data Mining

1.1.1. Motivity of Data Mining

1.1.2. Objectives and Scope of Data Mining

1.1.2.1. Generalization

1.1.2.2. Association

1.1.2.3. Classification and Clustering

1.1.2.4. Prediction

1.1.2.5. Deviation

1.1.3. Classification of Data Mining Systems

1.1.3.1. To Classify According to the Mined DB Type

1.1.3.2. To Classify According to the Mined Knowledge Type

1.1.3.3. To Classify According to the Available Techniques Type

1.1.3.4. To Classify According to the Application

1.1.4. Major Issues in Data Mining for Geosciences

1.2. Data Systems Usable by Data Mining

1.2.1. Databases

1.2.1.1. Database types

1.2.1.2. Data Properties

1.2.1.3. Development Phases

1.2.1.4. Commonly Used Databases

1.2.2. Data Warehousing

1.2.2.1. Data Storage

1.2.2.2. Construction Step

1.2.3. Data Banks

1.3. Commonly Used Regression and Classification Algorithms

1.3.1. Linear and Nonlinear Algorithms

1.3.2. Error Analysis of Calculation Results

1.3.3. Differences between Regression and Classification Algorithms

1.3.4. Nonlinearity of a Studied Problem

1.3.5. Solution Accuracy of Studied Problem

1.4. Data Mining System

1.4.1. System Functions

1.4.2. System Flowcharts

1.4.3. Data Preprocessing

1.4.3.1. Data Cleaning

1.4.3.2. Data Integration

1.4.3.3. Data Transformation

1.4.3.4. Data Reduction

1.4.4. Summary of Algorithms and Case Studies

Exercises

References

In the early 21st century, data mining (DM) was predicted to be “one of the most revolutionary developments of the next decade” and was chosen as one of 10 emerging technologies that will change the world (Hand et al., 2001; Larose, 2005; Larose, 2006). In fact, in the past 20 years, the field of DM has seen enormous success, both in terms of broad-ranging application achievements and in terms of scientific progress and understanding. DM is the computerized process of extracting previously unknown and important actionable information and knowledge from a database (DB). This knowledge can then be used to make crucial decisions by leveraging the individual’s intuition and experience to objectively generate opportunities that might otherwise go undiscovered. So, DM is also called knowledge discovery in database (KDD). It has been widely used in some fields of business and sciences (Hand et al, 2001; Tan et al., 2005; Witten and Frank, 2005; Han and Kamber, 2006; Soman et al., 2006), but the DM application to geosciences is still in its initial stage (Wong, 2003; Zangl and Hannerer, 2003; Aminzadeh, 2005; Mohaghegh, 2005; Shi, 2011). This is because geosciences are different from the other fields, with miscellaneous data types, huge quantities, different measuring precision, and lots of uncertainties as to data mining results.

With the establishment of numbers of DB for geosciences, including data banks, data warehouses, and so on, the question of how to search for new important information and knowledge from large amounts of data is becoming an urgent task after the data bank is constructed. Facing such large amounts of geoscientific data, people can use the DB management system to conduct conventional applications (such as query, search, and simple statistical analysis) but cannot obtain the available knowledge inhered in data, falling into a puzzle of “rich data but poor knowledge.” The only solution is to develop DM techniques in geoscientific databases.

We need to stress here that attributes and variables mentioned in this book are the same terminology; the term attribute refers to data related to datalogy, whereas variable refers to data related to mathematics. These two terms are called parameters when they are related to applications, so these three terms are absolutely the same. There are two types for these three terminologies; one is the continuous or real type, referring to lots of unequal real numbers occurring in the sample value, and the other is the discrete or integer type, referring to the fact that sample values are integer numbers such as 1, 2, 3, and so on. Continuous and discrete are the words of datalogy, such as continuous attribute, discrete attribute, continuous variable, discrete variable; whereas real type and integer type are terms related to software, such as real attribute, integer attribute, real variable, and integer variable.

1.1 INTRODUCTION TO DATA MINING

1.1.1 Motivity of Data Mining

Just as its meaning implies, data mining involves digging out the useful information from a large amount of data. With the wider application of computers, large amounts of data have piled up each year. It is possible to mine “gold” from these large amounts of data by applying DM techniques.

We are living in an era in which telecommunications, computers, and network technology are changing human beings and society. However, large amounts of information introduce large numbers of problems while bringing convenience to people. For example, it is hard to digest the excessive amounts of information, to identify true and false information, to ensure information safety, and to deal with inconsistent forms of information.

On the other hand, with the rapid development of DB techniques and the wide application of DB management systems, the amounts of data that people accumulate are growing more and more. A great deal of important information is hidden behind the increased data. It is our hope to analyze this information at a higher level so as to make better use of these data.

The current DB systems can efficiently realize the function of data records, queries, and statistics, but they cannot discover the relationship and rules that exist in the data and cannot predict the future development tendency based on the available data.

The phenomenon of rich data but poor knowledge results from the lack of effective means to mine the hidden knowledge in the data. Facing this challenge, DM techniques have been introduced and appear to be vital. The prediction of DM is the next hotpoint technique following netw...

Cover image
Title page
Table of Contents
Copyright
Preface
Chapter 1. Introduction
Chapter 2. Probability and Statistics
Chapter 3. Artificial Neural Networks
Chapter 4. Support Vector Machines
Chapter 5. Decision Trees
Chapter 6. Bayesian Classification
Chapter 7. Cluster Analysis
Chapter 8. Kriging
Chapter 9. Other Soft Computing Algorithms for Geosciences
Chapter 10. A Practical Software System of Data Mining and Knowledge Discovery for Geosciences
Appendix 1. Table of Unit Conversion
Appendix 2. Answers to Exercises
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Data Mining and Knowledge Discovery for Geoscientists by Guangren Shi in PDF and/or ePUB format, as well as other popular books in Informatica & Database. We have over one million books available in our catalogue for you to explore.