Data Mining and Knowledge Discovery for Geoscientists
eBook - ePub

Data Mining and Knowledge Discovery for Geoscientists

  1. 376 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Mining and Knowledge Discovery for Geoscientists

About this book

Currently there are major challenges in data mining applications in the geosciences. This is due primarily to the fact that there is a wealth of available mining data amid an absence of the knowledge and expertise necessary to analyze and accurately interpret the same data. Most geoscientists have no practical knowledge or experience using data mining techniques. For the few that do, they typically lack expertise in using data mining software and in selecting the most appropriate algorithms for a given application. This leads to a paradoxical scenario of "rich data but poor knowledge".The true solution is to apply data mining techniques in geosciences databases and to modify these techniques for practical applications. Authored by a global thought leader in data mining, Data Mining and Knowledge Discovery for Geoscientists addresses these challenges by summarizing the latest developments in geosciences data mining and arming scientists with the ability to apply key concepts to effectively analyze and interpret vast amounts of critical information.- Focuses on 22 of data mining's most practical algorithms and popular application samples- Features 36 case studies and end-of-chapter exercises unique to the geosciences to underscore key data mining applications- Presents a practical and integrated system of data mining and knowledge discovery for geoscientists- Rigorous yet broadly accessible to geoscientists, engineers, researchers and programmers in data mining- Introduces widely used algorithms, their basic principles and conditions of applications, diverse case studies, and suggests algorithms that may be suitable for specific applications

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Data Mining and Knowledge Discovery for Geoscientists by Guangren Shi in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1

Introduction

Abstract

This chapter includes four sections. Section 1 describes the motivity of data mining, objectives and scope of data mining, classification of data mining systems, and major issues in data mining for geosciences, indicating the particularities of underground data and relative processing methods with other fields. Section 2 introduces the database, data warehouse, and data bank, which are data systems usable by data mining. Section 3 discusses the linear and nonlinear algorithms, error analysis of calculation results, differences between regression and classification algorithms, nonlinearity of studied problem, and solution accuracy of studied problem, which are shared by regression and classification algorithms introduced in Chapters 2 through 6 and Chapter 10, for the latter two of which the five ranks have been presented. Section 4 introduces the functions, flowchart, and data preprocessing of data mining systems and summarizes the algorithms and case studies in this book. Finally, 10 exercises are provided.

Keywords

data mining; data mining system; geosciences particularities; regression algorithms; classification algorithms; linear algorithms; nonlinear algorithms; error analysis; nonlinearity ranking; solution accuracy ranking
Outline
1.1. Introduction to Data Mining
1.1.1. Motivity of Data Mining
1.1.2. Objectives and Scope of Data Mining
1.1.2.1. Generalization
1.1.2.2. Association
1.1.2.3. Classification and Clustering
1.1.2.4. Prediction
1.1.2.5. Deviation
1.1.3. Classification of Data Mining Systems
1.1.3.1. To Classify According to the Mined DB Type
1.1.3.2. To Classify According to the Mined Knowledge Type
1.1.3.3. To Classify According to the Available Techniques Type
1.1.3.4. To Classify According to the Application
1.1.4. Major Issues in Data Mining for Geosciences
1.2. Data Systems Usable by Data Mining
1.2.1. Databases
1.2.1.1. Database types
1.2.1.2. Data Properties
1.2.1.3. Development Phases
1.2.1.4. Commonly Used Databases
1.2.2. Data Warehousing
1.2.2.1. Data Storage
1.2.2.2. Construction Step
1.2.3. Data Banks
1.3. Commonly Used Regression and Classification Algorithms
1.3.1. Linear and Nonlinear Algorithms
1.3.2. Error Analysis of Calculation Results
1.3.3. Differences between Regression and Classification Algorithms
1.3.4. Nonlinearity of a Studied Problem
1.3.5. Solution Accuracy of Studied Problem
1.4. Data Mining System
1.4.1. System Functions
1.4.2. System Flowcharts
1.4.3. Data Preprocessing
1.4.3.1. Data Cleaning
1.4.3.2. Data Integration
1.4.3.3. Data Transformation
1.4.3.4. Data Reduction
1.4.4. Summary of Algorithms and Case Studies
Exercises
References
In the early 21st century, data mining (DM) was predicted to be “one of the most revolutionary developments of the next decade” and was chosen as one of 10 emerging technologies that will change the world (Hand et al., 2001; Larose, 2005; Larose, 2006). In fact, in the past 20 years, the field of DM has seen enormous success, both in terms of broad-ranging application achievements and in terms of scientific progress and understanding. DM is the computerized process of extracting previously unknown and important actionable information and knowledge from a database (DB). This knowledge can then be used to make crucial decisions by leveraging the individual’s intuition and experience to objectively generate opportunities that might otherwise go undiscovered. So, DM is also called knowledge discovery in database (KDD). It has been widely used in some fields of business and sciences (Hand et al, 2001; Tan et al., 2005; Witten and Frank, 2005; Han and Kamber, 2006; Soman et al., 2006), but the DM application to geosciences is still in its initial stage (Wong, 2003; Zangl and Hannerer, 2003; Aminzadeh, 2005; Mohaghegh, 2005; Shi, 2011). This is because geosciences are different from the other fields, with miscellaneous data types, huge quantities, different measuring precision, and lots of uncertainties as to data mining results.
With the establishment of numbers of DB for geosciences, including data banks, data warehouses, and so on, the question of how to search for new important information and knowledge from large amounts of data is becoming an urgent task after the data bank is constructed. Facing such large amounts of geoscientific data, people can use the DB management system to conduct conventional applications (such as query, search, and simple statistical analysis) but cannot obtain the available knowledge inhered in data, falling into a puzzle of “rich data but poor knowledge.” The only solution is to develop DM techniques in geoscientific databases.
We need to stress here that attributes and variables mentioned in this book are the same terminology; the term attribute refers to data related to datalogy, whereas variable refers to data related to mathematics. These two terms are called parameters when they are related to applications, so these three terms are absolutely the same. There are two types for these three terminologies; one is the continuous or real type, referring to lots of unequal real numbers occurring in the sample value, and the other is the discrete or integer type, referring to the fact that sample values are integer numbers such as 1, 2, 3, and so on. Continuous and discrete are the words of datalogy, such as continuous attribute, discrete attribute, continuous variable, discrete variable; whereas real type and integer type are terms related to software, such as real attribute, integer attribute, real variable, and integer variable.

1.1 INTRODUCTION TO DATA MINING

1.1.1 Motivity of Data Mining

Just as its meaning implies, data mining involves digging out the useful information from a large amount of data. With the wider application of computers, large amounts of data have piled up each year. It is possible to mine “gold” from these large amounts of data by applying DM techniques.
We are living in an era in which telecommunications, computers, and network technology are changing human beings and society. However, large amounts of information introduce large numbers of problems while bringing convenience to people. For example, it is hard to digest the excessive amounts of information, to identify true and false information, to ensure information safety, and to deal with inconsistent forms of information.
On the other hand, with the rapid development of DB techniques and the wide application of DB management systems, the amounts of data that people accumulate are growing more and more. A great deal of important information is hidden behind the increased data. It is our hope to analyze this information at a higher level so as to make better use of these data.
The current DB systems can efficiently realize the function of data records, queries, and statistics, but they cannot discover the relationship and rules that exist in the data and cannot predict the future development tendency based on the available data.
The phenomenon of rich data but poor knowledge results from the lack of effective means to mine the hidden knowledge in the data. Facing this challenge, DM techniques have been introduced and appear to be vital. The prediction of DM is the next hotpoint technique following netw...

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Preface
  6. Chapter 1. Introduction
  7. Chapter 2. Probability and Statistics
  8. Chapter 3. Artificial Neural Networks
  9. Chapter 4. Support Vector Machines
  10. Chapter 5. Decision Trees
  11. Chapter 6. Bayesian Classification
  12. Chapter 7. Cluster Analysis
  13. Chapter 8. Kriging
  14. Chapter 9. Other Soft Computing Algorithms for Geosciences
  15. Chapter 10. A Practical Software System of Data Mining and Knowledge Discovery for Geosciences
  16. Appendix 1. Table of Unit Conversion
  17. Appendix 2. Answers to Exercises
  18. Index