eBook - ePub

Data Mining Models, Second Edition

Name: Data Mining Models, Second Edition
Author: David L. Olson

David L. Olson

182 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Data Mining Models, Second Edition

David L. Olson

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Data mining has become the fastest growing topic of interest in business programs in the past decade. This book is intended to describe the benefits of data mining in business, the process and typical business applications, the workings of basic data mining models, and demonstrate each with widely available free software. The book focuses on demonstrating common business data mining applications. It provides exposure to the data mining process, to include problem identification, data management, and available modeling tools. The book takes the approach of demonstrating typical business data sets with open source software. KNIME is a very easy-to-use tool, and is used as the primary means of demonstration. R is much more powerful and is a commercially viable data mining tool. We also demonstrate WEKA, which is a highly useful academic software, although it is difficult to manipulate test sets and new cases, making it problematic for commercial use.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Data Mining Models, Second Edition è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Data Mining Models, Second Edition di David L. Olson in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Economia e Statistiche per il settore aziendale ed economico. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Business Expert Press

Anno

2018

ISBN

9781948580502

Argomento

Economia

Categoria

Statistiche per il settore aziendale ed economico

Chapter 1

Data Mining in Business

Introduction

Data mining refers to the analysis of large quantities of data that are stored in computers. Bar coding has made checkout very convenient for us and provides retail establishments with masses of data. Grocery stores and other retail stores are able to quickly process our purchases and use computers to accurately determine the product prices. These same computers can help the stores with their inventory management, by instantaneously determining the quantity of items of each product on hand. Computers allow the store’s accounting system to more accurately measure costs and determine the profit that store stockholders are concerned about. All of this information is available based on the bar coding information attached to each product. Along with many other sources of information, information gathered through bar coding can be used for data mining analysis.

The era of big data is here, with many sources pointing out that more data are created over the past year or two than was generated throughout all prior human history. Big data involves datasets so large that traditional data analytic methods no longer work due to data volume. Davenport¹ gave the following features of big data:

Data too big to fit on a single server
Data too unstructured to fit in a row-and-column database
Data flowing too continuously to fit into a static data warehouse
Lack of structure is the most important aspect (even more than the size)
The point is to analyze, converting data into insights, innovation, and business value

Big data has been said to be more about analytics than about the data itself. The era of big data is expected to emphasize focusing on knowing what (based on correlation) rather than the traditional obsession for causality. The emphasis will be on discovering patterns offering novel and useful insights.²Data will become a raw material for business, a vital economic input and source of value. Cukier and Mayer–Scheonberger³ cite big data providing the following impacts on the statistical body of theory established in the 20th century: (1) There is so much data available that sampling is usually not needed (n = all). (2) Precise accuracy of data is, thus, less important as inevitable errors are compensated for by the mass of data (any one observation is flooded by others). (3) Correlation is more important than causality—most data mining applications involving big data are interested in what is going to happen, and you don’t need to know why. Automatic trading programs need to detect the trend changes, not figure out that the Greek economy collapsed or the Chinese government will devalue the Renminbi (RMB). The programs in vehicles need to detect that an axle bearing is getting hot and the vehicle is vibrating and the wheel should be replaced, not whether this is due to a bearing failure or a housing rusting out.

There are many sources of big data.⁴ Internal to the corporation, e-mails, blogs, enterprise systems, and automation lead to structured, unstructured, and semistructured information within the organization. External data is also widely available, much of it free over the Internet, but much also available from the commercial vendors. There also is data obtainable from social media.

Data mining is not limited to business. Both major parties in the U.S. elections utilize data mining of potential voters.⁵ Data mining has been heavily used in the medical field, from diagnosis of patient records to help identify the best practices.⁶ Business use of data mining is also impressive. Toyota used data mining of its data warehouse to determine more efficient transportation routes, reducing the time to deliver cars to their customers by an average 19 days. Data warehouses are very large scale database systems capable of systematically storing all transactional data generated by a business organization, such as Walmart. Toyota also was able to identify the sales trends faster and to identify the best locations for new dealerships.

Data mining is widely used by banking firms in soliciting credit card customers, by insurance and telecommunication companies in detecting fraud, by manufacturing firms in quality control, and many other applications. Data mining is being applied to improve food product safety, criminal detection, and tourism. Micromarketing targets small groups of highly responsive customers. Data on consumer and lifestyle data is widely available, enabling customized individual marketing campaigns. This is enabled by customer profiling, identifying those subsets of customers most likely to be profitable to the business, as well as targeting, determining the characteristics of the most profitable customers.

Data mining involves statistical and artificial intelligence (AI) analysis, usually applied to large-scale datasets. There are two general types of data mining studies. Hypothesis testing involves expressing a theory about the relationship between actions and outcomes. This approach is referred to as supervised. In a simple form, it can be hypothesized that advertising will yield greater profit. This relationship has long been studied by retailing firms in the context of their specific operations. Data mining is applied to identifying relationships based on large quantities of data, which could include testing the response rates to various types of advertising on the sales and profitability of specific product lines. However, there is more to data mining than the technical tools used. The second form of data mining study is knowledge discovery. Data mining involves a spirit of knowledge discovery (learning new and useful things). Knowledge discovery is referred to as unsupervised. In this form of analysis, a preconceived notion may not be present, but rather relationships can be identified by looking at the data. This may be supported by visualization tools, which display data, or through fundamental statistical analysis, such as correlation analysis. Much of this can be accomplished through automatic means, as we will see in decision tree analysis, for example. But data mining is not limited to automated analysis. Knowledge discovery by humans can be enhanced by graphical tools and identification of unexpected patterns through a combination of human and computer interaction.

Requirements for Data Mining

Data mining requires identification of a problem, along with the collection of data that can lead to better understanding, and computer models to provide statistical or other means of analysis. A variety of analytic computer models have been used in data mining. In the later sections, we will discuss various types of these models. Also required is access to data. Quite often, systems including data warehouses and data marts are used to manage large quantities of data. Other data mining analyses are done with smaller sets of data, such as can be organized in online analytic processing systems.

Masses of data generated from cash registers, scanning, and topic-specific databases throughout the company are explored, analyzed, reduced, and reused. Searches are performed across different models proposed for predicting sales, marketing response, and profit. The classical statistical approaches are fundamental to data mining. Automated AI methods are also used. However, a systematic exploration through classical statistical methods is still the basis of data mining. ...

Indice dei contenuti

Stili delle citazioni per Data Mining Models, Second Edition

APA 6 Citation

Olson, D. (2018). Data Mining Models, Second Edition ([edition unavailable]). Business Expert Press. Retrieved from https://www.perlego.com/book/744490/data-mining-models-second-edition-pdf (Original work published 2018)

Chicago Citation

Olson, David. (2018) 2018. Data Mining Models, Second Edition. [Edition unavailable]. Business Expert Press. https://www.perlego.com/book/744490/data-mining-models-second-edition-pdf.

Harvard Citation

Olson, D. (2018) Data Mining Models, Second Edition. [edition unavailable]. Business Expert Press. Available at: https://www.perlego.com/book/744490/data-mining-models-second-edition-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Olson, David. Data Mining Models, Second Edition. [edition unavailable]. Business Expert Press, 2018. Web. 14 Oct. 2022.