eBook - ePub

Data Warehousing in the Age of Big Data

Name: Data Warehousing in the Age of Big Data
Author: Krish Krishnan

Krish Krishnan

Buch teilen

370 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Data Warehousing in the Age of Big Data

Krish Krishnan

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse.

As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory.

Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse.

Learn how to leverage Big Data by effectively integrating it into your data warehouse.
Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies
Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Data Warehousing in the Age of Big Data als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Data Warehousing in the Age of Big Data von Krish Krishnan im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Ciencia de la computación & Bases de datos. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Morgan Kaufmann

Jahr

2013

ISBN

9780124059207

Thema

Ciencia de la computación

Thema

Bases de datos

Part 1

Big Data

Chapter 1 Introduction to Big Data

Chapter 2 Working with Big Data

Chapter 3 Big Data Processing Architectures

Chapter 4 Introducing Big Data Technologies

Chapter 5 Big Data Driving Business Value

Chapter 1

Introduction to Big Data

Introduction

The biggest phenomenon that has captured the attention of the modern computing industry today since the “Internet” is “Big Data”. These two words combined together was first popularized in the paper on this subject by McKinsey & Co., and the foundation definition was first popularized by Doug Laney from Gartner.

The fundamental reason why “Big Data” is popular today is because the technology platforms that have emerged along with it, provide the capability to process data of multiple formats and structures without worrying about the constraints associated with traditional systems and database platforms.

Big Data

Data represents the lowest raw format of information or knowledge. In the computing world, we refer to data commonly in terms of rows and columns of organized values that represent one or more entities and their attributes. Long before the age of computing or information management with electronic processing aids, data was invented with the advent of counting and trade, preceding the Greeks. Simply put, it is the assignment of values to numerals and then using those numerals to mark the monetary value, population, calendars, taxes, and many historical instances to provide ample evidence to the fascination of the human mind with data and knowledge acquisition and management.

Information or data management according to a series of studies by Carnegie Mellon University entails the process of organizing, acquiring, storing, retrieving, and managing data. Data collected from different processes is used to make decisions feasible to the understanding and requirements of those executing and consuming the results of the process. This administrative behavior was the underlying theme for Herbert Simon’s view of bounded rationality¹, or the limited field of vision in human minds when applied to data management. The argument presented in the decision-making behaviors and administrative behaviors makes complete sense, as we limit the data in the process of modeling, applying algorithmic applications, and have always been seeking discrete relationships within the data as opposed to the whole picture.

In reality, however, decision making has always transcended beyond the traditional systems used to aid the process. For example, patient treatment and management is not confined to computers and programs. But the data generated by doctors, nurses, lab technicians, emergency personnel, and medical devices within a hospital for each patient can now, through the use of unstructured data integration techniques and algorithms, be collected and processed electronically to gain mathematical or statistical insights. These insights provide visible patterns that can be useful in improving quality of care for a given set of diseases.

Data warehousing evolved to support the decision-making process of being able to collect, store, and manage data, applying traditional and statistical methods of measurement to create a reporting and analysis platform. The data collected within a data warehouse was highly structured in nature, with minimal flexibility to change with the needs of data evolution. The underlying premise for this comes from the transactional databases that were the sources of data for a data warehouse. This concept applies very well when we talk of transactional models based on activity generated by consumers in retail, financial, or other industries. For example, movie ticket sales is a simple transaction, and the success of a movie is based on revenues it can generate in the opening and following weeks, and in a later stage followed by sales from audio (vinyl to cassette tapes, CDs’, and various digital formats), video (’DVDs and other digital formats), and merchandise across multiple channels. When reporting sales revenue, population demographics, sentiments, reviews, and feedback were not often reported or at least were not considered as a visible part of decision making in a traditional computing environment. The reasons for this included rigidity of traditional computing architectures and associated models to integrate unstructured, semi-structured, or other forms of data, while these artifacts were used in analysis and internal organizational reporting for revenue activities from a movie.

Looking at these examples in medicine and entertainment business management, we realize that decision support has always been an aid to the decision-making process and not the end state itself, as is often confused.

If one were to consider all the data, the associated processes, and the metrics used in any decision-making situation within any organization, we realize that we have used information (volumes of data) in a variety of formats and varying degrees of complexity and derived decisions with the data in nontraditional software processes. Before we get to Big Data, let us look at a few important events in computing history.

In the late 1980s, we were introduced to the concept of decision support and data warehousing. This wave of being able to create trends, perform historical analysis, and provide predictive analytics and highly scalable metrics created a series of solutions, companies, and an industry in itself.

In 1995, with the clearance to create a commercial Internet, we saw the advent of the “dot-com” world and got the first taste of being able to communicate peer to peer in a consumer world. With the advent of this capability, we also saw a significant increase in the volume and variety of data.

In the following five to seven years, we saw a number of advancements driven by web commerce or e-commerce, which rapidly changed the business landscape for an organization. New models emerged and became rapidly adopted standards, including the business-to-consumer direct buying/selling (website), consumer-to-consumer marketplace trading (eBay and Amazon), and business-to- business-to-consumer selling (Amazon). This entire flurry of activity drove up data volumes more than ever before. Along with the volume, we began to see the emergence of additional data, such as consumer review, feedback on experience, peer surveys, and the emergence of word-of-mouth marketing. This newer and additional data brings in subtle layers of complexity in data processing and integration.

Along the way between 1997 and 2002, we saw the definition and redefinition of mobility solutions. Cellular phones became ubiquitous and the use of voice and text to share sentiments, opinions, and trends among people became a vibrant trend. This increased the ability to communicate and create a crowd-based affinity to products and services, which has significantly driven the last decade of technology innovation, leading to even more disruptions in business landscape and data management in terms of data volumes, velocity, variety, complexity, and usage.

The years 2000 to 2010 have been a defining moment in the history of data, emergence of search engines (Google, Yahoo), personalization of music (iPod), tablet computing (iPad), bigger mobile solutions (smartphones, 3 G networks, mobile broadband, Wi-Fi), and emergence of social media (driven by Facebook, MySpace, Twitter, and Blogger). All these entities have contributed to the consumerization of data, from data creation, acquisition, and consumption perspectives.

The business models and opportunities that came with the large-scale growth of data drove the need to create powerful metrics to tap from the knowledge of the crowd that was driving them, and in return offer personalized services to address the need of the moment. This challenge was not limited to technology companies; large multinational organizations like P&G and Unilever wanted solutions that could address data processing, and additionally wanted to implement the output from large-scale data processing into their existing analytics platform.

Google, Yahoo, Facebook, and several other companies invested in technology solutions for data management, allowing us to consume large volumes of data in a short amount of time across many formats with varying degrees of complexity to create a powerful decision support platform. These technologies and their implementation are discussed in detail in later chapters in th...