Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization that an organization will face, and how to capture value from it.

It will help readers understand what technology is required for a basic capability and what the expected benefits are from establishing a big data capability within their organization.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Obtaining Value from Big Data for Service Systems, Volume II by Stephen H. Kaisler, Frank Armour, J. Alberto Espinosa, William H. Money in PDF and/or ePUB format, as well as other popular books in Business & Operations. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Business Expert Press

Year

2019

Print ISBN

9781949991468

eBook ISBN

9781949991475

Edition

Topic

Business

Subtopic

Operations

Index

Business

CHAPTER 1

Big Data Infrastructure—A Technical Architecture Overview

Four elements compose a technical infrastructure: processing capability, storage capacity, data transport capability, and visualization capability. These are provided by a combination of hardware systems and analytical software techniques, which constitute the basic infrastructure of big data from our perspective. We will address each of these components in this chapter.

First, we view data processing as having two basic paradigms: batch and stream processing. Batch processing has high latency, whereas stream processing analyzes small amounts of data as they arrive. It has low latency, and depending on the arrival rate, volume can mount up very quickly. If you try to process a terabyte or more of data all at once, you will not be able to do it in less than a second with batch processing. On the other hand, smaller amounts of data can be processed very fast, even on the fly.

As big data scales upwardly to exabytes and much larger volumes, it is clear that single processors, and even small multiprocessors, cannot provide the computational power to process all the data. Large multiprocessor systems have evolved—as grid architectures and cloud-based systems—to handle the large volumes of data. Having powerful computers providing trillions of instructions per second is not enough. The computer system must therefore be balanced across processing capability, and both the second and third components—storage capacity and data transport bandwidth—to ensure that big data can be processed in a time interval consonant with the time to decide and utilize the extracted and derived information.

Additionally, a big data processing system must incorporate visualization techniques to provide the user with the ability to understand and navigate through the data and the resulting information derived from the data by analytics. These four elements, along with the systems and analytical software suites, constitute the basic infrastructure of a big data computing capability.

Data and Information Processing

Data processing infrastructure has evolved through several generations since the first mainframes were developed in the 1950s. The most recent manifestations have been threefold: (1) cluster computing, (2) cloud computing, and (3) processing stacks. Cluster computing and cloud computing are focused on scaling the computational infrastructure as an organization’s needs evolve. Processing stacks provide open source software (OSS) frameworks for developing applications to support a business data analytics capability. In addition, an organization must decide on a suite of programming systems, tools, and languages in order to develop custom applications compatible with the analytic suites that it may purchase or obtain through OSS.

Big data success will ultimately depend on a scalable and extensible architecture and foundation for data, information, and analytics. This foundation must support the acquisition, storage, computational processing, and visualization of big data and the delivery of results to the clients and decision makers.

Service-Oriented Architecture

Service-oriented architecture (SOA) is a paradigm for designing distributed, usually interactive, systems. An SOA is essentially a collection of services running on one or more hardware–software platforms. These services communicate with each other through established protocols, by which we say the services are interoperable. The communication can involve either simple data passing, or it could involve multiple services coordinating some activity. The services are connected to each other through software mechanisms supported by the software infrastructure.

SOAs evolved from transaction processing systems as a general software architecture. A service is a self-contained software unit that performs one or a few functions. Here, by service, we mean the software module that implements the service that was previously defined in “Defining Services” section. It is designed and implemented to ensure that the service can exchange information with any other service in the network without human interaction and without the need to make changes to the underlying program itself. Thus, services are usually autonomous, platform-independent, software modules that can be described, published, discovered, and loosely coupled within a single platform or across multiple platforms. Services adhere to well-defined protocols for constructing and parsing messages using description metadata.

Web Services

One implementation of SOA is known as web services because they are delivered through the web. The advantages of web services are interoperability, functional encapsulation and abstraction, loose coupling, reusability, and composability. Because communication between two web service modules is through HTML or eXtended Markup Language (XML), the communication is independent of any particular messaging system. The message definition is embedded in the message so that each receiver, knowing how to parse HTML/XML, can readily understand the message contents. This allows any service to interoperate with any other services without human intervention and thus provides a capability to compose multiple services to implement complex business operations.

Services can be reused by many other different services without having to implement a variation for each pair of business transaction interactions. Functional encapsulation and abstraction means that functions performed on the client and server sides are independent of each other. Through loose coupling, in which the client sends a message and sometime later, the server receives it, the client and server are allowed to operate independently of each other, and more importantly to reside separately on geographically and physically independent platforms. Web services are built on a number of components as described in Table 1.1.

Cluster Computing

Cluster computing is an outgrowth of the distributed processing architectures of the 1980s but achieved its major impetus from the high-performance computing community as it was applied to very large-scale scientific processing requiring trillions of computational cycles. Cluster machines can be connected through fast local area networks, but are sometimes geographically distributed. Each node runs its own instance of an operating system. The machines in a cluster may be homogeneous or heterogeneous.

As with parallel processors and cloud computing, effective and efficient use of cluster systems requires careful attention to software architecture and distribution of work across multiple machines. Middleware is software that sits atop the operating systems and allows users to “see” the cluster of machines as essentially a single, multinode machine. One common approach is to use Beowulf clusters built in commodity hardware and OSS modules.

Cloud Computing

Cloud computing is a maturing technology in which an IT user does not have to physically access, control (operate), or own any computing infrastructure other than, perhaps, workstations, routers, and switches, and, more recently, mobile client devices. Rather, the user “rents or leases” computational resources (time, bandwidth, storage, etc.) in part or whole from some external entity. The resources are accessed and managed through logical and electronic means. A cloud architecture can be physically visualized as the arrangement of large to massive numbers of computers in distributed data centers to deliver applications and services via a utility model. In a true physical sense, many servers may actually function on a high capacity blade in a single data center.

Rather than providing the user with a permanent server to connect to when application execution is required, cloud computing provides “virtualized servers” chosen from a pool of servers implemented as virtual machines at one of the available data centers. A user’s request for execution of a web application is directed to one of the available servers that have the required operating environment, tools, and application locally installed. Within a data center, almost any application can be run on any server. The user knows neither the physical server nor, in many cases, where it is physically located. In general, it is locationally irrelevant. However, the rise of data tariffs and strict regulations in the European Union and other countries and the ability of service providers to move data from data center to data center means that users must now be aware of the location of their data under certain conditions.

Confusion still exists about the nature of cloud computing. Gartner asserts that a key characteristic is that it is “massively scalable” (Desisto, Plummer, and Smith 2008). Originally, cloud computing was proposed as a solution to deliver large-scale computing resources to the scientific community for individual users who could not afford to make the huge investments in permanent infrastructure or specialized tools, or could not lease needed infrastructure and computing services. It evolved, rapidly, into a medium of storage and computation for Internet users that offers economies of scale in several areas. Within the past 10 years, a plethora of applications based on cloud computing have emerged including various e-mail services (HotMail, Gmail, etc.), personal photo storage (Flickr), social networking sites (Facebook, MySpace), or instant communication (Skype Chat, Twitter, Instagram).

While there are public cloud service providers (Amazon, IBM, Microsoft, Google, to name a few) that have received the majority of attention, large corporations are beginning to develop “private” clouds to host their own applications in order to protect their corporate data and proprietary applications while still capturing significant economies of scale in hardware, software, or support services.

Types of Cloud Computing

Clouds can be classified as public, private, or hybrid. A public cloud is a set of services provided by a vendor to any customer generally without restrictions. Public clouds rely on the service provider for security services, depending on the type of implementation. A private cloud is provided by an organization solely for the use of its employees, and sometimes for its suppliers. Private clouds are protected behind an organization’s firewalls and security mechanisms. A hybrid cloud is distributed across both public and private cloud services.

Many individuals are using cloud computing without realizing that social media sites such as Facebook, Pinterest, Tumblr, and Gmail all use cloud computing infrastructure to support performance and scalability. Many organizations use a hybrid approach where publicly available information is stored in the public cloud while proprietary and protected information is stored in a private cloud.

Implementations of Cloud Computing

The original perspective on cloud computing was defined by the National Institute for Science and Technology (NIST) as software-as-a-service (SaaS), platform-as-a-service (PaaS), or infrastructure-as-a-service (IaaS) (Mello and Grance 2011). As cloud computing concepts have evolved, additional perspectives have emerged. Linthicum (2009) identified those presented in Table 1.2.

Access to Data in a Cloud

There are three generally accepted...

Cover
Half Title Page
Title Page
Copyright
Contents
Purpose
Acknowledgments
List of Acronyms
Chapter 1 Big Data Infrastructure—A Technical Architecture Overview
Chapter 2 Issues and Challenges in Big Data and Analytics
Chapter 3 Conclusion
Appendix
References
Further Reading
Glossary
About the Contributors
Index

About this book

Frequently asked questions

Information

Table of contents