Software Architecture for Big Data and the Cloud is designed to be a single resource that brings together research on how software architectures can solve the challenges imposed by building big data software systems. The challenges of big data on the software architecture can relate to scale, security, integrity, performance, concurrency, parallelism, and dependability, amongst others. Big data handling requires rethinking architectural solutions to meet functional and non-functional requirements related to volume, variety and velocity.The book's editors have varied and complementary backgrounds in requirements and architecture, specifically in software architectures for cloud and big data, as well as expertise in software engineering for cloud and big data. This book brings together work across different disciplines in software engineering, including work expanded from conference tracks and workshops led by the editors.- Discusses systematic and disciplined approaches to building software architectures for cloud and big data with state-of-the-art methods and techniques- Presents case studies involving enterprise, business, and government service deployment of big data applications- Shares guidance on theory, frameworks, methodologies, and architecture for cloud and big data

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Index

Preface

Ivan Mistrik

Nour Ali

Rami Bahsoon

Maritta Heisel

Bruce R. Maxim

The relation of software architectures to functional and quality requirements is of particular importance in cloud computing. Requirements are the basis for deriving software architectures. Furthermore, the environment in which the software will operate is an important aspect to consider in developing high-quality software. That environment has to be taken into account explicitly. One and the same software may be appropriate (e.g., secure) in one environment, but inadequate (e.g., not sufficiently secure) in a different environment. While these considerations are important for every software development task, there are many challenges specific to cloud and big data. In this book, our goal is to collect chapters on systems and architectures for cloud and big data and, more specifically, how software architectures can manage challenges in advanced big data processing.

Introduction

Software architecture is the earliest design artifact, which realizes the requirements of a software system. It is the manifestation of the earliest design decisions, which comprise the architectural structure (i.e., components and interfaces), the architectural topology (i.e., the architectural style), the architectural infrastructure (e.g., the middleware), the relationship among them, and their relation to other software artifacts (e.g., detailed design and implementation) and the environment. The architecture of a system can also guide the evolution of qualities such as security, reliability, availability, scalability and real-time performance over time. The properties of a particular architecture, whether structural or behavioral, can have global impacts on the software system. Poor architectural realization can threaten the trustworthiness of a system and slow down its evolution. The architectural properties of a system also determine the extent to which it can meet its business and strategic objectives. Consistent with this view is the trend toward focusing software architecture documentation in meeting stakeholder needs and communicating how the software solution addresses their concerns and the business objectives.

Big data is about extracting valuable information from data in order to use it in decision making in business, science, and society. Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Such datasets are often from various sources (Variety), yet unstructured such as social media, sensors, scientific applications, surveillance, video and image archives, Internet texts and documents, Internet search indexing, medical records, business transactions and web logs; and are of large size (Volume) with fast data in/out (Velocity). More importantly, big data has to be of high value (Value) and establish trust in it for business decision making (Veracity). Various technologies are being discussed to support the handling of big data such as massively parallel processing databases, scalable storage systems, cloud computing platforms, and MapReduce. Innovative software architectures play a key role in advanced big data processing.

As the new-generation distributed computing platform, cloud computing environments offer high efficiency and low cost for data-intensive computation in big data applications. Cloud resources and services are available in pay-as-you-go mode, which brings extraordinary flexibility and cost-effectiveness as well as zero investment in the customer's own computing infrastructure. However, these advantages come at a price – people no longer have direct control over their own data. Based on this view, data security becomes a major concern in the adoption of cloud computing.

Why a New Book on Software Architecture for Big Data and the Cloud?

We believe that cloud architecture is an emerging and important topic right now. Coupled with the increased number of applications, migrating to mobile devices that make use of cloud storage and cloud software services will see a marked increase in usage, but we are not sure software designers have thought through the changes needed to their applications to use cloud capabilities wisely. Likewise, big data is on everyone's radar right now. Working with big data is tricky, once you get past the knowledge discovery tasks. The real challenge to including big data in data architecture is structuring the data to allow for efficient searching, sorting, and updating (especially if parallel hardware or parallel algorithms are involved).

The area of cloud and big data is rapidly developing, with conferences/workshops exploring opportunities that cloud and big data technology offers to software engineering, both in practice and in research. However, most of these are focused on the challenges imposed by building big data software systems. There is no single resource that brings together research on how software architectures can solve these challenges. The editors of this book have varied and complementary backgrounds in requirements and architecture, specifically in software architectures for cloud and big data. They also have expertise in software engineering for cloud and big data. This book aims to collect together work across different disciplines in software engineering for cloud and big data.

This new book makes a valuable contribution to this existing body of knowledge in terms of state-of-the-art techniques, methodologies, tools, best practices, and guidelines for software quality assurance and points out directions for future software engineering research and practice. This book discusses systematic and disciplined approaches to building software architectures for cloud and big data. We invited chapters on all aspects of software architecture for cloud and big data, including novel and high-quality research related approaches on innovative software development environments and tools for big data processing.

The book provides opportunities for further dissemination of state-of-the-art methods and techniques for representing and evaluating these systems. We asked authors to ensure that all of their chapters will consider the practical application of the topic through case studies, experiments, empirical validation, or systematic comparisons with other approaches already in practice. Topics of interest included, but were not limited to: innovative software architecture for big data processing; theory, frameworks, methodologies, and architecture for cloud and big data; big data technologies; big data visualization and software architectures; innovative software development environments and tools for big data processing; cloud software as a service; software security, privacy with big data; new programming models for big data processing; software simulation and debugging environments for big data processing; research challenges in software architecture for cloud and big data; architecture refactoring for cloud and big data; modeling the software architecture of big data-oriented software systems; and architectures for organizing big data in clouds.

Book Outline

In Chapter 1 we present an overview of software architecture for big data and the cloud. The cloud has revolutionized the way we look at software architectures. The emergence of the cloud and its “as-service” layers (e.g., software, platform, databases, infrastructure as services, etc.) has significantly induced the architecture of software systems. Cloud marketplaces, multitenancies, federation, elastic and on-demand access have enabled new modalities to the way we incept, compose, architect, deploy, maintain and evolve architectures of software systems. Properties related to dynamic access of resources; resource pooling; rapid elasticity and utility service provision; economies of scale; dynamicity and multitenancy are arguably the emergent “cloud-architecture significant properties.” These properties have influenced not only the behavior of the software systems benefiting from the cloud, but also its structure, style, and topology. It has also moved architecting practices towards architecting for uncertainty, where architecture design decisions are more complex and require us to anticipate the extent to which they can operate in dynamic environments and cope with operational uncertainties and continuous changes. More interestingly, the cloud business model has also moved architecting towards economics-driven architecting, where utilities, risk avoidance, utilization, technical debt monitoring, and optimizing for Service Level Agreements (SLA) are among the business objectives. In this context, architecting in/for the cloud has become an exercise that requires continuous alignments between enterprise and technical objectives. Several architecture styles and architecture-centric development processes that leverage the benefits of the cloud and big data have emerged. The fundamentals of these styles cannot be understood in isolation in what we term as “cloud-architecturally significant requirements.” The chapter will review these requirements and explain their implications on architecting for/in the cloud in the presence of big data. It will also roadmap opportunities for researchers and practitioners in software architecture for cloud and big data.

We have divided the rest of the book into five key parts, grouping chapters by their link to these key themes: concepts and models, evaluation of architecture models, big data technologies, resource management, and future directions. Part I papers examine concepts and models. Here the five chapters provide a broad outline of the area of software architectural concepts as applied to big data and the cloud.

Part I: Concepts and Models

Part I of this book consists of five chapters focusing on concepts and models which are useful for understanding big data and cloud computing architectures. Chapter 2, by Ian Groton, discusses issues related to hyperscalability and the changing face of software architecture. Hyperscale systems are pushing the limits of software engineering knowledge on multiple horizons. To address this explosion of data and processing requirements, we need to build systems that can be scaled rapidly with controllable costs and schedules. Hyperscalable systems can grow their capacity and processing capabilities exponentially to serve a potentially global user base, while scaling linearly the resources and costs needed to deliver and operate the system. Successful solutions are not confined to the software architecture and algorithms that comprise an application. Approaches to data architectures and deployment platforms are indelibly intertwined with the software design, and all these dimensions must be considered together in order to meet system scalability requirements. This chapter describes some of the basic principles that underpin system design at scale.

Chapter 3, by Mandy Chessell, Dan Wolfson, and Tim Vincent, discusses different types of systems involved in big data architecture, how the data flows between them, how these data flows intercept with the analytics lifecycle,¹ providing self-service access to data, backed with information governance that creates trust and confidence both to share and consume data. As an industry we need to improve the time to value and success rate of big data projects. This is going to take: better architecture methods that support different big data arenas; tools that automatically manage the metadata and context data necessary to pass data between processing zones; and standard structures for this data to allow for interoperability between cloud services and on premises systems.

Domain-driven design of big data systems based on reference architectures is discussed by Cigdem Avci Salma, Bedir Tekinerdogan, and Ioannis N. Athanasiadis in Chapter 4. Big data has become a very important driver for innovation and growth for various application domains. These application domains impose different requirements on the big data system. Designing a big system as such needs to be carefully considered to realize a system's business goals. In this chapter, the authors have adopted a domain-driven design approach in which they provide a family feature model and reference architecture based on a domain analysis process. The family feature model covers the common and variant features of a broad set of applications, while the reference architecture provides a reusable architecture for deriving concrete application architectures. The authors illustrate their approach by considering Facebook and Twitter as case studies.

Robert Heinrich, Reiner Jung, Christian Zirkelbach, Wilhelm Hasselbring, and Ralf Reussner consider architectural run-time models for quality-aware DevOps in cloud applications in Chapter 5. Cloud-based software applications are designed to change often and rapidly during operations to provide constant quality of service. As a result the boundary between development and operations is becoming increasingly blurred. DevOps is a set of practices for the integrated consideration of developing and operating software. Software architecture is a central artifact in DevOps practices. Architectural information must be available during operations. Existing architectural models used in the development phase differ from those used in the operation phase in terms of abstraction, purpose and content. This chapter presents the iObserve approach to address these differences and allow for phase-spanning usage of architectural models.

In Chapter 6, Tao Chen and Rami Bahsoon present novel ideas for facilitating cloud autoscaling. They examine the similarities between a cloud ecosystem, represented by a collection of cloud-based services with a natural ecosystem. They investigate how the ecological view can be adopted to explain how cloud-based services evolve, and explore the key factors that drive stable and sustainable cloud-based services. To achieve this goal they discuss how to transpose ecological principles, theories and models into autoscaling cloud analogues that spontaneously improve long-term stability and sustainability of a cloud ecosystem.

Part II: Analyzing and Evaluating

The four chapters that make up Part II of this book focus on the analysis and evaluation of several big data and cloud architectural models. The production, processing, and consumption of big data require that all the agents involved in those operations be able to authenticate each other reliably. Authentication of servers and services on the Internet is a surprisingly hard problem. Much research has been done to enhance certificate management in order to create more secure and reliable cloud architectures. However, none of it has been widely adopted, yet. Chapter 7 written by Jiangshan Yu and Mark Ryan provides a survey with critical analysis of the existing proposals for managing public key certificates. Of the three solution categories reviewed, they argue that solutions based on transparent public logs have had the most success in the real world. They present an evaluation framework which should be helpful for future research on designing alternative certificate management systems to secure the Internet.

Performance monitoring in cloud-based big data systems is an important challenge that has not been fully solved, yet. In Chapter 8, Bedir Tekinerdogan, and Alp Oral discuss several potential solutions including caching and scalability. However, none of these approaches solves the problem of disruptive tenants that impede the performance of other tenants. Problems that are difficult to solve using the conventional caching and scalability approaches can be addressed using performance isolation. The authors discuss several performance isolation strategies and describe how the Tork application framework can be used to integrate performance isolation mechanisms found in existing cloud-based big data systems. In this chapter, they propose a framework and a systematic approach for performance isolation in cloud-based big data systems. They present an architectural design for a cloud-based big data system and discuss the integration of feasible performance isolation approaches. They evaluate their approach using PublicFeed, a social media application that is based on a cloud-based big data platform.

Anastasija Efremovska and Patricia Lago discuss the risks and benefits found in software cloud migration in Chapter 9. Multiple factors need to be considered when migrating to the cloud, which include financial, legal, security, organizational, technical risks and benefits, as well as ...

Cover image
Title page
Table of Contents
Copyright
Contributors
About the Editors
Foreword by Mandy Chessell
Foreword by Ian Gorton
Preface
Chapter 1: Introduction. Software Architecture for Cloud and Big Data: An Open Quest for the Architecturally Significant Requirements
Part 1: Concepts and Models
Chapter 2: Hyperscalability – The Changing Face of Software Architecture
Chapter 3: Architecting to Deliver Value From a Big Data and Hybrid Cloud Architecture
Chapter 4: Domain-Driven Design of Big Data Systems Based on a Reference Architecture
Chapter 5: An Architectural Model-Based Approach to Quality-Aware DevOps in Cloud Applicationsc
Chapter 6: Bridging Ecology and Cloud: Transposing Ecological Perspective to Enable Better Cloud Autoscaling
Part 2: Analyzing and Evaluating
Chapter 7: Evaluating Web PKIs
Chapter 8: Performance Isolation in Cloud-Based Big Data Architectures
Chapter 9: From Legacy to Cloud: Risks and Benefits in Software Cloud Migration
Chapter 10: Big Data: A Practitioners Perspective
Part 3: Technologies
Chapter 11: A Taxonomy and Survey of Stream Processing Systems
Chapter 12: Architecting Cloud Services for the Digital Me in a Privacy-Aware Environment
Chapter 13: Reengineering Data-Centric Information Systems for the Cloud – A Method and Architectural Patterns Promoting Multitenancy
Chapter 14: Exploring the Evolution of Big Data Technologies
Chapter 15: A Taxonomy and Survey of Fault-Tolerant Workflow Management Systems in Cloud and Distributed Computing Environments
Part 4: Resource Management
Chapter 16: The HARNESS Platform: A Hardware- and Network-Enhanced Software System for Cloud Computing
Chapter 17: Auditable Version Control Systems in Untrusted Public Clouds
Chapter 18: Scientific Workflow Management System for Clouds
Part 5: Looking Ahead
Chapter 19: Outlook and Future Directions
Glossary
Author Index
Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Software Architecture for Big Data and the Cloud by Ivan Mistrik,Rami Bahsoon,Nour Ali,Maritta Heisel,Bruce Maxim in PDF and/or ePUB format, as well as other popular books in Computer Science & Software Development. We have over one million books available in our catalogue for you to explore.

About this book