Ivan Mistrik
Nour Ali
Rami Bahsoon
Maritta Heisel
Bruce R. Maxim
The relation of software architectures to functional and quality requirements is of particular importance in cloud computing. Requirements are the basis for deriving software architectures. Furthermore, the environment in which the software will operate is an important aspect to consider in developing high-quality software. That environment has to be taken into account explicitly. One and the same software may be appropriate (e.g., secure) in one environment, but inadequate (e.g., not sufficiently secure) in a different environment. While these considerations are important for every software development task, there are many challenges specific to cloud and big data. In this book, our goal is to collect chapters on systems and architectures for cloud and big data and, more specifically, how software architectures can manage challenges in advanced big data processing.
Introduction
Software architecture is the earliest design artifact, which realizes the requirements of a software system. It is the manifestation of the earliest design decisions, which comprise the architectural structure (i.e., components and interfaces), the architectural topology (i.e., the architectural style), the architectural infrastructure (e.g., the middleware), the relationship among them, and their relation to other software artifacts (e.g., detailed design and implementation) and the environment. The architecture of a system can also guide the evolution of qualities such as security, reliability, availability, scalability and real-time performance over time. The properties of a particular architecture, whether structural or behavioral, can have global impacts on the software system. Poor architectural realization can threaten the trustworthiness of a system and slow down its evolution. The architectural properties of a system also determine the extent to which it can meet its business and strategic objectives. Consistent with this view is the trend toward focusing software architecture documentation in meeting stakeholder needs and communicating how the software solution addresses their concerns and the business objectives.
Big data is about extracting valuable information from data in order to use it in decision making in business, science, and society. Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Such datasets are often from various sources (Variety), yet unstructured such as social media, sensors, scientific applications, surveillance, video and image archives, Internet texts and documents, Internet search indexing, medical records, business transactions and web logs; and are of large size (Volume) with fast data in/out (Velocity). More importantly, big data has to be of high value (Value) and establish trust in it for business decision making (Veracity). Various technologies are being discussed to support the handling of big data such as massively parallel processing databases, scalable storage systems, cloud computing platforms, and MapReduce. Innovative software architectures play a key role in advanced big data processing.
As the new-generation distributed computing platform, cloud computing environments offer high efficiency and low cost for data-intensive computation in big data applications. Cloud resources and services are available in pay-as-you-go mode, which brings extraordinary flexibility and cost-effectiveness as well as zero investment in the customer's own computing infrastructure. However, these advantages come at a price – people no longer have direct control over their own data. Based on this view, data security becomes a major concern in the adoption of cloud computing.
Why a New Book on Software Architecture for Big Data and the Cloud?
We believe that cloud architecture is an emerging and important topic right now. Coupled with the increased number of applications, migrating to mobile devices that make use of cloud storage and cloud software services will see a marked increase in usage, but we are not sure software designers have thought through the changes needed to their applications to use cloud capabilities wisely. Likewise, big data is on everyone's radar right now. Working with big data is tricky, once you get past the knowledge discovery tasks. The real challenge to including big data in data architecture is structuring the data to allow for efficient searching, sorting, and updating (especially if parallel hardware or parallel algorithms are involved).
The area of cloud and big data is rapidly developing, with conferences/workshops exploring opportunities that cloud and big data technology offers to software engineering, both in practice and in research. However, most of these are focused on the challenges imposed by building big data software systems. There is no single resource that brings together research on how software architectures can solve these challenges. The editors of this book have varied and complementary backgrounds in requirements and architecture, specifically in software architectures for cloud and big data. They also have expertise in software engineering for cloud and big data. This book aims to collect together work across different disciplines in software engineering for cloud and big data.
This new book makes a valuable contribution to this existing body of knowledge in terms of state-of-the-art techniques, methodologies, tools, best practices, and guidelines for software quality assurance and points out directions for future software engineering research and practice. This book discusses systematic and disciplined approaches to building software architectures for cloud and big data. We invited chapters on all aspects of software architecture for cloud and big data, including novel and high-quality research related approaches on innovative software development environments and tools for big data processing.
The book provides opportunities for further dissemination of state-of-the-art methods and techniques for representing and evaluating these systems. We asked authors to ensure that all of their chapters will consider the practical application of the topic through case studies, experiments, empirical validation, or systematic comparisons with other approaches already in practice. Topics of interest included, but were not limited to: innovative software architecture for big data processing; theory, frameworks, methodologies, and architecture for cloud and big data; big data technologies; big data visualization and software architectures; innovative software development environments and tools for big data processing; cloud software as a service; software security, privacy with big data; new programming models for big data processing; software simulation and debugging environments for big data processing; research challenges in software architecture for cloud and big data; architecture refactoring for cloud and big data; modeling the software architecture of big data-oriented software systems; and architectures for organizing big data in clouds.
Book Outline
In Chapter 1 we present an overview of software architecture for big data and the cloud. The cloud has revolutionized the way we look at software architectures. The emergence of the cloud and its “as-service” layers (e.g., software, platform, databases, infrastructure as services, etc.) has significantly induced the architecture of software systems. Cloud marketplaces, multitenancies, federation, elastic and on-demand access have enabled new modalities to the way we incept, compose, architect, deploy, maintain and evolve architectures of software systems. Properties related to dynamic access of resources; resource pooling; rapid elasticity and utility service provision; economies of scale; dynamicity and multitenancy are arguably the emergent “cloud-architecture significant properties.” These properties have influenced not only the behavior of the software systems benefiting from the cloud, but also its structure, style, and topology. It has also moved architecting practices towards architecting for uncertainty, where architecture design decisions are more complex and require us to anticipate the extent to which they can operate in dynamic environments and cope with operational uncertainties and continuous changes. More interestingly, the cloud business model has also moved architecting towards economics-driven architecting, where utilities, risk avoidance, utilization, technical debt monitoring, and optimizing for Service Level Agreements (SLA) are among the business objectives. In this context, architecting in/for the cloud has become an exercise that requires continuous alignments between enterprise and technical objectives. Several architecture styles and architecture-centric development processes that leverage the benefits of the cloud and big data have emerged. The fundamentals of these styles cannot be understood in isolation in what we term as “cloud-architecturally significant requirements.” The chapter will review these requirements and explain their implications on architecting for/in the cloud in the presence of big data. It will also roadmap opportunities for researchers and practitioners in software architecture for cloud and big data.
We have divided the rest of the book into five key parts, grouping chapters by their link to these key themes: concepts and models, evaluation of architecture models, big data technologies, resource management, and future directions. Part I papers examine concepts and models. Here the five chapters provide a broad outline of the area of software architectural concepts as applied to big data and the cloud.
Part I: Concepts and Models
Part I of this book consists of five chapters focusing on concepts and models which are useful for understanding big data and cloud computing architectures. Chapter 2, by Ian Groton, discusses issues related to hyperscalability and the changing face of software architecture. Hyperscale systems are pushing the limits of software engineering knowledge on multiple horizons. To address this explosion of data and processing requirements, we need to build systems that can be scaled rapidly with controllable costs and schedules. Hyperscalable systems can grow their capacity and processing capabilities exponentially to serve a potentially global user base, while scaling linearly the resources and costs needed to deliver and operate the system. Successful solutions are not confined to the software architecture and algorithms that comprise an application. Approaches to data architectures and deployment platforms are indelibly intertwined with the software design, and all these dimensions must be considered together in order to meet system scalability requirements. This chapter describes some of the basic principles that underpin system design at scale.
Chapter 3, by Mandy Chessell, Dan Wolfson, and Tim Vincent, discusses different types of systems involved in big data architecture, how the data flows between them, how these data flows intercept with the analytics lifecycle,1 providing self-service access to data, backed with information governance that creates trust and confidence both to share and consume data. As an industry we need to improve the time to value and success rate of big data projects. This is going to take: better architecture methods that support different big data arenas; tools that automatically manage the metadata and context data necessary to pass data between processing zones; and standard structures for this data to allow for interoperability between cloud services and on premises systems.
Domain-driven design of big data systems based on reference architectures is discussed by Cigdem Avci Salma, Bedir Tekinerdogan, and Ioannis N. Athanasiadis in Chapter 4. Big data has become a very important driver for innovation and growth for various application domains. These application domains impose different requirements on the big data system. Designing a big system as such needs to be carefully considered to realize a system's business goals. In this chapter, the authors have adopted a domain-driven design approach in which they provide a family feature model and reference architecture based on a domain analysis process. The family feature model covers the common and variant features of a broad set of applications, while the reference architecture provides a reusable architecture for deriving concrete application architectures. The authors illustrate their approach by considering Facebook and Twitter as case studies.
Robert Heinrich, Reiner Jung, Christian Zirkelbach, Wilhelm Hasselbring, and Ralf Reussner consider architectural run-time models for quality-aware DevOps in cloud applications in Chapter 5. Cloud-based software applications are designed to change often and rapidly during operations to provide constant quality of service. As a result the boundary between development and operations is becoming increasingly blurred. DevOps is a set of practices for the integrated consideration of developing and operating software. Software architecture is a central artifact in DevOps practices. Architectural information must be available during operations. Existing architectural models used in the development phase differ from those used in the operation phase in terms of abstraction, purpose and content. This chapter presents the iObserve approach to address these differences and allow for phase-spanning usage of architectural models.
In Chapter 6, Tao Chen and Rami Bahsoon present novel ideas for facilitating cloud autoscaling. They examine the similarities between a cloud ecosystem, represented by a collection of cloud-based services with a natural ecosystem. They investigate how the ecological view can be adopted to explain how cloud-based services evolve, and explore the key factors that drive stable and sustainable cloud-based services. To achieve this goal they discuss how to transpose ecological principles, theories and models into autoscaling cloud analogues that spontaneously improve long-term stability and sustainability of a cloud ecosystem.
Part II: Analyzing and Evaluating
The four chapters that make up Part II of this book focus on the analysis and evaluation of several big data and cloud architectural models. The production, processing, and consumption of big data require that all the agents involved in those operations be able to authenticate each other reliably. Authentication of servers and services on the Internet is a surprisingly hard problem. Much research has been done to enhance certificate management in order to create more secure and reliable cloud architectures. However, none of it has been widely adopted, yet. Chapter 7 written by Jiangshan Yu and Mark Ryan provides a survey with critical analysis of the existing proposals for managing public key certificates. Of the three solution categories reviewed, they argue that solutions based on transparent public logs have had the most success in the real world. They present an evaluation framework which should be helpful for future research on designing alternative certificate management systems to secure the Internet.
Performance monitoring in cloud-based big data systems is an important challenge that has not been fully solved, yet. In Chapter 8, Bedir Tekinerdogan, and Alp Oral discuss several potential solutions including caching and scalability. However, none of these approaches solves the problem of disruptive tenants that impede the performance of other tenants. Problems that are difficult to solve using the conventional caching and scalability approaches can be addressed using performance isolation. The authors discuss several performance isolation strategies and describe how the Tork application framework can be used to integrate performance isolation mechanisms found in existing cloud-based big data systems. In this chapter, they propose a framework and a systematic approach for performance isolation in cloud-based big data systems. They present an architectural design for a cloud-based big data system and discuss the integration of feasible performance isolation approaches. They evaluate their approach using PublicFeed, a social media application that is based on a cloud-based big data platform.
Anastasija Efremovska and Patricia Lago discuss the risks and benefits found in software cloud migration in Chapter 9. Multiple factors need to be considered when migrating to the cloud, which include financial, legal, security, organizational, technical risks and benefits, as well as ...