Agile Data Warehousing for the Enterprise
eBook - ePub

Agile Data Warehousing for the Enterprise

A Guide for Solution Architects and Project Leaders

Ralph Hughes

Compartir libro
  1. 562 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Agile Data Warehousing for the Enterprise

A Guide for Solution Architects and Project Leaders

Ralph Hughes

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines:

  • Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked.
  • Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs.
  • Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines.

Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way.

  • Learn how to quickly define scope and architecture before programming starts
  • Includes techniques of process and data engineering that enable iterative and incremental delivery
  • Demonstrates how to plan and execute quality assurance plans and includes a guide to continuous integration and automated regression testing
  • Presents program management strategies for coordinating multiple agile data mart projects so that over time an enterprise data warehouse emerges
  • Use the provided 120-day road map to establish a robust, agile data warehousing program

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Agile Data Warehousing for the Enterprise un PDF/ePUB en línea?
Sí, puedes acceder a Agile Data Warehousing for the Enterprise de Ralph Hughes en formato PDF o ePUB, así como a otros libros populares de Betriebswirtschaft y Business Intelligence. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2015
ISBN
9780123965189
Chapter 1

Solving Enterprise Data Warehousing’s “Fundamental Problem”

Data warehouses used to be too expensive and take far too long to build. Agile data warehousing techniques honed during the past 15 years have solved this problem. At their root, agile data warehousing methods incorporate practices such as Scrum or Kanban to accelerate programming, but this strategy alone is not enough because poor software engineering practices in the phases leading up to or following application coding can fatally undermine an enterprise data warehousing (EDW) project. Agile EDW teams must also utilize new, incremental approaches to requirements management, data modeling, and quality assurance. Generic agile techniques will not suffice in these areas because EDW applications have multi-layer data architectures and encounter cross-organizational challenges while defining the company’s data and metadata. EDW requirements must represent stakeholders at least three levels within the company; data modeling must draw upon hyper modeled designs; and quality assurance must address a large matrix of test types, architectural layers, and stakeholder groupings.

Keywords

Enterprise data warehousing; business intelligence and data analytics; requirements management; data modeling; quality assurance; agile methods; Scrum and XP; Kanban; lean software development; Rational Unified Process
Let me open this book with an extraordinary claim: After 30 years, we have finally solved the fundamental problem of enterprise data warehousing. This fundamental problem can be stated simply as “In theory, an enterprise data warehouse can be extremely valuable to the sponsoring organization, but in practice one cannot be implemented quickly enough or at a cost that company executives consider reasonable.” People like the idea of an enterprise data warehouse (EDW)—a shared repository of standardized and trustworthy information on company events and circumstances, integrated across the many business units within the corporation. What they do not like is that they must wait the better part of a year and invest millions of dollars, only to receive a disappointing small subset of the capabilities they expected. When pursued with a traditional software engineering approach, enterprise data warehouses simply take too long and cost too much to build. With the agile techniques presented in this book, I believe that we have solved that problem.
I have been working in data warehousing since the early 1980s, in roles ranging from extract, transform, and load (ETL) programmer to business intelligence (BI) developer, integration tester, lead designer, project manager, and, more recently, program architect. During the first 15 years of my career, the EDW projects I joined or led were managed using traditional project management techniques. Like many software efforts in that era, these data warehousing projects proved to be so protracted and stressful that they disappointed both the developers and the customers when many of the promised features had to be dropped to meet time and budget constraints. Though my teammates suggested that all large projects naturally experience such challenges, I wondered why we as an industry were not improving our performance as the years went by. Project managers were certainly introducing far more monitoring and control into the methods we employed, but if anything, the project outcomes were getting worse.
I started to see that the EDW development profession had fallen into a negative feedback loop, and that this downward spiral was actually the cause of data warehousing’s fundamental problem. As shown in Figure 1.1, this feedback loop begins with the perception that EDW applications are large, complex, and therefore risky to build. We fear failure, so we adopt a plethora of extremely risk-adverse engineering and project management practices that make our developers’ task lists considerably longer. The tasks themselves become more difficult to complete due to all the audits and reporting steps that project management requires in order to know that the process is on track. Unfortunately, these longer task lists make the EDW development project even more complex and all that more likely to fail. The higher price tag of the task list and the increasing failure rates heighten the EDW’s perceived risk, driving everyone involved into another lap around the fear circle. After a few cycles of this negative feedback, the development process has become so riddled with controls and audits that one wonders how the programmers will be able to get any significant work completed at all.
image

Figure 1.1 The negative feedback loop present in most traditionally managed projects.

The Agile Solution in a Nutshell

The agile software development movement that started in the early 2000s solved a very similar problem, though it was geared toward the programming of transaction capture systems—that is, non-data warehousing applications. The highlights of the generic agile software development strategy consist of the following:
Progressive decomposition of requirements to generate a simple list of the programming task
Co-located, self-organized teams of developers
Iterative programming techniques that deliver small slices of the application every couple of weeks
Frequent review of those small slices by one or more members of the end-user community
Many data warehousing teams attempted to utilize this incremental delivery approach, but for a long time they struggled to perform as well as agile developers building transaction-capture systems. This early difficulty was largely due to the fact that data warehouses differ from transaction-capture applications in two crucial ways. First, they have data architectures with two to four times as many layers as transaction systems, often with each layer requiring its own data modeling strategy, a different flavor of data transforms, and even a unique development tool set. It turns out that constructing a data warehouse is like building three to eight separate transaction systems at once.
Second, an EDW’s data repositories amass billions if not trillions of records. The initial data load required to put the data warehouse into production usage often runs for many days or weeks. When the warehouse’s design must change, the development team can be forced to scrap large portions of the data already captured and repeat the long initial data load. Moreover, if the source for that data is no longer available, the team must then invest hundreds of hours writing, running, and validating conversion scripts that retrofit millions of data records to comply with the warehouse’s new design. Evolving an existing data warehouse is like dragging a ball and chain through a swamp.
This double challenge of building and evolving a data warehouse lies at the heart of the fear-drive failure cycle in our profession. Because a single oversight in requirements and design could invalidate months of programming or require weeks of frantic data conversion, data warehousing professionals believed they can not employ agile’s iterative and incremental approach. All requirements have to be identified before design work can begin, and the design must be complete and bulletproof before programming can start. Without an incremental delivery strategy, the EDW profession remained mired in the negative feedback loops that agile teams building transaction-capture systems escaped long ago.
The solution to the data warehousing predicament emerged only in the past few years with the advent of incremental data modeling techniques. This new approach to designing a warehouse’s data schemas allowed large data repositories to be adapted for new designs after they are initially loaded—without requiring expensive reloads or conversion scripting. These new data modeling techniques worked from the inside out, to make the entirety of agile data warehousing suddenly feasible. Once a team could economically evolve a data warehouse, it was then free to design incrementally, and consequently its analysts could detail requirements a chunk at a time. The big, complete, and perfect specification up-front was no longer necessary. Although a good overall vision for the project is still necessary, by and large data warehousing teams can program and deliver an enterprise business intelligence application one piece at a time. They can readily steer their programming efforts to address many more of their customers’ short-term goals, making EDWs far more responsive to business needs—making them, in fact, agile. Considerable thought and innovation are still required to adapt all of the software engineering processes besides programming to the peculiarities of data warehousing. However, that remaining work proves to be fairly straightforward now that the “data engineering” component has been solved.

Five Legs to Stand Upon

In the past 15 years, I have worked with agile teams that have steadily adapted iterative, incremental development techniques to meet the demands of large, data-driven applications such as enterprise data warehouses. These adapted agile practices have certainly accelerated EDW delivery speeds, frequently by a factor of two or three. More importantly, these new agile techniques for EDW have kept the business sponsors and project stakeholders solidly “in the loop,” providing frequent reviews of crucial design decisions as each new component is coded. Such frequent business reviews regularly catch misconceptions regarding requirements and design, keeping the development effort intently focused on the features essential for project success and eliminating ill-conceived programming objectives that would have only wasted time and resources. By largely eliminating the risk within large EDW projects, the techniques remove the fear that used to drive us to the specification- and process-heavy project management styles that formerly doomed our applications to failure.
Unfortunately, thousands of data warehousing programs throughout the world still suffer from the waste and frustration forced on them by the fear-driven death spirals. The mission of this book is to illustrate the alternative strategies and techniques that agile enterprise data warehousing teams utilize for building large, data-driven applications. I hope that with the agile EDW approach well documented sponsors, stakeholders, and development team leaders can successfully advocate that their companies switch to an incremental, risk-mitigating approach for their next data warehousing project.
The full practice of agile enterprise data warehousing is a large assembly of principles and techniques. The practitioners of agile enterprise data warehousing derived this collection over many years by borrowing pieces from four different agile methods: Scrum, XP, Kanban, and RUP. We also incorporated a few old-school disciplines from management information science, such as requirements management and quality assurance. By merging these multiple influences and sharing our experiences with each other, our community of DW/BI professionals has arrived at what I consider a baseline approach to enterprise agile data warehousing.
This baseline approach consists of five major elements, as illustrated by the mind map in Figure 1.2. These adapted software engineering discipline represent the five “legs” that the full agile EDW method stands upon:
1. Iterative, incremental application coding (AC) techniques that provide not only faster delivery speeds but also significant risk mitigation
2. Streamlined requirements management (RM) that makes the work of defining a project quick and focused
3. Adaptive data engineering (DE) skills that allow a warehouse’s data repository to be built incrementally, then economically revised as requirements change, even after it has been loaded with data
4. Balanced quality assurance (QA) efforts that instill test-led development at all levels of project work
5. Several productivity tools organized into a repeatable “value cycle” (VC) for creating incremental subreleases that amplifies the ability of the other four elements to accelerate deliveries and mitigate risk
image

Figure 1.2 The five major components to agile enterprise data warehousing.
This book steps the reader through each of these components and thus serves as a field manual for DW/BI development teams, both those that are just getting started and those that are seeking ways to bring new life to a struggling project. Putting these five legs to work gives even the largest enterprise data warehousing programs incredible traction against the challenges they must conquer—challenges such as uninvolved business partners, incomplete and inconsistent project definitions, rigid data models, and poorly coded application modules. Incorporating these five adapted disciplines allows agile EDW teams to steadily chip away at the unknowns in both business and technical requirements, translate them into lists of actionable development tasks, and steadily deliver a growing collection of user-validated features and performance capabilities.
These agile practices convert the entire EDW development experience into a far more understandable and predictable process for everyone involved, including project sponsors and business stakeholders. The net result is a spiral that operates in the reverse manner of the cycle diagrammed previously. As depicted with Figure 1.3, agile EDW project experiences a positive feedback loop. The desired application is still large and complex, but instead of specifying every last detail of the application before coding begins, the team decomposes the work into small increments that can be easily accomplished sequentially. As the team develops the modules for each increment, the business can validate both the new features they offer and how they integrate into an overall system. The enormity of the project transforms into a list of components that both the business and IT readily understand and that can be delivered one after the other without incurring serious risk. With such clarity and low risk, sponsors and project managers can lighten up on the audits and process controls, allowing the programmers to work far more quickly and judging the project’s progress by the working modules created.
image

Figure 1.3 Agile EDW practices switch projects to a positive feedback loop.

The Agile EDW Alternative is Ready to Deploy

This book is designed with two audiences in mind: EDW sponsors and EDW project leaders. By “EDW sponsors,” I mean the executives and the representatives of a company department that is funding the development of a data integration application, perhaps with a BI or data analytics front end. These folks on the business side of a project need to realize that an agile alternative to traditional, failure-prone development methods exists. Understanding the nature and advantages of the agile alternative will empower these sponsors to insist that the development teams and project managers who work for them employ an incremental delivery approach.
When referring to “EDW team leaders,” I am thinking of the members of the development group other than the programmers who build the data integration and BI components. This group includes roles that go by many names, including solutions architects, project architects, business analysts, data architects,...

Índice