Agile Data Warehousing for the Enterprise
eBook - ePub

Agile Data Warehousing for the Enterprise

A Guide for Solution Architects and Project Leaders

Ralph Hughes

Condividi libro
  1. 562 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Agile Data Warehousing for the Enterprise

A Guide for Solution Architects and Project Leaders

Ralph Hughes

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines:

  • Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked.
  • Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs.
  • Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines.

Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way.

  • Learn how to quickly define scope and architecture before programming starts
  • Includes techniques of process and data engineering that enable iterative and incremental delivery
  • Demonstrates how to plan and execute quality assurance plans and includes a guide to continuous integration and automated regression testing
  • Presents program management strategies for coordinating multiple agile data mart projects so that over time an enterprise data warehouse emerges
  • Use the provided 120-day road map to establish a robust, agile data warehousing program

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Agile Data Warehousing for the Enterprise è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Agile Data Warehousing for the Enterprise di Ralph Hughes in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Betriebswirtschaft e Business Intelligence. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2015
ISBN
9780123965189
Chapter 1

Solving Enterprise Data Warehousing’s “Fundamental Problem”

Data warehouses used to be too expensive and take far too long to build. Agile data warehousing techniques honed during the past 15 years have solved this problem. At their root, agile data warehousing methods incorporate practices such as Scrum or Kanban to accelerate programming, but this strategy alone is not enough because poor software engineering practices in the phases leading up to or following application coding can fatally undermine an enterprise data warehousing (EDW) project. Agile EDW teams must also utilize new, incremental approaches to requirements management, data modeling, and quality assurance. Generic agile techniques will not suffice in these areas because EDW applications have multi-layer data architectures and encounter cross-organizational challenges while defining the company’s data and metadata. EDW requirements must represent stakeholders at least three levels within the company; data modeling must draw upon hyper modeled designs; and quality assurance must address a large matrix of test types, architectural layers, and stakeholder groupings.

Keywords

Enterprise data warehousing; business intelligence and data analytics; requirements management; data modeling; quality assurance; agile methods; Scrum and XP; Kanban; lean software development; Rational Unified Process
Let me open this book with an extraordinary claim: After 30 years, we have finally solved the fundamental problem of enterprise data warehousing. This fundamental problem can be stated simply as “In theory, an enterprise data warehouse can be extremely valuable to the sponsoring organization, but in practice one cannot be implemented quickly enough or at a cost that company executives consider reasonable.” People like the idea of an enterprise data warehouse (EDW)—a shared repository of standardized and trustworthy information on company events and circumstances, integrated across the many business units within the corporation. What they do not like is that they must wait the better part of a year and invest millions of dollars, only to receive a disappointing small subset of the capabilities they expected. When pursued with a traditional software engineering approach, enterprise data warehouses simply take too long and cost too much to build. With the agile techniques presented in this book, I believe that we have solved that problem.
I have been working in data warehousing since the early 1980s, in roles ranging from extract, transform, and load (ETL) programmer to business intelligence (BI) developer, integration tester, lead designer, project manager, and, more recently, program architect. During the first 15 years of my career, the EDW projects I joined or led were managed using traditional project management techniques. Like many software efforts in that era, these data warehousing projects proved to be so protracted and stressful that they disappointed both the developers and the customers when many of the promised features had to be dropped to meet time and budget constraints. Though my teammates suggested that all large projects naturally experience such challenges, I wondered why we as an industry were not improving our performance as the years went by. Project managers were certainly introducing far more monitoring and control into the methods we employed, but if anything, the project outcomes were getting worse.
I started to see that the EDW development profession had fallen into a negative feedback loop, and that this downward spiral was actually the cause of data warehousing’s fundamental problem. As shown in Figure 1.1, this feedback loop begins with the perception that EDW applications are large, complex, and therefore risky to build. We fear failure, so we adopt a plethora of extremely risk-adverse engineering and project management practices that make our developers’ task lists considerably longer. The tasks themselves become more difficult to complete due to all the audits and reporting steps that project management requires in order to know that the process is on track. Unfortunately, these longer task lists make the EDW development project even more complex and all that more likely to fail. The higher price tag of the task list and the increasing failure rates heighten the EDW’s perceived risk, driving everyone involved into another lap around the fear circle. After a few cycles of this negative feedback, the development process has become so riddled with controls and audits that one wonders how the programmers will be able to get any significant work completed at all.
image

Figure 1.1 The negative feedback loop present in most traditionally managed projects.

The Agile Solution in a Nutshell

The agile software development movement that started in the early 2000s solved a very similar problem, though it was geared toward the programming of transaction capture systems—that is, non-data warehousing applications. The highlights of the generic agile software development strategy consist of the following:
Progressive decomposition of requirements to generate a simple list of the programming task
Co-located, self-organized teams of developers
Iterative programming techniques that deliver small slices of the application every couple of weeks
Frequent review of those small slices by one or more members of the end-user community
Many data warehousing teams attempted to utilize this incremental delivery approach, but for a long time they struggled to perform as well as agile developers building transaction-capture systems. This early difficulty was largely due to the fact that data warehouses differ from transaction-capture applications in two crucial ways. First, they have data architectures with two to four times as many layers as transaction systems, often with each layer requiring its own data modeling strategy, a different flavor of data transforms, and even a unique development tool set. It turns out that constructing a data warehouse is like building three to eight separate transaction systems at once.
Second, an EDW’s data repositories amass billions if not trillions of records. The initial data load required to put the data warehouse into production usage often runs for many days or weeks. When the warehouse’s design must change, the development team can be forced to scrap large portions of the data already captured and repeat the long initial data load. Moreover, if the source for that data is no longer available, the team must then invest hundreds of hours writing, running, and validating conversion scripts that retrofit millions of data records to comply with the warehouse’s new design. Evolving an existing data warehouse is like dragging a ball and chain through a swamp.
This double challenge of building and evolving a data warehouse lies at the heart of the fear-drive failure cycle in our profession. Because a single oversight in requirements and design could invalidate months of programming or require weeks of frantic data conversion, data warehousing professionals believed they can not employ agile’s iterative and incremental approach. All requirements have to be identified before design work can begin, and the design must be complete and bulletproof before programming can start. Without an incremental delivery strategy, the EDW profession remained mired in the negative feedback loops that agile teams building transaction-capture systems escaped long ago.
The solution to the data warehousing predicament emerged only in the past few years with the advent of incremental data modeling techniques. This new approach to designing a warehouse’s data schemas allowed large data repositories to be adapted for new designs after they are initially loaded—without requiring expensive reloads or conversion scripting. These new data modeling techniques worked from the inside out, to make the entirety of agile data warehousing suddenly feasible. Once a team could economically evolve a data warehouse, it was then free to design incrementally, and consequently its analysts could detail requirements a chunk at a time. The big, complete, and perfect specification up-front was no longer necessary. Although a good overall vision for the project is still necessary, by and large data warehousing teams can program and deliver an enterprise business intelligence application one piece at a time. They can readily steer their programming efforts to address many more of their customers’ short-term goals, making EDWs far more responsive to business needs—making them, in fact, agile. Considerable thought and innovation are still required to adapt all of the software engineering processes besides programming to the peculiarities of data warehousing. However, that remaining work proves to be fairly straightforward now that the “data engineering” component has been solved.

Five Legs to Stand Upon

In the past 15 years, I have worked with agile teams that have steadily adapted iterative, incremental development techniques to meet the demands of large, data-driven applications such as enterprise data warehouses. These adapted agile practices have certainly accelerated EDW delivery speeds, frequently by a factor of two or three. More importantly, these new agile techniques for EDW have kept the business sponsors and project stakeholders solidly “in the loop,” providing frequent reviews of crucial design decisions as each new component is coded. Such frequent business reviews regularly catch misconceptions regarding requirements and design, keeping the development effort intently focused on the features essential for project success and eliminating ill-conceived programming objectives that would have only wasted time and resources. By largely eliminating the risk within large EDW projects, the techniques remove the fear that used to drive us to the specification- and process-heavy project management styles that formerly doomed our applications to failure.
Unfortunately, thousands of data warehousing programs throughout the world still suffer from the waste and frustration forced on them by the fear-driven death spirals. The mission of this book is to illustrate the alternative strategies and techniques that agile enterprise data warehousing teams utilize for building large, data-driven applications. I hope that with the agile EDW approach well documented sponsors, stakeholders, and development team leaders can successfully advocate that their companies switch to an incremental, risk-mitigating approach for their next data warehousing project.
The full practice of agile enterprise data warehousing is a large assembly of principles and techniques. The practitioners of agile enterprise data warehousing derived this collection over many years by borrowing pieces from four different agile methods: Scrum, XP, Kanban, and RUP. We also incorporated a few old-school disciplines from management information science, such as requirements management and quality assurance. By merging these multiple influences and sharing our experiences with each other, our community of DW/BI professionals has arrived at what I consider a baseline approach to enterprise agile data warehousing.
This baseline approach consists of five major elements, as illustrated by the mind map in Figure 1.2. These adapted software engineering discipline represent the five “legs” that the full agile EDW method stands upon:
1. Iterative, incremental application coding (AC) techniques that provide not only faster delivery speeds but also significant risk mitigation
2. Streamlined requirements management (RM) that makes the work of defining a project quick and focused
3. Adaptive data engineering (DE) skills that allow a warehouse’s data repository to be built incrementally, then economically revised as requirements change, even after it has been loaded with data
4. Balanced quality assurance (QA) efforts that instill test-led development at all levels of project work
5. Several productivity tools organized into a repeatable “value cycle” (VC) for creating incremental subreleases that amplifies the ability of the other four elements to accelerate deliveries and mitigate risk
image

Figure 1.2 The five major components to agile enterprise data warehousing.
This book steps the reader through each of these components and thus serves as a field manual for DW/BI development teams, both those that are just getting started and those that are seeking ways to bring new life to a struggling project. Putting these five legs to work gives even the largest enterprise data warehousing programs incredible traction against the challenges they must conquer—challenges such as uninvolved business partners, incomplete and inconsistent project definitions, rigid data models, and poorly coded application modules. Incorporating these five adapted disciplines allows agile EDW teams to steadily chip away at the unknowns in both business and technical requirements, translate them into lists of actionable development tasks, and steadily deliver a growing collection of user-validated features and performance capabilities.
These agile practices convert the entire EDW development experience into a far more understandable and predictable process for everyone involved, including project sponsors and business stakeholders. The net result is a spiral that operates in the reverse manner of the cycle diagrammed previously. As depicted with Figure 1.3, agile EDW project experiences a positive feedback loop. The desired application is still large and complex, but instead of specifying every last detail of the application before coding begins, the team decomposes the work into small increments that can be easily accomplished sequentially. As the team develops the modules for each increment, the business can validate both the new features they offer and how they integrate into an overall system. The enormity of the project transforms into a list of components that both the business and IT readily understand and that can be delivered one after the other without incurring serious risk. With such clarity and low risk, sponsors and project managers can lighten up on the audits and process controls, allowing the programmers to work far more quickly and judging the project’s progress by the working modules created.
image

Figure 1.3 Agile EDW practices switch projects to a positive feedback loop.

The Agile EDW Alternative is Ready to Deploy

This book is designed with two audiences in mind: EDW sponsors and EDW project leaders. By “EDW sponsors,” I mean the executives and the representatives of a company department that is funding the development of a data integration application, perhaps with a BI or data analytics front end. These folks on the business side of a project need to realize that an agile alternative to traditional, failure-prone development methods exists. Understanding the nature and advantages of the agile alternative will empower these sponsors to insist that the development teams and project managers who work for them employ an incremental delivery approach.
When referring to “EDW team leaders,” I am thinking of the members of the development group other than the programmers who build the data integration and BI components. This group includes roles that go by many names, including solutions architects, project architects, business analysts, data architects,...

Indice dei contenuti