Taming The Big Data Tidal Wave
eBook - ePub

Taming The Big Data Tidal Wave

Finding Opportunities in Huge Data Streams with Advanced Analytics

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Taming The Big Data Tidal Wave

Finding Opportunities in Huge Data Streams with Advanced Analytics

About this book

You receive an e-mail. It contains an offer for a complete personal computer system. It seems like the retailer read your mind since you were exploring computers on their web site just a few hours prior….

As you drive to the store to buy the computer bundle, you get an offer for a discounted coffee from the coffee shop you are getting ready to drive past. It says that since you're in the area, you can get 10% off if you stop by in the next 20 minutes….

As you drink your coffee, you receive an apology from the manufacturer of a product that you complained about yesterday on your Facebook page, as well as on the company's web site….

Finally, once you get back home, you receive notice of a special armor upgrade available for purchase in your favorite online video game. It is just what is needed to get past some spots you've been struggling with….

Sound crazy? Are these things that can only happen in the distant future? No. All of these scenarios are possible today! Big data. Advanced analytics. Big data analytics. It seems you can't escape such terms today. Everywhere you turn people are discussing, writing about, and promoting big data and advanced analytics. Well, you can now add this book to the discussion.

What is real and what is hype? Such attention can lead one to the suspicion that perhaps the analysis of big data is something that is more hype than substance. While there has been a lot of hype over the past few years, the reality is that we are in a transformative era in terms of analytic capabilities and the leveraging of massive amounts of data. If you take the time to cut through the sometimes-over-zealous hype present in the media, you'll find something very real and very powerful underneath it. With big data, the hype is driven by genuine excitement and anticipation of the business and consumer benefits that analyzing it will yield over time.

Big data is the next wave of new data sources that will drive the next wave of analytic innovation in business, government, and academia. These innovations have the potential to radically change how organizations view their business. The analysis that big data enables will lead to decisions that are more informed and, in some cases, different from what they are today. It will yield insights that many can only dream about today. As you'll see, there are many consistencies with the requirements to tame big data and what has always been needed to tame new data sources. However, the additional scale of big data necessitates utilizing the newest tools, technologies, methods, and processes. The old way of approaching analysis just won't work. It is time to evolve the world of advanced analytics to the next level. That's what this book is about.

Taming the Big Data Tidal Wave isn't just the title of this book, but rather an activity that will determine which businesses win and which lose in the next decade. By preparing and taking the initiative, organizations can ride the big data tidal wave to success rather than being pummeled underneath the crushing surf. What do you need to know and how do you prepare in order to start taming big data and generating exciting new analytics from it? Sit back, get comfortable, and prepare to find out!

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Taming The Big Data Tidal Wave by Bill Franks in PDF and/or ePUB format, as well as other popular books in Business & Business General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2012
Print ISBN
9781118208786
eBook ISBN
9781118241172
Edition
1
PART ONE: The Rise of Big Data
CHAPTER 1
What Is Big Data and Why Does It Matter?
Perhaps nothing will have as large an impact on advanced analytics in the coming years as the ongoing explosion of new and powerful data sources. When analyzing customers, for example, the days of relying exclusively on demographics and sales history are past. Virtually every industry has at least one completely new data source coming online soon, if it isn’t here already. Some of the data sources apply widely across industries; others are primarily relevant to a very small number of industries or niches. Many of these data sources fall under a new term that is receiving a lot of buzz: big data.
Big data is sprouting up everywhere and using it appropriately will drive competitive advantage. Ignoring big data will put an organization at risk and cause it to fall behind the competition. To stay competitive, it is imperative that organizations aggressively pursue capturing and analyzing these new data sources to gain the insights that they offer. Analytic professionals have a lot of work to do! It won’t be easy to incorporate big data alongside all the other data that has been used for analysis for years.
This chapter begins with some background on big data and what it is all about. Then it will cover a number of considerations in terms of how an organization can make use of big data. Readers will need to understand what is in this chapter as much as or more than anything else in the book if they are to tame the big data tidal wave successfully.

WHAT IS BIG DATA?

There is not a consensus in the marketplace as to how to define big data, but there are a couple of consistent themes. Two sources have done a good job of capturing the essence of what most would agree big data is all about. The first definition is from Gartner’s Merv Adrian in a Q1, 2011 Teradata Magazine article. He said, “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it within a tolerable elapsed time for its user population.”1 Another good definition is from a paper by the McKinsey Global Institute in May 2011: “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.”2
These definitions imply that what qualifies as big data will change over time as technology advances. What was big data historically or what is big data today won’t be big data tomorrow. This aspect of the definition of big data is one that some people find unsettling. The preceding definitions also imply that what constitutes big data can vary by industry, or even organization, if the tools and technologies in place vary greatly in capability. We will talk more about this later in the chapter in the section titled “Today’s Big Data Is Not Tomorrow’s Big Data.”
A couple of interesting facts in the McKinsey paper help bring into focus how much data is out there today:
  • $600 today can buy a disk drive that will store all of the world’s music.
  • There are 30 billion pieces of information shared on Facebook each month.
  • Fifteen of 17 industry sectors in the United States have more data per company on average than the U.S. Library of Congress.3
THE “BIG” IN BIG DATA ISN’T JUST ABOUT VOLUME
While big data certainly involves having a lot of data, big data doesn’t refer to data volume alone. Big data also has increased velocity (i.e., the rate at which data is transmitted and received), complexity, and variety compared to data sources of the past.
Big data isn’t just about the size of the data in terms of how much data there is. According to the Gartner Group, the “big” in big data also refers to several other characteristics of a big data source.4 These aspects include not just increased volume but increased velocity and increased variety. These factors, of course, lead to extra complexity as well. What this means is that you aren’t just getting a lot of data when you work with big data. It’s also coming at you fast, it’s coming at you in complex formats, and it’s coming at you from a variety of sources.
It is easy to see why the wealth of big data coming toward us can be likened to a tidal wave and why taming it will be such a challenge! The analytics techniques, processes, and systems within organizations will be strained up to, or even beyond, their limits. It will be necessary to develop additional analysis techniques and processes utilizing updated technologies and methods in order to analyze and act upon big data effectively. We will talk about all these topics before the book is done with the goal of demonstrating why the effort to tame big data is more than worth it.

IS THE “BIG” PART OR THE “DATA” PART MORE IMPORTANT?

It is already time to take a brief quiz! Stop for a minute and consider the following question before you read on: What is the most important part of the term big data? Is it (1) the “big” part, (2) the “data” part, (3) both, or (4) neither? Take a minute to think about it and once you’ve locked in your answer, proceed to the next paragraph. In the meantime, imagine the “contestants are thinking” music from a game show playing in the background.
Okay, now that you’ve locked in your answer let’s find out if you got the right answer. The answer to the question is choice (4). Neither the “big” part nor the “data” part is the most important part of big data. Not by a long shot. What organizations do with big data is what is most important. The analysis your organization does against big data combined with the actions that are taken to improve your business are what matters.
Having a big source of data does not in and of itself add any value whatsoever. Maybe your data is bigger than mine. Who cares? In fact, having any set of data, however big or small it may be, doesn’t add any value by itself. Data that is captured but not used for anything is of no more value than some of the old junk stored in an attic or basement. Data is irrelevant without being put into context and put to use. As with any source of data big or small, the power of big data is in what is done with that data. How is it analyzed? What actions are taken based on the findings? How is the data used to make changes to a business?
Reading a lot of the hype around big data, many people are led to believe that just because big data has high volume, velocity, and variety, it is somehow better or more important than other data. This is not true. As we will discuss later in the chapter in the section titled Most Big Data Doesn’t Matter, many big data sources have a far higher percentage of useless or low-value content than virtually any historical data source. By the time you trim down a big data source to what you actually need, it may not even be so big any more. But that doesn’t really matter, because whether it stays big or whether it ends up being small when you’re done processing it, the size isn’t important. It’s what you do with it.
IT ISN’T HOW BIG IT IS. IT’S HOW YOU USE IT!
We’re talking about big data of course! Neither the fact that big data is big nor the fact that it is data adds any inherent value. The value is in how you analyze and act upon the data to improve your business.
The first critical point to remember as we start into the book is that big data is both big and it’s data. However, that’s not what’s going to make it exciting for you and your organization. The exciting part comes from all the new and powerful analytics that will be possible as the data is utilized. We’re going to talk about a number of those new analytics as we proceed.

HOW IS BIG DATA DIFFERENT?

There are some important ways that big data is different from traditional data sources. Not every big data source will have every feature that follows, but most big data sources will have several of them.
First, big data is often automatically generated by a machine. Instead of a person being involved in creating new data, it’s generated purely by machines in an automated way. If you think about traditional data sources, there was always a person involved. Consider retail or bank transactions, telephone call detail records, product shipments, or invoice payments. All of those involve a person doing something in order for a data record to be generated. Somebody had to deposit money, or make a purchase, or make a phone call, or send a shipment, or make a payment. In each case, there is a person who is taking action as part of the process of new data being created. This is not so for big data in many cases. A lot of sources of big data are generated without any human interaction at all. A sensor embedded in an engine, for example, spits out data about its surroundings even if nobody touches it or asks it to.
Second, big data is typically an entirely new source of data. It is not simply an extended collection of existing data. For example, with the use of the Internet, customers can now execute a transaction with a bank or retailer online. But the transactions they execute are not fundamentally different transactions from what they would have done traditionally. They’ve simply executed the transactions through a different channel. An organization may capture web transactions, but they are really just more of the same old transactions that have been captured for years. However, actually capturing browsing behaviors as customers execute a transaction creates fundamentally new data which we’ll discuss in detail in Chapter 2.
Sometimes “more of the same” can be taken to such an extreme that the data becomes something new. For example, your power meter has probably been read manually each month for years. An argument can be made that automatic readings every 15 minutes by a Smart Meter is more of the same. It can also be argued that it is so much more of the same and that it enables such a different, more in-depth level of analytics that such data is really a new data source. We’ll discuss this data in Chapter 3.
Third, many big data sources are not designed to be friendly. In fact, some of the sources aren’t designed at all! Take text streams from a social media site. There is no way to ask users to follow certain standards of grammar, or sentence ordering, or vocabulary. You are going to get what you get when people make a posting. It can be difficult to work with such data at best and very, very ugly at worst. We’ll discuss text data in Chapters 3 and 6. Most traditional data sources were designed up-front to be friendly. Systems used to capture transactions, for example, provide data in a clean, preformatted template that makes the data easy to load and use. This was driven in part by the historical need to be highly efficient with space. There was no room for excess fluff.
BIG DATA CAN BE MESSY AND UGLY
Traditional data sources were very tightly defined up-front. Every bit of data had a high level of value or it would not be included. With the cost of storage space becoming almost negligible, big data sources are not always tightly defined up-front and typically capture everything that may be of use. This can lead to having to wade through messy, junk-filled data when doing an analysis.
Last, large swaths of big data streams may not have much value. In fact, much of the data may even be close to worthless. Within a web log, there is information that is very powerful. There is also a lot of information that doesn’t have much value at all. It is necessary to weed through and pull out the valuable and relevant pieces. Traditional data sources were defined up-front to be 100 percent relevant. This is because of the scalability limitations that were present. It was far too expensive to have anything included in a data feed that wasn’t critical. Not only were data records predefined, but every piece of data in them was high-value. Storage space is no longer a primary constraint. This has led to the default with big data being to capture everything possible and worry later about what matters. This ensures nothing will be missed, but also can make the process of analyzing big data more painful.

HOW IS BIG DATA MORE OF THE SAME?

As with any new topic getting a lot of attention, there are all sorts of claims about how big data is going to fundamentally change everything about how analysis is done and how it is used. If you take the time to think about it, however, it really isn’t the case. It is an example where the hype is going beyond the reality.
The fact that big data is big and poses scalability issues isn’t new. Most new data sources were considered big and difficult when they first came into use. Big data is just the next wave of new, bigger data that pushes current limits. Analysts were able to tame past data sources, given the constraints at the time, and big data will be tamed as well. After all, analysts have been at the forefront of exploring new data sources for a long time. That’s going to continue.
Who first started to analyze call detail records within telecom companies? Analysts did. I was doing churn analysis against mainframe tapes at my first job. At the time, the data was mind-boggling big. Who first started digging into retail point-of-sale data to figure out what nuggets it held? Analysts did. Originally, the thought of analyzing data about tens to hundreds of thousands of products across thousands of stores was considered a huge problem. Today, not so much.
The analytical professionals who first dipped their toe into such sources were dealing with what at the time were unthinkably large amounts of data. They had to figure out how to analyze it and make use of it within the constraints in place at the time. Many people doubted it was possible, and some even questioned the value of such data. That sounds a lot like big data today, doesn’t it?
Big data really isn’t going to change what analytic professionals are trying to do or why they are doing it. Even as some begin to define themselves as data scientists, rather than analysts, the goals and objectives are the same. Certainly the problems addressed will evolve with big data, just as they have always evolved. But at the end of the day, analysts and data scientists will simply be exploring new and unthinkably large data sets to uncover valuable trends and patterns as they have always done. For the purposes o...

Table of contents

  1. Cover
  2. Additional praise for Taming the Big Data Tidal Wave
  3. Wiley & SAS Business Series
  4. Title page
  5. Copyright page
  6. Dedication
  7. Foreword
  8. Preface
  9. Acknowledgments
  10. PART ONE: The Rise of Big Data
  11. PART TWO: Taming Big Data: The Technologies, Processes, and Methods
  12. PART THREE: Taming Big Data: The People and Approaches
  13. PART FOUR: Bringing It Together: The Analytics Culture
  14. Conclusion: Think Bigger!
  15. About the Author
  16. Index