Avoiding Data Pitfalls
eBook - ePub

Avoiding Data Pitfalls

How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations

Ben Jones

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Avoiding Data Pitfalls

How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations

Ben Jones

Book details
Book preview
Table of contents
Citations

About This Book

Avoid data blunders and create truly useful visualizations

Avoiding Data Pitfalls is a reputation-saving handbook for those who work with data, designed to help you avoid the all-too-common blunders that occur in data analysis, visualization, and presentation. Plenty of data tools exist, along with plenty of books that tell you how to use them—but unless you truly understand how to work with data, each of these tools can ultimately mislead and cause costly mistakes. This book walks you step by step through the full data visualization process, from calculation and analysis through accurate, useful presentation. Common blunders are explored in depth to show you how they arise, how they have become so common, and how you can avoid them from the outset. Then and only then can you take advantage of the wealth of tools that are out there—in the hands of someone who knows what they're doing, the right tools can cut down on the time, labor, and myriad decisions that go into each and every data presentation.

Workers in almost every industry are now commonly expected to effectively analyze and present data, even with little or no formal training. There are many pitfalls—some might say chasms —in the process, and no one wants to be the source of a data error that costs money or even lives. This book provides a full walk-through of the process to help you ensure a truly useful result.

  • Delve into the "data-reality gap" that grows with our dependence on data
  • Learn how the right tools can streamline the visualization process
  • Avoid common mistakes in data analysis, visualization, and presentation
  • Create and present clear, accurate, effective data visualizations

To err is human, but in today's data-driven world, the stakes can be high and the mistakes costly. Don't rely on "catching" mistakes, avoid them from the outset with the expert instruction in Avoiding Data Pitfalls.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Avoiding Data Pitfalls an online PDF/ePUB?
Yes, you can access Avoiding Data Pitfalls by Ben Jones in PDF and/or ePUB format, as well as other popular books in Business & Meetings & Presentations. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2019
ISBN
9781119278177
Edition
1

Chapter One
The Seven Types of Data Pitfalls

“You need to give yourself permission to be human.”
Joyce Brothers
Data pitfalls. Anyone who has worked with data has fallen into them many, many times. I certainly have. It's as if we've used data to pave the way for a better future, but the road we've made is filled with craters we just don't seem to notice until we're at the bottom looking up. Sometimes we fall into them and don't even know it. Finding out about it much later can be quite humbling.
If you've worked with data before, you know the feeling. You're giving an important presentation, your data is insightful beyond belief, your charts and graphs are impeccable and Tufte-compliant, the build to your grand conclusion is unassailable and awe-inspiring. And then that one guy in the back of the room – the guy with folded arms and furrowed brow – waits until the very end to ask you if you're aware that the database you're working with is fundamentally flawed, pulling the rug right out from underneath you, and plunging you to the bottom of yet another data pitfall. It's enough to make a poor data geek sweat bullets.
The nature of data pitfalls is that we have a particular blindness to them. It makes sense if you think about it. The human race hasn't needed to work with billions of records of data in the form of zeros and ones until the second half of the last century. Just a couple of decades later, though, our era is characterized by an ever-increasing abundance of data and a growing array of incredibly powerful tools. In many ways, our brains just haven't quite caught up yet.
These data pitfalls don't doom our every endeavor, though. Far from it. We've accomplished great things in this new era of data. We've mapped the human genome and begun to understand the complexity of the human brain, how its neurons interact so as to stimulate cognition. We've charted vast galaxies out there and we've come to a better understanding of geological and meteorological patterns right here on our own planet. Even in the simpler endeavors of life like holiday shopping, recommendation engines on e-commerce sites have evolved to be incredibly helpful. Our successes with data are too numerous to list.
But our slipups with data are mounting as well. Misuse of data has led to great harm and loss. From the colossal failure of Wall Street quants and their models in the financial crisis of the previous decade to the parable of Google Flu Trends and its lesson in data-induced hubris,1 our use of data isn't always so successful. In fact, sometimes it's downright disastrous.
Why is that? Simply because we have a tendency to make certain kinds of mistakes time and time again. Noticing those mistakes early in the process is quite easy – just as long as it's someone else who's making them. When I'm the one committing the blunder, it seems I don't find out until that guy in the back of the room launches his zinger.
And like our good friend and colleague, we're all quite adept at spotting the screw-ups of other people, aren't we? I had an early lesson in this haphazard trade. In my seventh-grade science fair exhibition, a small group of budding student scientists had a chance to walk around with the judges and explain our respective science fair projects while the other would-be blue-ribbon winners listened along. The judges, wanting to encourage dialogue and inquisitiveness, encouraged the students to also ask questions after each presentation. In spite of the noble intention behind this prompting, we basically just used the opportunity to poke holes in the methods and analysis of our competition. Kids can be cruel.
I don't do science fair projects anymore, unlike many other parents at my sons' schools, but I do work with data a lot. And I work with others who work with data a lot, too. In all of my data wrangling, data remixing, data analyzing, data visualizing, and data surmising, I've noticed that there are specific types of pitfalls that exist on the road to data paradise.
In fact, in my experience, I've found that the pitfalls we fall into can be grouped into one of seven categories.

Seven Types of Data Pitfalls

Pitfall 1: Epistemic Errors: How We Think About Data

What can data tell us? Maybe even more importantly, what can't it tell us? Epistemology is the field of philosophy that deals with the theory of knowledge – what's a reasonable belief versus what is just opinion. We often approach data with the wrong mind-set and assumptions, leading to errors all along the way, regardless of what chart type we choose, such as:
  • Assuming that the data we are using is a perfect reflection of reality
  • Forming conclusions about the future based on historical data only
  • Seeking to use data to verify a previously held belief rather than to test it to see whether it's actually false
Avoiding epistemic errors and making sure we are thinking clearly about what's reasonable and what's unreasonable is an important foundation for successful data analysis.

Pitfall 2: Technical Traps: How We Process Data

Once we've decided to use data to help solve a particular problem, we have to gather it, store it, join it with other data sets, transform it, clean it up, and get it in the right shape. Doing so can result in:
  • Dirty data with mismatching category levels and data entry typos
  • Units of measurement or date fields that aren't consistent or compatible
  • Bringing together disparate data sets and getting nulls or duplicated rows that skew analysis
These steps can be complex and messy, but accurate analysis depends on doing them right. Sometimes the truth contained within data gets “lost in translation,” and it's possible to plow ahead and make decisions without even knowing we're dealing with a seriously flawed data set.

Pitfall 3: Mathematical Miscues: How We Calculate Data

Working with data almost always involves calculations – doing math with the quantitative data we have at our disposal:
  • Summing at various levels of aggregation
  • Calculating rates or ratios
  • Working with proportions and percentages
  • Dealing with different units
These are just a few examples of how we take data fields that exist and create new data fields out of them. Just like in grade school, it's very possible to get the math wrong. These mistakes can be quite costly – an error of this type led to the loss of a $125 million Mars orbiter in 1999.2 That was more like falling into a black hole than a pitfall.

Pitfall 4: Statistical Slipups: How We Compare Data

“There are lies, damned lies, and statistics.” This saying usually implies that someone is fudging the numbers to mislead others, but we can just as often be lying to ourselves when it comes to statistics. Whether we're talking about descriptive or inferential statistics, the pitfalls abound:
  • Are the measures of central tendency or variation that we're using leading us astray?
  • Are the samples we're working with representative of the population we wish to study?
  • Are the means of comparison we're using valid and statistically sound?
These pitfalls are numerous and particularly hard to spot on the horizon, because they deal with a way of thinki...

Table of contents