Avoiding Data Pitfalls
eBook - ePub

Avoiding Data Pitfalls

How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations

Ben Jones

Compartir libro
  1. English
  2. ePUB (apto para móviles)
  3. Disponible en iOS y Android
eBook - ePub

Avoiding Data Pitfalls

How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations

Ben Jones

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Avoid data blunders and create truly useful visualizations

Avoiding Data Pitfalls is a reputation-saving handbook for those who work with data, designed to help you avoid the all-too-common blunders that occur in data analysis, visualization, and presentation. Plenty of data tools exist, along with plenty of books that tell you how to use them—but unless you truly understand how to work with data, each of these tools can ultimately mislead and cause costly mistakes. This book walks you step by step through the full data visualization process, from calculation and analysis through accurate, useful presentation. Common blunders are explored in depth to show you how they arise, how they have become so common, and how you can avoid them from the outset. Then and only then can you take advantage of the wealth of tools that are out there—in the hands of someone who knows what they're doing, the right tools can cut down on the time, labor, and myriad decisions that go into each and every data presentation.

Workers in almost every industry are now commonly expected to effectively analyze and present data, even with little or no formal training. There are many pitfalls—some might say chasms —in the process, and no one wants to be the source of a data error that costs money or even lives. This book provides a full walk-through of the process to help you ensure a truly useful result.

  • Delve into the "data-reality gap" that grows with our dependence on data
  • Learn how the right tools can streamline the visualization process
  • Avoid common mistakes in data analysis, visualization, and presentation
  • Create and present clear, accurate, effective data visualizations

To err is human, but in today's data-driven world, the stakes can be high and the mistakes costly. Don't rely on "catching" mistakes, avoid them from the outset with the expert instruction in Avoiding Data Pitfalls.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Avoiding Data Pitfalls un PDF/ePUB en línea?
Sí, puedes acceder a Avoiding Data Pitfalls de Ben Jones en formato PDF o ePUB, así como a otros libros populares de Business y Meetings & Presentations. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial
Wiley
Año
2019
ISBN
9781119278177
Edición
1
Categoría
Business

Chapter One
The Seven Types of Data Pitfalls

“You need to give yourself permission to be human.”
Joyce Brothers
Data pitfalls. Anyone who has worked with data has fallen into them many, many times. I certainly have. It's as if we've used data to pave the way for a better future, but the road we've made is filled with craters we just don't seem to notice until we're at the bottom looking up. Sometimes we fall into them and don't even know it. Finding out about it much later can be quite humbling.
If you've worked with data before, you know the feeling. You're giving an important presentation, your data is insightful beyond belief, your charts and graphs are impeccable and Tufte-compliant, the build to your grand conclusion is unassailable and awe-inspiring. And then that one guy in the back of the room – the guy with folded arms and furrowed brow – waits until the very end to ask you if you're aware that the database you're working with is fundamentally flawed, pulling the rug right out from underneath you, and plunging you to the bottom of yet another data pitfall. It's enough to make a poor data geek sweat bullets.
The nature of data pitfalls is that we have a particular blindness to them. It makes sense if you think about it. The human race hasn't needed to work with billions of records of data in the form of zeros and ones until the second half of the last century. Just a couple of decades later, though, our era is characterized by an ever-increasing abundance of data and a growing array of incredibly powerful tools. In many ways, our brains just haven't quite caught up yet.
These data pitfalls don't doom our every endeavor, though. Far from it. We've accomplished great things in this new era of data. We've mapped the human genome and begun to understand the complexity of the human brain, how its neurons interact so as to stimulate cognition. We've charted vast galaxies out there and we've come to a better understanding of geological and meteorological patterns right here on our own planet. Even in the simpler endeavors of life like holiday shopping, recommendation engines on e-commerce sites have evolved to be incredibly helpful. Our successes with data are too numerous to list.
But our slipups with data are mounting as well. Misuse of data has led to great harm and loss. From the colossal failure of Wall Street quants and their models in the financial crisis of the previous decade to the parable of Google Flu Trends and its lesson in data-induced hubris,1 our use of data isn't always so successful. In fact, sometimes it's downright disastrous.
Why is that? Simply because we have a tendency to make certain kinds of mistakes time and time again. Noticing those mistakes early in the process is quite easy – just as long as it's someone else who's making them. When I'm the one committing the blunder, it seems I don't find out until that guy in the back of the room launches his zinger.
And like our good friend and colleague, we're all quite adept at spotting the screw-ups of other people, aren't we? I had an early lesson in this haphazard trade. In my seventh-grade science fair exhibition, a small group of budding student scientists had a chance to walk around with the judges and explain our respective science fair projects while the other would-be blue-ribbon winners listened along. The judges, wanting to encourage dialogue and inquisitiveness, encouraged the students to also ask questions after each presentation. In spite of the noble intention behind this prompting, we basically just used the opportunity to poke holes in the methods and analysis of our competition. Kids can be cruel.
I don't do science fair projects anymore, unlike many other parents at my sons' schools, but I do work with data a lot. And I work with others who work with data a lot, too. In all of my data wrangling, data remixing, data analyzing, data visualizing, and data surmising, I've noticed that there are specific types of pitfalls that exist on the road to data paradise.
In fact, in my experience, I've found that the pitfalls we fall into can be grouped into one of seven categories.

Seven Types of Data Pitfalls

Pitfall 1: Epistemic Errors: How We Think About Data

What can data tell us? Maybe even more importantly, what can't it tell us? Epistemology is the field of philosophy that deals with the theory of knowledge – what's a reasonable belief versus what is just opinion. We often approach data with the wrong mind-set and assumptions, leading to errors all along the way, regardless of what chart type we choose, such as:
  • Assuming that the data we are using is a perfect reflection of reality
  • Forming conclusions about the future based on historical data only
  • Seeking to use data to verify a previously held belief rather than to test it to see whether it's actually false
Avoiding epistemic errors and making sure we are thinking clearly about what's reasonable and what's unreasonable is an important foundation for successful data analysis.

Pitfall 2: Technical Traps: How We Process Data

Once we've decided to use data to help solve a particular problem, we have to gather it, store it, join it with other data sets, transform it, clean it up, and get it in the right shape. Doing so can result in:
  • Dirty data with mismatching category levels and data entry typos
  • Units of measurement or date fields that aren't consistent or compatible
  • Bringing together disparate data sets and getting nulls or duplicated rows that skew analysis
These steps can be complex and messy, but accurate analysis depends on doing them right. Sometimes the truth contained within data gets “lost in translation,” and it's possible to plow ahead and make decisions without even knowing we're dealing with a seriously flawed data set.

Pitfall 3: Mathematical Miscues: How We Calculate Data

Working with data almost always involves calculations – doing math with the quantitative data we have at our disposal:
  • Summing at various levels of aggregation
  • Calculating rates or ratios
  • Working with proportions and percentages
  • Dealing with different units
These are just a few examples of how we take data fields that exist and create new data fields out of them. Just like in grade school, it's very possible to get the math wrong. These mistakes can be quite costly – an error of this type led to the loss of a $125 million Mars orbiter in 1999.2 That was more like falling into a black hole than a pitfall.

Pitfall 4: Statistical Slipups: How We Compare Data

“There are lies, damned lies, and statistics.” This saying usually implies that someone is fudging the numbers to mislead others, but we can just as often be lying to ourselves when it comes to statistics. Whether we're talking about descriptive or inferential statistics, the pitfalls abound:
  • Are the measures of central tendency or variation that we're using leading us astray?
  • Are the samples we're working with representative of the population we wish to study?
  • Are the means of comparison we're using valid and statistically sound?
These pitfalls are numerous and particularly hard to spot on the horizon, because they deal with a way of thinki...

Índice