Data Science Without Makeup
eBook - ePub

Data Science Without Makeup

A Guidebook for End-Users, Analysts, and Managers

Mikhail Zhilkin

Buch teilen
  1. 178 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfĂŒgbar
eBook - ePub

Data Science Without Makeup

A Guidebook for End-Users, Analysts, and Managers

Mikhail Zhilkin

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Mikhail Zhilkin, a data scientist who has worked on projects ranging from Candy Crush games to Premier League football players' physical performance, shares his strong views on some of the best and, more importantly, worst practices in data analytics and business intelligence. Why data science is hard, what pitfalls analysts and decision-makers fall into, and what everyone involved can do to give themselves a fighting chance—the book examines these and other questions with the skepticism of someone who has seen the sausage being made.

Honest and direct, full of examples from real life, Data Science Without Makeup: A Guidebook for End-Users, Analysts and Managers will be of great interest to people who aspire to work with data, people who already work with data, and people who work with people who work with data—from students to professional researchers and from early-career to seasoned professionals.

Mikhail Zhilkin is a data scientist at Arsenal FC. He has previously worked on the popular Candy Crush mobile games and in sports betting.

HĂ€ufig gestellte Fragen

Wie kann ich mein Abo kĂŒndigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kĂŒndigen“ – ganz einfach. Nachdem du gekĂŒndigt hast, bleibt deine Mitgliedschaft fĂŒr den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich BĂŒcher herunterladen?
Derzeit stehen all unsere auf MobilgerĂ€te reagierenden ePub-BĂŒcher zum Download ĂŒber die App zur VerfĂŒgung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die ĂŒbrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den AboplÀnen?
Mit beiden AboplÀnen erhÀltst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst fĂŒr LehrbĂŒcher, bei dem du fĂŒr weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhĂ€ltst. Mit ĂŒber 1 Million BĂŒchern zu ĂŒber 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
UnterstĂŒtzt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nÀchsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Data Science Without Makeup als Online-PDF/ePub verfĂŒgbar?
Ja, du hast Zugang zu Data Science Without Makeup von Mikhail Zhilkin im PDF- und/oder ePub-Format sowie zu anderen beliebten BĂŒchern aus Computer Science & Data Mining. Aus unserem Katalog stehen dir ĂŒber 1 Million BĂŒcher zur VerfĂŒgung.

Information

Jahr
2021
ISBN
9781000464856

II

a new hope

DOI: 10.1201/9781003057420-5
Just because data science is hard, and our brain was not exactly designed to do it we should not stop trying our best. The hype around data science is not entirely unfounded. When done right, it can, indeed, transform businesses and disrupt industries (and even create new ones). “When” is the operative word here.
There is lot of the material on how to do data science the right way that falls into two categories:
  • Articles full of truisms, such as “Data quality is important,” which you cannot and will not argue with, but which leave you none the wiser as to how to actually do things better.
  • Books and workshops on tools and workflows that enable one to do data analysis, but do not necessarily cover the chasm between using a tool and achieving the desired result.
This book tries to fill in the blanks. Online courses and bootcamps can prepare you for the role of a junior data scientist; saying things like “Deliver relevant results” and “Empower business stakeholders” may work when interviewing for a managerial position; but we will focus on the stuff in between—the magical transformation of data science efforts into something useful.
This part of the book will deal with three major areas:
  • data science for people, not for its own sake (doing the right thing),
  • quality assurance (doing it correctly),
  • automation (never having to do it again).
It will all be essentially about data science best practices.
Just like any best practices, rules of thumb and other kinds of guidelines, data science best practices do not emerge from nowhere. It is important to understand that the process is as follows:
  1. Practitioners try to solve a certain problem.
  2. Some approaches work better than others.
  3. Success and failure stories are distilled into dos and don’ts—a best practice is born.
When writing about best practices, it is difficult to avoid words like “should” and “must.” The reader would do well to remember that there are no sacred texts in data science. And even if there were, this would not be one of them.
“Should” and “must” are only convenient shortcuts. “Thou shalt not lie” is easier to say and remember than “Telling lies is detrimental to one’s character and destroys mutual trust, which is a crucial resource for a group of people with a shared goal.” Similarly, “Always comment your code” is a shortcut for “Commenting your code will make it easier for you and others to maintain it and, if necessary, reuse it in the future.”
If someone tells you that you should do this, or that you must never do that, and you are not sure where they are coming from, it is a good idea to ask: “What will happen if I do? What if I don’t?” Writing about best practices, I will try to make it clear what good things will happen if you follow it, and what bad things will happen if you don’t.
There are no hard rules beyond the laws of physics (and even they are just our best guess for the moment), but experience shows that it is better to start out with known best practices and only deviate from them once you know the ins and outs.

4

data science for people

DOI: 10.1201/9781003057420-6
When discussing data science best practices, it is important to note that there is a hierarchy to them. A best practice never exists without context, and for it to make sense, it may be required that a more high-level best practice (or several) has been put in place.
For example, there is no point in arguing which machine learning technique to use if the data the model will be trained on is rubbish. Thus, data-related best practices come before those specific to machine learning. Improving the data will have a bigger impact on the model accuracy than picking a more sophisticated machine learning algorithm.
This chapter will attempt to outline the most general of data science best practices in a meaningful order:
  1. Align data science efforts with business needs.
  2. Mind data science hierarchy of needs.
  3. Make it simple, reproducible, and shareable.
I do not know about you, but when I am looking at these I cannot help thinking, “Aren’t they all extremely obvious?” Who does not want to align data science efforts with business needs? Who wants to make it unnecessarily complicated? Who does not want to automate everything that can be automated, and save time and money? But then, if these principles were adhered to by—not even all—most organizations, I would not feel the urge to write this book in the first place.
Let’s go through these best practices one by one and try to understand why they are ignored more often than not.

align data science efforts with business needs

In any organization that aspires to be data-driven, the first thing to look at is the alignment of data science efforts with business needs. This may sound obvious, but I have been in and observed situations when data science efforts were primarily driven not by what the business needed, but by one or both of the following two:
  • what data scientists wanted to do,
  • what people working with data scientists thought the business needed.
Let’s address the first one: data scientists doing what they want rather than what business needs.
As science is concerned with seeking truth, data science is concerned with seeking truth in data. The two main reasons to seek truth are:
  1. Curiosity: you want to understand something for the sake of understanding. This is what often drives data scientists. They want to do an exploratory analysis, run an A/B test or master a new tool not because it will necessarily create business value, but simply because they are curious.
  2. Pragmatism: whatever your goal, you can get closer to it by better understanding the domain. In case of a business, you may, for example, hope to increase revenue by better understanding your customers’ needs and behaviors.
A fundamental challenge of creating a data-driven organization is the marriage of these two: curious people working towards pragmatic goals. The optimal proportion of curiosity and pragmatism will vary from company to company. A research data scientist working on pushing the boundaries of deep learning may do well to be 95% curious and 5% pragmatic, whereas a business analyst supporting a small chain of retail stores is likely to benefit from being only 5% curious and 95% pragmatic. Data analysts in most companies will be at their most productive when combining curiosity and pragmatism in reasonable proportions.
The absolute majority of data analysts I have had the privilege to work with had enough curiosity for two people. Some would be more interested in statistical analysis, some—in writing efficient code, others—in building data pipelines, but all of them would have a pre-existing interest in doing something data-related for its own sake.
The same could not be said about every analyst’s passion for meeting quarterly business objectives. Most of them, especially those just breaking into the field, would look for an interesting project first and think about its value for the business later, if ever. Whereas in the ideal world it would be the other way around.
This challenge is best addressed top down: a business-minded data science team manager will have a shot at aligning less business-minded data scientists and making sure they deliver business value. It is difficult to imagine a bottom-up approach to be successful.
A manager who understands business needs but knows nothing about data science will generally outperform a tech-savvy manager of comparable general intelligence who has lost touch with business goals.
I have personally worked with a variety of data science managers, with widely varying degrees of business-mindedness and tech-savviness. My experience is that a manager who understands business needs but knows nothing about data science will generally outperform a tech-savvy manager of comparable general intelligence who has lost touch with business goals.
In a smaller data science organization, it can be the data scientist themselves who determine the overall direction of research.
I once got a question from a data analyst who had just joined a sports club. He wanted my input on how to start off with the data and what questions he should be looking to answer.
While I did my best to answer in a friendly and constructive manner, I could not help thinking, “I am not the person you should be talking to. Your job is to help people running the club. Ask them what they need, not me.”
For a data scientist, it may be useful to know what your peers in other companies work on. If they happen to have solved a problem similar to one you are working on, you may be able to learn from their experience (and you can certainly learn from their mistakes). However, at the end of the day, everything you do you do in the context of your organization, and you are best positioned to find out what needs to be done. And it is arguably the most important part of your job. You cannot outsource understanding the needs of the business.
You cannot outsource understanding the needs of the business.
One well-known management methodology that can help align what the data science team does with what business needs is objectives and key results (OKR). The idea behind this goal-setting framework, popularized by Google, is to ensure that the company focuses efforts on the same important issues throughout the organization. When OKR is applied correctly, anything a data scientist (or any employee, for that matter) does should be connected to an overarching company’s objective. Conversely, if a task cannot be connected to such objective, it can and should be dropped.
Unfortunately, as often happens with methodologies and frameworks, they can be applied in theory while very much ignored in practice. A certain kind of cargo cult takes place: meetings are held, presentations are shown around, to-do lists are created, but when the dust settles, it is business as usual, with people doing what they have always been doing.
Without a change in organization culture and everyone’s mindset, a management framework is just a yoga mat that was bought and put away in the loft.
Without a change in organization culture and everyone’s mindset, a management framework is jus...

Inhaltsverzeichnis