Managing Data Science
eBook - ePub

Managing Data Science

Effective strategies to manage data science projects and build a sustainable team

Kirill Dubovikov

Share book
  1. 290 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Managing Data Science

Effective strategies to manage data science projects and build a sustainable team

Kirill Dubovikov

Book details
Book preview
Table of contents
Citations

About This Book

Understand data science concepts and methodologies to manage and deliver top-notch solutions for your organization

Key Features

  • Learn the basics of data science and explore its possibilities and limitations
  • Manage data science projects and assemble teams effectively even in the most challenging situations
  • Understand management principles and approaches for data science projects to streamline the innovation process

Book Description

Data science and machine learning can transform any organization and unlock new opportunities. However, employing the right management strategies is crucial to guide the solution from prototype to production. Traditional approaches often fail as they don't entirely meet the conditions and requirements necessary for current data science projects. In this book, you'll explore the right approach to data science project management, along with useful tips and best practices to guide you along the way.

After understanding the practical applications of data science and artificial intelligence, you'll see how to incorporate them into your solutions. Next, you will go through the data science project life cycle, explore the common pitfalls encountered at each step, and learn how to avoid them. Any data science project requires a skilled team, and this book will offer the right advice for hiring and growing a data science team for your organization. Later, you'll be shown how to efficiently manage and improve your data science projects through the use of DevOps and ModelOps.

By the end of this book, you will be well versed with various data science solutions and have gained practical insights into tackling the different challenges that you'll encounter on a daily basis.

What you will learn

  • Understand the underlying problems of building a strong data science pipeline
  • Explore the different tools for building and deploying data science solutions
  • Hire, grow, and sustain a data science team
  • Manage data science projects through all stages, from prototype to production
  • Learn how to use ModelOps to improve your data science pipelines
  • Get up to speed with the model testing techniques used in both development and production stages

Who this book is for

This book is for data scientists, analysts, and program managers who want to use data science for business productivity by incorporating data science workflows efficiently. Some understanding of basic data science concepts will be useful to get the most out of this book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Managing Data Science an online PDF/ePUB?
Yes, you can access Managing Data Science by Kirill Dubovikov in PDF and/or ePUB format, as well as other popular books in Informatique & Modélisation et conception de données. We have over one million books available in our catalogue for you to explore.

Information

Year
2019
ISBN
9781838824563

Section 1: What is Data Science?

Before diving into the management issues of building systems around machine learning algorithms, we need to explore the topic of data science itself. What are the main concepts behind data science and machine learning? How do you build and test a model? What are the common pitfalls in this process? What kinds of models are there? What tasks can we solve using machine learning?
This section contains the following chapters:
  • Chapter 1, What You Can Do with Data Science
  • Chapter 2, Testing Your Models
  • Chapter 3, Understanding AI

What You Can Do with Data Science

I once told a friend who works as a software developer about one of the largest European data science conferences. He showed genuine interest and asked whether we could go together. Sure, I said. Let's broaden our knowledge together. It will be great to talk to you about machine learning. Several days later, we were sitting in the middle of a large conference hall. The first speaker had come on stage and told us about some technical tricks he used to win several data science competitions. When the next speaker talked about tensor algebra, I noticed a depleted look in the eyes of my friend.
What's up? I asked.
I'm just wondering when they'll show us the robots.
To avoid having incorrect expectations, we need to inform ourselves. Before building a house, you'd better know how a hammer works. Having basic knowledge of the domain you manage is vital for any kind of manager. A software development manager needs to understand computer programming. A factory manager needs to know the manufacturing processes. A data science manager is no exception. The first part of this book gives simple explanations of the main concepts behind data science. We will dissect and explore it bit by bit.
Data science has become popular, and many business people and technical professionals have an increasing interest in understanding data science and applying it to solve their problems. People often form their first opinions about data science from the information that they collect through the background: news sites, social networks, and so on. Unfortunately, most of those sources misguide, rather than give a realistic picture of data science and machine learning.
Instead of explaining, the media describes the ultimate magical tools that easily solve all our problems. The technological singularity is near. A universal income economy is around the corner. Well, only if machines learned and thought like humans. In fact, we are far from creating general-purpose, self-learning, and self-improving algorithms.
This chapter explores current possibilities and modern applications of the main tools of data science: machine learning and deep learning.
In this chapter, we will cover the following topics:
  • Defining AI
  • Introduction to machine learning
  • Introduction to deep learning
  • Deep learning use case
  • Introduction to causal inference

Defining AI

Media and news use AI as a substitute buzzword for any technology related to data analysis. In fact, AI is a sub-field of computer science and mathematics. It all started in the 1950s, when several researchers started asking whether computers can learn, think, and reason. 70 years later, we still do not know the answer. However, we have made significant progress in a specific kind of AI that solves thoroughly specified narrow tasks: weak AI.
Science fiction novels tell about machines that can reason and think like humans. In scientific language, they are described as strong AI. Strong AI can think like a human, and its intellectual abilities may be much more advanced. The creation of strong AI remains the main long-term dream of the scientific community. However, practical applications are all about weak AI. While strong AI tries to solve the problem of general intelligence, weak AI is focused on solving one narrow cognition task, such as vision, speech, or listening. Examples of weak AI tasks are diverse: speech recognition, image classification, and customer churn prediction. Weak AI plays an important role in our lives, changing the way we work, think, and live. We can find successful applications of weak AI in every area of our lives. Medicine, robotics, marketing, logistics, art, and music all benefit from recent advances in weak AI.

Defining data science

How does AI relate to machine learning? What is deep learning? And how do we define data science? These popular questions are better answered graphically:
This diagram includes all the technical topics that will be discussed in this book:
  • AI is a general scientific field that covers everything related to weak and strong AI. We won't focus much on AI, since most practical applications come from its subfields, which we define and discuss through the rest of Section 1: What is Data Science?
  • Machine learning is a subfield of AI that studies algorithms that can adapt their behavior based on incoming data without explicit instructions from a programmer.
  • Deep learning is a subfield of machine learning that studies a specific kind of machine learning model called deep neural networks.
  • Data science is a multidisciplinary field that uses a set of tools to extract knowledge from data and support decision making. Machine learning and deep learning are among the main tools of data science.
The ultimate goal of data science is to solve problems by extracting knowledge from data and giving support for complex decisions. The first part of solving a problem is getting a good understanding of its domain. You need to understand the insurance business before using data science for risk analysis. You need to know the details of the goods manufacturing process before designing an automated quality assurance process. First, you understand the domain. Then, you find a problem. If you skip this part, you have a good chance of solving the wrong problem.
After coming up with a good problem definition, you seek a solution. Suppose that you have created a model that solves a task. A machine learning model in a vacuum is rarely interesting for anyone. So, it is not useful. To make it useful, we need to wrap our models into something that can be seen and acted upon. In other words, we need to create software around models. Data science always comes hand-in-hand with creating software systems. Any machine learning model needs software. Without software, models would just lie in computer memory, not helping anyone.
So, data science is never only about science. Business knowledge and software development are also important. Without them, no solution would be complete.

The influence of data science

Data science has huge potential. It already affects our daily lives. Healthcare companies are learning to diagnose and predict major health issues. Businesses use it to find new strategies for winning new customers and personalize their services. We use big data analysis in genetics and particle physics. Thanks to advances in data science, self-driving cars are now a reality.
Thanks to the internet and global computerization, we create vast amounts of data daily. Ever-increasing volumes of data allow us to automate human labor.
Sadly, for each use case that improves our lives, we can easily find two that make them worse. To give you a disturbing example, let's look at China. The Chinese government is experimenting with a new social credit system. It uses surveillance cameras to track the daily lives of its citizens on a grand scale. Computer vision systems can recognize and log every action that you make while commuting to work, waiting in lines at a government office, or going home after a party. A special social score is then calculated based on your monitored actions. This score affects the lives of real people. In particular, public transport fees can change depending on your score; low scores can prohibit you from interviewing for a range of government jobs.
On the other hand, this same technology can be used to help people. For example, it can be used to track criminals in large crowds. The way you apply this new technology can bring the world closer to George Orwell's 1984, or make it a safer place. The general public must be more conscious of these choices, as they might have lasting effects on their lives.
Another example of some disturbing uses of machine learning is businesses that use hiring algorithms based on machine learning. Months later, they discovered that the algorithms introduced bias against women. It is becoming clear that we do not give the right amount of attention to the ethics of data science. While companies such as Google create internal ethics boards, there is still no governmental control over the unethical use of modern technology. Before such programs arrive, I strongly encourage you to consider the ethical implications of using data science. We all want a better world to live in. Our future, and the future of our children, depends on small decisions we make each day.

Limitations of data science

Like any set of tools, data science has its limitations. Before diving into a project with ambitious ideas, it is important to consider the current limits of possibility. A task that seems easily solvable may be unsolvable in practice.
Insufficient understanding of the technical side of data science can lead to serious problems in your projects. You can start a project only to discover that you cannot solve the task at all. Even worse, you can find out that nothing works as intended only after deployment. Depending on your use case, it can affect real people. Understanding the main principles behind data science will rid you of many technical risks that predetermine a project's fate before it has even started.

Introduction to machine learning

Machine learning is by far the most important tool of a data scientist. It allows us to create algorithms that discover patterns in data with thousands of variables. We will now explore different types and capabilities of machine learning algorithms.
Machine learning is a scientific field that studies algorithms that can learn to perform tasks without specific instructions, relying on patterns discovered in data. For example, we can use algorithms to predict the likelihood of having a disease or assess the risk of failure in complex manufacturing equipment. Every machine learning algorithm follows a simple formula. In the following diagram, you can see a high-level decision process that is based on a machine learning algorithm. Each machine learning model consumes data to produce information that can support human decisions or fully automate them:
We will now explore the meaning of each block in more detail in the next section.

Decisions and insights provided by a machine learning model

When solving a task using machine learning, you generally want to automate a decision-making process or get insights to support your decision. For example, you may want an ...

Table of contents