1 What is deep learning?
This chapter covers
- High-level definitions of fundamental concepts
- Timeline of the development of machine learning
- Key factors behind deep learningâs rising popularity and future potential
In the past few years, artificial intelligence (AI) has been a subject of intense media hype. Machine learning, deep learning, and AI come up in countless articles, often outside of technology-minded publications. Weâre promised a future of intelligent chatbots, self-driving cars, and virtual assistantsâa future sometimes painted in a grim light and other times as utopian, where human jobs will be scarce and most economic activity will be handled by robots or AI agents. For a future or current practitioner of machine learning, itâs important to be able to recognize the signal amid the noise, so that you can tell world-changing developments from overhyped press releases. Our future is at stake, and itâs a future in which you have an active role to play: after reading this book, youâll be one of those who develop those AI systems. So letâs tackle these questions: What has deep learning achieved so far? How significant is it? Where are we headed next? Should you believe the hype?
This chapter provides essential context around artificial intelligence, machine learning, and deep learning.
1.1 Artificial intelligence, machine learning, and deep learning
First, we need to define clearly what weâre talking about when we mention AI. What are artificial intelligence, machine learning, and deep learning (see figure 1.1)? How do they relate to each other?
Figure 1.1 Artificial intelligence, machine learning, and deep learning
1.1.1 Artificial intelligence
Artificial intelligence was born in the 1950s, when a handful of pioneers from the nascent field of computer science started asking whether computers could be made to âthinkââa question whose ramifications weâre still exploring today.
While many of the underlying ideas had been brewing in the years and even decades prior, âartificial intelligenceâ finally crystallized as a field of research in 1956, when John McCarthy, then a young Assistant Professor of Mathematics at Dartmouth College, organized a summer workshop under the following proposal:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
At the end of the summer, the workshop concluded without having fully solved the riddle it set out to investigate. Nevertheless, it was attended by many people who would move on to become pioneers in the field, and it set in motion an intellectual revolution that is still ongoing to this day.
Concisely, AI can be described as the effort to automate intellectual tasks normally performed by humans. As such, AI is a general field that encompasses machine learning and deep learning, but that also includes many more approaches that may not involve any learning. Consider that until the 1980s, most AI textbooks didnât mention âlearningâ at all! Early chess programs, for instance, only involved hardcoded rules crafted by programmers, and didnât qualify as machine learning. In fact, for a fairly long time, most experts believed that human-level artificial intelligence could be achieved by having programmers handcraft a sufficiently large set of explicit rules for manipulating knowledge stored in explicit databases. This approach is known as symbolic AI. It was the dominant paradigm in AI from the 1950s to the late 1980s, and it reached its peak popularity during the expert systems boom of the 1980s.
Although symbolic AI proved suitable to solve well-defined, logical problems, such as playing chess, it turned out to be intractable to figure out explicit rules for solving more complex, fuzzy problems, such as image classification, speech recognition, or natural language translation. A new approach arose to take symbolic AIâs place: machine learning.
1.1.2 Machine learning
In Victorian England, Lady Ada Lovelace was a friend and collaborator of Charles Babbage, the inventor of the Analytical Engine: the first-known general-purpose mechanical computer. Although visionary and far ahead of its time, the Analytical Engine wasnât meant as a general-purpose computer when it was designed in the 1830s and 1840s, because the concept of general-purpose computation was yet to be invented. It was merely meant as a way to use mechanical operations to automate certain computations from the field of mathematical analysisâhence the name Analytical Engine. As such, it was the intellectual descendant of earlier attempts at encoding mathematical operations in gear form, such as the Pascaline, or Leibnizâs step reckoner, a refined version of the Pascaline. Designed by Blaise Pascal in 1642 (at age 19!), the Pascaline was the worldâs first mechanical calculatorâit could add, subtract, multiply, or even divide digits.
In 1843, Ada Lovelace remarked on the invention of the Analytical Engine,
The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. . . . Its province is to assist us in making available what weâre already acquainted with.
Even with 178 years of historical perspective, Lady Lovelaceâs observation remains arresting. Could a general-purpose computer âoriginateâ anything, or would it always be bound to dully execute processes we humans fully understand? Could it ever be capable of any original thought? Could it learn from experience? Could it show creativity?
Her remark was later quoted by AI pioneer Alan Turing as âLady Lovelaceâs objectionâ in his landmark 1950 paper âComputing Machinery and Intelligence,â which introduced the Turing test as well as key concepts that would come to shape AI. Turing was of the opinionâhighly provocative at the timeâthat computers could in principle be made to emulate all aspects of human intelligence.
The usual way to make a computer do useful work is to have a human programmer write down rulesâa computer programâto be followed to turn input data into appropriate answers, just like Lady Lovelace writing down step-by-step instructions for the Analytical Engine to perform. Machine learning turns this around: the machine looks at the input data and the corresponding answers, and figures out what the rules should be (see figure 1.2). A machine learning system is trained rather than explicitly programmed. Itâs presented with many examples relevant to a task, and it finds statistical structure in these examples that eventually allows the system to come up with rules for automating the task. For instance, if you wished to automate the task of tagging your vacation pictures, you could present a machine learning system with many examples of pictures already tagged by humans, and the system would learn statistical rules for associating specific pictures to specific tags.
Figure 1.2 Machine learning: a new programming paradigm
Although machine learning only started to flourish in the 1990s, it has quickly become the most popular and most successful subfield of AI, a trend driven by the availability of faster hardware and larger datasets. Machine learning is related to mathematical statistics, but it differs from statistics in several important ways, in the same sense that medicine is related to chemistry but cannot be reduced to chemistry, as medicine deals with its own distinct systems with their own distinct properties. Unlike statistics, machine learning tends to deal with large, complex datasets (such as a dataset of millions of images, each consisting of tens of thousands of pixels) for which classical statistical analysis such as Bayesian analysis would be impractical. As a result, machine learning, and especially deep learning, exhibits comparatively little mathematical theoryâmaybe too littleâand is fundamentally an engineering discipline. Unlike theoretical physics or mathematics, machine learning is a very hands-on field driven by empirical findings and deeply reliant on advances in software and hardware.
1.1.3 Learning rules and representations from data
To define deep learning and understand the difference between deep learning and other machine learning approaches, first we need some idea of what machine learning algorithms do. We just stated that machine learning discovers rules for executing a data processing task, given examples of whatâs expected. So, to do machine learning, we need three things:
-
Input data pointsâFor instance, if the task is speech recognition, these data points could be sound files of people speaking. If the task is image tagging, they could be pictures.
-
Examples of the expected outputâIn a speech-recognition task, these could be human-generated transcripts of sound files. In an image task, expected outputs could be tags...