1.1 Deep Learning: A Historic Perspective
In the early 1940s, S. McCulloch and W. Pitts, using a simple electrical circuit called a âthreshold logic unitâ, simulated intelligent behavior by emulating how the brain works [179]. The simple model had the first neuron with inputs and outputs that would generate an output 0 when the âweighted sumâ was below a threshold and 1 otherwise, which later became the basis of all the neural architectures. The weights were not learned but adjusted. In his book The Organization of Behaviour (1949), Donald Hebb laid the foundation of complex neural processing by proposing how neural pathways can have multiple neurons firing and strengthening over time [108]. Frank Rosenblatt, in his seminal work, extended the McCullochâPitts neuron, referring to it as the âMark I Perceptronâ; given the inputs, it generated outputs using linear thresholding logic [212].
The weights in the perceptron were âlearnedâ by repeatedly passing the inputs and reducing the difference between the predicted output and the desired output, thus giving birth to the basic neural learning algorithm. Marvin Minsky and Seymour Papert later published the book Perceptrons which revealed the limitations of perceptrons in learning the simple exclusive-or function (XOR) and thus prompting the so-called The First AI Winter [186].
John Hopfield introduced âHopfield Networksâ, one of the first recurrent neural networks (RNNs) that serve as a content-addressable memory system [117].
In 1986, David Rumelhart, Geoff Hinton, and Ronald Williams published the seminal work âLearning representations by back-propagating errorsâ [217]. Their work confirms how a multi-layered neural network using many âhiddenâ layers can overcome the weakness of perceptrons in learning complex patterns with relatively simple training procedures. The building blocks for this work had been laid down by various research over the years by S. Linnainmaa, P. Werbos, K. Fukushima, D. Parker, and Y. LeCun [91, 149, 164, 196, 267].
LeCun et al., through their research and implementation, led to the first widespread application of neural networks to recognize the hand-written digits used by the U.S. Postal Service [150]. This work is a critical milestone in deep learning history, proving the utility of convolution operations and weight sharing in learning the features in computer vision.
Backpropagation, the key optimization technique, encountered a number of issues such as vanishing gradients, exploding gradients, and the inability to learn long-term information, to name a few [115]. Hochreiter and Schmidhuber, in their work,âLong short-term memory (LSTM)â architecture, demonstrated how issues with long-term dependencies could overcome shortcomings of backpropagation over time [116].
Hinton et al. published a breakthrough paper in 2006 titled âA fast learning algorithm for deep belief netsâ; it was one of the reasons for the resurgence of deep learning [113]. The research highlighted the effectiveness of layer-by-layer training using unsupervised methods followed by supervised âfine-tuningâ to achieve state-of-the-art results in character recognition. Bengio et al., in their seminal work following this, offered deep insights into why deep learning networks with multiple layers can hierarchically learn features as compared to shallow neural networks [27]. In their research, Bengio and LeCun emphasized the advantages of deep learning through architectures such as convolutional neural networks (CNNs), restricted Boltzmann machines (RBMs), and deep belief networks (DBNs), and through techniques such as unsupervised pre-training with fine-tuning, thus inspiring the next wave of deep learning [28]. Fei-Fei Li, head of the artificial intelligence lab at Stanford University, along with other researchers, launched ImageNet, which resulted in the most extensive collection of images and, for the first time, highlighted the usefulness of data in learning essential tasks such as object ...