In recent years, machine learning has gained a lot of interest. Due to the advances in processor technology and the availability of large amounts of data, machine learning techniques have provided astounding results in areas such as object recognition or natural language processing. New approaches, e.g. deep learning, have provided groundbreaking outcomes in fields such as multimedia mining or voice recognition. Machine learning is now used in virtually every domain and deep learning algorithms are present in many devices such as smartphones, cars, drones, healthcare equipment, or smart home devices. The Internet, cloud computing and the Internet of Things produce a tsunami of data and machine learning provides the methods to effectively analyze the data and discover actionable knowledge.
This book describes the most common machine learning techniques such as Bayesian models, support vector machines, decision tree induction, regression analysis, and recurrent and convolutional neural networks. It first gives an introduction into the principles of machine learning. It then covers the basic methods including the mathematical foundations. The biggest part of the book provides common machine learning algorithms and their applications. Finally, the book gives an outlook into some of the future developments and possible new research areas of machine learning and artificial intelligence in general.
This book is meant to be an introduction into machine learning. It does not require prior knowledge in this area. It covers some of the basic mathematical principle but intends to be understandable even without a background in mathematics. It can be read chapter wise and intends to be comprehensible, even when not starting in the beginning. Finally, it also intends to be a reference book.
Key Features:
Describes real world problems that can be solved using Machine Learning
Provides methods for directly applying Machine Learning techniques to concrete real world problems
Demonstrates how to apply Machine Learning techniques using different frameworks such as TensorFlow, MALLET, R
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Machine Learning and its Applications by Peter Wlodarczak in PDF and/or ePUB format, as well as other popular books in Informatik & Informatik Allgemein. We have over one million books available in our catalogue for you to explore.
Real world data is seldom in a form suitable for machine learning algorithms. It is typically noisy, contains redundant and irrelevant data, or the data set is too large to be processed efficiently in its original form. Data pre-processing extracts the relevant information, the features, from a data set and transforms it into a form that can be used as input for an machine learning algorithm, usually in the form of a feature vector. Data pre-processing reduces the cost of feature measurement, increase the efficiency of a learner, and allows for higher classification accuracy.
Machine learning algorithms are domain independent, data preprocessing is highly domain specific and usually requires domain knowledge. Data pre-processing comprises a large number of different techniques and often several training data sets are created by using different pre-processing techniques. The learner is then trained with each of the pre-processed training sets and the best performing set is used for the final model. Some tools offer feature filtering so the same data set can be tested with different features.
Typical pre-processing tasks include data deduplication, outlier removal, stop word removal from texts, where irrelevant words or signs are removed from the original text. Sometimes data is enriched with synthetic data or synthetic features, especially for sparse data sets.
3.1 Feature extraction
Feature extraction, also called feature selection or feature engineering, is one of the most important tasks to find hidden knowledge or business insides. In machine learning, rarely is the whole initial data set that was collected used as input for a learning algorithm. Instead, the initial data set is reduced to a subset of data that is expected to be useful and relevant for the subsequent machine learning tasks. Feature extraction is the process of selecting values from the initial data set. Features are distinctive properties of the input data and are representative descriptive attributes. In literature, there is often a distinction between feature selection and feature extraction [43]. Feature selection means reducing the feature set into a smaller feature set or into smaller feature sets, because not all features are useful for a specific task. Feature extraction means converting the original feature set into a new feature set that can perform a data mining task better or faster. Here, we treat feature extraction and feature selection interchangeably. Feature selection and, as mentioned before, generally data pre-processing is highly domain specific, whereas machine learning algorithms are not. Feature selection is also independent of the machine learning algorithm used.
Features do not need to be directly observable. For instance, a large set of directly observable features might be reduced using dimensionality reduction techniques into a smaller set of indirectly observable features, also called latent variables or hidden variables. Feature extraction can be seen as a form of dimensionality reduction. Often features are weighted, so that not every feature contributes equally to the result. Usually the weights are presented to the learner in a separate vector.
A sample feature extraction task is word frequency counting. For instance, reviews of a consumer product contain certain words more often if they are positive or negative. A positive review of a new car typically contains words such as “good”, “great”, “excellent” more often than negative reviews. Here, feature extraction means defining the words to count and counting the number of times a positive or negative word is used in the review. The resulting feature vector is a list of words with their frequencies. A feature vector is an n-dimensional vector of numerical features, so the actual word is not part of the feature vector itself. The feature vector can contain the hash of the word as shown in Figure 3.1, or the word is recognized by its index, the position in the vector.
Another example of feature extraction is object recognition in an image. The region of the object of interest has to be detected in the image and the shape has to be isolated. The background of the region is omitted. The relevant region is then converted into a pixel array which can be used as the input vector for a learning scheme.
As with machine learning schemes, there are many different feature extraction techniques and there are no simple recipes. A lot of research has been conducted in automatic feature extraction, also called feature learning. Since feature extraction is a laborious and time consuming task, automatic feature selection can potentially speed up a data mining project considerably. In recent years, great advances have been made using deep learners. Deep learners have the capability to automatically extract features. Deep learners have learned how to play games such as chess without being trained to learn the rules first. They learned the rules through trial and error. Also, deep learners such as convolutional neural networks can extract the relevant areas in an image by themselves without any image pre-processing. Next to the astoundingly good results of deep learners in tasks such as multimedia mining and natural language processing, automatic feature extraction is one of the most exciting advancements in machine learning.
Figure 3.1: Feature vector.
3.2 Sampling
Data sampling is the process of selecting a data subset, since analyzing the whole data set is often too expensive. Sampling reduces the number of instances to a subset that is processed instead of the whole data set. Samples can be selected randomly so every instance has the same probability to be selected. The selection process should guarantee that the sample is representative of the distribution that governs the data, thereby ensuring that results obtained on the sample are close to ones obtained on the whole data set [43].
There are different ways data can be sampled. When using random sampling, each instance has the same probability to be selected. Using random sampling, an instance can be selected several times. If the randomly selected instance is removed from the data set, we end up with no duplicates in the sample.
Another sampling method uses stratification. Stratification is used to avoid overrepresentation of a class label in the sample and provide a sample with an even distribution of labels. For instance, a sample might contain an overrepresentation of male or female instances. Stratification first divides the data in homogeneous subgroups before sampling. The subgroups should be mutually exclusive. Inst...