Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization
eBook - ePub

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization

B.K. Tripathy, Anveshrithaa Sundareswaran, Shrusti Ghela

Share book
  1. 216 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization

B.K. Tripathy, Anveshrithaa Sundareswaran, Shrusti Ghela

Book details
Book preview
Table of contents
Citations

About This Book

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization describes such algorithms as Locally Linear Embedding (LLE), Laplacian Eigenmaps, Isomap, Semidefinite Embedding, and t-SNE to resolve the problem of dimensionality reduction in the case of non-linear relationships within the data. Underlying mathematical concepts, derivations, and proofs with logical explanations for these algorithms are discussed, including strengths and limitations. The book highlights important use cases of these algorithms and provides examples along with visualizations. Comparative study of the algorithms is presented to give a clear idea on selecting the best suitable algorithm for a given dataset for efficient dimensionality reduction and data visualization.

FEATURES



  • Demonstrates how unsupervised learning approaches can be used for dimensionality reduction


  • Neatly explains algorithms with a focus on the fundamentals and underlying mathematical concepts


  • Describes the comparative study of the algorithms and discusses when and where each algorithm is best suitable for use


  • Provides use cases, illustrative examples, and visualizations of each algorithm


  • Helps visualize and create compact representations of high dimensional and intricate data for various real-world applications and data analysis

This book is aimed at professionals, graduate students, and researchers in Computer Science and Engineering, Data Science, Machine Learning, Computer Vision, Data Mining, Deep Learning, Sensor Data Filtering, Feature Extraction for Control Systems, and Medical Instruments Input Extraction.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization an online PDF/ePUB?
Yes, you can access Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization by B.K. Tripathy, Anveshrithaa Sundareswaran, Shrusti Ghela in PDF and/or ePUB format, as well as other popular books in Économie & Statistiques pour les entreprises et l'économie. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2021
ISBN
9781000438451

1

Introduction to Dimensionality Reduction

1.1 INTRODUCTION

The world is currently witnessing unprecedented technological advancements that are changing the world in every way possible. We are already in the era of data, where the amount of data that is available and the rate at which data is being generated are outside of the realm of our imagination. The amount of data that is being created every second is growing at an exponential rate resulting in data explosion and this growth is expected to be ever increasing. Saying so, with exponentially more data than ever, we are moving toward a data-centric world, where every field is becoming data dominated. Data being one of the most valuable resources today, we are working toward deriving more value from it, as a wealth of information can be processed from this data that can have a profound impact on shaping the world in many aspects.
On the other hand, the evolution of machine learning is happening at a tremendous pace. The advent of machine learning and AI has brought in one of the greatest technological transformations that is revolutionizing the world. The idea that machines can learn from data with minimal human interference has become so prevalent that machine learning has a ubiquitous influence in almost every field. Furthermore, with the inception of high power computing leading to the increased availability of cheaper and more powerful computational capacity and affordable data storage, it is possible to make extensive progress in various fields, by leveraging machine learning to make good use of the available data. This is also leading to major breakthroughs in the field of science and research. For instance, understanding biological data by exploiting machine learning is greatly helping in finding answers to some of the important questions on the existence of life on earth, causes of diseases, and the effect of microorganisms on the human body; this is also leading to remarkable findings in cancer research and drug discovery. Furthermore, analysis of space data has also paved the way for noteworthy and fascinating findings in the field of space exploration and is greatly helping in solving some of the mysteries that are still beyond human imagination. Coming to the finance industry, analysis of large amounts of data plays a vital role as it is crucial to identify important insights in data to learn about trends in stock prices, investment and trade opportunities, identifying high-risk profiles, and preventing fraud. Another sector that is extensively using massive amounts of data to derive value from them is the healthcare industry. Large volumes of real-time data from sensors and wearable devices are analyzed to identify trends, diagnose diseases, and even treat them. This list is endless as data science plays an important role in almost every domain ranging from computational genomics to environmental science to even art.
Today, businesses heavily rely on their data for their sustainability. Data has become an inevitable aspect of every business today with the growing demand to analyze the data to derive important insights for better decision making and improving business strategies. The adoption of machine learning by enterprises for analysis of their data to answer questions from that data has accelerated over the last few years. Enterprises are capitalizing on data to accelerate their businesses and increase their efficiency, productivity, and profit. Companies are largely investing in data to unleash the potential of their data in improving their business value.
Owing to this situation, data visualization and exploratory data analysis have gained tremendous importance. But extracting information from real-world datasets that are generally large in volume with high dimensionality and discovering compact representations of such high dimensional data is an intricate task. Sometimes, the data can be of extremely large dimensions, on the order of thousands or sometimes even millions which makes it infeasible for analysis as many algorithms perform poorly on very high dimensional data. Given that the volume of data itself is very large, the problem of high dimensionality is not trivial, making it necessary to find an approximation to the original data that has much fewer dimensions but at the same time retains the structure and the relationships within the data.
The main goal of unsupervised learning is to create compact representations of the data by detecting their intrinsic and hidden structure for applications in data visualization and exploratory data analysis. The need to analyze large volumes of multivariate data for tasks like pattern recognition, analysis of gene expression data and time series analysis across a wide range of fields raises the fundamental issue of discovering compact representations of high dimensional data. This difficulty in extracting information from high dimensional data is the principal motivation behind the renewed interest in formulating the problem of dimensionality reduction. The idea behind dimensionality reduction is to map data in a higher dimensional space to a lower dimensional space as a method for data visualization and to extract the key low dimensional features. While classical unsupervised dimensionality reduction approaches like Principal Component Analysis (PCA) are limited to linear data in terms of effectiveness as they overlook correlations in the data that are higher than second order, relatively new approaches like Locally Linear Embedding (LLE), Laplacian Eigenmaps, Isomap, Semidefinite Embedding, and t-distributed stochastic neighbor embedding (t-SNE) have been proposed in the recent past as solutions to resolving the problem of dimensionality reduction in the case of nonlinear relationships within the data.
Linear methods such as PCA and Multidimensional Scaling (MDS) are some of the fundamental spectral methods for linear dimensionality reduction whose underlying geometric intuitions form the basis for many other nonlinear dimensionality reduction techniques.
In Chapter 1, we discuss one of the oldest and most widely known dimensionality reduction techniques, PCA, which is based on linear projection. This is a fundamental, classical approach to dimensionality reduction, but its major limitation is its inability to handle nonlinear data. This technique projects data to lower dimensional space by maximizing the variance. The principal components are obtained by finding the solution to an eigenproblem, that is, by performing singular value decomposition on the data matrix. An algorithm which is a slight variation of the classical PCA, called the Dual PCA, is discussed in Chapter 2.
Since PCA is restricted to linear dimensionality reduction where high dimensional data lies on a linear subspace, a kernel-based method which is a nonlinear generalization of PCA was introduced to handle nonlinear data. This method is based on the “kernel trick” where the inner products are replaced with a kernel. The kernel function can be considered as a nonlinear similarity measure and many linear approaches can be generalized to nonlinear methods by exploiting the “kernel trick.” This variant of PCA uses kernels to compute the principal components. It works well on nonlinear data and overcomes the limitation of PCA in addressing the nonlinearity of the data. This technique came to be known as the kernel PCA, which is extensively discussed in Chapter 3.
While PCA is an efficient dimensionality reduction technique, in cases where the dimensionality of the data is very high, PCA becomes computationally expensive. Chapter 4 discusses another dimensionality reduction technique called random projection that uses a projection matrix whose entries are randomly sampled from a distribution to project the data to a low dimensional space and also guarantees pairwise distance preservation in the low dimensional space. This method of dimensionality reduction outperforms PCA when the dimensionality of the data is very high which makes PCA computationally expensive.
In Chapter 5, we will discuss Canonical Correlation Analysis which is used as a dimensionality reduction technique for multi-view setting, that is, when there are two or more views of the same data. The goal is to generate low dimensional representations of the points in each view such that it retains the redundant information between the different views by singular value decomposition.
Similar to PCA, MDS is yet another classical approach to dimensionality reduction that attempts to preserve the pairwise distances between the data points. Again, MDS is a linear dimensionality reduction technique just like PCA and is limited to linear data. This method of mapping data points from high dimensional space to low dimensional space is elaborately explained in Chapter 6.
These linear methods, such as PCA and MDS, produce good low dimensional representations when the geometry of the original input data is confined to a low dimensional subspace, whereas, in the case where the input data is sampled from a low dimensional submanifold of the input space which makes it nonlinear, these linear methods fail to perform well. Many powerful graph-based methods have been proposed for nonlinear dimensionality reduction. These manifold learning approaches construct graphs representing the data points and the relationships between them from which matrices are formed whose spectral decomposition gives low dimensional representations of the nonlinear data. Some of these graph-based manifold learning approaches to nonlinear dimensionality reduction are discussed in the subsequent chapters.
Just like PCA, MDS has also been extended to solve the problem of nonlinear dimensionality reduction. This approach, one of the earliest manifold learning techniques, attempts to unfold the intrinsically low dimensional manifold on which the data points lie and uses the geodesic distances between data points rather than Euclidean distance to find the low dimensional mapping of the points such that the geodesic distances between pairs of data points are preserved. This nonlinear generalization of MDS, called as Isomap is discussed in Chapter 7.
The method of LLE proposed in 2000 by Sam T. Roweis and Lawrence K. Saul is a nonlinear dimensionality reduction technique that identifies the ­nonlinear structure in the data and finds neighborhood preserving mapping in low dimensional space. A detailed discussion on how this algorithm works and how the local distances are preserved is presented in Chapter 8.
Clustering techniques are used to identify groups within data that exhibit some similarity within them and one such algorithm is the spectral clustering method which is discussed in detail in Chapter 9. Though it is not a dimensionality reduction technique by itself, it has close connection with another technique called the Laplacian eigenmap which is a computationally efficient approach to nonlinear dimensionality reduction that uses the concept of graph Laplacians to find locality preserving embeddings in the feature space. This algorithm and its association with spectral clustering are elaborated upon in Chapter 10.
While kernel PCA is the nonlinear generalization of PCA that uses kernel function in place of inner product, the choice of the kernel is very important in that it determines the low dimensional mappings. Unlike the kernel PCA where the kernel that is going to be used is determined prior, Maximum Variance Unfolding, also known as Semidefinite Embedding, is an algorithm that learns the optimal kernel by semidefinite programming. This form of kernel PCA that is different from other manifold learning methods like LLE and Isomap is elaborately dealt with in Chapter 11.
A relatively new method for visualizing high dimensional data is the t-SNE proposed by Laurens van der Maaten and Geoffrey Hinton in the year 2008. This algorithm is a variation of an already existing algorithm called stochastic neighbor embedding (SNE) and was proposed in an attempt to overcome the limitations of SNE. Comparatively, t-SNE is much easier to optimize and yields significantly better embeddings than SNE. While many algorithms attempt to preserve the local geometry of the data, most of them are not capable of capturing both the local and global structure. But t-SNE was able to resolve this issue and provide better visualizations than other techniques. An extensive discussion on how t-SNE works is presented in Chapter 12.
To put it in a nutshell, in each chapter, we will elaborately discuss these unsupervised learning approaches to the problem of dimensionality reduction by understanding the intuition behind these algorithms using theoretical and mathematical explanations. We will also evaluate each of these algorithms for their strengths and weaknesses. The practical use cases of the algorithms are briefly discussed. Additionally, illustrative examples with visualizations of data are provided for each technique along with a hands-on tutorial that includes step-by-step explanation with code for dimensionality reduction and visualization of data. Finally, a section exclusively for a comparative analysis of all these algorithms is presented to help in assessing the performance of each of the algorithms on a given dataset and choosing the best suitable algorithm for the purpose of dimensionality reduction of the data.

2

Principal Component Analysis (PCA)

2.1 EXPLANATION AND WORKING

Principal Component Analysis (PCA), a feature extraction method for dimensionality reduction, is one of the most popular dimensionality reduction techniques. We want to reduce the number of features of the dataset (dimensionality of the dataset) and preserve the maximum possible information from the original dataset at the same time. PCA solves this problem by combining the input variables to represent it with fewer orthogonal (uncorrelated) variables that capture most of its variability [1].
Let the dataset contain a set of n data points denoted by x1, x2, …, xn where each xi is a d-dimensional vector. PCA finds a p-dimensional linear subspace (where p < d, and often pd) in a way that the original data points lie mainly on this p-­dimensional linear subspace. In practice, we do not usually find a reduced subspace where all the points lie precisely in that subspace. Instead, we try to find the approximate subspace which retains most of the variability of data. Thus, PCA tries to find the linear subspace in which the data approximately lies.
Th...

Table of contents