eBook - ePub

Computational Learning Approaches to Data Analytics in Biomedical Applications

Name: Computational Learning Approaches to Data Analytics in Biomedical Applications
ISBN: 9780128144831

310 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Computational Learning Approaches to Data Analytics in Biomedical Applications

About this book

Computational Learning Approaches to Data Analytics in Biomedical Applications provides a unified framework for biomedical data analysis using varied machine learning and statistical techniques. It presents insights on biomedical data processing, innovative clustering algorithms and techniques, and connections between statistical analysis and clustering. The book introduces and discusses the major problems relating to data analytics, provides a review of influential and state-of-the-art learning algorithms for biomedical applications, reviews cluster validity indices and how to select the appropriate index, and includes an overview of statistical methods that can be applied to increase confidence in the clustering framework and analysis of the results obtained. - Includes an overview of data analytics in biomedical applications and current challenges - Updates on the latest research in supervised learning algorithms and applications, clustering algorithms and cluster validation indices - Provides complete coverage of computational and statistical analysis tools for biomedical data analysis - Presents hands-on training on the use of Python libraries, MATLAB® tools, WEKA, SAP-HANA and R/Bioconductor

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Technology & Engineering

Subtopic

Biology

Index

Technology & Engineering

Introduction

Abstract

Chapter 1 is an introduction chapter, which illustrates the importance of data analytics in technology transformation in biomedical applications. The chapter provides a historical overview for the effects of data science on industry in general with focus on its effect in the medical field.

The chapter discusses the importance of data analysis automation. Then it discusses the challenges and the most important factors that form and guide the development of the field of data analysis. Such as the high-performance computers, large-scale data sets and other factors.

The second half of the chapter is dedicated to exploring the contents of the book through discussing the topics and approaches covered in this book. Chapter 1 includes brief overviews on all the remaining chapters, which provide a glimpse on the contents of the remaining chapters.

Keywords

Brute-force solutions; Cluster validation indices; Computational Intelligence; Data analytics; Graphics processing units; Visualization

Data Analytics has become a transformational technology, fueled by advances in computer and sensor hardware, storage, cloud computing, pervasive data collection, and advances in algorithms, particularly machine learning. No single one of these would have fueled the advance nearly so much as their convergence has. This fundamentally changed the way many industries do business. Data analytics is even revolutionizing industries that were not formerly dominated by data, such as transportation. In biomedical applications, although data does not play the central role, learning from it is playing an increasingly important one. The ability to visualize data has arguably been one of the most curative improvements in the history of medicine. But this visualization had been dominated by a human. Similar comments may be made about other tasks created by biomedical data. “Manual,” that is, human-dominated, data analysis is no longer sustainable. These tasks will increasingly be delegated to computers.

The techniques in this book are a large part of the reason for this shift. It is increasingly becoming possible, even necessary, for machines to do much of the analytics for us. The sheer growth in the volume of data is one reason for this. For example, healthcare data were predicted to increase from approximately 500 petabytes in 2012 to 25,000 petabytes by 2020 (Roski, Bo-Linn, & Andrews, 2014). This represents a faster increase than Moore's Law, even as that helpful phenomenon ends (Chien & Karamcheti, 2013; Theis & Wong, 2017). The situation would be hopeless were it not for increased automation in analysis.

Fortunately, automated analysis has improved dramatically. Some approaches were previously difficult because of the computational complexity of solving for parameters of large systems. The increased amount of data, although a challenge, can actually be helpful. Some of the most sophisticated techniques demand huge datasets because without them, systems with many parameters are subject to overfitting. Having enough data would have been cold comfort when most of these techniques were invented due to the high computational cost of solving for the parameters. The dramatic advances in high-performance computing have mitigated this impediment. This began long ago when the availability of parallel computing tools accelerated with expanded use of Graphics Processing Units. If you see a person addicted to electronic games, take a moment to thank him or her. That industry has grown larger than the movie and music industry combined (Nath, 2016). At the time, this was much larger than the market for high-performance computing and was a major driver for cost reductions. The visibility of Computational Intelligence methods has increased to the point that our industry has legs of its own, but the transformational cost reductions originated in the gaming industry, which continues to be an important driver of innovation.

However, not all of the techniques we cover are computationally intensive. Some of them are linear or log-linear in computational complexity, that is, very fast. These are suitable for embedded applications such as next-generation smart sensors, wearable and monitoring technology, and much more. These include selected neural network and statistical clustering techniques. Some of these techniques are classics, which makes sense, since earlier computing techniques were limited in performance. The speedier of the recent innovations described herein also offer surprising computational efficiency.

A related issue is the emergence of non-volatile memory technology (Bourzac, 2017; Merrikh-Bayat, Shouraki, & Rohani, 2011). This will lower the cost point and more importantly the energy consumption of embedded computing solutions. Much more of the burden of computation can now be shifted to cheaper, energy-efficient memory. Memory-intensive solutions to biomedical problems will therefore become much more pervasive. This alone would be a game-changer but there's more. This technology also offers an opportunity to directly implement learning algorithms on almost any device or object (Versace, Kozma, & Wunsch, 2012). For the smallest, cheapest devices, these may tend to be the simpler algorithms, for example, a neural network with a very small number of layers and/or neurons. However, this simplicity will be balanced by the sheer number of devices or objects which can benefit from an embedded neural network. In the future, it will be much easier to determine whether medications have been taken, objects have been used, patterns of usage of just about anything, and much more. As valuable as these innovations will be, the use of such information in aggregate offers even greater potential. Unleashing the potential of the resulting massive datasets will be enhanced by the ability to process from “the edge,” that is close to the point where data is generated. Without this capability, even the most powerful systems will choke on the volume of increasing data.

Nevertheless, methods of extraordinary computational complexity are no longer prohibitively difficult. The advances in low-cost, high-performance computing power is only part of the solution. As the business potential of improved computational intelligence capabilities has become obvious, more investment has followed the opportunity. The willingness of major corporations (such as Google (D'Orbino, 2017)) to invest billions in this technology has enabled brute-force solutions to even the most intractable and difficult problems (Silver et al., 2016). It has become increasingly apparent that the companies that win the computational intelligence competition will dominate their industries. Almost no investment is too large to achieve that objective. This includes high-performance computers and even application-specific integrated circuits optimized for machine learning applications (Jouppi, 2016). The resources that can be targeted at a problem are limited only by the importance of the problem. This is an especially significant opportunity for medical applications.

Another crucial factor is that the human learning curve has improved. Many of the techniques we describe formerly required experts to implement them. Now, many tools have been devised that allow rapid prototyping or even complete solutions by relatively new practitioners. This trend is accelerating. For example, TensorFlow, Python machine learning libraries, Weka, MATLAB toolboxes, open-source repositories at many research labs, and several other products enable more practitioners to implement algorithms and techniques.

Moreover, these enhanced software and hardware capabilities are constantly becoming more widely available. Some of the tools can be tried out by anyone for free. Others are available for free or at modest cost depending on the novelty and importance of the application. Many companies have announced ambitious plans to expand widespread availability of their tools. Expensive as this may seem, it is actually an astute business decision; for it provides the opportunities for companies to establish their software and hardware as de facto standards for large classes of applications. This will remove what was once a massive barrier to entry for new innovations in the field, while simultaneously creating barriers to entry for corporate competitors.

The result of these factors is that almost no area of medicine will be untouched by the extraordinary changes described here. This will create heretofore unimagined opportunities to prevent, cure or mitigate diseases. Familiarity with the technology described herein will help tremendously to bring about this desirable outcome.

A brief comment on terminology is appropriate here. The topics discussed in this book are highly multidisciplinary, and each discipline uses its own terminology. Different disciplines sometimes use the same word for different meanings, subtle or significant. For most of this book we will discuss terminology as it occurs. However, we will differentiate here between variables and features. In this book, we consider a variable to be anything that can vary. Therefore, the value of an observation, a vector, or a component of a vector can all be considered variables. A feature is considered to be something that has been processed or decided on in some way. It can be a single variable, a set of variables, or even a function of some variables. The main thing is that a human or algorithm should have selected it, in order for it to be called a feature. Certain fields use these terms differently but this is how we will use them in this book.

For most of this book's chapters, a large body and growing body of literature exists. Therefore, none of these are intended to be comprehensive. Rather, they are illustrative, and together they give the reader a perspective on the range of available tools. Much of the best work in this field is collaborative. An appreciation of these methods will allow the reader to become a more effective collaborator with domain experts or computational intelligence practitioners.

The remaining chapters in this book are organized as follows.

Chapter two presents a general framework for data curation. It covers the different phases of data preprocessing and preparation. The framework fits a broad variety of datasets. This chapter provides a detailed overview for the most popular algorithms and techniques for data curation, imputation, feature extraction, correlation analysis, and practical application of these algorithms. We also provide techniques that have been developed from our experience in data processing. At the end of Chapter two, we present a practical example showing the effect of using different imputation methods on the performance and efficiency of Support Vector Machines. The chapter describes a methodology for converting raw and messy data to a well-organized data set that is ready for applying high level machine learning algorithms or any advanced methods of data analysis.

Chapter three is an overview of clustering algorithms. See (Xu & Wunsch, 2005) for a survey (Xu & Wunsch, 2010), for a biomedical engineering survey, or (Xu & Wunsch, 2009), ...

Cover image
Title page
Table of Contents
Copyright
Preface and Acknowledgements
1. Introduction
2. Data preprocessing
3. Clustering algorithms
4. Selected approaches to supervised learning
5. Statistical analysis tools
6. Genomic data analysis
7. Evaluation of cluster validation metrics
8. Data visualization
9. Data analysis and machine learning tools in MATLAB and Python
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Computational Learning Approaches to Data Analytics in Biomedical Applications by Khalid Al-Jabery,Tayo Obafemi-Ajayi,Gayla Olbricht,Donald Wunsch in PDF and/or ePUB format, as well as other popular books in Technology & Engineering & Biology. We have over one million books available in our catalogue for you to explore.