eBook - ePub

Informatics and Machine Learning

Name: Informatics and Machine Learning
ISBN: 9781119716761

From Martingales to Metaheuristics

Stephen Winters-Hilt,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Informatics and Machine Learning

From Martingales to Metaheuristics

Stephen Winters-Hilt,

About this book

Informatics and Machine Learning

Discover a thorough exploration of how to use computational, algorithmic, statistical, and informatics methods to analyze digital data

Informatics and Machine Learning: From Martingales to Metaheuristics delivers an interdisciplinary presentation on how analyze any data captured in digital form. The book describes how readers can conduct analyses of text, general sequential data, experimental observations over time, stock market and econometric histories, or symbolic data, like genomes. It contains large amounts of sample code to demonstrate the concepts contained within and assist with various levels of project work.

The book offers a complete presentation of the mathematical underpinnings of a wide variety of forms of data analysis and provides extensive examples of programming implementations. It is based on two decades worth of the distinguished author's teaching and industry experience.

A thorough introduction to probabilistic reasoning and bioinformatics, including Python shell scripting to obtain data counts, frequencies, probabilities, and anomalous statistics, or use with Bayes' rule
An exploration of information entropy and statistical measures, including Shannon entropy, relative entropy, maximum entropy (maxent), and mutual information
A practical discussion of ad hoc, ab initio, and bootstrap signal acquisition methods, with examples from genome analytics and signal analytics

Perfect for undergraduate and graduate students in machine learning and data analytics programs, Informatics and Machine Learning: From Martingales to Metaheuristics will also earn a place in the libraries of mathematicians, engineers, computer scientists, and life scientists with an interest in those subjects.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Wiley

Year

2021

Print ISBN

9781119716747

Edition

eBook ISBN

9781119716761

Topic

Mathematics

Subtopic

Probability & Statistics

Index

Mathematics

1
Introduction

Informatics provides new avenues of understanding and inquiry in any medium that can be captured in digital form. Areas as diverse as text analysis, signal analysis, and genome analysis, to name a few, can be studied with informatics tools. Computationally powered informatics tools are having a phenomenal impact in many fields, including engineering, nanotechnology, and the biological sciences (Figure 1.1).

In this text I provide a background on various methods from Informatics and Machine Learning (ML) that together comprise a “complete toolset” for doing data analytics work at all levels – from a first year undergraduate introductory level to advanced topics in subsections suitable for graduate students seeking a deeper understanding (or a more detailed example). Numerous prior book, journal, and patent publications by the author are drawn upon extensively throughout the text [1–68]. Part of the objective of this book is to bring these examples together and demonstrate their combined use in typical signal processing situations. Numerous other journal and patent publications by the author [69–100] provide related material, but are not directly drawn upon this text. The application domain is practically everything in the digital domain, as mentioned above, but in this text the focus will be on core methodologies with specific application in informatics, bioinformatics, and cheminformatics (nanopore detection, in particular). Other disciplines can also be analyzed with informatics tools. Basic questions about human origins (anthrogenomics) and behavior (econometrics) can also be explored with informatics‐based pattern recognition methods, with a huge impact on new research directions in anthropology, sociology, political science, economics, and psychology. The complete toolset of statistical learning tools can be used in any of these domains.

In the chapter that follows an overview is given of the various information processing stages to be discussed in the text, with some highlights to help explain the order and connectivity of topics, as well as motivate their presentation in further detail in what is to come.

Schematic illustration of a Penrose tiling. A non-repeating tiling with two shapes of tiles, with 5-point local symmetry and both local and global (emergent) golden ratio. — **Figure 1.1** A Penrose tiling. A non‐repeating tiling with two shapes of tiles, with 5‐point local symmetry and both local and global (emergent) golden ratio.

1.1 Data Science: Statistics, Probability, Calculus … Python (or Perl) and Linux

Knowledge construction using statistical and computational methods is at the heart of data science and informatics. Counts on data features (or events) are typically gathered as a starting point in many analyses [101, 102]. Computer hardware is very well suited to such counting tasks. Basic operating system commands and a popular scripting language (Python) will be taught to enable doing these tasks easily. Computer software methods will also be shown that allow easy implementation and understanding of basic statistical methods, whereby the counts, for example, can be used to determine event frequencies, from which statistical anomalies can be subsequently identified. The computational implementation of basic statistics methods then provides the framework to perform more sophisticated knowledge construction and discovery by use of information theory and basic ML methods. ML can be thought of as a specialized branch of statistics where there is minimal assumption of a statistical “model” based on prior human learning. This book shows how to use computational, statistical, and informatics/algorithmic methods to analyze any data that is captured in digital form, whether it be text, sequential data in general (such as experimental observations over time, or stock market/econometric histories), symbolic data (genomes), or image data. Along the way there will be a brief introduction to probability and statistics concepts (Chapter 2) and basic Python/Linux system programming methods (Chapter 2 and Appendix A).

1.2 Informatics and Data Analytics

It is common to need to acquire a signal where the signal properties are not known, or the signal is only suspected and not discovered yet, or the signal properties are known but they may be too much trouble to fully enumerate. There is no common solution, however, to the acquisition task. For this reason the initial phases of acquisition methods unavoidably tend to be ad hoc. As with data dependency in non‐evolutionary search metaheuristics (where there is no optimal search method that is guaranteed to always work well), here there is no optimal signal acquisition method known in advance. In what follows methods are described for bootstrap optimization in signal acquisition to enable the most general‐use, almost “common,” solution possible. The bootstrap algorithmic method involves repeated passes over the data sequence, with improved priors, and trained filters, among other things, to have improved signal acquisition on subsequent passes. The signal acquisition is guided by statistical measures to recognize anomalies. Informatics methods and information theory measures are central to the design of a good finite state automata (FSAs) acquisition method, and will be reviewed in signal acquisition context in Chapters 2–4. Code examples are given in Python and C (with introductory Python described in Chapter 2 and Appendix A). Bootstrap acquisition methods may not automatically provide a common solution, but appear to offer a process whereby a solution can be improved to some desirable level of general‐data applicability.

The signal analysis and pattern recognition methods described in this book are mainly applied to problems involving stochastic sequential data: power signals and genomic sequences in particular. The information modeling, feature selection/extraction, and feature‐vector discrimination, however, were each developed separately in a general‐use context. Details on the theoretical underpinnings are given in Chapter 3, including a collection of ab initio information theory tools to help “find your way around in the dark.” One of the main ab initio approaches is to search for statistical anomalies using information measures, so various information measures will be described in detail [103–115].

The background on information theory and variational/statistical modeling has significant roots in variational calculus. Chapter 3 describes information theory ideas and the information “calculus” description (and related anomaly detection methods). The involvement of variational calculus methods and the possible parallels with the nascent development of a new (modern) “calculus of information” motivates the detailed overview of the highly successful physics development/applications of the calculus of variations (Appendix B). Using variational calculus, for example, it is possible to establish a link between a choice of information measure and statistical formalism (maximum entropy, Section 3.1). Taking the maximum entropy on a distribution with moment constraints leads to the classic distributions seen in mathematics and nature (the Gaussian for fixed mean and variance, etc.). Not surprisingly, variational methods also help to establish and refine some of the main ML methods, including Neural Nets (NNs) (Chapters 9, 13) and Support Vector Machines (SVM) (Chapter 10). SVMs are the main tool presented for both classification (supervised learning) and clustering (unsupervised learning), and everything in between (such as bag learning).

1.3 FSA‐Based Signal Acquisition and Bioinformatics

Many signal features of interest are time limited and not band limited in the observational context of interest, such as noise “clicks,” “spikes,” or impulses. To acquire these signal features a time‐domain finite state automaton (tFSA) is often most appropriate [116–124]. Human hearing, for example, is a nonlinear system that thereby circumvents the restrictions of the Gabor limit (to allow for musical geniuses, for example, who have “perfect pitch”), where time‐frequency acuity surpasses what would be possible by linear signal processing alone [116] , such as with Nyquist sampled linear response recording devices that are bound by the limits imposed by the Fourier uncertainty principle (or Benedick’s theorem) [117] . Thus, even when the powerful Fourier Transform or Hidden Markov Model (HMM) feature extraction methods are utilized to full advantage, there is often a sector of the signal analysis that is only conveniently accessible to ana...

Cover
Table of Contents
Title Page
Copyright Page
Dedication Page
Preface
1 Introduction
2 Probabilistic Reasoning and Bioinformatics
3 Information Entropy and Statistical Measures
4 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods
5 Text Analytics
6 Analysis of Sequential Data Using HMMs
7 Generalized HMMs (GHMMs)
8 Neuromanifolds and the Uniqueness of Relative Entropy
9 Neural Net Learning and Loss Bounds Analysis
10 Classification and Clustering
11 Search Metaheuristics
12 Stochastic Sequential Analysis (SSA)
13 Deep Learning Tools – TensorFlow
14 Nanopore Detection – A Case Study
Appendix A: Python and Perl System Programming in Linux
Appendix B: Physics
Appendix C: Math
References
Index
End User License Agreement

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Informatics and Machine Learning by Stephen Winters-Hilt in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over 1.5 million books available in our catalogue for you to explore.