eBook - ePub

Human Genome Informatics

Name: Human Genome Informatics
Author: Christophe Lambert,Darrol Baker,George P. Patrinos

Translating Genes into Health

Christophe Lambert,Darrol Baker,George P. Patrinos

314 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Human Genome Informatics

Translating Genes into Health

Christophe Lambert,Darrol Baker,George P. Patrinos

Book details

Book preview

Table of contents

Citations

About This Book

Human Genome Informatics: Translating Genes into Health examines the most commonly used electronic tools for translating genomic information into clinically meaningful formats. By analyzing and comparing interpretation methods of whole genome data, the book discusses the possibilities of their application in genomic and translational medicine. Topics such as electronic decision-making tools, translation algorithms, interpretation and translation of whole genome data for rare diseases are thoroughly explored. In addition, discussions of current human genome databases and the possibilities of big data in genomic medicine are presented.

With an updated approach on recent techniques and current human genomic databases, the book is a valuable source for students and researchers in genome and medical informatics. It is also ideal for workers in the bioinformatics industry who are interested in recent developments in the field.

Provides an overview of the most commonly used electronic tools to translate genomic information
Brings an update on the existing human genomic databases that directly impact genome interpretation
Summarizes and comparatively analyzes interpretation methods of whole genome data and their application in genomic medicine

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Human Genome Informatics an online PDF/ePUB?

Yes, you can access Human Genome Informatics by Christophe Lambert,Darrol Baker,George P. Patrinos in PDF and/or ePUB format, as well as other popular books in Medicine & Medical Theory, Practice & Reference. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Academic Press

Year

2018

ISBN

9780128134313

Topic

Medicine

Subtopic

Medical Theory, Practice & Reference

Chapter 1

Human Genome Informatics: Coming of Age

Christophe G. Lambert⁎; Darrol J. Baker^†; George P. Patrinos^‡^,^§^,^¶ ^⁎ Center for Global Health, Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM, United States
^† The Golden Helix Foundation, London, United Kingdom
^‡ Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
^§ Department of Pathology, College of medicine and Health Sciences, United Arab Emirates University, Al-Ain, United Arab Emirates
^¶ Department of Pathology—Bioinformatics Unit, Faculty of Medicine and Health Sciences, Erasmus University Medical Center, Rotterdam, The Netherlands

Abstract

Human genome informatics is the application of information theory, including computer science and statistics, to the field of human genomics. We frame the challenge of understanding the human genome and controlling disease processes in terms of computation (the Turing machine), Kolmogorov complexity, Occam's razor, the law of requisite variety, Moore's Law, and how abstraction, computation, and collaboration expand our capacity to intervene in genomic processes, despite their enormous complexity. We follow this with an overview of the application of human genome informatics in genomics research and genomic medicine, focusing in particular on informatics solutions for analysis of data resulting from high-throughput microarray-based genotyping and next-generation sequencing, cytogenetics analysis, proteomics and metabolomics analysis, and variant annotation and reporting. Special emphasis is also given on genomic databases, artificial intelligence and machine learning, and translational tools and solutions for pharmacogenomics, while we also allude to the concept of genomic data sharing as an important new trend in genomics with huge social and technical implications and challenges of realizing the full potential of collaborative science in genomics research and genomic medicine.

Keywords

Human genome informatics; Genomics; Genomic medicine; Computer science; Requisite variety; Complexity; Abstraction; Data sharing; Pharmacogenomics; Variant annotation; Next-generation sequencing; Microarray genotyping

Acknowledgments

We wish to cordially thank the authors of all chapters who have contributed significantly in putting together this unique textbook, dealing exclusively with the use of informatics in human genomics research and genomics medicine.

1.1 Introduction

Human genome informatics is the application of information theory, including computer science and statistics, to the field of human genomics. Informatics enlists computation to augment our capacity to form models of reality with diverse sources of information. When forming a model of reality, one engages in a process of abstraction. The word “abstraction” comes from the Latin abstrahere, which means to “draw away,” which is a metaphor, based in human vision, that as we back away from something, the details fall away and we form mental constructs about what we can discern from the more distant vantage point. That more distant vantage point both encompasses a greater portion of reality and yet holds in mind a smaller amount of detail about that larger space.

Given the human mind's limit on the number of variables it can manage, as we form our mental models of reality, we pay attention to certain facets of reality and ignore others, perhaps leaving them to subconscious or unconscious processing mechanisms. When we form models of reality, we have a field of perception that encompasses a subset of reality at a particular scale and a particular time horizon and that includes a subset of the variables at that spatio-temporal scale. Those variables are recursively composed using abstractive processes, for instance, by scale: an atom, a base pair, a gene, a chromosome, a strand of DNA, the nucleus, a cell, a tissue, an organ, an organ system, the human body, a family, a racial group defined by geography and heredity, or all of humanity. Note this abstraction sequence was only spatial and ignored time. Because our perceivable universe is seen through the lens of three spatial and one apparently nonreversible temporal dimension, the mental models we compose describe the transformations of matter-energy forwards through space-time. Let us relate this to information theory and computer science, then bring it back to genomics.

In the 1930s, Alan Turing introduced an abstract model of computation, called the Turing machine (Turing, 1937). The machine is comprised of an infinite linear blank tape with a tape head that can read/write/erase only the current symbol and can move one space to the left or right or remain stationary. This tape head is controlled by a controller that contains a finite set of states and contains the rules for operating the tape head, based only on the current state and the current symbol on the tape (the algorithm or program). Despite the simplicity of this model, it turns out that it can represent the full power of every algorithm that a computer can perform and is thus a universal model of computation.

Suppose we wanted an algorithm to write down the first billion digits of the irrational number π. We could create a Turing machine that had the billion digits embedded in the finite controller (the program) and we could run that program to write the digits to the tape one at a time. In this case, the length of the program would be proportional to the billion digits of output. This might be coded in a language like C ++ as: printf(“3.1415926[…]7,504,551”), with “[…]” filled in with the remaining digits. If a billion-digit number was truly random and had no regularity, this would approach being the shortest program that we could write (the information-theoretic definition of randomness). However, π is not a random number, but can be computed to an arbitrary number of digits via a truncated infinite series. An algorithm to perform a series approximation of π could thus be represented as a much shorter set of instructions.

In algorithmic information theory, the Kolmogorov complexity or descriptive complexity of a string is the length of the shortest Turing machine instruction set (i.e., shortest computer program) that can produce that string (Kolmogorov, 1963). We can think of the problem of modeling a subset of reality as generating a parsimonious algorithm that prints out a representation of the trajectory of a set of variables representing an abstraction of that subset of reality to some level of approximation. That is, we say, “under such and such conditions, thus and such will happen over a prescribed time period”. The idea of Kolmogorov complexity motivates the use of Occam's razor, where, given two alternate explanations of reality that explain it comparably well, we will choose the simpler one.

In our modeling of reality, we are not generally trying to express the state space transitions of the universe down to the level of every individual atom or quark in time intervals measured by Planck time units, but rather at some level of abstraction that is useful with respect to the outcomes we value in a particular context. Also, because reality has constraints (i.e., laws), and thus regularity, we can observe a small spatial-temporal subset of reality from models that not only describe that observed behavior, but also that generalize to predict the behavior of a broader subset of reality. That is, we don’t just model specific concrete observables in the here and now, but we model abstract notions of observables that can be applied beyond the here and now.

The most powerful models are the most universal, such as laws of physics, which are hypothesized to hold over all of reality and can thus be falsified if any part of reality fails to behave according to those laws, and yet, cannot be proven because all reality would have to be observed over all time. This then forms the basis of the scientific method where we form and falsify hypotheses but can never prove them. Unlike with hydrogen atoms or billiard balls where the units of observation may be considered in most contexts as near-identical, when we operate on abstractions such as cells, or people, we create units of observation that may have enormous differences.

1.2 From Informatics to Bioinformatics and Genome Informatics

In biology, we often blithely assume that the notion of ceteris paribus (all things being equal) holds, but it can lead us astray (Lambert and Black, 2012; Meehl, 1990). For instance, while genetics exists at a scale where ceteris paribus generally holds, we are nevertheless trying to draw relations with genetic variations at the molecular scale, with fuzzy phenotypes at the level of populations of nonidentical people.

So unlike our previous example of writing a program to generate the first billion digits of π, which has a very precise answer, our use of abstraction to model biology involves leaving out variables of small effect, which nevertheless, when left unaccounted for, may result in error when we extrapolate our projections of the future with abstract models. We would do well to mind the words of George Box, “all models are wrong, but some are useful”:

Since all models are wrong, the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary, following William of Occam, he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist, overelaboration and overparameterization is often the mark of mediocrity (Box, 1976).

How then do we choose what variables to study at what level of abstraction over what time scale? To begin to answer this question, it is useful to talk about control in the context of goal-directedness and to turn to a field that preceded and contributed to the development of computer science, namely Cybernetics. In 1958, Ross Ashby introduced the Law of Requisite Variety (Ashby, 1958). Variety is measured as the logarithm of the number of states available to a system. Control, when stripped of its negative connotations of coercion, can be defined as restricting the variety of a system to a subset of states that are valued and preventing the other states from being visited. For instance, an organism will seek to restrict its state space to healthy and alive ones. For every disturbance that can move a system from its current state to an undesirable one, the system must have a means of acting upon or regulating that disturbance. Ashby's example of a fencer staving off attack is helpful:

Again, if a fencer faces an opponent who has various modes of attack available, the fencer must be provided with at least an equal number of modes of defense if the outcome is to have the single value: attack parried.

(Ashby, 1958)

The law of requisite variety says that “variety absorbs variety,” and thus that the number of states of the regulator or control mechanism whose job is to keep a system in desirable states (i.e., absorb or reduce the variety of outcomes) must be at least as large as the number of disturbances that could put the system in an undesirable state. All organisms engage in goal-directed activity, the primary one being sustaining existence or survival. The fact that humanity has dominated as a species reflects our capacity to control our environment—to both absorb and enlist the variety of our environment in the service of sustaining health and life.

In computing, a universal Turing machine is a Turing machine that can simulate any Turing machine on arbitrary input. If DNA is the computer program for the “Turing machine of life,” the field of human genome informatics is metaphorically moving towards the goal of a universal Turing machine that can answer “what-if” questions about modifying the governing variables of life. Note, the computer science concept of self-modifying code also enriches this metaphor. In particular, cancer genomics addresses the situation where the DNA program goes haywire, creating cancer cells with distorted copies where portions of the genome are deleted, copied extra times, and/or rearranged. Self-modifying code in computer science is enormously difficult to debug and is usually discouraged. Similarly, in cancer, we acknowledge that it is too difficult to repair rapidly replicating agents of chaos, and thus, most treatments involve killing or removing the offending cancer cells. Also, with the advent of emerging technologies such as CRISPR genome editing, humanity is now poised on the threshold of directly modifying our genome (Cong et al., 2013). Such technologies, guided by understanding of the genome, have the potential to recode portions of the program of life in order to cure genetic diseases.

With the human genome having a state space of three billion base pairs times two sets of chromosomes, compounded by ...

Citation styles for Human Genome Informatics

APA 6 Citation

Lambert, C., Baker, D., & Patrinos, G. (2018). Human Genome Informatics ([edition unavailable]). Elsevier Science. Retrieved from https://www.perlego.com/book/1833365/human-genome-informatics-translating-genes-into-health-pdf (Original work published 2018)

Chicago Citation

Lambert, Christophe, Darrol Baker, and George Patrinos. (2018) 2018. Human Genome Informatics. [Edition unavailable]. Elsevier Science. https://www.perlego.com/book/1833365/human-genome-informatics-translating-genes-into-health-pdf.

Harvard Citation

Lambert, C., Baker, D. and Patrinos, G. (2018) Human Genome Informatics. [edition unavailable]. Elsevier Science. Available at: https://www.perlego.com/book/1833365/human-genome-informatics-translating-genes-into-health-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Lambert, Christophe, Darrol Baker, and George Patrinos. Human Genome Informatics. [edition unavailable]. Elsevier Science, 2018. Web. 15 Oct. 2022.