Bioinformatics
eBook - ePub

Bioinformatics

Genes, Proteins and Computers

Christine Orengo, David Jones, Janet Thornton, Christine Orengo, David Jones, Janet Thornton

Buch teilen
  1. 320 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfĂŒgbar
eBook - ePub

Bioinformatics

Genes, Proteins and Computers

Christine Orengo, David Jones, Janet Thornton, Christine Orengo, David Jones, Janet Thornton

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Bioinformatics, the use of computers to address biological questions, has become an essential tool in biological research. It is one of the critical keys needed to unlock the information encoded in the flood of data generated by genome, protein structure, transcriptome and proteome research.
Bioinformatics: Genes, Proteins & Computers covers both the more traditional approaches to bioinformatics, including gene and protein sequence analysis and structure prediction, and more recent technologies such as datamining of transcriptomic and proteomic data to provide insights on cellular mechanisms and the causes of disease.

HĂ€ufig gestellte Fragen

Wie kann ich mein Abo kĂŒndigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kĂŒndigen“ – ganz einfach. Nachdem du gekĂŒndigt hast, bleibt deine Mitgliedschaft fĂŒr den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich BĂŒcher herunterladen?
Derzeit stehen all unsere auf MobilgerĂ€te reagierenden ePub-BĂŒcher zum Download ĂŒber die App zur VerfĂŒgung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die ĂŒbrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den AboplÀnen?
Mit beiden AboplÀnen erhÀltst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst fĂŒr LehrbĂŒcher, bei dem du fĂŒr weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhĂ€ltst. Mit ĂŒber 1 Million BĂŒchern zu ĂŒber 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
UnterstĂŒtzt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nÀchsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Bioinformatics als Online-PDF/ePub verfĂŒgbar?
Ja, du hast Zugang zu Bioinformatics von Christine Orengo, David Jones, Janet Thornton, Christine Orengo, David Jones, Janet Thornton im PDF- und/oder ePub-Format sowie zu anderen beliebten BĂŒchern aus Sciences biologiques & Biotechnologie. Aus unserem Katalog stehen dir ĂŒber 1 Million BĂŒcher zur VerfĂŒgung.

Information

Jahr
2003
ISBN
9781135325237
1
Molecular evolution
Sylvia B. Nagl
Concepts
‱ Information is a measure of order that can be applied to any structure or system. It quantifies the instructions needed to produce a certain organization and can be expressed in bits. Large biomolecules have very high information content.
‱ The concept of the gene has undergone many changes. New concepts are emerging that define genes as functional units whose action is dependent on biological context.
‱ Multigene and multidomain families have arisen by gene duplication in genomes. Complete and partial gene duplication can occur by unequal crossing over, unequal sister chromatid exchange and transposition.
‱ Sequences or protein structures are homologous if they are related by evolutionary divergence from a common ancestor. Homology cannot be directly observed, but must be inferred from sequence or structural similarity.
1.1
Molecular evolution is a fundamental part of bioinformatics
Genomes are dynamic molecular entities that evolve over time due to the cumulative effects of mutation, recombination, and selection. Before we address bioinformatics techniques for analysis of evolutionary relationships between biological sequences, and between protein structures, in later chapters, we will first survey these mechanisms that form the basis of genome evolution.
1.1.1
A brief history of the gene
The systematic study of the laws of heredity began with the work of Gregor Mendel (1822–1884). In 1865, Mendel who was an Augustinian monk living in Brno, then part of the Austro-Hungarian Empire, published a paper describing the results of plant-breeding experiments that he had begun in the gardens of his monastery almost a decade earlier. Mendel’s work received little attention during his lifetime, but when his paper was rediscovered by biologists in 1900, a scientific revolution ensued. The key concept of the Mendelian revolution is that heredity is mediated by discrete units that can combine and dissociate in mathematically predictable ways.
Mendel had studied physics and plant physiology at the University of Vienna, where he was also introduced to the new science of statistics. His experimental methods show the mental habits of a physicist, and in looking for mathematical patterns of inheritance, he became the first mathematical biologist. He set out to arrange in a statistically accurate way the results of deliberate crosses between varieties of sweet pea plants with characteristics that could be easily identified. His elegant experiments led to a clear distinction between genotype (the hereditary make-up of an organism) and phenotype (the organism’s physical and behavioral characteristics), and his results on the pattern of inheritance have become known as Mendel’s law.
Mendel was the first to refer to hypothetical ‘factors’ that act as discrete units of heredity and are responsible for particular phenotypic traits. The rediscovery of his work at the beginning of the 20th century prompted a search for the cellular and molecular basis of heredity. The existence of ‘germ plasm’ was postulated, a material substance in eggs and sperm that in some way carried heritable traits from parent to offspring. While the molecular basis of hereditary factors—protein versus nucleic acid—remained in dispute until the mid-20th century, their cellular basis in chromosomes was soon discovered. In 1909, W.Johannsen coined the word gene to denote hypothetical particles that are carried on chromosomes and mediate inheritance.
In sexually reproducing diploid organisms, such as Mendel’s pea plants and the fruit fly Drosophila melanogaster used in the breeding experiments of early Mendelians, the pattern of inheritance of some phenotypic traits could be explained by postulating a pair of genes underlying each trait—a pair of alleles occupying a locus on a chromosome. It was recognized early on that a single trait might be caused by several genes (polygenic traits) and that a single gene may have several effects (pleiotropy).
It is important to realize that ‘gene’ was an abstract concept to Mendel, and the Mendelian biologists of the first half of the 20th century. The founders of genetics, not having any knowledge of the biochemical basis of heredity, had to infer the characteristics of genes by observing the phenotypic outcomes of their breeding experiments. They developed a theoretical framework based on sound mathematics, now called classical genetics, that worked extremely well and is still useful today for the interpretation of the data obtained by the new molecular genetics.
Molecular genetics seeks to elucidate the chemical nature of the hereditary material and its cellular environment. At the beginning of the 1950s, it finally became clear that DNA was the critical ingredient of the genes. Rosalind Franklin (1920–1958), a crystallographer working at King’s College in London, conducted a careful analysis of DNA using X-ray diffraction which indicated that the macromolecule possessed a double-stranded helical geometry. In 1953, James Watson and Francis Crick produced a successful model of the molecular structure of DNA based on her data by employing the molecular-model building method pioneered by Linus Pauling (1901–1994).
It was clear as soon as the structure of DNA was elucidated that its central role depends on the fact that it can be both replicated and read. This has given rise to the concept of genetic information being encoded in DNA (see section 1.1.2). This idea is often expressed by metaphors that conceive of a gene as a ‘word’ and a genome, the total genetic material of a species, as a linguistic ‘text’ written in DNA code.
Over the ensuing five decades, rapidly accumulating knowledge on the fine structure of DNA and its functional organization has led to many critical adjustments and refinements in our understanding of the complex roles of genetic material in the cell. These discoveries all highlight the crucial contributions of the cellular environment in regulating the effects of DNA sequences on an organism’s phenotype. The causal chain between DNA and phenotype is indirect, different cellular environments link identical DNA sequences to quite different phenotypic outcomes. One of the outcomes of the Human Genome Project is a significantly expanded view of the role of the genome within the integrated functioning of cellular systems.
The concept of the gene has undergone a profound transformation in recent years (Figure 1.1). For molecular biology, the traditional definition of gene action, originating from George Beadle’s one gene-one enzyme hypothesis (1941) led to the concept of the gene as a stretch of DNA that codes for a single polypeptide chain. But even ignoring the fact that open reading frames might overlap, the relationship between DNA sequences and protein chains is many-to-many, not one-to-one. Now, the essential contributions of alternative splicing, RNA editing and post-translational modifications to the synthesis of the actual gene product have become recognized, and with them the limitations of the classic gene concept.
The gene as a unit of function can no longer be taken to be identical with the gene as a unit of intergenerational transmission of molecular information. New concepts are emerging that place emphasis on a functional perspective and define genes as ‘segments of DNA that function as functional units’, ‘loci of cotranscribed exons’ or, along similar lines, ‘distinct transcription units or parts of transcription units that can be translated to generate one or a set of related amino acid sequences’. Another less widely adopted, yet, from a functional perspective, valid definition has recently been proposed by Eva Neumann-Held which states that a gene ‘is a process that regularly results, at some stage in development, in the production of a particular protein.’ This process centrally involves a linear sequence of DNA, some parts of which correspond to the protein via the genetic code. However, the concept of gene-as-process could also include such entities as coding regions for transcription factors that bind to its regulatory sequences, coding regions for RNA editing and splicing factors, the regulatory dynamics of the cell as a whole and the signals determining the specific nature of the final transcript, and beyond that, the final protein product. In conclusion, diverse interpretations of the concept of the gene exist and the meaning in which the term is applied can only be made clear by careful definition. At the very least, inclusion or exclusion of introns, regulatory regions and promoters need to be made explicit when we speak of a gene from the perspective of molecular biology.
1.1.2
What is information?
Biological, or genetic, information is a fundamental concept of bioinformatics. Yet what exactly is information? In physics, it is understood as a measure of order that can be applied to any structure or system. The term ‘information’ is derived from the Latin informare, which means to ‘form’, to ‘shape’, to ‘organize’. The word ‘order’ has its roots in textile weaving; it stems from the Latin ordiri, to ‘lay the warp’. Information theory, pioneered by Claude Shannon, is concerned with information as a universal measure that can be applied equally to the order contained in a hand of playing cards, a musical score, a DNA or protein sequence, or a galaxy.
Information quantifies the instructions needed to produce a certain organization. Several ways to achieve this can be envisaged, but a particularly parsimonious one is in terms of binary choices. Following this approach, we compute information inherent in any given arrangement of matter from the number of ‘yes’ and ‘no’ choices that must be made to arrive at a particular arrangement among all equally possible ones (Figure 1.2). In his book The Touchstone of Life (1999), Werner Loewenstein illustrates this with the following thought experiment. Suppose you are playing bridge and are dealt a hand of 13 cards. There are about 635×109 different hands of 13 cards that can occur in this case, so the probability that you would be dealt this hand of cards is about 1 in 635×109—a large number of choices would need to be made to produce this exact hand. In other words, a particular hand of 13 cards contains a large amount of information.
Order refers to the structural arrangement of a system, something that is easy to understand in the case of a warp for weaving cloth but is much harder to grasp in the case of macromolecular structures. We can often immediately see whether an everyday structure is orderly or disorderly—but this intuitive notion does not go beyond simple architectural or periodic features. The intuition breaks down when we deal with macromolecules, like DNA, RNA and proteins. The probability for spontaneous assembly of such molecules is extremely low, and their structural specifications require enormous amounts of information since the number of ways they can be assembled as linear array of their constituent building blocks, nucleotides and amino acids, is astronomical. Like being dealt a particular hand during a bridge game, the synthesis of a particular biological sequence is very unlikely—its information content is therefore very high.
Image
Figure 1.1
From gene to folded protein
The large molecules in living organisms offer the most striking example of information density in the universe. In human DNA, roughly 3 × 109 nucleotides are strung together on the scaffold of the phosphate backbone in an aperiodic, yet perfectly determined, sequence. Disregarding spontaneous somatic mutations, all the DNA molecules of an individual display the same sequence. We are not yet able to precisely calculate the information inherent in the human genome, but we can get an idea of its huge information storing capacity with a simple calculation. The positions along the linear DNA sequence, that can be occupied by one of the four types of DNA bases, represent the elements of stored information (Figure 1.2). So, with 3×109 positions and four possible choices for each position, there are 43,000,000,000 possible states. The number of possibilities is greater than the estimated number of particles in the universe.
Image
Figure 1.2
What is information? Picking a particular playing card from a pack of eight cards requires 3 yes/no choices (binary choices of 0 or 1). The information required can be quantified as ‘3 bits’. Likewise, picking one nucleotide among all four equally likely ones (A, G, T, C) requires 2 choices (2 bits).
Charles Darwin pointed out in Origin of Species how natural selection could gradually accumulate information about biological structures through the processes of random genotypic variation, natural selection and differential reproduction. What Darwin did not know was exactly how this information is stored, passed on to offspring and modified during evolution—this is the subject of the study of molecular evolution.
1.1.3
Molecular evolution
This section provides an overview of the processes giving rise to the dynamics of evolutionary change at the molecular level. It primarily focuses on biological mechanisms relevant to the evolution of genomic sequences encoding polypeptide chains.
1.1.3.1
The algorithmic nature of molecular evolution
In his theory of evolution, Darwin identified three major features of the process that occur in an endlessly repeating cycle: generation of heritable variation by random mutation at the level of the genotype, natural selection acting on the phenotype, and differential reproductive success. Darwin discovered the power of cumulative algorithmic selection, although he lacked the te...

Inhaltsverzeichnis