Historical background
Linus Pauling first used the term “molecular disease” in 1949, after the discovery that the structure of sickle cell hemoglobin differed from that of normal hemoglobin. Indeed, it was this seminal observation that led to the concept of molecular medicine, the description of disease mechanisms at the level of cells and molecules. However, until the development of recombinant DNA technology in the mid-1970s, knowledge of events inside the cell nucleus, notably how genes function, could only be the subject of guesswork based on the structure and function of their protein products. However, as soon as it became possible to isolate human genes and to study their properties, the picture changed dramatically.
Progress over the last 30 years has been driven by technological advances in molecular biology. At first it was possible only to obtain indirect information about the structure and function of genes by DNA/DNA and DNA/RNA hybridization; that is, by probing the quantity or structure of RNA or DNA by annealing reactions with molecular probes. The next major advance was the ability to fractionate DNA into pieces of predictable size with bacterial restriction enzymes. This led to the invention of a technique that played a central role in the early development of human molecular genetics, called Southern blotting after the name of its developer, Edwin Southern. This method allowed the structure and organization of genes to be studied directly for the first time and led to the definition of a number of different forms of molecular pathology.
Once it was possible to fractionate DNA, it soon became feasible to insert the pieces into vectors able to divide within bacteria. The steady improvement in the properties of cloning vectors made it possible to generate libraries of human DNA growing in bacterial cultures. Ingenious approaches were developed to scan the libraries to detect genes of interest; once pinpointed, the appropriate bacterial colonies could be grown to generate larger quantities of DNA carrying a particular gene. Later it became possible to sequence these genes, persuade them to synthesize their products in microorganisms, cultured cells, or even other species, and hence to define their key regulatory regions.
The early work in the field of human molecular genetics focused on diseases in which there was some knowledge of the genetic defect at the protein or biochemical level. However, once linkage maps of the human genome became available, following the identification of highly polymorphic regions of DNA, it was possible to search for any gene for a disease, even where the cause was completely unknown. This approach, first called reverse genetics and later rechristened positional cloning, led to the discovery of genes for many important diseases.
As methods for sequencing were improved and automated, thoughts turned to the next major goal in this field, which was to determine the complete sequence of the bases that constitute our genes and all that lies between them: the Human Genome Project. This remarkable endeavor was finally completed in 2006. The further understanding of the functions and regulation of our genes will require multidisciplinary research encompassing many different fields. The next stage in the Human Genome Project, called genome annotation, entails analyzing the raw DNA sequence in order to determine its biological significance. One of the main ventures in the era of functional genomics will be in what is termed proteomics, the large-scale analysis of the protein products of genes. The ultimate goal will be to try to define the protein complement, or proteome, of cells and how the many different proteins interact with one another. To this end, large-scale facilities are being established for isolating and purifying the protein products of genes that have been expressed in bacteria. Their structure can then be studied by a variety of different techniques, notably X-ray crystallography and nuclear magnetic resonance spectroscopy. The crystallographic analysis of proteins is being greatly facilitated by the use of X-ray beams from a synchrotron radiation source.
In the last few years both the utility and extreme complexity of the fruits of the genome project have become apparent. The existence of thousands of single-nucleotide polymorphisms (SNPs) has made it possible to search for genes of biological or medical significance. The discovery of families of regulatory RNAs and proteins is starting to shed light on how the functions of the genome are controlled, and studies of acquired changes in its structure, epigenetics, promise to provide similar information. Recent developments in new-generation sequencing of DNA and RNA are also providing invaluable information about many aspects of gene regulation.
During this remarkable period of technical advance, considerable progress has been made toward an understanding of the pathology of disease at the molecular level. This has had a particular impact on hematology, leading to advances in the understanding of gene function and disease mechanisms in almost every aspect of the field.
The inherited disorders of hemoglobin – the thalassemias and structural hemoglobin variants, the commonest human monogenic diseases – were the first to be studied systematically at the molecular level and a great deal is known about their genotype–phenotype relationships. This field led the way to molecular hematology and, indeed, to the development of molecular medicine. Thus, even though the genetics of hemoglobin is complicated by the fact that different varieties are produced at particular stages of human development, the molecular pathology of the hemoglobinopathies provides an excellent model system for understanding any monogenic disease and the complex interactions between genotype and environment that underlie many multigenic disorders.
In this chapter I consider the structure, synthesis, and genetic control of the human hemoglobins, describe the molecular pathology of the thalassemias, and discuss briefly how the complex interactions of their different genotypes produce a remarkably diverse family of clinical phenotypes; the structural hemoglobin variants are discussed in more detail in Chapter 14. Readers who wish to learn more about the methods of molecular genetics, particularly as applied to the study of hemoglobin disorders, are referred to the reviews cited at the end of this chapter.