PATRICK K. CHAFFEYa, XIAOYANG GUANa, LAI-XI WANG,b AND ZHONGPING TANa
1.1 Introduction
The Human Genome Project has revealed that the entire human genome comprises only about 20 000 genes, while the total number of proteins in the human proteome is estimated to be between 250 000 and one million.1 This severe disparity has made it evident that the human proteome is far more complex than the human genome. Such vast proteomic diversity results from a variety of mechanisms that act at each stage of the protein production process. At the level of transcriptional regulation, there are genomic recombination events, differential transcription initiation and termination, and alternative transcript splicing,2 which together can result in a single DNA-encoded gene being transcribed into more than four unique mRNA transcripts.3 While impressive, such transcription-level events pale in comparison to post-translational processing where most of the proteomic diversification occurs through introduction of many different post-translational modifications (PTMs).4 The combination of these mechanisms significantly increases the number of human proteins from what would be expected based solely on the number of genes.5
PTMs occur after mRNA has been translated into a protein and are covalent modifications of the proteinâs side chains or backbone atoms. These types of modifications can be reversible or irreversible;6 and to date, more than 200 different types of PTMs have been identified.7 Although some modifications, such as oxidation8 and glycation,9 occur through loosely controlled, non-enzymatic reactions, most PTMs are introduced onto proteins through tightly controlled enzymatic processes.4 In eukaryotes, the major types of enzyme-catalyzed PTMs are phosphorylation, glycosylation, acetylation, methylation, ubiquitylation, and SUMOylation.10 While phosphorylation is the most common PTM regulating internal functions and activities of cellular proteins,11 glycosylation is the most abundant modification found on extracellular membrane-associated and excreted proteins.12 Glycosylation involves the covalent attachment of one or more glycans (also called âsaccharidesâ or âcarbohydratesâ) to a protein. Over 50 years of studying protein phosphorylation has greatly clarified its roles in regulating cellular processes.13 However, the same type of knowledge regarding the biological roles of protein glycosylation lags far behind, and the importance of this modification is still not well appreciated by the general scientific community.14 The lack of knowledge and appreciation of protein glycosylation is a by-product of the slow progress thus far in analyzing and characterizing this type of modification, which, in turn, is due to the complex nature of this modification.15,16
1.1.1 Complexity of Protein Glycosylation
Similar to phosphorylation, protein glycosylation is regulated by many different factors, such as amino acid sequence, local conformational properties at potential glycosylation sites, and the availability of activated sugar substrates and enzymes involved in glycosylation. As a result, protein glycosylation is sensitive to both genetic and environmental changes. When multiple sites of glycosylation are present in a protein, the site occupancy can vary under different conditions, leading to glycosylation macroheterogeneity.17 This level of heterogeneity is similarly observed in protein phosphorylation.18 Glycans occupying a glycosylation site, however, are far more complex and diverse than a small phosphate group and can vary significantly in terms of size, charge state, glycosidic linkage, and branching pattern. This leads to another level of heterogeneity, namely: microheterogeneity, which is not present in protein phosphorylation.19 The combination of two levels of heterogeneity results in glycoproteins isolated from biological sources existing as complex mixtures. Even for a single protein sequence, these mixtures can contain more than one hundred differently glycosylated isoforms (glycoforms) and the compositions can vary substantially between mixtures generated by different hosts and under different conditions.20 Because it is difficult to exhaustively quantify individual components of the mixtures or to separate the individual glycoforms from one another, most studies of glycoproteins to date have been carried out using samples with unknown compositions. For example, one of the commonly used methods to analyze protein glycosylation is to directly compare the properties of an unglycosylated protein with its corresponding glycoform mixture.21 This method can only provide a rough estimate of the average effects of all glycans on a given protein. Because the composition of its glycoform mixture can vary considerably, in many cases, the observed effects are sample-specific, inconsistent and sometimes even conflict with each other. The resulting inconsistency and disagreement across studies has led to significant knowledge gaps in the understanding of protein glycosylation at the levels of both fundamental science22 and biomedical development.23,24
1.1.2 Strategies and Methods to Study Protein Glycosylation
To obtain a more complete understanding of protein glycosylation, it is essential to have a more detailed knowledge of the relationships between glycan structure and function, and the consequences of forming heterogeneous mixtures of glycoforms. As with studies of nucleic acids25 and proteins,26 such knowledge can be acquired through the use of homogeneous glycoprotein isoforms and glycopeptides with defined and well-characterized glycosylation patterns.27,28
The availability of homogeneous glycoforms enables a clearer and more confident view of the effects of different glycans on the same or different proteins, which then makes establishing structureâfunction relationships possible.15,16,29 Through decades of research, glycosylation has been demonstrated to affect both physical and biological properties of proteins, including folding,30 aggregation,31 conformation and structure,32 solubility,33 trafficking,34 stability,35 binding affinity and specificity,36 and biological activity.37,38 These studies cover the effects of many types of glycans, differing widely in their charge, size, linkage, and branching architectures, on the properties of numerous glycoproteins.22,35 Although it is commonly accepted that protein glycosylation is generally important, this conclusion is too vague to be practically useful for predicting the effects of protein glycosylation24,30 or guiding the glycoengineering of proteins.39 To reach a clearer and more definite conclusion, many questions, such as âwhich effects are the general impacts of protein glycosylation and which are specific to glycosylation type, glycosylation site, and glycan structureâ, âwhat ...