RNA Sequencing Technology for Biomedical Sciences
Sandeep Ameta*, Roberta Menafra Laboratoire de Biochimie, Ăcole supĂ©rieure de physique et de chimie industrielles de la ville de Paris [ESPCI Paris], France CNRS UMR 8231 Chimie Biologie Innovation, PSL Research University, Paris, France
Abstract
In the last two decades, the development of massive parallel sequencing methods has allowed the sequencing of RNA at an unprecedented resolution, unleashing an enormous wealth of information about the cellular state. Sequencing has accelerated biomedical research by identifying novel mutations, aberrant splicing patterns, splicing isoforms, new gene regulators, and cell-to-cell heterogeneity. In order to efficiently characterize the complexity of the complete transcriptome, there is a steady development for different RNA sequencing [RNA-seq] protocols by improving different steps from library preparation to the data analysis. Furthermore, with the advancements in the sequencing strategies, single-cell RNA sequencing[scRNA-seq] methods have been developed allowing to address the heterogeneity in cell types, and mRNA expression at a remarkable resolution. The majority of these methods involves the conversion of RNA to cDNA and thus amenable to errors, PCR and ligation biases, and inefficiencies of enzymes. Amid these challenges, strategies have been developed to sequence the RNA directly at the single-molecule level which allows to overcome these biases. This chapter provides a brief overview of different sequencing technologies available for the RNA-seq, scRNA-seq and single molecule RNA sequencing along with the different aspects where RNA sequencing has contributed to the biomedical field.
Keywords: Direct RNA sequencing, Different sequencing strategies, Next generation sequencing, RNA-seq, RNA-related diseases, ScRNA-seq.
* Correspondingauthor Sandeep Ameta: Laboratoire de Biochimie, Ăcole supĂ©rieure de physique et de chimie industrielles de la ville de Paris (ESPCI Paris), France; Tel: +33 140794587; E-mail: [email protected] INTRODUCTION
RNA plays a multitude of roles which are diverse and central to the cellular functions. Owing to the technological advancements in last decades, our perspective for RNA has changed from being a passive messenger involved in translating the information to one of the critical biomolecules involved in regula-
tion [1-8], catalysis [4, 9-12], metabolism [13, 14], development [15-17], diseases [16, 18], and much more. With deeper insights into the cellular processes, it is well established that only a small percentage (1-5%) of transcribed RNA is translated into proteins, the so-called messenger RNA (mRNA), leading to the discovery of new roles for RNAs [19, 20]. Within the cell, various steps are involved in the processing of RNA, and defects in any of these steps can lead to the onset of diseases. One of the key processes is pre-mRNA splicing, where non-coding part of pre-mRNA is excised out by a complex RNA-protein machinery [21, 22]. The sequences in these non-coding regions contain information about exon-exon junction, and interaction with different splicing proteins, thus mutations in these regions can cause various diseases, e.g. spinal muscular atrophy, a common and leading cause of infant mortality, is shown to occur due to mutations in splicing region [23, 24]. Also, disruption in splicing of microtubule associated protein tau (MAPT) gene can lead to neurodegenerative disorders [25] such as dementia, Alzheimer and Parkinsonism associated with chromosome 17 (FTDP-17) [26]. Furthermore, exon skipping in medium-chain acyl-CoA dehydrogenase gene can lead to severe enzyme deficiency causing metabolic disorders, like hypoglycemia [27, 28].
There are number of functionally relevant non-coding RNAs discovered in last decades, primarily including piRNAs (PIWI-interacting RNAs), miRNA (microRNAs), siRNAs (small interfering RNAs), snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), long noncoding RNAs (lncRNAs), etc. [29, 30]. These are involved in a multitude of diseases and regulation, for example, piRNA causes repression of transposable elements involved in genetic instability and are also associated with regulation of different cancers [31, 32]. Similarly, snoRNAs in human have been shown to be involved in neuro-developmental genetic disorder (due to the inefficient expression of C/D box snoRNAs) and cancer development [33]. Recently, it has been found that long noncoding RNAs also play role in gene regulation by competing for endogenous RNAs (ceRNAs) and have severe pathological implications [34, 35]. miRNAs are another class of abundant small non-coding RNAs which have been implicated in various diseases. They are involved in glucose homeostasis [36], cancer development and progression [37], and also in neurodegenerative diseases [38, 39]. Modifications of RNA molecules are also crucial for the regulation of biological processes and have been involved in diseases [40]. Similarly to DNA and histone modifications, RNA post-translational modifications represent a layer of epigenetic regulation. The methylation of adenosine at the N6 position (m6A) is an abundant mark in eukaryotic mRNA [41]. The m6A modification is involved in a variety of biological processes and has been linked to several human diseases [42].
One of the tools which is very critical in unraveling the roles of RNA in health and diseases is the large-scale sequencing. Sequencing has revolutionized the field of biomedical research by analyzing clinically relevant samples at an unprecedented resolution than ever, helping in identifying new targets, regulators, biomarkers, and now it is even possible to interrogate entire genomes. For example, high-throughput sequencing has shown that the number of splicing sites in the human transcriptome is far more than identified earlier [43]. Similarly, sequencing the samples from breast cancer patients has identified several piRNAs which are differentially regulated in tumors compared to normal tissues [44]. Recent sequencing technologies coupled with antibody-mediated capture were also able to accurately map and quantify the m6A epigenetic modification (by m6A-seq) [45]. The analysis of the m6A distribution along the genome suggested that this mark could be involved in mediating splicing mechanisms, since transcripts with multiple isoforms were found to be enriched in m6A compared to single-isoform genes.
As it is commonly described, we are in the 2nd generation of sequencing and progressing rapidly towards the 3rd generation [46]. The pioneer sequencing methods developed by Frederick Sanger [47, 48] and Allan Maxam, Walter Gilbert [49] are often regarded as the first generation methods. While the chain termination strategy (Sanger sequencing method) has been used to sequence the first human genome [50] and is still considered as the âgold standardâ for sequencing, development of novel and efficient ways for creating clonal DNA population, less labor-intensive protocols, and technological advancement paved the way for second generation seque...