Bioinformatics
eBook - ePub

Bioinformatics

A Practical Guide to NCBI Databases and Sequence Alignments

  1. 456 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Bioinformatics

A Practical Guide to NCBI Databases and Sequence Alignments

About this book

Bioinformatics: A Practical Guide to NCBI Databases and Sequence Alignments provides the basics of bioinformatics and in-depth coverage of NCBI databases, sequence alignment, and NCBI Sequence Local Alignment Search Tool (BLAST). As bioinformatics has become essential for life sciences, the book has been written specifically to address the need of a large audience including undergraduates, graduates, researchers, healthcare professionals, and bioinformatics professors who need to use the NCBI databases, retrieve data from them, and use BLAST to find evolutionarily related sequences, sequence annotation, construction of phylogenetic tree, and the conservative domain of a protein, to name just a few. Technical details of alignment algorithms are explained with a minimum use of mathematical formulas and with graphical illustrations.

Key Features

  • Provides readers with the most-used bioinformatics knowledge of bioinformatics databases and alignments including both theory and application via illustrations and worked examples.
  • Discusses the use of Windows Command Prompt, Linux shell, R, and Python for both Entrez databases and BLAST.
  • The companion website (http://www.hamiddi.com/instructors/) contains tutorials, R and Python codes, instructor materials including slides, exercises, and problems for students.

This is the ideal textbook for bioinformatics courses taken by students of life sciences and for researchers wishing to develop their knowledge of bioinformatics to facilitate their own research.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Bioinformatics by Hamid D. Ismail in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Computer Science General. We have over one million books available in our catalogue for you to explore.

CHAPTER 1The Origin of Genomic Information

DOI: 10.1201/9781003226611-1

Introduction

The diversity of life, from a simple organism like bacteria to the largest animals, and the diversity of individuals within a species, are guided by biomolecules inside the living cells called deoxyribonucleic acid (DNA). The DNA molecule is formed of only four basic monomeric units known as DNA nucleotides composing of a phosphate group, a sugar, and four different types of nucleobases or simply bases (adenine, cytosine, guanine, and thymine). In bioinformatics, those four units are given the letters: A, C, G, and T respectively. The DNA molecules in a living cell are represented as sequences of those four nucleotides forming the genome. Viruses usually have small genomes; Bacteriophage spp has a median total length of 8689 bases (8.689 kb). The smallest non-viral genome is that of a bacterium known as Carsonella ruddii, which has a genome of 164,376 bases (164.376 kb). The total length of the human genome is 3,272,090,000 bases (3,272.09 Mb). Segments of DNA known as genes control the different aspects of life of a living organism by instructing the cells to synthesize the proteins, which do most of the work in cells and are required for the structure, function, and regulation of the body tissues and organs. The instructions are transcribed into ribonucleic acid (RNA), which is translated into a specific protein. The two-step process (transcription and translation) by which the information in gene flows into proteins is known as the central dogma of molecular biology. The information in the DNA is also transmitted from one generation to another. The new generation of a living organism inherits characteristics due to DNA transmission from parents. The diversity in life is attributed to the ability of the DNA to change slowly in search of better traits to adapt with changes in nature. Such changes or mutations contribute to the diversity in life. Advancement in molecular biology and biotechnology made possible the capturing of the information carried by DNA, RNA, and proteins. Sequences and other biological information from diverse species and individuals within the species of organisms are now increasingly deposited by researchers and institutions onto bioinformatics databases to be available for retrieval and analysis for research purposes. The genomic information has revolutionized biology and made modern biologists dependent on bioinformatics, which uses computer science to store, organize, search, manipulate, and retrieve the genomic information. Institutions like the National Institute of Health (NIH), the European Molecular Biology Laboratory (EMBL), and the Japanese Institute of Genetics contributed largely to the progress made in bioinformatics. Together, those three institutes formed the International Nucleotide Sequence Database Collaboration (INSDC) [1], which is a joint effort to collect and disseminate databases containing DNA and RNA, and protein sequences. The INSDC includes GenBank (USA), the European Nucleotide Archive (UK), and DNA Data Bank of Japan (Japan). Those three partners capture, preserve, share, and exchange a comprehensive collection of nucleotide sequences and associated information on a daily basis. The INSDC policy allows public access to the global archives of nucleotide data generated in publicly funded experiments. The submission of this genomic data is instrumented by the fact that it is a pre-requisite for publication in scholarly journals. The database records are publicly available for scientists from all over the world to access, analyze, draw conclusion, and publish their findings.
Before digging deep, it is important to discuss some basics in genomics that will help readers to understand bioinformatics. The foundation of bioinformatics is built on the data that represents the flow of genomic information from the DNA, onto RNA, and proteins. Therefore, understanding the composition of these three kinds of biomolecules, gene structure, gene transcription and expression, mutation, and techniques used to obtain such genomic data is fundamental for understanding the biological databases and other bioinformatics applications.

Genetic Information and Its Transmission

In the traditional Linnaean system of classification, living organisms are classified on the basis of cellular organization and methods of nutrition into five kingdoms: Monera (bacteria), Protista (protozoans and algae), Fungi (funguses), Plantae (plants), and Animalia (animals). A modern taxonomic classification has been made to extend the Linnaean system to consider genomic characteristics. Nowadays, biologists recognize only two vastly different cell types, prokaryote and eukaryote, based on the absence or presence of a membrane-bound nucleus containing the genetic material of the cell. Therefore, a living organism is either prokaryotic or eukaryotic [2, 3]. The prokaryote includes unicellular organisms that do not have a true nucleolus or membrane-bound organelles (Figure 1.1a). Prokaryote includes bacteria, which is the most abundant organism, and archaea, which are inhabitants of the most ext...

Table of contents

  1. Cover
  2. Half-Title Page
  3. Series Page
  4. Title Page
  5. Copyright Page
  6. Table of Contents
  7. Acknowledgments
  8. Preface
  9. Author bio
  10. Chapter 1 The Origin of Genomic Information
  11. Chapter 2 The Sources of Genomic Data
  12. Chapter 3 The NCBI Entrez Databases
  13. Chapter 4 NCBI Entrez E-Utilities and Applications
  14. Chapter 5 The Entrez Direct
  15. Chapter 6 R and Python Packages for the NCBI E-Utilities
  16. Chapter 7 Pairwise Sequence Alignment
  17. Chapter 8 Basic Local Alignment Search Tool
  18. Index