Python for Bioinformatics
eBook - ePub

Python for Bioinformatics

  1. 424 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Python for Bioinformatics

About this book

In today's data driven biology, programming knowledge is essential in turning ideas into testable hypothesis. Based on the author's extensive experience, Python for Bioinformatics, Second Edition helps biologists get to grips with the basics of software development. Requiring no prior knowledge of programming-related concepts, the book focuses on the easy-to-use, yet powerful, Python computer language.

This new edition is updated throughout to Python 3 and is designed not just to help scientists master the basics, but to do more in less time and in a reproducible way. New developments added in this edition include NoSQL databases, the Anaconda Python distribution, graphical libraries like Bokeh, and the use of Github for collaborative development.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Python for Bioinformatics by Sebastian Bassi in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Programming Games. We have over one million books available in our catalogue for you to explore.

I

Programming

CHAPTER 1

Introduction

CONTENTS

1.1 Who Should Read this Book
1.1.1 What the Reader Should Already Know
1.2 Using this Book
1.2.1 Typographical Conventions
1.2.2 Python Versions
1.2.3 Code Style
1.2.4 Get the Most from This Book without Reading It All
1.2.5 Online Resources Related to This Book
1.3 Why Learn to Program?
1.4 Basic Programming Concepts
1.4.1 What Is a Program?
1.5 Why Python?
1.5.1 Main Features of Python
1.5.2 Comparing Python with Other Languages
Readability
Speed
1.5.3 How Is It Used?
1.5.4 Who Uses Python?
1.5.5 Flavors of Python
1.5.6 Special Python Distributions
1.6 Additional Resources
The most effective way to do it, is to do it.
Amelia Earhart

1.1Who Should Read this Book

This book is for the life science researcher who wants to learn how to program. He/she may have previous exposure to computer programming, but this is not necessary to understand this book (although it surely helps).
This book is designed to be useful to several separate but related audiences, students, graduates, postdocs, and staff scientists, since all of them can benefit from knowing how to program.
Exposing students to programming at early stages in their career helps to boost their creativity and logical thinking, and both skills can be applied in research. In order to ease the learning process for students, all subjects are introduced with the minimal prerequisites. There are also questions at the end of each chapter. They can be used for self-assessing how much you’ve learned. The answers are available to teachers in a separate guide.
Graduates and staff scientists having actual programming needs should find its several real-world examples and abundant reference material extremely valuable.

1.1.1What the Reader Should Already Know

Since this book is called Python for Bioinformatics, it has been written with the following assumptions in mind:
No programming knowledge is assumed, but the reader is required to have minimum computer proficiency to be able to use a text editor and handle basic tasks in your operating system (OS). Since Python is multi-platform, most instructions from this book will apply to the most common operating systems (Windows, macOS and Linux); when there is a command or a procedure that applies only to a specific OS, it will be clearly noted.
The reader should be working (or at least planning to work) with bioinformatics tools. Even low-scale handmade jobs, such as using the NCBI BLAST to ID a sequence, aligning proteins, primer searching, or estimating a phylogenetic tree will be useful to follow the examples. The more familiar the reader is with bioinformatics, the better he will be able to apply the concepts learned in this book.

1.2Using this Book

1.2.1Typographical Conventions

There are some typographical conventions I have tried to use in a uniform way throughout the book. They should aid readability and were chosen to tell apart user-made names (or variables) from language keywords. This comes in handy when learning a new computer language.
Bold: Objects provided by Python and by third-party modules. With this notation it should be clear that round is part of the language and not a user-defined name. Bold is also used to highlight parts of the text. There is no way to confuse one bold usage with the other.
Mono-spaced font: User declared variables, code, and filenames. For example: sequence = ‘MRVLLVALALLALAASATS’.
Italics: In commands, it is used to denote a variable that can take different values. For example, in len(iterable), “iterable” can take different values. Used in text, it marks a new word or concept. For example “One such fundamental data structure is a dictionary.”
The content of lines starting with $ (dollar sign) are meant to be typed in your operating system console (also called command prompt in Windows or terminal in macOS).
↲: Break line. Some lines are longer than the available space in a printed page, so this symbol is inserted to mean that what is on the next line in the page represents the same line on the computer screen. Inside code, the symbol used is <=.

1.2.2Python Versions

The current version of Python at this moment is 3.6.1. There is a 2.7.12 version that is maintained1 because there are still a sizable number of applications in production using the 2.7 branch. Versions 3.x and 2.x are slightly different, at the point of being incompatible. Python 3 is more efficient than Python 2 in many aspects. Large websites such as Instagram migrated from Python 2.7 to Python 3.6 to save in CPU and memory consumption by up to 30%. This book uses Python 3.6.
The only scenario where you may need to use Python 2.7, apart from maintenance of old code, is when there is no availability of a specific library for Python 3. In this case, before starting a project in Python 2.7, try to search for a replacement library. For example, you want to connect with a MySQL database and you are told to use MySQLdb, since this package is not Python 3 compatible; instead of using Python 2.7, use mysqlclient or mysql-connector-python, both works with Python 3.

1.2.3Code Style

Python source code that appears in this book is presented as listings. Each line of these listings is numbered. These numbers are not intended to be typed; they are used to reference each line in the text. You don’t need to copy the code from the book, since it can be downloaded from the GitHub repository at https://github.com/Serulab/Py4Bio.
Code can be formatted in several ways and still be valid to the Python interpreter. This following code is syntactically correct:
 def GetAverage(X): avG=sum(X)/len(X) " Calculate the average " return avG 
Also this one:
 def get_average(items): """ Calculate the average """ average = sum(items) / len(items) return average 
The former code sample follows most accepted coding styles for Python.2 Throughout the book you will find mostly code formatted as the second sample. Some code in the book will not follow accepted coding styles for the following reasons:
There are some instances where the most didactic way to show a particular piece of code conflicts with the style guide. On those few occasions, I choose to deviate from the style guide in favor of clarity.
Due to size limitation in a printed book, some names were shortened and other minor drifts from the coding styles have been introduced.
To show th...

Table of contents

  1. Cover
  2. Halftitle Page
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. List of Figures
  7. List of Tables
  8. Preface to the First Edition
  9. Preface to the Second Edition
  10. Acknowledgments
  11. SECTION I Programming
  12. SECTION II Advanced Topics
  13. SECTION III Python Recipes with Commented Source Code
  14. SECTION IV Appendices
  15. Index