eBook - ePub

Open Source Software in Life Science Research

Name: Open Source Software in Life Science Research
ISBN: 9781908818249

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

Lee Harland,

Mark Forster,

582 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Open Source Software in Life Science Research

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

Lee Harland,

Mark Forster,

About this book

The free/open source approach has grown from a minor activity to become a significant producer of robust, task-orientated software for a wide variety of situations and applications. To life science informatics groups, these systems present an appealing proposition - high quality software at a very attractive price. Open source software in life science research considers how industry and applied research groups have embraced these resources, discussing practical implementations that address real-world business problems.The book is divided into four parts. Part one looks at laboratory data management and chemical informatics, covering software such as Bioclipse, OpenTox, ImageJ and KNIME. In part two, the focus turns to genomics and bioinformatics tools, with chapters examining GenomicsTools and EBI Atlas software, as well as the practicalities of setting up an 'omics' platform and managing large volumes of data. Chapters in part three examine information and knowledge management, covering a range of topics including software for web-based collaboration, open source search and visualisation technologies for scientific business applications, and specific software such as DesignTracker and Utopia Documents. Part four looks at semantic technologies such as Semantic MediaWiki, TripleMap and Chem2Bio2RDF, before part five examines clinical analytics, and validation and regulatory compliance of free/open source software. Finally, the book concludes by looking at future perspectives and the economics and free/open source software in industry.- Discusses a broad range of applications from a variety of sectors- Provides a unique perspective on work normally performed behind closed doors- Highlights the criteria used to compare and assess different approaches to solving problems

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Open Source Software in Life Science Research by Lee Harland,Mark Forster in PDF and/or ePUB format, as well as other popular books in Computer Science & Digital Media. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Index

Building research data handling systems with open source tools

Claus Stie Kallesøe

Abstract:

Pharmaceutical discovery and development requires handling of complex and varied data across the process pipeline. This covers chemical structures information, biological assay and structure versus activity data, as well as logistics for compounds, plates and animals. An enterprise research data handling system must meet the needs of industrial scientists and the demands of a regulatory environment, and be available to external partners. Within Lundbeck, we have adopted a strategy focused on agile and rapid internal development using existing open source software toolkits. Our small development team developed and integrated these tools to achieve these objectives, producing a data management environment called the Life Science Project (LSP). In this chapter, I describe the challenges, rationale and methods used to develop LSP. A glimpse into the future is given as we prepare to release an updated version of LSP, LSP4All, to the research community as an open source project.

Key words

research data management

open source software

software development

pharmaceutical research

Lundbeck

LSP

LSP4All

1.1 Introduction

All pharmaceutical company R&D groups have some kind of ‘corporate database’. This may not originally be an in-house designed knowledge base, but is still distinct from the specific area tools/databases that companies acquire from different software vendors. The corporate database is the storage area for all the ‘final’ pre-clinical results that companies want to retain indefinitely. The corporate database holds data from chemistry, biology, pharmacology and other relevant drug discovery disciplines and is also often a classic data warehouse [1] in the sense that no transactions are performed there – data is fed from the other databases, stored and retrieved. The system in this chapter is partly an example of such an infrastructure, but with a somewhat unique perspective.

The system is not only a data warehouse, however. Final/analysed data from other (specialist) tools are uploaded and stored there. Additionally, it is the main access point for data retrieval and decision support, but the system does a lot more. It forms the control centre and heart of our data transactions and workflow support through the drug discovery process at Lundbeck [2]. Lab equipment is connected, enabling controlled file transfer to equipment, progress monitoring and loading of output data directly back into the database. All Lundbeck Research logistics are also handled there, covering reagents, compounds, plates and animals. The system is updated when assets enter the various sites and when they get registered, and it stores location information and handles required re-ordering by scientists.

The system also supports our discovery project managers with ‘project grids’ containing compounds and assay results. These project grids or Structure Activity Relationship (SAR) tables are linked to the research projects and are where the project groups setup their screening cascade, or tests in which they are interested. Subsequently, the project groups can register both compounds and assay results to generate a combined project results overview. The grids also enable simple data mining and ordering of new tests when the teams have decided what compounds should be moved forward. To read more about corporate pharmaceutical research systems see references [3, 4].

How is the system unique? Is this any different from those of other companies? We believe it is. It is one coherent system built on top of one database. It covers a very broad area with data concerning genes, animals and compounds in one end of the process all the way to the late-stage non-GLP/GMP [5] exploratory toxicology studies. With a few exceptions, which are defined later, it is built entirely with open source software. It is therefore relevant to talk about, and fits well with the theme of this book, as a case in which a pharmaceutical company has built its main corporate database and transaction system on open source tools.

Corporate sales colleagues would likely call our system something like enterprise research data management or ‘SAP [6] for Research’. We simply call it LSP – which is short for the Life Science Project.

1.2 Legacy

It is difficult to make a clear distinction between before and after LSP, as the core part of the database was initiated more than 10 years ago. Internally, LSP is defined as the old corporate database combined with the new user interface (UI) (actually the full stack above the database) as well as new features, data types and processes/workflows. The following section describes what our environment looked like prior to LSP and what initiated the decision to build LSP.

Lundbeck has had a corporate database combining compounds and assay results since 1980. It has always been Lundbeck’s strategy to keep the final research data together in our own in-house designed database to facilitate fast changes to the system if needed, independently of vendors.

Previously, research used several closed source ‘speciality’ software packages with which the scientists interacted. In chemistry these were mainly centred around the ‘ISIS suite’ of applications from what used to be called MDL [7]. They have since been merged into Symyx [8], which recently became Accelrys [9]. As an aside, this shows the instability of the chemistry software arena, making the decision to keep (at least a core piece of) the environment in-house developed and/or in another way independent of the vendors more relevant. If not in-house controlled/ developed, then at least using an open source package will enable a smoother switch of vendor if the initial vendor decides to change direction.

The main third-party software package in the (in vitro) pharmacology area was ActivityBase [10], a very popular system to support plate-based assays in pharma in the early 2000s. Whereas the ISIS applications were connected to the internal corporate database, ActivityBase came with its own Oracle database. Therefore, when the chemists registered compounds into our database the information about the compounds had to be copied (and hence duplicated) into the ActivityBase database to enable the correct link between compounds and results. After analysis in ActivityBase, the (main) results were copied back into our corporate database. Hardly efficient and lean data management!

Of course, the vendors wanted to change this – by selling more of their software and delivering the ‘full enterprise coverage’. Sadly, their tools were not originally designed to cover all areas and therefore did not come across as a fully integrated system – rather they were a patchwork of individual tools knitted together. A decision to move to a full vendor system would have been against Lundbeck’s strategy, and, as our group implemented more and more functionality in the internal systems, the opposite strategy of using only in-house tools became the natural direction.

Workflow support is evidently a need in drug discovery. Scientists need to be able to see the upstream data in order to do their work. Therefore, ‘integration projects’ between different tools almost always follow after acquisition of an ‘of the shelf’ software package. The times where one takes software from the shelf, installs and runs it are truly rare. Even between applications from the same vendor – where one would expect smooth interfaces – integration projects were needed.

As commercial tools are generally closed source, the amount of integration work Lundbeck is able to do, either in-house or through hired local programmers with relevant technology knowledge is very limited. This means that on top of paying fairly expensive software licences, the organisation has to hire the vendor’s consultants to do all the integration work and they can cost £1000/day. If one part of the workflow is later upgraded, all integrations have to be upgraded/re-done resulting in even more expensive integration projects. Supporting such a system becomes a never-ending story of upgrading and integrating, leaving less time for...

Cover image
Title page
Table of Contents
Copyright
Dedication
List of figures and tables
Foreword
About the editors
About the contributors
Introduction
Chapter 1: Building research data handling systems with open source tools
Chapter 2: Interactive predictive toxicology with Bioclipse and OpenTox
Chapter 3: Utilizing open source software to facilitate communication of chemistry at RSC
Chapter 4: Open source software for mass spectrometry and metabolomics
Chapter 5: Open source software for image processing and analysis: picture this with ImageJ
Chapter 6: Integrated data analysis with KNIME
Chapter 7: Investigation-Study-Assay, a toolkit for standardizing data capture and sharing
Chapter 8: GenomicTools: an open source platform for developing high-throughput analytics in genomics
Chapter 9: Creating an in-house ’omics data portal using EBI Atlas software
Chapter 10: Setting up an ’omics platform in a small biotech
Chapter 11: Squeezing big data into a small organisation
Chapter 12: Design Tracker: an easy to use and flexible hypothesis tracking system to aid project team working
Chapter 13: Free and open source software for web-based collaboration
Chapter 14: Developing scientific business applications using open source search and visualisation technologies
Chapter 15: Utopia Documents: transforming how industrial scientists interact with the scientific literature
Chapter 16: Semantic MediaWiki in applied life science and industry: building an Enterprise Encyclopaedia
Chapter 17: Building disease and target knowledge with Semantic MediaWiki
Chapter 18: Chem2Bio2RDF: a semantic resource for systems chemical biology and drug discovery
Chapter 19: TripleMap: a web-based semantic knowledge discovery and collaboration application for biomedical research
Chapter 20: Extreme scale clinical analytics with open source software
Chapter 21: Validation and regulatory compliance of free/open source software
Chapter 22: The economics of free/open source software in industry
Index