Open Source Software in Life Science Research
eBook - ePub

Open Source Software in Life Science Research

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

  1. 582 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Open Source Software in Life Science Research

Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond

About this book

The free/open source approach has grown from a minor activity to become a significant producer of robust, task-orientated software for a wide variety of situations and applications. To life science informatics groups, these systems present an appealing proposition - high quality software at a very attractive price. Open source software in life science research considers how industry and applied research groups have embraced these resources, discussing practical implementations that address real-world business problems.The book is divided into four parts. Part one looks at laboratory data management and chemical informatics, covering software such as Bioclipse, OpenTox, ImageJ and KNIME. In part two, the focus turns to genomics and bioinformatics tools, with chapters examining GenomicsTools and EBI Atlas software, as well as the practicalities of setting up an 'omics' platform and managing large volumes of data. Chapters in part three examine information and knowledge management, covering a range of topics including software for web-based collaboration, open source search and visualisation technologies for scientific business applications, and specific software such as DesignTracker and Utopia Documents. Part four looks at semantic technologies such as Semantic MediaWiki, TripleMap and Chem2Bio2RDF, before part five examines clinical analytics, and validation and regulatory compliance of free/open source software. Finally, the book concludes by looking at future perspectives and the economics and free/open source software in industry.- Discusses a broad range of applications from a variety of sectors- Provides a unique perspective on work normally performed behind closed doors- Highlights the criteria used to compare and assess different approaches to solving problems

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Open Source Software in Life Science Research by Lee Harland,Mark Forster in PDF and/or ePUB format, as well as other popular books in Computer Science & Digital Media. We have over one million books available in our catalogue for you to explore.
1

Building research data handling systems with open source tools

Claus Stie KallesĆøe

Abstract:

Pharmaceutical discovery and development requires handling of complex and varied data across the process pipeline. This covers chemical structures information, biological assay and structure versus activity data, as well as logistics for compounds, plates and animals. An enterprise research data handling system must meet the needs of industrial scientists and the demands of a regulatory environment, and be available to external partners. Within Lundbeck, we have adopted a strategy focused on agile and rapid internal development using existing open source software toolkits. Our small development team developed and integrated these tools to achieve these objectives, producing a data management environment called the Life Science Project (LSP). In this chapter, I describe the challenges, rationale and methods used to develop LSP. A glimpse into the future is given as we prepare to release an updated version of LSP, LSP4All, to the research community as an open source project.
Key words
research data management
open source software
software development
pharmaceutical research
Lundbeck
LSP
LSP4All

1.1 Introduction

All pharmaceutical company R&D groups have some kind of ā€˜corporate database’. This may not originally be an in-house designed knowledge base, but is still distinct from the specific area tools/databases that companies acquire from different software vendors. The corporate database is the storage area for all the ā€˜final’ pre-clinical results that companies want to retain indefinitely. The corporate database holds data from chemistry, biology, pharmacology and other relevant drug discovery disciplines and is also often a classic data warehouse [1] in the sense that no transactions are performed there – data is fed from the other databases, stored and retrieved. The system in this chapter is partly an example of such an infrastructure, but with a somewhat unique perspective.
The system is not only a data warehouse, however. Final/analysed data from other (specialist) tools are uploaded and stored there. Additionally, it is the main access point for data retrieval and decision support, but the system does a lot more. It forms the control centre and heart of our data transactions and workflow support through the drug discovery process at Lundbeck [2]. Lab equipment is connected, enabling controlled file transfer to equipment, progress monitoring and loading of output data directly back into the database. All Lundbeck Research logistics are also handled there, covering reagents, compounds, plates and animals. The system is updated when assets enter the various sites and when they get registered, and it stores location information and handles required re-ordering by scientists.
The system also supports our discovery project managers with ā€˜project grids’ containing compounds and assay results. These project grids or Structure Activity Relationship (SAR) tables are linked to the research projects and are where the project groups setup their screening cascade, or tests in which they are interested. Subsequently, the project groups can register both compounds and assay results to generate a combined project results overview. The grids also enable simple data mining and ordering of new tests when the teams have decided what compounds should be moved forward. To read more about corporate pharmaceutical research systems see references [3, 4].
How is the system unique? Is this any different from those of other companies? We believe it is. It is one coherent system built on top of one database. It covers a very broad area with data concerning genes, animals and compounds in one end of the process all the way to the late-stage non-GLP/GMP [5] exploratory toxicology studies. With a few exceptions, which are defined later, it is built entirely with open source software. It is therefore relevant to talk about, and fits well with the theme of this book, as a case in which a pharmaceutical company has built its main corporate database and transaction system on open source tools.
Corporate sales colleagues would likely call our system something like enterprise research data management or ā€˜SAP [6] for Research’. We simply call it LSP – which is short for the Life Science Project.

1.2 Legacy

It is difficult to make a clear distinction between before and after LSP, as the core part of the database was initiated more than 10 years ago. Internally, LSP is defined as the old corporate database combined with the new user interface (UI) (actually the full stack above the database) as well as new features, data types and processes/workflows. The following section describes what our environment looked like prior to LSP and what initiated the decision to build LSP.
Lundbeck has had a corporate database combining compounds and assay results since 1980. It has always been Lundbeck’s strategy to keep the final research data together in our own in-house designed database to facilitate fast changes to the system if needed, independently of vendors.
Previously, research used several closed source ā€˜speciality’ software packages with which the scientists interacted. In chemistry these were mainly centred around the ā€˜ISIS suite’ of applications from what used to be called MDL [7]. They have since been merged into Symyx [8], which recently became Accelrys [9]. As an aside, this shows the instability of the chemistry software arena, making the decision to keep (at least a core piece of) the environment in-house developed and/or in another way independent of the vendors more relevant. If not in-house controlled/ developed, then at least using an open source package will enable a smoother switch of vendor if the initial vendor decides to change direction.
The main third-party software package in the (in vitro) pharmacology area was ActivityBase [10], a very popular system to support plate-based assays in pharma in the early 2000s. Whereas the ISIS applications were connected to the internal corporate database, ActivityBase came with its own Oracle database. Therefore, when the chemists registered compounds into our database the information about the compounds had to be copied (and hence duplicated) into the ActivityBase database to enable the correct link between compounds and results. After analysis in ActivityBase, the (main) results were copied back into our corporate database. Hardly efficient and lean data management!
Of course, the vendors wanted to change this – by selling more of their software and delivering the ā€˜full enterprise coverage’. Sadly, their tools were not originally designed to cover all areas and therefore did not come across as a fully integrated system – rather they were a patchwork of individual tools knitted together. A decision to move to a full vendor system would have been against Lundbeck’s strategy, and, as our group implemented more and more functionality in the internal systems, the opposite strategy of using only in-house tools became the natural direction.
Workflow support is evidently a need in drug discovery. Scientists need to be able to see the upstream data in order to do their work. Therefore, ā€˜integration projects’ between different tools almost always follow after acquisition of an ā€˜of the shelf’ software package. The times where one takes software from the shelf, installs and runs it are truly rare. Even between applications from the same vendor – where one would expect smooth interfaces – integration projects were needed.
As commercial tools are generally closed source, the amount of integration work Lundbeck is able to do, either in-house or through hired local programmers with relevant technology knowledge is very limited. This means that on top of paying fairly expensive software licences, the organisation has to hire the vendor’s consultants to do all the integration work and they can cost Ā£1000/day. If one part of the workflow is later upgraded, all integrations have to be upgraded/re-done resulting in even more expensive integration projects. Supporting such a system becomes a never-ending story of upgrading and integrating, leaving less time for...

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. List of figures and tables
  7. Foreword
  8. About the editors
  9. About the contributors
  10. Introduction
  11. Chapter 1: Building research data handling systems with open source tools
  12. Chapter 2: Interactive predictive toxicology with Bioclipse and OpenTox
  13. Chapter 3: Utilizing open source software to facilitate communication of chemistry at RSC
  14. Chapter 4: Open source software for mass spectrometry and metabolomics
  15. Chapter 5: Open source software for image processing and analysis: picture this with ImageJ
  16. Chapter 6: Integrated data analysis with KNIME
  17. Chapter 7: Investigation-Study-Assay, a toolkit for standardizing data capture and sharing
  18. Chapter 8: GenomicTools: an open source platform for developing high-throughput analytics in genomics
  19. Chapter 9: Creating an in-house ’omics data portal using EBI Atlas software
  20. Chapter 10: Setting up an ’omics platform in a small biotech
  21. Chapter 11: Squeezing big data into a small organisation
  22. Chapter 12: Design Tracker: an easy to use and flexible hypothesis tracking system to aid project team working
  23. Chapter 13: Free and open source software for web-based collaboration
  24. Chapter 14: Developing scientific business applications using open source search and visualisation technologies
  25. Chapter 15: Utopia Documents: transforming how industrial scientists interact with the scientific literature
  26. Chapter 16: Semantic MediaWiki in applied life science and industry: building an Enterprise Encyclopaedia
  27. Chapter 17: Building disease and target knowledge with Semantic MediaWiki
  28. Chapter 18: Chem2Bio2RDF: a semantic resource for systems chemical biology and drug discovery
  29. Chapter 19: TripleMap: a web-based semantic knowledge discovery and collaboration application for biomedical research
  30. Chapter 20: Extreme scale clinical analytics with open source software
  31. Chapter 21: Validation and regulatory compliance of free/open source software
  32. Chapter 22: The economics of free/open source software in industry
  33. Index