Big Data in Predictive Toxicology
Daniel Neagu, Andrea-Nicole Richarz, Daniel Neagu, Andrea-Nicole Richarz
- 394 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Big Data in Predictive Toxicology
Daniel Neagu, Andrea-Nicole Richarz, Daniel Neagu, Andrea-Nicole Richarz
About This Book
The rate at which toxicological data is generated is continually becoming more rapid and the volume of data generated is growing dramatically. This is due in part to advances in software solutions and cheminformatics approaches which increase the availability of open data from chemical, biological and toxicological and high throughput screening resources. However, the amplified pace and capacity of data generation achieved by these novel techniques presents challenges for organising and analysing data output.
Big Data in Predictive Toxicology discusses these challenges as well as the opportunities of new techniques encountered in data science. It addresses the nature of toxicological big data, their storage, analysis and interpretation. It also details how these data can be applied in toxicity prediction, modelling and risk assessment.
This title is of particular relevance to researchers and postgraduates working and studying in the fields of computational methods, applied and physical chemistry, cheminformatics, biological sciences, predictive toxicology and safety and hazard assessment.
Frequently asked questions
Information
*E-mail: [email protected]
Predictive toxicology and model development rely heavily on data to draw upon and have historically suffered from the paucity of available and good quality datasets. The situation has now dramatically changed from a lack of data hampering model development to ādata overloadā. With high throughput/content screening methodologies being systematically used aiming to understand the mechanistic basis of adverse effects, and increasing use of omics technologies and consideration of (bio)monitoring data, the volume of data is continuously increasing. Big data in predictive toxicology may not have reached the dimension of other areas yet, such as real-time generated data in the health sector, but encompass similar characteristics and related challenges. Pertinent questions in this area are whether the new plethora of data are adequate for use in predictive toxicology and whether they address this area's most urgent problems. This overview chapter looks at the definition and characteristics of big data in the context of predictive toxicology as well as the challenges and opportunities big data present in this field.
1.1 Introduction
1.2 Big Data in the Area of Predictive Toxicology
1.3 The Big Vs of Predictive Toxicology Data
Big V | Applicability in predictive toxicology | Opportunities | Challenges |
Volume | High number of data generated, e.g., high content screening (HCS), omics test read-outs, epidemiological and (bio)monitoring data | Broader data basis for modelling and elucidation of modes of action and pathways | Storing and processing large amounts of data; finding the relevant data in the flood of information; limits of capacities for data curation |
Velocity | Speed of data generation increased, e.g., high throughput screening (HTS) | More rapid generation of data to fill gaps; ability to generate time-dependent data | Speed of data generation overtaking speed of storage, analysis and processing capacities |
Variety | Many different types of data, e.g., chemical structures, results from a variety of assays, omics, time-dependent kinetics etc. | Different types of information that can be combined to get the full picture | Integration of different types of data. Representation and informatics processing of chemical structures is a challenge, especially if 3D transformations of structures have to be computed for many chemicals. Comparability of data from varied sources might not always be given |
Veracity | Data quality, accuracy and reliability; uncertainties requiring data curation and evaluation of data quality | Large amounts of data might statistically compensate for inaccuracy in individual data when integrated | Data curation and evaluation of data quality for large amount of data |
Variability | Intrinsic variability of biological data, e.g., inter- and intra- individual variations, genetic, population variations | Availability of a large amount of data might enable taking into account variations of parameters with the models built, enabling better prediction of population variations | Processing large amounts of variable data |
Validity | Validity of the data for a specific application, e.g., for prediction of toxicity (for a specific endpoint) | Generation of many (types) of data possible in a targeted way, tailored for the specific prediction and toxicity assessment goal | Finding/choosing the relevant data for the specific application |
Visibility | Data sharing, access to data sources leading to centralised databases and repositories | Large data sets more visible than small disparate data sets, preferred storage in centralised repositories or linked via hubs/portals | Making the data available and visible in an appropriate way |
Visualisation | Representation of the data content | Supports and facilitates making sense of the varied and complex data sets, also supporting the organisation of the data | Visualisation of complex data difficult, methods to improve representations of the data content in a clear way are needed |
Volatility | Data access might cease, or repositories disappear, which affects the sustainability of data resources | N/A | Appropriate storage and sustainability concept necessary |
Value | Adequacy and usefulness for predictive toxicology and hazard/risk assessment, depending on specific risk assessment goal | Availability of many (types) of data as a broad data basis to understand mechanisms and pathways, in order to build informed predictive models | Extracting and distilling the knowledge from the large amount of data |