Tutorials in Chemoinformatics
eBook - ePub

Tutorials in Chemoinformatics

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Tutorials in Chemoinformatics

About this book

30 tutorials and more than 100 exercises in chemoinformatics, supported by online software and data sets

Chemoinformatics is widely used in both academic and industrial chemical and biochemical research worldwide. Yet, until this unique guide, there were no books offering practical exercises in chemoinformatics methods. Tutorials in Chemoinformatics contains more than 100 exercises in 30 tutorials exploring key topics and methods in the field. It takes an applied approach to the subject with a strong emphasis on problem-solving and computational methodologies.

Each tutorial is self-contained and contains exercises for students to work through using a variety of software packages. The majority of the tutorials are divided into three sections devoted to theoretical background, algorithm description and software applications, respectively, with the latter section providing step-by-step software instructions. Throughout, three types of software tools are used: in-house programs developed by the authors, open-source programs and commercial programs which are available for free or at a modest cost to academics. The in-house software and data sets are available on a dedicated companion website.

Key topics and methods covered in Tutorials in Chemoinformatics include:

  • Data curation and standardization
  • Development and use of chemical databases
  • Structure encoding by molecular descriptors, text strings and binary fingerprints
  • The design of diverse and focused libraries
  • Chemical data analysis and visualization
  • Structure-property/activity modeling (QSAR/QSPR)
  • Ensemble modeling approaches, including bagging, boosting, stacking and random subspaces
  • 3D pharmacophores modeling and pharmacological profiling using shape analysis
  • Protein-ligand docking
  • Implementation of algorithms in a high-level programming language

Tutorials in Chemoinformatics is an ideal supplementary text for advanced undergraduate and graduate courses in chemoinformatics, bioinformatics, computational chemistry, computational biology, medicinal chemistry and biochemistry. It is also a valuable working resource for medicinal chemists, academic researchers and industrial chemists looking to enhance their chemoinformatics skills.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Tutorials in Chemoinformatics by Alexandre Varnek in PDF and/or ePUB format, as well as other popular books in Physical Sciences & Analytic Chemistry. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2017
Print ISBN
9781119137962
eBook ISBN
9781119137986

Part 1
Chemical Databases

1
Data Curation

Gilles Marcou and Alexandre Varnek
Goal: Identify and curate problematic chemical information from a data collection. The raw dataset is processed so that it will be ready to feed a relational database dedicated to the organoleptic properties of small organic molecules. Information is interpreted and re‐encoded as categories or bit vectors when relevant.
Software: KNIME 3.0, ChemAxon
Data: The following files are provided in the tutorial:
  • thegoodscent_dup.csv – The raw data formatted in a semicolon separated file extracted from the web site of The Good Scent Company. The data is prepared and the most visible errors and discrepancies are already corrected.
  • thegoodscent_dup.raw – The raw data without any processing related to the tutorial.
  • MissingOdorTypes.csv – Manually curated Odor Types provided for some difficult cases.
  • StructureCuration.csv – File containing the curation rules for some deficient SMILES of the input.
  • TutoDataCuration.zip – The final KNIME workflow. Unzip the archive in the KNIME workspace and it will appear in your LOCAL workflows.
  • Slurp.pl – A Perl script exploring the website of The Good Scents Company in search of some chemical information.
The Good Scent Company is an online shop providing cosmetic, flavor, and fragrance ingredients. It provides information for the flavor, food, and fragrance industry since 1994, and sales ingredients since 1980.

Theoretical Background

Chemical datasets can be collected from literature, compendiums, web sites, lab‐books, databases, and so on. Aggregation and automatic treatment of data represent additional sources of errors. Therefore, verification of quality and accuracy of chemical information is a crucial step of data valorization.[1]
The problem of the quality of publicly available chemical data can be illustrated on the searching the Web for the chemical structure of antibacterial compound Vancomycine, for which stereochemistry information is essential. One can suggest two possible queries using InChIKey notations:[2,3]
  • Query 1: “MYPYJXKWCTUITO” “Vancomycine”
  • Query 2: “MYPYJXKWCTUITO‐LYRMYLQWSA‐N” “Vancomycine”
Query 1 corresponds to the first layer of the InChI code of Vancomycine; it encodes only elemental constitution and atoms connectivity, whereas Query 2 includes detailed stereochemistry information.
A search on Google (29/01/2016) retrieves 82 and 71 entries for Queries 1 and 2, respectively. Entries found with Query 2 correspond to the correct chemical structure of Vancomycine, whereas all 11 additional entries retrieved with Query 1 refer to its different enantiomers, see example on Scheme 1.1.
Image described by caption.
Scheme 1.1 Chemical structures of Vancomycine from PubChem. (a) PubChem CID 441141, InChIKey : MYPYJXKWCTUITO‐UTHKAUQRSA‐N. (b) PubChem CID 14969, InChIKey : MYPYJXKWCTUITO‐LYRMYLQWSA‐N. Notice that Vancomycine corresponds to structure (b), whereas structure (a) is, in fact, its enantiomer.
From this example, one can see that an estimate of the erroneous data associating Vancomycine to the wrong chemical structure is about 13%. Analysis of some 6800 publications in drug discovery[4] show that the average error rate of reported chemical structures is about 8% and, it seems, nothing has changed so far. Numerous examples and alerts about data curation problems, espec...

Table of contents

  1. Cover
  2. Title Page
  3. Table of Contents
  4. List of Contributors
  5. Preface
  6. About the Companion Website
  7. Part 1: Chemical Databases
  8. Part 2: Library Design
  9. Part 3: Data Analysis and Visualization
  10. Part 4: Obtaining and Validation QSAR/QSPR Models
  11. Part 5: Ensemble Modeling
  12. Part 6: 3D Pharmacophore Modeling
  13. Part 7: The Protein 3D-Structures in Virtual Screening
  14. Part 8: Protein-Ligand Docking
  15. Part 9: Pharmacophorical Profiling Using Shape Analysis
  16. Part 10: Algorithmic Chemoinformatics
  17. Index
  18. End User License Agreement