eBook - ePub

High Performance Computing for Big Data

Name: High Performance Computing for Big Data
ISBN: 9781351651578

Methodologies and Applications

Chao Wang,

268 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

High Performance Computing for Big Data

Methodologies and Applications

Chao Wang,

About this book

High-Performance Computing for Big Data: Methodologies and Applications explores emerging high-performance architectures for data-intensive applications, novel efficient analytical strategies to boost data processing, and cutting-edge applications in diverse fields, such as machine learning, life science, neural networks, and neuromorphic engineering.

The book is organized into two main sections. The first section covers Big Data architectures, including cloud computing systems, and heterogeneous accelerators. It also covers emerging 3D IC design principles for memory architectures and devices. The second section of the book illustrates emerging and practical applications of Big Data across several domains, including bioinformatics, deep learning, and neuromorphic engineering.

Features

Covers a wide range of Big Data architectures, including distributed systems like Hadoop/Spark

Includes accelerator-based approaches for big data applications such as GPU-based acceleration techniques, and hardware acceleration such as FPGA/CGRA/ASICs

Presents emerging memory architectures and devices such as NVM, STT- RAM, 3D IC design principles

Describes advanced algorithms for different big data application domains

Illustrates novel analytics techniques for Big Data applications, scheduling, mapping, and partitioning methodologies

Featuring contributions from leading experts, this book presents state-of-the-art research on the methodologies and applications of high-performance computing for big data applications.

About the Editor

Dr. Chao Wang is an Associate Professor in the School of Computer Science at the University of Science and Technology of China. He is the Associate Editor of ACM Transactions on Design Automations for Electronics Systems (TODAES), Applied Soft Computing, Microprocessors and Microsystems, IET Computers & Digital Techniques, and International Journal of Electronics. Dr. Chao Wang was the recipient of Youth Innovation Promotion Association, CAS, ACM China Rising Star Honorable Mention (2016), and best IP nomination of DATE 2015. He is now on the CCF Technical Committee on Computer Architecture, CCF Task Force on Formal Methods. He is a Senior Member of IEEE, Senior Member of CCF, and a Senior Member of ACM.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Statistics for Business & Economics

Index

Computer Science

Emerging Big Data Applications

CHAPTER 5

Matrix Factorization for Drug–Target Interaction Prediction

Yong Liu, Min Wu, and Xiao-Li Li

Institute of Infocomm Research (I2R)

A*STAR, Singapore

Peilin Zhao

Artificial Intelligence Department

Ant Financial Services Group, China

CONTENTS

5.1 Introduction

5.2 Related Work

5.2.1 Classification-Based Methods

5.2.2 Matrix Factorization-Based Methods

5.3 Neighborhood Regularized Logistic Matrix Factorization

5.3.1 Problem Formalization

5.3.2 Logistic Matrix Factorization

5.3.3 Neighborhood Regularization

5.3.4 Combined Model

5.3.5 Neighborhood Smoothing

5.4 Experimental Results

5.4.1 Experimental Settings

5.4.2 Performance Comparison

5.4.3 Neighborhood Benefits

5.4.4 Parameter Sensitivity Analysis

5.4.5 Predicting Novel Interactions

5.5 Conclusions

References

5.1 INTRODUCTION

The drug discovery is one of the primary objectives of the pharmaceutical sciences, which is an interdisciplinary research field of fundamental sciences covering biology, chemistry, physics, statistics, etc. In the drug discovery process, the prediction of drug–target interactions (DTIs) is an important step that aims to identify potential new drugs or new targets for existing drugs. Therefore, it can help guide the experimental validation and reduce costs. In recent years, the DTI prediction has attracted vast research attention and numerous algorithms have been proposed [1, 2]. Existing methods predict DTIs based on a small number of experimentally validated interactions in existing databases, for example, ChEMBL [3], DrugBank [4], KEGG DRUG [5], and SuperTarget [6]. Previous studies have shown that a fraction of new interactions between drugs and targets can be predicted based on the experimentally validated DTIs, and the computational methods for identifying DTIs can significantly improve the drug discovery efficiency.

In general, traditional methods developed for DTI prediction can be categorized into two main groups: docking simulation approaches and ligand-based approaches [7–9]. The docking simulation approaches predict potential DTIs, considering the structural information of target proteins. However, the docking simulation is extensively time-consuming, and the structural information may not be available for some protein families, for example, the G-protein coupled receptors (GPCRs). In the ligand-based approaches, potential DTIs are predicted by comparing a candidate ligand with the known ligands of the target proteins. This kind of approaches may not perform well for the targets with a small number of ligands.

Recently, the rapid development of machine learning techniques provides effective and efficient ways to predict DTIs. An intuitive idea is to formulate the DTI prediction as a binary classification problem, where the drug-target pairs are treated as instances, and the chemical structures of drugs and the amino acid subsequences of targets are treated as features. Then, classical classification methods [e.g., support vector machines (SVM) and regularized least square (RLS)] can be used for DTI prediction [10–16]. Essentially, the DTI prediction problem is a recommendation task that aims to suggest a list of potential DTIs. Therefore, another line of research for DTI prediction is the application of recommendation technologies, especially matrix factorization-based approaches [17–20]. The matrix factorization methods aim to map both drugs and targets into a shared latent space with low dimensionality and model the DTIs using the combinations of the latent representations of drugs and targets.

In this chapter, we introduce a DTI prediction approach, named neighborhood regularized logistic matrix factorization (NRLMF), which focuses on predicting the probability that a drug would interact with a target [21]. Specifically, the properties of a drug and a target are represented by two vectors in the shared low-dimensional latent space, respectively. For each drug-target pair, the interaction probability is modeled by a logistic function of the drug- specific and target-specific latent vectors. This is different from the kernelized Bayesian matrix factorization (KBMF) method [17] that predicts the interaction probability using a standard normal cumulative distribution function of the drug-specific and target-specific latent vectors [22]. In NRLMF, an observed interacting drug-target pair (i.e., positive observation) is treated as c = (c ≥ 1) positive examples, while an unknown pair (i.e., negative observation) is treated as a single negative example. As such, NRLMF assigns higher importance levels to positive observations than negatives. The reason is that the positive observations are biologically validated and thus usually more trustworthy. However, the negative observations could contain potential DTIs and are thus unreliable. This differs from previous matrix factorization-based DTI prediction methods [17–19] that treat the interaction and unknown pairs equally.

Furthermore, NRLMF also studies the local structure of the interaction data to improve the DTI prediction accuracy by exploiting the neighborhood influences from most similar drugs and most similar targets. In particular, NRLMF imposes individual regularization constraints on the latent representations of a drug and its nearest neighbors, which are most similar with the given drug. Similar neighborhood regularization constraints have also been added on the latent representations of targets. Note that this neighborhood regularization method is different from previous approaches that exploit the drug similarities and target similarities using kernels [12, 13, 15, 23] or factorizing the similarity matrices [19]. Moreover, the proposed approach only considers nearest neighbors instead of all similar neighbo...

Title Page
Copyright Page
Table of Contents
Preface
Acknowledgments
Editor
Contributors
SECTION I Big Data Architectures
SECTION II Emerging Big Data Applications
INDEX

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access High Performance Computing for Big Data by Chao Wang in PDF and/or ePUB format, as well as other popular books in Computer Science & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions