Graph-Powered Machine Learning
eBook - ePub

Graph-Powered Machine Learning

Alessandro Negro

Share book
  1. 496 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Graph-Powered Machine Learning

Alessandro Negro

Book details
Book preview
Table of contents
Citations

About This Book

Upgrade your machine learning models with graph-based algorithms, the perfect structure for complex and interlinked data. Summary
In Graph-Powered Machine Learning, you will learn: The lifecycle of a machine learning project
Graphs in big data platforms
Data source modeling using graphs
Graph-based natural language processing, recommendations, and fraud detection techniques
Graph algorithms
Working with Neo4J Graph-Powered Machine Learning teaches to use graph-based algorithms and data organization strategies to develop superior machine learning applications. You'll dive into the role of graphs in machine learning and big data platforms, and take an in-depth look at data source modeling, algorithm design, recommendations, and fraud detection. Explore end-to-end projects that illustrate architectures and help you optimize with best design practices. Author Alessandro Negro's extensive experience shines through in every chapter, as you learn from examples and concrete scenarios based on his work with real clients! Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology
Identifying relationships is the foundation of machine learning. By recognizing and analyzing the connections in your data, graph-centric algorithms like K-nearest neighbor or PageRank radically improve the effectiveness of ML applications. Graph-based machine learning techniques offer a powerful new perspective for machine learning in social networking, fraud detection, natural language processing, and recommendation systems. About the book
Graph-Powered Machine Learning teaches you how to exploit the natural relationships in structured and unstructured datasets using graph-oriented machine learning algorithms and tools. In this authoritative book, you'll master the architectures and design practices of graphs, and avoid common pitfalls. Author Alessandro Negro explores examples from real-world applications that connect GraphML concepts to real world tasks. What's inside Graphs in big data platforms
Recommendations, natural language processing, fraud detection
Graph algorithms
Working with the Neo4J graph databaseAbout the reader
For readers comfortable with machine learning basics. About the author
Alessandro Negro is Chief Scientist at GraphAware. He has been a speaker at many conferences, and holds a PhD in Computer Science.Table of Contents
PART 1 INTRODUCTION
1 Machine learning and graphs: An introduction
2 Graph data engineering
3 Graphs in machine learning applications
PART 2 RECOMMENDATIONS
4 Content-based recommendations
5 Collaborative filtering
6 Session-based recommendations
7 Context-aware and hybrid recommendations
PART 3 FIGHTING FRAUD
8 Basic approaches to graph-powered fraud detection
9 Proximity-based algorithms
10 Social network analysis against fraud
PART 4 TAMING TEXT WITH GRAPHS
11 Graph-based natural language processing
12 Knowledge graphs

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Graph-Powered Machine Learning an online PDF/ePUB?
Yes, you can access Graph-Powered Machine Learning by Alessandro Negro in PDF and/or ePUB format, as well as other popular books in Computer Science & Neural Networks. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Manning
Year
2021
ISBN
9781638353935

Part 1 Introduction

We are surrounded by graphs. Facebook, LinkedIn, and Twitter are the most famous examples of social networks—that is, graphs of people. Other types of graphs exist even though we don’t think of them as such: electrical or power networks, the tube, and so on.
Graphs are powerful structures useful not only for representing connected information, but also for supporting multiple types of analysis. Their simple data model, consisting of two basic concepts such as nodes and relationships, is flexible enough to store complex information. If you also store properties in nodes and relationships, it is possible to represent practically everything of any size.
Furthermore, in a graph every single node and every single relationship is an access point for analysis, and from an access point, it is possible to navigate the rest in an endless way, which provides multiple access patterns and analysis potentials.
Machine learning, on the other side, provides tools and techniques for making representations of reality and providing predictions. Recommendation is a good example; the algorithm takes what the users interacted with and is capable of predicting what they will be interested in. Fraud detection is another one, taking the previous transactions (legit or not) and creating a model that can recognize with a good approximation whether a new transaction is fraudulent.
The performance of machine learning algorithms, both in terms of accuracy and speed, is affected almost directly from the way in which we represent our training data and store our prediction model. The quality of algorithm prediction is as good as the quality of the training dataset. Data cleansing and feature selection, among other tasks, are mandatory if we would like to achieve a reasonable level of trust in the prediction. The speed at which the system provides prediction affects the usability of the entire product. Suppose that a recommendation algorithm for an online retailer produced recommendations in 3 minutes. By that time, the user would be on another page or, worse, on a competitor’s website.
Graphs can support machine learning by doing what they do best: representing data in a way that is easily understandable and easily accessible. Graphs make all the necessary processes faster, more accurate, and much more effective. Moreover, graph algorithms are powerful tools for machine learning practitioners. Graph community detection algorithms can help identify groups of people, page rank can reveal the most relevant keywords in a text, and so on.
If you didn’t fully understand some of the terms and concepts presented in the introduction, the first part of the book will provide you all the knowledge you need to move further in the book. It introduces the basic concepts related to graphs and machine learning as single, independent entities and as powerful binomials. Let me wish you good reading!

1 Machine learning and graphs: An introduction

This chapter covers
  • An introduction to machine learning
  • An introduction to graphs
  • The role of graphs in machine learning applications
Machine learning is a core branch of artificial intelligence: it is the field of study in computer science that allows computer programs to learn from data. The term was coined in 1959, when Arthur Samuel, an IBM computer scientist, wrote the first computer program to play checkers [Samuel, 1959]. He had a clear idea in mind:
Programming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort.
Samuel wrote his initial program by assigning a score to each board position based on a fixed formula. This program worked quite well, but in a second approach, he had the program execute thousands of games against itself and used the results to refine the board scoring. Eventually, the program reached the proficiency of a human player, and machine learning took its first steps.
An entity—such as a person, an animal, an algorithm, or a generic computer agent1—is learning if, after making observations about the world, it is able to improve its performance on future tasks. In other words, learning is the process of converting experience to expertise or knowledge [Shalev-Shwartz and Ben-David, 2014]. Learning algorithms use training data that represents experience as input and create expertise as output. That output can be a computer program, a complex predictive model, or tuning of internal variables. The definition of performance depends on the specific algorithm or goal to be achieved; in general, we consider it to be the extent to which the prediction matches specific needs.
Let’s describe the learning process with an example. Consider the implementation of a spam filter for emails. A pure programming solution would be to write a program to memorize all the emails labeled as spam by a human user. When a new email arrives, the pseudoagent will search for a similar match in the previous spam emails, and if it finds any matches, the new email will be rerouted to the trash folder. Otherwise, the email will pass through the filter untouched.
This approach could work and, in some scenarios, be useful. Yet it is not a learning process because it lacks an important aspect of learning: the ability to generalize, to transform the individual examples into a broader model. In this specific use case, it means the ability to label unseen emails even though they are dissimilar to previously labeled emails. This process is also referred to as inductive reasoning or inductive inference.2 To generalize, the algorithm should scan the training data and extract a set of words whose appearance in an email message is indicative of spam. Then, for a new email, the agent would check whether one or more of the suspicious words appear and predict its label accordingly.
If you are an experienced developer, you might be wondering, “Why should I write a program that learns how to program itself, when I can instruct the computer to carry out the task at hand?” Taking the example of the spam filter, it is possible to write a program that checks for the occurrence of some words and classifies an email as spam if those words are present. But this approach has three primary disadvantages:
  • A developer cannot anticipate all possible situations. In the spam-filter use case, all the words that might be used in a spam email cannot be predicted up front.
  • A developer cannot anticipate all changes over time. In spam emails, new words can be used, or techniques can be adopted to avoid easy recognition, such as adding hyphens or spaces between characters.
  • Sometimes, a developer cannot write a program to accomplish the task. Even though recognizing the face of a friend is a simple task for a human, for example, it is impossible to program software to accomplish this task without the use of machine learning.
Therefore, when you face new problems or tasks that you would like to solve with a computer program, the following questions can help you decide whether to use machine learning:
  • Is the specific task too complex to be programmed?
  • Does the task require any sort of adaptivity throughout its life?
A crucial aspect of any machine learning task is the training data on which the knowledge is built. Starting from the wrong data leads to the wrong results, regardless of the potential performance or the quality of the learning algorithm used.
The aim of this book is to help data scientists and data engineers approach the machine learning process from two sides: the learning algorithm and the data. In both perspectives, we will use the graph (let me introduce it now as a set of nodes and relationships connec...

Table of contents