eBook - ePub

Deep Learning for Search

Name: Deep Learning for Search
Author: Tommaso Teofili

Tommaso Teofili

Condividi libro

328 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Deep Learning for Search

Tommaso Teofili

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Summary Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on!Foreword by Chris Mattmann.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You'll review how DL relates to search basics like indexing and ranking. Then, you'll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you'll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's inside

Accurate and relevant rankings
Searching across languages
Content-based image search
Search with recommendations

About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA).He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Table of Contents

PART 1 - SEARCH MEETS DEEP LEARNING

Neural search
Generating synonyms

PART 2 - THROWING NEURAL NETS AT A SEARCH ENGINE

From plain retrieval to text generation
More-sensitive query suggestions
Ranking search results with word embeddings
Document embeddings for rankings and recommendations

PART 3 - ONE STEP BEYOND

Searching across languages
Content-based image search
A peek at performance

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Deep Learning for Search è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Deep Learning for Search di Tommaso Teofili in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatik e Neuronale Netzwerke. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Manning

Anno

2019

ISBN

9781638356271

Argomento

Informatik

Categoria

Neuronale Netzwerke

Part 1. Search meets deep learning

Setting up search engines to effectively react to users’ needs isn’t an easy task. Traditionally, many manual tweaks and adjustments had to be made to a search engine’s internals to get it to work decently on a real collection of data. On the other hand, deep neural networks are very good at learning useful information about vast amounts of data. In this first part of the book, we’ll start looking into how a search engine can be used in conjunction with a neural network to get around some common limitations and provide users with a better search experience.

Chapter 1. Neural search

This chapter covers

A gentle introduction to search fundamentals
Important problems in search
Why neural networks can help search engines be more effective

Suppose you want to learn something about the latest research breakthroughs in artificial intelligence. What will you do to find information? How much time and work does it take to get the facts you’re looking for? If you’re in a (huge) library, you can ask the librarian what books are available on the topic, and they will probably point you to a few they know about. Ideally, the librarian will suggest particular chapters to browse in each book.

That sounds easy enough. But the librarian generally comes from a different context than you do, meaning you and the librarian may have different opinions about what’s significant. The library could have books in various languages, or the librarian might speak a different language. Their information about the topic could be outdated, given that latest is a fairly relative point in time, and you don’t know when the librarian last read anything about artificial intelligence, or if the library regularly receives publications in the field. Additionally, the librarian may not understand your inquiry properly. The librarian may think you’re talking about intelligence from the psychology perspective,^[1] requiring a few iterations back and forth before you understand one another and get to the pieces of information you need.

¹
This happened to me in real life.

Then, after all this, you might discover the library doesn’t have the book you need; or the information may be spread among several books, and you have to read them all. Exhausting!

Unless you’re a librarian yourself, this is what often happens nowadays when you search for something on the internet. Although we can think of the internet as a single huge library, there are many different librarians out there to help you find the information you need: search engines. Some search engines are experts in certain topics; others know only a subset of a library, or only a single book.

Now imagine that someone, let’s call him Robbie, who already knows about both the library and its visitors, can help you communicate with the librarian in order to better find what you’re looking for. That will help you get your answers more quickly. Robbie can help the librarian understand a visitor’s inquiry by providing additional context, for example. Robbie knows what the visitor usually reads about, so he skips all the books about psychology. Also, having read a lot of the books in the library, Robbie has better insight into what’s important in the field of artificial intelligence. It would be extremely helpful to have advisors like Robbie to help search engines work better and faster, and help users get more useful information.

This book is about using techniques from a machine learning field called deep learning (DL) to build models and algorithms that can influence the behavior of search engines, to make them more effective. Deep learning algorithms will play the role of Robbie, helping the search engine to provide a better search experience and to deliver more precise answers to end users.

One important thing to note is that DL isn’t the same as artificial intelligence (AI). As you can see in figure 1.1, AI is a huge research field; machine learning is only part of it, and DL, in turn, is a sub-area of machine learning. Basically, DL studies how to make machines “learn” using the deep neural network computing model.

Figure 1.1. Artificial intelligence, machine learning, and deep learning

1.1. Neural networks and deep learning

The goal of this book is to enable you to use deep learning in the context of search engines, to improve the search experience and results. Even if you’re not going to build the next Google search, you should be able to learn enough to use DL techniques within small or medium-sized search engines to provide a better experience to users. Neural search should help you automate work that you’d otherwise have to perform manually. For example, you’ll learn how to automate extraction of synonyms from search engine data, avoiding manual editing of synonym files (chapter 2). This saves time while improving search effectiveness, regardless of the specific use case or domain. The same is true for having good related-content suggestions (chapter 6). In many cases, users are satisfied with a combination of plain search together with the ability to navigate related content. We’ll also cover some more-specific use cases, such as searching content in multiple languages (chapter 7) and searching for images (chapter 8).

The only requirement for the techniques we’ll discuss is that they have enough data to feed into neural networks. But it’s difficult to define the boundaries of “enough data” in a generic way. Let’s instead summarize the minimum number of documents (text, images, and so on) that are generally needed for each of the problems addressed in the book: see table 1.1.

Table 1.1. Per-task requirements for neural search techniques

Task	Minimum number of docs (range)	Chapter
Learning word representations	1,000–10,000	2, 5
Text generation	10,000–100,000	3, 4
Learning document representations	1,000–10,000	6
Machine translation	10,000–100,000	7
Learning image representations	10,000–100,000	8

Note that table 1.1 isn’t meant to be strictly adhered to; these are numbers drawn from experience. For example, even if a search engine counts fewer than 10,000 documents, you can still try to implement the neural machine translation techniques from chapter 7; but you should take into account that it may be harder to get high-quality results (for example, perfect translations).

As you read the book, you’ll learn a lot about DL as well as all the required search fundamentals to implement these DL principles in a search engine. So if you’re a search engineer or a programmer willing to learn neural search, this book is for you.

You aren’t expected to know what DL is or how it works, at this point. You’ll get to know more as we look at some specific algorithms one by one, when they become useful for solving particular types of search problems. For now, I’ll start you off with some basic definitions. Deep learning is a field of machine learning where computers are capable of learning to represent and recognize things incrementally, by using deep neural networks. Deep artificial neural networks are a computational paradigm originally inspired by the way the brain is organized into graphs of neurons (although the brain is much more complex than an artificial neural network). Usually, information flows into neurons in an input layer, then through a network of hidden neurons (forming one or more hidden layers), and then out through neurons in an output layer. Neural networks can also be thought of as black boxes: smart functions that can transform inputs into outputs, based on what each network has been trained for. A common neural network has at least one input layer, one hidden layer, and one output layer. When a network has more than one hidden layer, we call the network deep. In figure 1.2, you can see a deep neural network with two hidden layers.

Figure 1.2. A deep neural network with two hidden layers

Before going into more detail about neural networks, let’s take a step back. I said deep learning is a subfield of machine learning, which is part of the broader area of artificial intelligence. But what is machine learning?

1.2. What is machine learning?

An overview of basic machine learning concepts is useful here before diving into DL and search specifics. Many of the concepts that apply to learning with artificial neural networks, such as supervised and unsupervised learning, training, and predicting, come from machine le...