Information retrieval (IR) is a process of retrieving information, which satisfies usersā need, from massive unstructured datasets (such as natural language texts). IR is an important tool that helps users to rapidly and effectively derive useful information from massive data. With the drastic increase in the size of data and the increasingly growing user needs in search services, IR has evolved from a tool that was only designed for libraries into a network service indispensable in life, work, and study. In addition to the search systems represented by the popular Google search engine, some other common forms of IR systems include classification systems, recommendation systems, and Q&A systems.
With the rapid popularization and continual development of social networking services (SNS), IR not only has new resources and opportunities but is also confronted by new problems and challenges. Acquiring information from emerging resources such as social networks has gradually drawn attention from both industry and academics. Compared with the traditional webpages, social network texts have different characteristics, such as the limit of text length, special expression form (such as Hashtag1 in microblogs), and existence of social relations between authors. These differences make it inappropriate to directly apply traditional IR technologies to an SNS environment. Social network-oriented IR technology still faces many problems and difficulties, and it is of great academic significance and application value to conduct research in this field.
This chapter mainly introduces IR for social networks, and aims to present the challenges faced by IR technology in its applications to new resources of social networks, and also introduces some possible solutions to these problems. Concretely, three most representative IR applications ā search, classification, and recommendation are discussed in this chapter. This chapter is arranged as follows: Section 1.1 is the Introduction, which introduces the relevant concepts commonly used throughout the chapter, along with the challenges facing social network-oriented IR technology; Sections 1.2, 1.3, and 1.4, respectively, introduce the basic methods for content search, content classification, and recommendation in social networks, and the status quo of researches in these areas; Section 1.5 provides the summary of this chapter and future prospects. In Sections 1.2 and 1.3, relevant researches are introduced based on microblog data ā one of the most representative SNSs, while Section 1.4 focusses on social networks developed from traditional e-commerce websites carrying social networking information, where commodities are recommended by integrating such online social networking information.
1.1 Introduction
IR is a process of retrieving information (generally documents), which satisfies usersā need for information, from massive unstructured datasets (generally texts that are mostly stored in computers) [1].
Unstructured data refers to data without obvious structural markers which differs from structured traditional databases. Natural language texts are the most common unstructured data. User information need refers to an information theme that the user wants to find, while a query is usually a piece of text that the user submits to the retrieval system representing his information need or an object in any other form (such as one or several keywords in the text search engine or sample images in the image search engine).
The dataset being retrieved refers to a corpus or a collection. Each record in a collection...