Opinion Analysis for Online Reviews
eBook - ePub

Opinion Analysis for Online Reviews

Yuming Lin, Xiaoling Wang;Aoying Zhou

Share book
  1. 128 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Opinion Analysis for Online Reviews

Yuming Lin, Xiaoling Wang;Aoying Zhou

Book details
Book preview
Table of contents
Citations

About This Book

This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Opinion Analysis for Online Reviews an online PDF/ePUB?
Yes, you can access Opinion Analysis for Online Reviews by Yuming Lin, Xiaoling Wang;Aoying Zhou in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Vision & Pattern Recognition. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1

Introduction

With the development and popularization of Web2.0, the communication forms of network users have changed dramatically. More and more users prefer to browse, post and repost messages on various social network platforms to share opinions and experience with others. At the same time, the development of social network techniques and the transformation of user communication have had a great influence on e-commerce area. An open network environment would provide a broad space and convenient way for many practical applications, such as e-commerce/e-government and advertising serving.
On the other hand, the rich opinions in online reviews serve as a valuable reference for government departments, manufacturers, merchants and customers. By reading the reviews, the consumers can know about the features of products, providers can know about what the consumers care about on their products. According to the survey1 released by Cone in 2011, before deciding whether to purchase recommended products or services, 81% consumers will go online to verify those recommendations, specifically through researching product/service information (61%), reading user reviews (55%) or searching rating websites (43%). About four-out-of-five consumers have changed their purchase intention based solely on negative information. And positive information has a similar influence on decision making. As a result, online reviews are very important to a product/service on the Web, it brings an urgent demand on identifying the users’ opinions expressed in reviews automatically.

1.1 The research framework on opinion analysis

Opinion analysis, also called opinion mining, sentiment analysis [Pang and Lee (2008)], involves the processes of analysis, processing, induction and deduction for text. In this book, we use these terms more or less interchangeably. The study on opinion analysis has received increasing attention by industry insiders and academics, which involves many challenging tasks like review quality evaluation, opinion information extraction, opinion identification, opinion retrieval and opinion summarization. Figure 1.1 shows the research framework on opinion analysis.
images
Fig. 1.1 The research framework on opinion analysis
As one of the main ways on sentiment expression, the text from various sources is quite different on quality. For example, the reviews from authoritative sites would have high quality, those from the public review sites would encounter the quality problem, even the malicious reviews deviating from the facts. Such reviews are called review spam [Jindal and Liu (2008)], which should be filtered out before analyzing the opinion expressed in reviews. The techniques on review spam detection target at identifying the review spam [Jindal and Liu (2008); Li et al. (2011); Ott et al. (2011); Xie et al. (2012); Jindal et al. (2010)], the review spammers [Lee et al. (2010); Lim et al. (2010)], and the suspicious reviewer groups [Mukherjee et al. (2011, 2012)]. In order to reduce the negative impacts made by review spam, they should be detected as early as possible. The review quality can be evaluated by review’s readability, product feature coverage, product relevance, reviewer’s expertise and so on [Kim and Movy (2006); Lu et al. (2010); Liu et al. (2008); P O’Mahony and Smyth (2009)]. It is worth noting that the reviews with low quality are not equal to the review spam. A low-quality review may be caused by the reviewer’s experience, education, etc. Such opinion information should be treated according to the application requirement, rather than filtered out directly.
Not all contents in text contain opinion. Opinion information extraction focuses on identifying and extracting the subjective contents from text. A basic task on this topic is to determine whether a document contains opinions or identify which contents in document contain opinions. After Mihalcea and his colleagues analyzed some research projects [Andreevskaia and Bergler (2006); Esuli and Sebastiani (2006a); Takamura et al. (2006)], they reported that distinguishing the subjective contents from the objective ones in documents is more difficult than identifying opinion types in Ref. [Mihalcea et al. (2007)]. Besides the documents, some previous works on subjective object detection focused on a finer grain of sentences and entities such as Refs. [Hatzivassiloglou and Wiebe (2000, 2005); Pang and Lee (2004); Hatzivassiloglou and Wiebe (2004)]. The opinion entities can be opinion holders, opinion targets, and opinion words. Opinion holder indicates who expresses the opinion. Extracting opinion holder is also an important task. In general, a opinion holder is a named entity like name and organization. Thus, the opinion holders can be extracted by the techniques of Name Entity Recognition (NER). Opinion target is the object that user opinions express on. For example, the product attributes always are the opinion target in product reviews. Opinion words are the words and phrases users use to express their opinions, which are often regarded as the adjectives and verbs.
Opinion identification is to identify what types of opinion expressed in opinion text, which can be treated as a classification problem. This work can be done at different granularities. For the case of document level, a document will be classified as one of the predefined categories according to its overall sentiment orientation. Most works on opinion classification focus on two opposite opinion categories (positive, negative) [Turney (2002); Pang et al. (2002); Lin et al. (2012c); Tan et al. (2011); Pan et al. (2010)] and three categories (positive, neutral, negative) [Feng et al. (2011); Barbosa and Feng (2010, 2011); Wilson et al. (2005)]. Of course, we can divide user opinions into multiple types by different sentiment intensity levels such as positive, weak positive, neutral, weak negative, and negative. However, with the increase in type number, it is difficult for user to determine which category a document belongs to really. In some cases, the opinion targets need to be refined. For example, user can comment the appearance, screen, and power consumption in a review on cellphone.
The traditional text retrieval techniques focus on string similarity. With more opinion-rich user-generated contents appearing on the Web, the retrieval techniques related to opinions become basic requirements. Opinion retrieval needs to search the documents meeting the topic relevance as well as the opinion relevance. Such systems seek the documents related to query topic at first. Then the opinions expressed on the corresponding topic are identified. At last, the retrieval document are sorted according to the opinion score and the relevance score. Thus, opinion retrieval combines the traditional text retrieval techniques and the opinion analysis techniques, which is one of the basic functions of new search engines.
The results of opinion analysis need to be shown to users. If the opinion information is too much, it would take users much time to process them. For instance, user often cannot read all reviews on a popular product, since lots of user comment it. Opinion summarization induces and concludes the opinion information contained in reviews automatically. It can bring a concise manner to show users the information contained in original review set. By this way, users can process the dominant information conveniently. Opinion classification can be treated as a coarse grained opinion summarization since it can determine the overall opinion of a document. However, the potential consumers would prefer to know the main features of a product. Thus, opinion summarization on product attributes would meet the users’ requirements better.
Opinion analysis covers all the processes from collecting data to showing analysis results. In the process of opinion analysis, it needs to apply the techniques on information retrieval, machine learning, data mining, and natural language processing. Thus, opinion analysis is an interdisciplinary study field.

1.2 The application prospects and challenges

Online reviews have the features of great variety and massive scale, which provide rich data for opinion analysis. Therefore, opinion analysis can play an important role in many applications, and it possesses great potentials and long-term developing prospects.
(1)Business decision and review analysis. Business intelligence targets at aiding enterprise to make decisions by integrating, presenting, and analyzing relative data. The conventional investigation ways encounter the problems of heavy workload and difficult information collection. However, there are lots of reviews on products, which express a wealth of useful information such as the experiences and opinions of users. The manufacturers can make decisions and improve quality of product based on these information to promote the customer satisfaction. Now, opinion analysis has been applied in practical applications. For example, the OpinionFinder system [Wilson et al. (2005)] can identify the subjective sentences and extract the sentiment information contained in sentences automatically. Yao et al. have developed an opinion analysis system to mine and summary the Chinese automobile reviews on sorts of auto brands [Yao et al. (2006)].
(2)Retrieval service based on users’ opinions. Many retrieve results wanted by users are relative to the opinions, such as “Which film is the most interesting one in this year?,” “Which one is the best place in Shanghai?.” For such queries, the search engines need not only to retrieve the content-relevant pages, but also identify the opinion expressed by the opinion holders. Such queries also appear in Question Answering systems, and the opinion analysis is essential for such applications in these systems [Lita et al. (2005); Somasundaran et al. (2007); Stoyanov et al. (2005)]. For the issues on definitions, if the answers contain others’ opinions on definitions, they would be easier to be understood by users [Lita et al. (2005)].
(3)Government intelligence. The social networking sites, such as online forums and microblogs, are important ways for users to obtain news and express their opinions, and are also an important platform for government agencies to obtain public opinions. Detecting pubic opinions on special events and policies quickly is useful for government to raise response speed, to make effective solutions, to restrict scale of negative effect, to promote reputation of government, and to maintain social stability. For another, some political events could be predicted by analyzing the opinions existed in Web data. For example, Kim successfully predicts the result of US presidential election by analyzing Web news reviews on election [Kim and Hovy (2007)].
(4)Web advertising. The huge count of network users provide a new chance for advertising. The traditional advertising shows same advertisement contents to all users, which makes poor effectiveness. However, Web users’ profiles and requirements can be predicted by their browsing history, posted contents, and so on. Thus, it makes directional delivery possible, namely, it provides personalized advertisements for users. But the base of such applications is to identify users’ preference.
(5)Precise recommendation service. The browsed or posted contents of a user can reflect what contents the user cares about and what are users’ preference, hence user profile can come into being through these information. Based on these information, advertising user with appropriate products, news and services will achieve good effectiveness.
(6)E-audiobooks. Many books and novels contain rich sentiment. If the sentiment expressed in each sentence can be identified correctly, so as to adjust the corresponding tone and speed, it would improve the expressiveness and infectivity of texts, and would arouse audiences’ resonance.
There are some Web sites providing opinion analysis functions, which are shown in Table 1.1. The second column indicates the languages supported by the system. In the third column, each capital letter indicates a granularity. The forth column indicates what categories the system supports. For example, The “3, [−1,1]” in second line means AlchemyAPI provides two forms of opinion category. The first form is three categories (positive, neural, negative). The second form is a real number locating in −1 and 1.
Table 1.1 Some Web sites proving opinion analysis functions
images
a en = English, fr = French, de = German, es = Spanish, ar = Arabic, pt = Portuguese, it = Italian, zh = Chinese, ru = Russian
b D = Document, S = Sentence, E = Entity
Although there are lots of works on opinion analysis, many areas still need to be explored. At the same time, the development of Web applications brings new chances and challenges for this field. Overall, main challenges can be summarized as follows.
(1)The complexity of sentiment expression. Sentiment can often be expressed in a more subtle manner, making it difficult to be identified by any sentences or document’s terms when they are considered in isolation [Pang and Lee (2008)]. Identifying such opinions is very difficult. For example, a movie review like “I should stay at home.” could express a negative sentiment, though this review does not contain any negative words.
(2)The dependency of opinion expression. User opinion’s expressions are influenced by his/her own education, cultural background, and experiences. Different users can have different opinions on the same sentence. Even for the same sentence written by a user, the opinions could depend on the context or the feeling at the moment. A word expresses different sentiment for different analyzed domains. For an instance, “predictable” expresses the positive opinion for electronics, while it is not desired for a novel.
(3)Ground truth is difficult to acquire. Labeling samples is a difficult task in opinion analysis, since it is subjective. For the same review, different readers could have inconsistent understanding. Thus, it is hard to obtain the ground truth to verify the effectiveness of proposed identification techniques.
(4)High noise and disunity text formats. Now social network platforms are the main way to collect data on user opinions. The openness of such platforms makes text formats inconsistent. At the same time, such data has high noise, since they are generated by users. How to integrate the data from different sources and reduce the noise, is an i...

Table of contents