Social Media Data Mining and Analytics
eBook - ePub

Social Media Data Mining and Analytics

Gabor Szabo, Gungor Polatkan, P. Oscar Boykin, Antonios Chalkiopoulos

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Social Media Data Mining and Analytics

Gabor Szabo, Gungor Polatkan, P. Oscar Boykin, Antonios Chalkiopoulos

Book details
Book preview
Table of contents

About This Book

Harness the power of social media to predict customer behavior and improve sales

Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.

Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Wikipedia, Stack Exchange, LiveJournal, movie reviews, and other rich data sources. In it, you will learn:

  • The four key characteristics of online services-users, social networks, actions, and content
  • The full data discovery lifecycle-data extraction, storage, analysis, and visualization
  • How to work with code and extract data to create solutions
  • How to use Big Data to make accurate customer predictions
  • How to personalize the social media experience using machine learning

Using the techniques the authors detail will provide organizations the competitive advantage they need to harness the rich data available from social media platforms.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Social Media Data Mining and Analytics an online PDF/ePUB?
Yes, you can access Social Media Data Mining and Analytics by Gabor Szabo, Gungor Polatkan, P. Oscar Boykin, Antonios Chalkiopoulos in PDF and/or ePUB format, as well as other popular books in Informatik & Data-Warehousing. We have over one million books available in our catalogue for you to explore.



Users: The Who of Social Media

Social media revolves around users, and their activities and interactions. Users create the content, communicate with each other, and ultimately keep the service alive and growing. This chapter looks at the typical user's behavior on social media services and the universal similarities you can see across the different services.
First, we focus on the most basic questions about the overall activity of those using the service: Are there some regularities in their aggregate statistics? If regularities exist in one service, can they be generalized to other systems? A few very basic conditions affecting usage give rise to measured activity distributions, and we quantify the differences among users in terms of overall activity with the help of observed regularities. Because activity distributions have a specific analytical form, we discuss why it's hard to take and interpret averages in actual social media systems in the presence of such distributions.
Throughout, we support our conclusions with data collected from Wikipedia and Twitter.

Measuring Variations in User Behavior in Wikipedia

One of the most important questions in terms of user activities is: How much do users contribute to, or use, the service? You can look at this question from many different points of view, but certainly one of the most straightforward ways to characterize users is to describe how frequently they come back and are present on the service. You can certainly expect that some users are more “active” than others—but how do you exactly quantify user activity in relation to the service?
User activity can be characterized in the most obvious manner by how many times a user performed a certain action such as leaving a comment, sharing a picture, creating or removing social network connections, and so on—in other words, using any facility that the service provides to its users. To determine this, the first thing to do is to define the time period for collecting the data needed to make the measurements.
Figure 1.1 shows two possible scenarios for choosing periods from which we can collect user activity data. In scenario (a), we chose more or less random, non‐consecutive periods for the data collection. Although this choice may be valid under specific requirements, we generally prefer consecutive, closed‐time ranges for data collection, like those that we can see in case (b). General user behavior may change over time (for instance new users might have different characteristics than older ones), so we prefer to sample user activity within as short a time range as possible. For this reason, case (b) is the natural choice, in which we select a continuous time interval and count the number of times a user has been active within this interval. This is the frequency of usage in the given time window.
Top: A rightward arrow (representing time) with 4 shaded rectangles of various widths lying on it. Bottom: A rightward arrow (representing time) with a horizontal shaded rectangle lying on it.
Figure 1.1: Possible choices for sampling time windows to measure aggregate user activity. In scenario (a), we pick non‐consecutive time windows randomly. In (b), we choose a continuous time window between two given points in time.

The Diversity of User Activities

We can reasonably assume that users will differ in how likely they are to use a service: some will be very active, whereas others will use the service only once in a while. How large are these differences, and how can we characterize them? These are the questions you look at in this section.
This section uses the Wikipedia edit history logs. First, we look at how often Wikipedia editors contribute to articles: The question is how many times a given user makes a change to any Wikipedia article in each time period. A Wikipedia “editor” is anyone with a registered user name, and in the broader sense anyone who makes a smaller or larger change to any Wikipedia article. Luckily, the Wikimedia foundation makes the edit history of all articles a...

Table of contents