eBook - ePub

Social Data Analytics

Name: Social Data Analytics
ISBN: 9781000644630

238 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Social Data Analytics

About this book

This book is an introduction to social data analytics along with its challenges and opportunities in the age of Big Data and Artificial Intelligence. It focuses primarily on concepts, techniques and methods for organizing, curating, processing, analyzing, and visualizing big social data: from text to image and video analytics. It provides novel techniques in storytelling with social data to facilitate the knowledge and fact discovery. The book covers a large body of knowledge that will help practitioners and researchers in understanding the underlying concepts, problems, methods, tools and techniques involved in modern social data analytics. It also provides real-world applications of social data analytics, including: Sales and Marketing, Influence Maximization, Situational Awareness, customer success and Segmentation, and performance analysis of the industry. It provides a deep knowledge in social data analytics by comprehensively classifying the current state of research, by describing in-depth techniques and methods, and by highlighting future research directions. Lecturers will find a wealth of material to choose from for a variety of courses, ranging from undergraduate courses in data science to graduate courses in data analytics.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

CRC Press

Year

2022

Print ISBN

9781032196275

eBook ISBN

9781000644630

Topic

Informatique

Subtopic

Statistiques pour les entreprises et l'économie

1 Social Data Analytics: Challenges and Opportunities

1.1 Understanding Social Data

An Online Social Network (OSN) platform is a networking service which enables people to build social relationships with other people who may have same career, education, background, interest, goal, and more. To achieve this goal, Online Social Networks enable users to communicate by posting information, comments, messages, images, and more. Examples of social networks include Twitter¹, Facebook², Linkedin³, Instagram⁴, TikTok⁵, and Clubhouse⁶. Two main elements of a social network include an Information-Item (e.g., a tweet on Twitter or a post on Face-book) and a Social Actor (e.g., a person/organization who has an account on Twit-ter/Facebook). A social actor is a conscious, thinking individual who has an account on a social network such as Facebook and can shape their world in a variety of ways by reflecting on their situation and the choices available to them on social networks. An information item in a social network may contain structured information such as an id (unique identity of the information item), a source (the utility used to post the information item), a user (who posted the information item), and coordinates (the geographic location of this item). However, an information item may also include unstructured data such as text (the actual UTF-8 text of the status update), image, audio, and video.

In this context, the discovery, interpretation, and communication of meaningful patterns in social data (i.e., Social Data Analytics) could be a challenging task and may include:

¹ https://Twitter.com/. ² https://www.facebook.com/. ³ https://www.linkedin.com/. ⁴ https://www.instagram.com/. ⁵ https://www.tiktok.com/. ⁶ https://www.clubhouse.com/.

Data Science and Analytics, the goal here is to examine the information item that a social user posted. This may include: Text Analytics (e.g., to examine the text that a social user has posted), Natural Language Processing (e.g., to understand, analyze, manipulate, and process the text that a social user has posted), Image Processing and Analysis (e.g., to extract meaningful information from images posted by a social user) and more.
Social Science and Analytics, the goal here is to study the communities on online social media and the relationships among individuals/groups within those communities. This will help us to understand how social users behave and influence the world around us.
Cognitive Science and Analytics, the goal here is to study the intelligence, personality, behavior, and attitude of individuals/groups on online social media [58]. This could significantly contribute to personalizing the recommendations or analyzing behavioral disorders in Online Social Networks (to help in suicide prevention, school bullying detection, and extremist/criminal activity prediction).

The combination of data, social, and cognitive analytics will enable understanding, analyzing, measuring, and interpreting the data, topics, and ideas posted online social media as well as the relationships among social users on such networks. To achieve this, it is essential to characterize variables that grasp and encode information, thereby enabling to derive meaningful inferences from social data [37, 69]. Social data is useless unless processed in analytical tasks from which humans or downstream applications can derive insights. In this context, the main research problem would be to understand the social data. To properly understand the social data, we need to implement the following:

Organizing Social Data: this step deals with a variety of data ranging from structured to semi-structured and unstructured data. It involves organizing data using technologies from relational to NoSQL database management systems and Data Lakes [61].
Processing Social Data: this step deals with the organization and manipulation of large amounts of social data, and may involve operations including validation, curation, sorting, classification, calculation, interpretation, and transformation of data. The main challenge in processing social data is the large volume of data generated from various sources. As an example, consider Twitter⁷, where approximately 12TB of data is generated every day on Twitter. Accordingly, processing a simple query such as “Calculate the count of the number of tweets (per day) for a list of different countries” on a single computer may take several days/weeks/months. In this context, Big Data platforms such as Apache Hadoop⁸ are required to support the real-time processing of social data.
Curating Social Data: this step not only involves cleaning social data, but also includes efforts to understand the content and context of the social data [62, 67]. In particular, data curation is the process of transforming raw data into contextualized data. It includes all the tasks needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to the raw data. Social data curation may involve: Identifying relevant data sources, Ingesting data and knowledge, Cleaning, Integration, Transformation (Normalization and aggregation), Adding Value (e.g., Extraction, Enrichment, Linking, Summarization) [63].
Summarizing Social Data: this step helps with efficiently coping with large amounts of social data, to generate data summaries with significant meaning to users. This step is vital to social data analytics. The amount of available information on any given topic on online social media is far beyond humans’ processing capacity to manage, e.g., due to information overabundance and irrelevant obtained information. Data summarization facilitates gathering related information and collecting it into a shorter format that enables answering complicated questions, gaining new insight, and discovering conceptual boundaries. Social data summarization aims to identify and highlight the critical aspects of one or multiple input document(s) within a defined size limit.
Visualizing Social Data: this step enables a better understanding of the trends, outliers, and patterns in social data. Several techniques from simple visualization (e.g., using visual elements such as charts, graphs, and maps) to advanced approaches (e.g., storytelling with data [67] and interactive visualization [590]) could be leveraged to facilitate understanding social data and analytics results. These techniques can help us make sense of trillions of records and information items in social data, generated every second.

⁷ https://Twitter.com/. ⁸ Hadoop [641] is an open-source framework that uses a simple programming model to enable distributed processing of large data sets on clusters of computers.

1.2 Organizing Social Data

The continuous improvement in connectivity, storage, and data processing capabilities allows access to a data deluge from the big data generated on social data islands. Social data analytics for insight discovery is a strategic priority for modern businesses. This process heavily depends on properly organizing the large data generated on online social media platforms every second. Organizing social data may involve processes to persist and categorize social data to make it more usable. Persistence is the continuance of an effect after its cause is removed [44]. In computer science, data persistence means that the data survives after its creation has ended. In other words, for a data store to be considered persistent, it must be written to non-volatile storage. Ingesting and persisting vast amounts of social data (varying from the semi-structured data generated on Twitter to unstructured data such as images generated on Instagram, videos generated on Youtube, and audios posted on Clubhouse) being generated continuously, is challenging and may require a different approach to distributed data storage that is designed for large-scale clusters. Typical properties of social data include wide physical distribution, diversity of formats, independently-managed, and heterogeneous semantics.

1.2.1 Social Data Volume

Social data volume refers to the vast amounts of data generated every second on Online Social Networks. Social data is large scale, never ending, and ever changing, and arrives in diverse forms from diverse sources at irregular time intervals. There-fore, the main challenge would be to store the vast amounts of social data generated every second. To deal with the high volume of social data, with the aim to support scaling applications, there are two main approaches that can be used: (i) Scale Up: Keep the same number of Systems/Servers, but migrate each system to a larger System. For example, changing from a server with 16 CPU cores and 1 TB storage system to a server with 64 CPU cores and a 100 TB storage system; and (ii) Scale Out: When the workload exceeds the capacity of a server, the workload is spread out across several servers. This technique, also referred to as Clustering, supports scaling applications that have a loosely coupled architecture. In particular, it is cheaper to buy ten 100 TB storage systems than it is to buy a single 1 PB storage system.

1.2.2 Social Data Variety

Social data variety refers to the different types of data (from structured to semi-structured and unstructured) generated on Online Social Media.

Structured Data is highly organized and easily decipherable by an algorithm. Structured data can be easily defined by a schema, i.e., a structure described in a formal language supported by the database management system to facilitate organizing and interpreting information. Relational database management systems [144] (RDBMSs) typically support organizing structured data and require creating a schema for data before writing into the database, which is called Schema-on-write. Example of structured data generated on Online Social Networks includes the record of a user on a social network.

Unstructured Data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Examples include text, audio, video, images, and analog data. NoSQL database management systems [144] normally support organizing unstructured data and do not require creating a schema for data before writing into the database, which is called Schema-on-read. Example of unstructured data generated on Online Social Networks includes the text of a tweet posted on Twitter or an image posted on Instagram.

Semi-structured Data is a form of structured data that requires a high-level schema to understand the data structure. However, this schema may describe some unstructured data embedded in the high-level structure. As an example, consider a tweet in Twitter which has schema describing attributes such as an id (unique identity of the tweet), source (the utility used to post the Tweet), user (the user who posted this Tweet), coordinates (the geographic location of this Tweet), text (The actual UTF-8 text of the status update), as well as entities that have been parsed out of the text of the Tweet such as hashtags, URLs, and media.

Traditional data management systems required very predictable structured formats for the data, and supported relational data with a fixed schema. Today, social data is generated in diverse forms from diverse sources. To handle the volume and variety of such data, semi-structured data formats such as JSON⁹, Avro¹⁰, and XML¹¹ have become the standard to store and exchange semi-structured data. In particular, semi-structured data does not require a prior definition of a fixed schema. The schema can evolve over time, and new attributes could be added at any time. Moreover, semi-structured data may contain hierarchies of nested information.

⁹ https://www.json.org/. ¹⁰ https://avro.apache.org/. ¹¹ https://www.w3.org/standards/xml/core.

1.2.3 Social Data Velocity

Social data velocity refers to the rate at which new social data enters the system as well as the rate at which the social data must be processed. In the past, social media applications used to capture only the data about the main entities, such as posts and user information. But recently, social media applications started to capture user activities such as capturing every click in searching, browsing, and comparing. This, in turn, will heavily increase the velocity of the social data. The velocity of social data processing can be broken down into Streaming and Feedback Loop.

Streaming: Social networks are quickly becoming the primary medium for sharing news and discussing what is happening in the world. For example, Twitter is now considered one of the fastest news sources in the world, as it produces rich data streams for immediate insights into ongoing matters and the conversations around them. Stream processing, i.e., a big data technology that focuses on the real-time processing of continuous streams of data in motion, is now supported by many big data platforms such as Apache Kafka¹², Amazon Kinesis¹³, Microsoft Azure Stream Analytics¹⁴, Apache Flink¹⁵, and IBM Streaming Analytics¹⁶.
Feedback Loop, i.e., a process in which the outputs of a system are circled back and used as inputs, is an important step in analyzing the data to produce actionable results. As an example, the browsers started to capture users’ activities on the client-side, send that information to recommendation engines, with the goal to personalize the services for each user. For example, visit a Website to book a flight to travel to Australia, later when you log in to your social media account, e.g., Instagram or Facebook, you may see advertisements for cheap flights to Australia. In particular, this process may use customer activity and feedback, to create better recommendations.

¹² https://kafka.apache.org/. ¹³ https://aws.amazon.com/kinesis/. ¹⁴ https://docs.microsoft.com/en-us/azure/stream-analytics/. ¹⁵ https://flink.apache.org/. ¹⁶ https://www.ibm.com/au-en/cloud/streaming-analytics.

1.2.4 Social Data an...

Cover Page
Title Page
Copyright Page
Dedication
Foreword
Preface
Table of Contents
1. Social Data Analytics: Challenges and Opportunities
2. Organizing Social Data
3. Curating Social Data
4. Social Media Text Analytics
5. Social Media Image and Video Analytics
6. Summarizing Social Data
7. Storytelling with Social Data
8. Social Data and Recommender Systems: The Future of Personalization
9. Social Data Analytics Applications
References
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Social Data Analytics by Amin Beheshti,Samira Ghodratnama,Mehdi Elahi,Helia Farhood in PDF and/or ePUB format, as well as other popular books in Informatique & Statistiques pour les entreprises et l'économie. We have over 1.5 million books available in our catalogue for you to explore.