Social Data Analytics
eBook - ePub

Social Data Analytics

Amin Beheshti, Samira Ghodratnama, Mehdi Elahi, Helia Farhood

Share book
  1. 238 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Social Data Analytics

Amin Beheshti, Samira Ghodratnama, Mehdi Elahi, Helia Farhood

Book details
Book preview
Table of contents
Citations

About This Book

This book is an introduction to social data analytics along with its challenges and opportunities in the age of Big Data and Artificial Intelligence. It focuses primarily on concepts, techniques and methods for organizing, curating, processing, analyzing, and visualizing big social data: from text to image and video analytics. It provides novel techniques in storytelling with social data to facilitate the knowledge and fact discovery. The book covers a large body of knowledge that will help practitioners and researchers in understanding the underlying concepts, problems, methods, tools and techniques involved in modern social data analytics. It also provides real-world applications of social data analytics, including: Sales and Marketing, Influence Maximization, Situational Awareness, customer success and Segmentation, and performance analysis of the industry. It provides a deep knowledge in social data analytics by comprehensively classifying the current state of research, by describing in-depth techniques and methods, and by highlighting future research directions. Lecturers will find a wealth of material to choose from for a variety of courses, ranging from undergraduate courses in data science to graduate courses in data analytics.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Social Data Analytics an online PDF/ePUB?
Yes, you can access Social Data Analytics by Amin Beheshti, Samira Ghodratnama, Mehdi Elahi, Helia Farhood in PDF and/or ePUB format, as well as other popular books in Betriebswirtschaft & Medien- & Kommunikationsbranche. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2022
ISBN
9781000644630

1 Social Data Analytics: Challenges and Opportunities

1.1 Understanding Social Data

An Online Social Network (OSN) platform is a networking service which enables people to build social relationships with other people who may have same career, education, background, interest, goal, and more. To achieve this goal, Online Social Networks enable users to communicate by posting information, comments, messages, images, and more. Examples of social networks include Twitter1, Facebook2, Linkedin3, Instagram4, TikTok5, and Clubhouse6. Two main elements of a social network include an Information-Item (e.g., a tweet on Twitter or a post on Face-book) and a Social Actor (e.g., a person/organization who has an account on Twit-ter/Facebook). A social actor is a conscious, thinking individual who has an account on a social network such as Facebook and can shape their world in a variety of ways by reflecting on their situation and the choices available to them on social networks. An information item in a social network may contain structured information such as an id (unique identity of the information item), a source (the utility used to post the information item), a user (who posted the information item), and coordinates (the geographic location of this item). However, an information item may also include unstructured data such as text (the actual UTF-8 text of the status update), image, audio, and video.
In this context, the discovery, interpretation, and communication of meaningful patterns in social data (i.e., Social Data Analytics) could be a challenging task and may include:
1 https://Twitter.com/. 2 https://www.facebook.com/. 3 https://www.linkedin.com/. 4 https://www.instagram.com/. 5 https://www.tiktok.com/. 6 https://www.clubhouse.com/.
  • Data Science and Analytics, the goal here is to examine the information item that a social user posted. This may include: Text Analytics (e.g., to examine the text that a social user has posted), Natural Language Processing (e.g., to understand, analyze, manipulate, and process the text that a social user has posted), Image Processing and Analysis (e.g., to extract meaningful information from images posted by a social user) and more.
  • Social Science and Analytics, the goal here is to study the communities on online social media and the relationships among individuals/groups within those communities. This will help us to understand how social users behave and influence the world around us.
  • Cognitive Science and Analytics, the goal here is to study the intelligence, personality, behavior, and attitude of individuals/groups on online social media [58]. This could significantly contribute to personalizing the recommendations or analyzing behavioral disorders in Online Social Networks (to help in suicide prevention, school bullying detection, and extremist/criminal activity prediction).
The combination of data, social, and cognitive analytics will enable understanding, analyzing, measuring, and interpreting the data, topics, and ideas posted online social media as well as the relationships among social users on such networks. To achieve this, it is essential to characterize variables that grasp and encode information, thereby enabling to derive meaningful inferences from social data [37, 69]. Social data is useless unless processed in analytical tasks from which humans or downstream applications can derive insights. In this context, the main research problem would be to understand the social data. To properly understand the social data, we need to implement the following:
  • Organizing Social Data: this step deals with a variety of data ranging from structured to semi-structured and unstructured data. It involves organizing data using technologies from relational to NoSQL database management systems and Data Lakes [61].
  • Processing Social Data: this step deals with the organization and manipulation of large amounts of social data, and may involve operations including validation, curation, sorting, classification, calculation, interpretation, and transformation of data. The main challenge in processing social data is the large volume of data generated from various sources. As an example, consider Twitter7, where approximately 12TB of data is generated every day on Twitter. Accordingly, processing a simple query such as “Calculate the count of the number of tweets (per day) for a list of different countries” on a single computer may take several days/weeks/months. In this context, Big Data platforms such as Apache Hadoop8 are required to support the real-time processing of social data.
  • Curating Social Data: this step not only involves cleaning social data, but also includes efforts to understand the content and context of the social data [62, 67]. In particular, data curation is the process of transforming raw data into contextualized data. It includes all the tasks needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to the raw data. Social data curation may involve: Identifying relevant data sources, Ingesting data and knowledge, Cleaning, Integration, Transformation (Normalization and aggregation), Adding Value (e.g., Extraction, Enrichment, Linking, Summarization) [63].
  • Summarizing Social Data: this step helps with efficiently coping with large amounts of social data, to generate data summaries with significant meaning to users. This step is vital to social data analytics. The amount of available information on any given topic on online social media is far beyond humans’ processing capacity to manage, e.g., due to information overabundance and irrelevant obtained information. Data summarization facilitates gathering related information and collecting it into a shorter format that enables answering complicated questions, gaining new insight, and discovering conceptual boundaries. Social data summarization aims to identify and highlight the critical aspects of one or multiple input document(s) within a defined size limit.
  • Visualizing Social Data: this step enables a better understanding of the trends, outliers, and patterns in social data. Several techniques from simple visualization (e.g., using visual elements such as charts, graphs, and maps) to advanced approaches (e.g., storytelling with data [67] and interactive visualization [590]) could be leveraged to facilitate understanding social data and analytics results. These techniques can help us make sense of trillions of records and information items in social data, generated every second.
7 https://Twitter.com/. 8 Hadoop [641] is an open-source framework that uses a simple programming model to enable distributed processing of large data sets on clusters of computers.

1.2 Organizing Social Data

The continuous improvement in connectivity, storage, and data processing capabilities allows access to a data deluge from the big data generated on social data islands. Social data analytics for insight discovery is a strategic priority for modern businesses. This process heavily depends on properly organizing the large data generated on online social media platforms every second. Organizing social data may involve processes to persist and categorize social data to make it more usable. Persistence is the continuance of an effect after its cause is removed [44]. In computer science, data persistence means that the data survives after its creation has ended. In other words, for a data store to be considered persistent, it must be written to non-volatile storage. Ingesting and persisting vast amounts of social data (varying from the semi-structured data generated on Twitter to unstructured data such as images generated on Instagram, videos generated on Youtube, and audios posted on Clubhouse) being generated continuously, is challenging and may require a different approach to distributed data storage that is designed for large-scale clusters. Typical properties of social data include wide physical distribution, diversity of formats, independently-managed, and heterogeneous semantics.

1.2.1 Social Data Volume

Social data volume refers to the vast amounts of data generated every second on Online Social Networks. Social data is large scale, never ending, and ever changing, and arrives in diverse forms from diverse sources at irregular time intervals. There-fore, the main challenge would be to store the vast amounts of social data generated every second. To deal with the high volume of social data, with the aim to support scaling applications, there are two main approaches that can be used: (i) Scale Up: Keep the same number of Systems/Servers, but migrate each system to a larger System. For example, changing from a server with 16 CPU cores and 1 TB storage system to a server with 64 CPU cores and a 100 TB storage system; and (ii) Scale Out: When the workload exceeds the capacity of a server, the workload is spread out across several servers. This technique, also referred to as Clustering, supports scaling applications that have a loosely coupled architecture. In particular, it is cheaper to buy ten 100 TB storage systems than it is to buy a single 1 PB storage system.

1.2.2 Social Data Variety

Social data variety refers to the different types of data (from structured to semi-structured and unstructured) generated on Online Social Media.
Structured Data is highly organized and easily decipherable by an algorithm. Structured data can be easily defined by a schema, i.e., a structure described in a formal language supported by the database management system to facilitate organizing and interpreting information. Relational database management systems [144] (RDBMSs) typically support organizing structured data and require creating a schema for data before writing into the database, which is called Schema-on-write. Example of structured data generated on Online Social Networks includes the record of a user on a social network.
Unstructured Data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Examples include text, audio, video, images, and analog data. NoSQL database management systems [144] normally support organizing unstructured data and do not require creating a schema for data before writing into the database, which is called Schema-on-read. Example of unstructured data generated on Online Social Networks includes the text of a tweet posted on Twitter or an image posted on Instagram.
Semi-structured Data is a form of structured data that requires a high-level schema to understand the data structure. However, this schema may describe some unstructured data embedded in the high-level structure. As an example, consider a tweet in Twitter which has schema describing attributes such as an id (unique identity of the tweet), source (the utility used to post the Tweet), user (the user who posted this Tweet), coordinates (the geographic location of this Tweet), text (The actual UTF-8 text of the status update), as well as entities that have been parsed out of the text of the Tweet such as hashtags, URLs, and media.
Traditional data management systems required very predictable structured formats for the data, and supported relational data with a fixed schema. Today, social data is generated in diverse forms from diverse sources. To handle the volume and variety of such data, semi-structured data formats such as JSON9, Avro10, and XML11 have become the standard to store and exchange semi-structured data. In particular, semi-structured data does not require a prior definition of a fixed schema. The schema can evolve over time, and new attributes could be added at any time. Moreover, semi-structured data may contain hierarchies of nested information.
9 https://www.json.org/. 10 https://avro.apache.org/. 11 https://www.w3.org/standards/xml/core.

1.2.3 Social Data Velocity

Social data velocity refers to the rate at which new social data enters the system as well as the rate at which the social data must be processed. In the past, social media applications used to capture only the data about the main entities, such as posts and user information. But recently, social media applications started to capture user activities such as capturing every click in searching, browsing, and comparing. This, in turn, will heavily increase the velocity of the social data. The velocity of social data processing can be broken down into Streaming and Feedback Loop.
  • Streaming: Social networks are quickly becoming the primary medium for sharing news and discussing what is happening in the world. For example, Twitter is now considered one of the fastest news sources in the world, as it produces rich data streams for immediate insights into ongoing matters and the conversations around them. Stream processing, i.e., a big data technology that focuses on the real-time processing of continuous streams of data in motion, is now supported by many big data platforms such as Apache Kafka12, Amazon Kinesis13, Microsoft Azure Stream Analytics14, Apache Flink15, and IBM Streaming Analytics16.
  • Feedback Loop, i.e., a process in which the outputs of a system are circled back and used as inputs, is an important step in analyzing the data to produce actionable results. As an example, the browsers started to capture users’ activities on the client-side, send that information to recommendation engines, with the goal to personalize the services for each user. For example, visit a Website to book a flight to travel to Australia, later when you log in to your social media account, e.g., Instagram or Facebook, you may see advertisements for cheap flights to Australia. In particular, this process may use customer activity and feedback, to create better recommendations.
12 https://kafka.apache.org/. 13 https://aws.amazon.com/kinesis/. 14 https://docs.microsoft.com/en-us/azure/stream-analytics/. 15 https://flink.apache.org/. 16 https://www.ibm.com/au-en/cloud/streaming-analytics.

1.2.4 Social Data an...

Table of contents