Social Data Analytics
Amin Beheshti, Samira Ghodratnama, Mehdi Elahi, Helia Farhood
- 238 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Social Data Analytics
Amin Beheshti, Samira Ghodratnama, Mehdi Elahi, Helia Farhood
About This Book
This book is an introduction to social data analytics along with its challenges and opportunities in the age of Big Data and Artificial Intelligence. It focuses primarily on concepts, techniques and methods for organizing, curating, processing, analyzing, and visualizing big social data: from text to image and video analytics. It provides novel techniques in storytelling with social data to facilitate the knowledge and fact discovery. The book covers a large body of knowledge that will help practitioners and researchers in understanding the underlying concepts, problems, methods, tools and techniques involved in modern social data analytics. It also provides real-world applications of social data analytics, including: Sales and Marketing, Influence Maximization, Situational Awareness, customer success and Segmentation, and performance analysis of the industry. It provides a deep knowledge in social data analytics by comprehensively classifying the current state of research, by describing in-depth techniques and methods, and by highlighting future research directions. Lecturers will find a wealth of material to choose from for a variety of courses, ranging from undergraduate courses in data science to graduate courses in data analytics.
Frequently asked questions
Information
1 Social Data Analytics: Challenges and Opportunities
1.1 Understanding Social Data
- Data Science and Analytics, the goal here is to examine the information item that a social user posted. This may include: Text Analytics (e.g., to examine the text that a social user has posted), Natural Language Processing (e.g., to understand, analyze, manipulate, and process the text that a social user has posted), Image Processing and Analysis (e.g., to extract meaningful information from images posted by a social user) and more.
- Social Science and Analytics, the goal here is to study the communities on online social media and the relationships among individuals/groups within those communities. This will help us to understand how social users behave and influence the world around us.
- Cognitive Science and Analytics, the goal here is to study the intelligence, personality, behavior, and attitude of individuals/groups on online social media [58]. This could significantly contribute to personalizing the recommendations or analyzing behavioral disorders in Online Social Networks (to help in suicide prevention, school bullying detection, and extremist/criminal activity prediction).
- Organizing Social Data: this step deals with a variety of data ranging from structured to semi-structured and unstructured data. It involves organizing data using technologies from relational to NoSQL database management systems and Data Lakes [61].
- Processing Social Data: this step deals with the organization and manipulation of large amounts of social data, and may involve operations including validation, curation, sorting, classification, calculation, interpretation, and transformation of data. The main challenge in processing social data is the large volume of data generated from various sources. As an example, consider Twitter7, where approximately 12TB of data is generated every day on Twitter. Accordingly, processing a simple query such as âCalculate the count of the number of tweets (per day) for a list of different countriesâ on a single computer may take several days/weeks/months. In this context, Big Data platforms such as Apache Hadoop8 are required to support the real-time processing of social data.
- Curating Social Data: this step not only involves cleaning social data, but also includes efforts to understand the content and context of the social data [62, 67]. In particular, data curation is the process of transforming raw data into contextualized data. It includes all the tasks needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to the raw data. Social data curation may involve: Identifying relevant data sources, Ingesting data and knowledge, Cleaning, Integration, Transformation (Normalization and aggregation), Adding Value (e.g., Extraction, Enrichment, Linking, Summarization) [63].
- Summarizing Social Data: this step helps with efficiently coping with large amounts of social data, to generate data summaries with significant meaning to users. This step is vital to social data analytics. The amount of available information on any given topic on online social media is far beyond humansâ processing capacity to manage, e.g., due to information overabundance and irrelevant obtained information. Data summarization facilitates gathering related information and collecting it into a shorter format that enables answering complicated questions, gaining new insight, and discovering conceptual boundaries. Social data summarization aims to identify and highlight the critical aspects of one or multiple input document(s) within a defined size limit.
- Visualizing Social Data: this step enables a better understanding of the trends, outliers, and patterns in social data. Several techniques from simple visualization (e.g., using visual elements such as charts, graphs, and maps) to advanced approaches (e.g., storytelling with data [67] and interactive visualization [590]) could be leveraged to facilitate understanding social data and analytics results. These techniques can help us make sense of trillions of records and information items in social data, generated every second.
1.2 Organizing Social Data
1.2.1 Social Data Volume
1.2.2 Social Data Variety
1.2.3 Social Data Velocity
- Streaming: Social networks are quickly becoming the primary medium for sharing news and discussing what is happening in the world. For example, Twitter is now considered one of the fastest news sources in the world, as it produces rich data streams for immediate insights into ongoing matters and the conversations around them. Stream processing, i.e., a big data technology that focuses on the real-time processing of continuous streams of data in motion, is now supported by many big data platforms such as Apache Kafka12, Amazon Kinesis13, Microsoft Azure Stream Analytics14, Apache Flink15, and IBM Streaming Analytics16.
- Feedback Loop, i.e., a process in which the outputs of a system are circled back and used as inputs, is an important step in analyzing the data to produce actionable results. As an example, the browsers started to capture usersâ activities on the client-side, send that information to recommendation engines, with the goal to personalize the services for each user. For example, visit a Website to book a flight to travel to Australia, later when you log in to your social media account, e.g., Instagram or Facebook, you may see advertisements for cheap flights to Australia. In particular, this process may use customer activity and feedback, to create better recommendations.