Computer Science

Big Data Variety

Big Data Variety refers to the diverse types of data that can be collected and analyzed, including structured, unstructured, and semi-structured data. This encompasses a wide range of data sources such as text, images, videos, sensor data, and more. Managing and analyzing this variety of data is a key challenge in the field of big data analytics.

Written by Perlego with AI-assistance

11 Key excerpts on "Big Data Variety"

  • Book cover image for: Networking for Big Data
    • Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen, Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen(Authors)
    • 2015(Publication Date)
    10–21, 2011. 41. K. Michael and K. Miller, Big data: New opportunities and new challenges, Computer (Long. Beach. Calif) ., vol. 46, no. 6, 2013, pp. 22–24. 42. NoSQL Database Technology, Report, 2014. This page intentionally left blank This page intentionally left blank 57 C H A P T E R 4 Big Data Distributed Systems Management Rashid A. Saeed and Elmustafa Sayed Ali B ig data deals with large scales of data characterized by three concepts: volume, variety, and velocity known as the 3Vs of Big Data. Volume is a term related to Big Data, and as known data can be organized in sizes by gigabytes or terabytes of data stor-age but Big Data means there are a lot of data amounting to more than terabytes such as petabytes or exabytes and it is one of the challenges of Big Data that it requires a scalable storage. Really, data volume will continue to grow every day, regardless of the organized sizes because of the natural tendency of companies to store all types of data such as finan-cial data, medical data, environmental data, and so on. Many of these companies’ data-sets are within the terabytes range today, but soon they could reach petabytes or even CONTENTS Big Data Challenges 58 Big Data Management Systems 60 Distributed File System 61 Nonstructural and Semistructured Data Storage 61 Big Data Analytics 62 Data Mining 62 Image and Speech Data Recognition 62 Social Network Data Analysis 63 Data Fusion and Integration 64 Management of Big Data Distributed Systems 64 Hadoop Technologies 65 Hadoop Distributed File System (HDFS) 65 Hadoop MapReduce 66 NoSQL Database Management System (NoSQL DBMS) 66 Software as a Service (SaaS)–Based Business Analytics 68 Master Data Management 68 Conclusion 69 References 69 58 ◾ Networking for Big Data exabyte and more. Variety of Big Data is an aggregation of many types of data and maybe structured or unstructured including social media, multimedia, web server logs, and many other types of information forms.
  • Book cover image for: Big Data and Social Science
    • Sudha Menon, University of Kerala, India(Authors)
    • 2019(Publication Date)
    Generally, as the requirements of the work exceed the ability of a single computer, it becomes a great challenge of pooling, assigning, and coordinating distinct resources from different groups of computers. Algorithms and cluster management that are able to break the task into smaller pieces become gradually important. 1.3.2. Velocity Another key way in which big data considerably differs from some other data systems is the speed by which the information flows through the system. Data is flowing into the system from different sources and it is generally presumed to be processed in real time so as to acquire insights and then update or modify the current understanding about the system. Such kind of focus on the near-instant feedback has determined several practitioners of big data away from the batch-oriented approach and much closer to the real-time streaming system. Constantly, the data is being added, manipulated, processed, and analyzed to keep up with the inflow of novel information and to pop in the most valuable information when it is much relevant and accurate. Such ideas need robust systems with extremely available components so as to protect against the failures along the pipeline of the data. 1.3.3. Variety The issues of big data are generally unique due to broad variety of the sources being processed along with their relative quality. Data can be easily ingested from the internal systems such as server and application logs, from feeds of social media, from physical device sensors and from various other providers. Big data appears to potentially handle the data irrespective of its source by consolidating all kind of information into a single system. The types and formats of social media can significantly differ. Rich media such as video files, audio recordings and images ingested together with the structured logs, text files, etc. Though more conventional systems
  • Book cover image for: Social Big Data Mining
    • Hiroshi Ishikawa(Author)
    • 2015(Publication Date)
    • CRC Press
      (Publisher)
    • The kinds (Variety) of data have expanded into unstructured texts, semi-structured data such as XML, and graphs (i.e., networks). • As is often the case with Twitter and sensor data streams, the speed (Velocity) at which data are generated is very high. Figure 2.1 Data deluge. 18 Social Big Data Mining Therefore, big data are often characterized as V 3 by taking the initial letters of these three terms Volume, Variety, and Velocity. Big data are expected to create not only knowledge in science but also values in various businesses. By variety, the author of this book means that big data appear in a wide variety of applications. Big data inherently contain “vagueness” such as inconsistency and deficiency. Such vagueness must be resolved in order to obtain quality analysis results. Moreover, a recent survey done in Japan has made it clear that a lot of users have “vague” concerns as to the securities and mechanisms of big data applications. The resolution of such concerns are one of keys to successful diffusion of big data applications. In this sense, V 4 should be used for the characteristics of big data, instead of V 3 . Social media data are a kind of big data that satisfy these V 4 characteristics as follows: First, sizes of social media are very large, as described in chapter one. Second, tweets consist mainly of texts, Wiki media consist of XML (semi-structured data), and Facebook articles contain photos and movies in addition to texts. Third, the relationships between users of social media, such as Twitter and Facebook, constitute large-scale graphs (networks). Furthermore, the speed of production of tweets is very fast. Social data can also be used in combination with various kinds of big data though they inherently contain contradictions and deficits. As social data include information about individuals, sufficient privacy protection and security management are mandatory.
  • Book cover image for: Handbook of Big Data
    • Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan, Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan(Authors)
    • 2016(Publication Date)
    All of these factors suggest a kind of ubiquity of data, but also contain a functionally vague understanding, which is situationally determined, and because of that it can be deployed in many contexts, has many advocates, and can be claimed by many as well. Partly because of this context-sensitive definition of the concept of big data, it is by no means a time phenomenon or novelty, but has a long genealogy that goes back to the earliest civilizations. Some aspects of this phenomenon will be discussed in the following sections. In addition, we will show in this chapter how big data embody a conception of data science at least at two levels. First of all, data science is the technical-scientific discipline, specialized in managing the multitude of data: collect, store, access, analyze, visualize, interpret, and protect. It is rooted in computer science and statistics; computer science is traditionally oriented toward data structures, algorithms, and scalability, and statistics is focused on analyzing and interpreting the data. In particular, we may identify here the triptych database technology/information retrieval, computational intelligence/machine learning, and finally inferential statistics. The first pillar concerns database/information retrieval technology. Both are core disciplines of computer science since many decades. Emerging from this tradition in recent years, notably researchers of Google and Yahoo have been working on techniques to cluster many computers in a data center, making data accessible and allowing for data-intensive calculations: think, for example, of BigTable, Google File Systems, a programming paradigm as Map Reduce, and the open source variant Hadoop. The paper of Halevy precedes this development as well. The second pillar relates to intelligent algorithms from the field of computational intelligence (machine learning and data mining).
  • Book cover image for: Practical Data Science for Information Professionals
    Despite the rapid growth in the popularity of big data, pinning down what is meant by the term is more difficult. Although it was initially used to refer to situations where the volume of data had grown so large that it no longer fitted into a computer’s processing memory (Mayer-Schönberger and Cukier, 2013), as computers have become more powerful (Moore’s law) and for a long time memory got much cheaper (Kryder’s law) (Rosenthal, 2017), such usage is crude as these situations are moveable targets: what is considered big data one day may not be big data the next. Rather than considering the absolute size of big data, the challenge is to analyse it for meaningful insights. Laney (2001) identified three growing data management challenges, and these have since been widely seen as traits of big data: its volume, velocity and variety. There are now enormous volumes of data available for analysis about ever more specific areas of our lives. This data is increasingly up to date. Whereas a census may have taken place every ten years, and market research data have taken weeks to gather and analyse, now research can be carried out and data gathered in near real time. The internet brings together a wide range of data in the same place, from vast quantities of unstructured documents to highly structured data adhering to agreed international standards. The three Vs of big data have since been extended to include various other traits, including exhaustivity, resolution and indexicality, relationality, and extensionality and scalability (Kitchin and McArdle, 2016), although in practice big data sets rarely have all, or even most, of the traits.
  • Book cover image for: Big Data Architect's Handbook
    No longer available |Learn more

    Big Data Architect's Handbook

    A guide to building proficiency in tools and systems used by leading big data experts

    • Syed Muhammad Fahad Akhtar(Author)
    • 2018(Publication Date)
    • Packt Publishing
      (Publisher)
    The preceding chart shows the amount of time users are spending on the popular social networking websites. Imagine the frequency of data being generated based on these user activities. This is just a glimpse of what's happening out there.
    Another dimension of velocity is the period of time during which data will make sense and be valuable. Will it age and lose value over time, or will it be permanently valuable? This analysis is also very important because if the data ages and loses value over time, then maybe over time it will mislead you.
    Till now, we have discussed two characteristics of big data. The third one is variety. Let's explore it now. Passage contains an image

    Variety

    In this section, we study the classification of data. It can be structured or unstructured data. Structured data is preferred for information that has a predefined schema or that has a data model with predefined columns, data types, and so on, whereas unstructured data doesn't have any of these characteristics. These include a long list of data such, as documents, emails, social media text messages, videos, still images, audio, graphs, the output from all types of machine-generated data from sensors, devices, RFID tags, machine logs, and cell phone GPS signals, and more. We will learn more details about structured and unstructured data in separate chapters in this book:
    Variety of data
    Let's take an example; 30 billion pieces of content are shared on Facebook each month. 400 million Tweets are sent per day. 4 billion hours of videos are watched on YouTube every month. These are all examples of unstructured data being generated that needs to be processed, either for a better user experience or to generate revenue for the companies itself.
    The fourth characteristic of big data is veracity. It's time to find out all about it.
    Passage contains an image

    Veracity

    This vector deals with the uncertainty of data. It may be because of poor data quality or because of the noise in data. It's human behavior that we don't trust the information provided. This is one of the reasons that one in three business leaders don't trust the information they use for making decisions.
  • Book cover image for: Big Data Analytics - Methods and Applications
    • Jovan Pehcevski(Author)
    • 2019(Publication Date)
    • Arcler Press
      (Publisher)
    Business is generating enormous quantities of data that are too big to be processed and analyzed by the traditional RDBMSs and DWs technologies, which are struggling to meet the performance and scalability requirements. Therefore, in the recent years, a new approach that aims to mitigate these limitations has emerged. Companies like Facebook, Google, Yahoo and Amazon are the pioneers in creating solutions to deal with these “Big Data Modeling and Data Analytics: A Survey from a Big Data Perspective 49 Data” scenarios, namely recurring to technologies like Hadoop [3] [4] and MapReduce [5] . Big Data is a generic term used to refer to massive and complex datasets, which are made of a variety of data structures (structured, semi- structured and unstructured data) from a multitude of sources [6] . Big Data can be characterized by three Vs: volume (amount of data), velocity (speed of data in and out) and variety (kinds of data types and sources) [7] . Still, there are added some other Vs for variability, veracity and value [8] . Adopting Big Data-based technologies not only mitigates the problems presented above, but also opens new perspectives that allow extracting value from Big Data. Big Data-based technologies are being applied with success in multiple scenarios [1] [9] [10] like in: (1) e-commerce and marketing, where count the clicks that the crowds do on the web allow identifying trends that improve campaigns, evaluate personal profiles of a user, so that the content shown is the one he will most likely enjoy; (2) government and public health, allowing the detection and tracking of disease outbreaks via social media or detect frauds; (3) transportation, industry and surveillance, with real-time improved estimated times of arrival and smart use of resources. This paper provides a broad view of the current state of this area based on two dimensions or perspectives: Data Modeling and Data Analytics.
  • Book cover image for: Big Data For Dummies
    • Judith S. Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman(Authors)
    • 2013(Publication Date)
    • For Dummies
      (Publisher)
    Companies have always had to deal with lots of data in lots of forms. The change that big data brings is what you can do with that information. If you have the right technology in place, you can use big data to anticipate and solve business problems and react to opportunities. With big data, you can analyze data patterns to change everything, from the way you manage cities, prevent failures, conduct experiments, manage traffic, improve customer satisfaction, or enhance product quality, just to name a few examples. The emerging technologies and tools that are the heart of this book can help you understand and unleash the tremendous power of big data, changing the world as we know it.
    Passage contains an image
    Chapter 2 Examining Big Data Types In This Chapter
    Identifying structured and unstructured data
    Recognizing real-time and non-real-time requirements for data types
    Integrating data types into a big data environment
    V ariety is the spice of life, and variety is one of the principles of big data. In Chapter 1 , we discuss the importance of being able to manage the variety of data types. Clearly, big data encompasses everything from dollar transactions to tweets to images to audio. Therefore, taking advantage of big data requires that all this information be integrated for analysis and data management. Doing this type of activity is harder than it sounds. In this chapter, we examine the two main types of data that make up big data — structured and unstructured — and provide you with definitions and examples of each.
    Although data management has been around for a long time, two factors are new in the big data world:
    Some sources of big data are actually new like the data generated from sensors, smartphone, and tablets.
    Previously produced data hadn’t been captured or stored and analyzed in a usable way. The main reason for this is that the technology wasn’t there to do so. In other words, we didn’t have a cost-effective way to deal with all that data.
    You have many different ways to put big data to use to solve problems. For example, in some situations, you want to deal with data in real time, such as when you’re monitoring traffic data. In other situations, real-time data management won’t be necessary, such as when you’re collecting massive amounts of data that you want to analyze in batch mode to determine an unsuspected pattern. Likewise, you sometimes need to integrate multiple sources of data as part of a big data solution, so we look at why you might want to integrate data sources. The bottom line is that what you want to do with your structured and unstructured data informs the technology purchases that you make.
  • Book cover image for: Data Science for Business and Decision Making: An Introductory Text for Students and Practitioners
    Figure 1.4. Data mining and big data. Source: MDPI. There are many sources of data and their output is rarely identical (Kirchweger et al., 2015). For example, a web server log is going to give you a different set of information categories than a social media post; yet both of them may be important for decision-making (Gilks, 2016). Big data involves an element of velocity where the speed of processing and disseminating data is increasing exponentially (Lewis, 1996). Veracity is a more recent concern about big data because of the democratization of the internet (Mieczakowski Introduction to Data Science 23 et al., 2011). Anyone can start disseminating data, but it is another matter to consider whether that data is, actually true (Miller, 2014). In the age of “fake news,” the consequences of not checking veracity can be catastrophic (Mieczakowski et al., 2011). Finally, it is anticipated that big data must add value to the decision-making process (Abu-Saifan, 2012). It is important to remember that big data is not always a burden to business (Ellison, 2004). Indeed, with the right analysis; this data can be used to improve decision-making and the quality of the implementation process (Gilks, 2016). The most effective businesses will use big data in their strategy moves in order to maximize their competitive advantages and minimize their vulnerabilities (Lyytinen et al., 2016). Because of the popularity of the term, there has been some confusion about the level at which data becomes big data (Berker et al., 2006). Existing literature often emphasizes the sheer volume of the data, more so than its complexity or diversity (Helmreich, 2000). For example, anything that is larger than 1Tb is known as big data (Little, 2002). Moreover, these calculations are sometimes based on the informed estimates of the predicted per capital data (McFarlane, 2010). The latest estimates are that by 2020, per capital data will be 5200 Gbs (Bansal, 2013).
  • Book cover image for: Engaging Customers Using Big Data
    eBook - PDF

    Engaging Customers Using Big Data

    How Marketing Analytics Are Transforming Business

    Each component leverages Hadoop’s MapReduce for parallelism; however, this elevates the skill level required for building applications. To make the environment more user-friendly, big data vendors are introducing a series of tools, such as Big Sheets from IBM, that help visualize the unstructured data. HIGH-VARIETY DATA ANALYSIS Blackberry faced a serious outage when its email servers were down for more than a day. I tried powering my Blackberry off and on because I was not sure whether it was my device or the CSP. It never occurred to me that the outage could be at the Blackberry server itself. When 144 ENGAGING CUSTOMERS USING BIG DATA I called the CSP, they were not aware of the problem. So I turned to one obvious source: Twitter. Sure enough, I found information about the Blackberry outage on Twitter. One of my clients told me that his vice president of customer service is always glued to Twitter looking for customer service problems. Often, someone discovers the problem on Twitter before the internal monitoring organization does. We found that a large number of junior staffers employed by marketing, customer service, and public relations search through social media for relevant information. Traditional analytics has been focused primarily on structured data. Big data, however, is primarily unstructured, so we now have two combinations available. We can perform quantitative analysis on struc- tured data as before. We can extract structure out of unstructured data and perform quantitative analysis on the extract quantifications. Last, but not least, there is a fair amount of nonquantitative analysis now available for unstructured data. I would like to explore a couple of tech- niques rapidly becoming popular with the vast amount of unstructured data and look at how these techniques are becoming mainstream with their powerful capabilities for organizing, categorizing, and analyzing big data.
  • Book cover image for: Big Data Computing
    Every organization begins this voyage by examining the data gen-erated from its automation systems: enterprise resource planning, customer relationship management, time and attendance, e-commerce, warranty management, and the like. Data warehouses, data mining, and database technologies have existed in various forms for years. Big Data as a term might be new, but many IT professionals have worked with large amounts of data in various industries for years. However, now Big Data is not just about large amounts of data. Digging and analyzing semistructured and unstructured data is new. A decade ago, we did not analyze email mes-sages, PDF files, or videos. The Internet was just a new trend; distributed computing was not created yesterday, but being able to distribute and scale out a system in a flash is new. Similarly, wanting to predict the future is not a new concept, but being able to access and store all the data that are created is new. Many enterprises have multiple databases and multiple database vendors, with terabytes or even petabytes of data. Some of these systems accumulated data over several years. Many enterprises build entire data warehouse and analytic platforms off this old data. For data to be useful to users, they must integrate customers with finance and sales data, with product data, with marketing data, with social media, with demographic data, with competi-tors’ data, and more. After decades of channelizing data collection, structures, storage, access, and retrieval, a value chain has emerged. The value chain connects Big Data and Analytics through the convergence of complexity and diversity shown in Figure 12.1. Descriptive Analytics Several businesses start with descriptive analytics to analyze business performance.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.