Computer Science

Big Data Volume

Big Data Volume refers to the vast amount of data generated and collected by organizations, often exceeding the capacity of traditional data processing systems. This includes structured and unstructured data from various sources such as social media, sensors, and business transactions. Managing and analyzing big data volumes requires specialized tools and technologies to derive valuable insights and make informed decisions.

Written by Perlego with AI-assistance

11 Key excerpts on "Big Data Volume"

  • Book cover image for: Guide to Cloud Computing for Business and Technology Managers
    eBook - PDF

    Guide to Cloud Computing for Business and Technology Managers

    From Distributed Computing to Cloudware Applications

    The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of big data computing applications. This chapter explores the challenges of big data computing. 21.1.1 What Is Big Data? Big data can be defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambi-guity, which cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. Data defined as big data include weather; geospatial and GIS data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance, and human-resources depart-ment; and device-generated data from sensor networks, nuclear plants, x-ray and scanning devices, and airplane engines. 21.1.1.1 Data Volume The most interesting data for any organization to tap into today are social media data. The amount of data generated by consumers every minute pro-vides extremely important insights into choices, opinions, influences, con-nections, brand loyalty, brand management, and much more. Social media sites provide not only consumer perspectives but also competitive posi-tioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize market-ing of products and services to each customer. Every enterprise has massive amounts of e-mails that are generated by its employees, customers, and executives on a daily basis. These e-mails are all considered an asset of the corporation and need to be managed as such. After Enron and the collapse of many audits in enterprises, the US government mandated that all enterprises should have a clear life-cycle management of e-mails and that e-mails should be available and auditable on a case-by-case basis.
  • Book cover image for: Big Data Computing
    eBook - PDF

    Big Data Computing

    A Guide for Business and Technology Managers

    The answer to these challenges is a scalable, integrated 208 Big Data Computing computer systems hardware and software architecture designed for parallel processing of big data computing applications. This chapter explores the challenges of big data computing. 9.1.1 What Is Big Data? Big data can be defined as volumes of data available in varying degrees of complexity, gen- erated at different velocities and varying degrees of ambiguity that cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off- the-shelf solutions. Data defined as big data includes weather, geospatial, and geographic information sys- tem (GIS) data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance and human-resources department; and device-generated data from sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines (Figures 9.1 and 9.2). 9.1.1.1 Data Volume The most interesting data for any organization to tap into today is social media data. The amount of data generated by consumers every minute provides extremely important insights into choices, opinions, influences, connections, brand loyalty, brand manage- ment, and much more. Social media sites not only provide consumer perspectives but also competitive positioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize marketing of products and services to each customer. Data variety Torrent Batch Trickle Stream Structure data GB MB PB TB Photo Video Audio HTML Free text Data volume Data velocity Data veracity Certain Confirmed Complete Consistent Clear Correct FIGURE 9.1 4V characteristics of big data. 209 Introducing Big Data Computing Many additional applications are being developed and are slowly becoming a reality.
  • Book cover image for: Networking for Big Data
    • Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen, Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen(Authors)
    • 2015(Publication Date)
    10–21, 2011. 41. K. Michael and K. Miller, Big data: New opportunities and new challenges, Computer (Long. Beach. Calif) ., vol. 46, no. 6, 2013, pp. 22–24. 42. NoSQL Database Technology, Report, 2014. This page intentionally left blank This page intentionally left blank 57 C H A P T E R 4 Big Data Distributed Systems Management Rashid A. Saeed and Elmustafa Sayed Ali B ig data deals with large scales of data characterized by three concepts: volume, variety, and velocity known as the 3Vs of Big Data. Volume is a term related to Big Data, and as known data can be organized in sizes by gigabytes or terabytes of data stor-age but Big Data means there are a lot of data amounting to more than terabytes such as petabytes or exabytes and it is one of the challenges of Big Data that it requires a scalable storage. Really, data volume will continue to grow every day, regardless of the organized sizes because of the natural tendency of companies to store all types of data such as finan-cial data, medical data, environmental data, and so on. Many of these companies’ data-sets are within the terabytes range today, but soon they could reach petabytes or even CONTENTS Big Data Challenges 58 Big Data Management Systems 60 Distributed File System 61 Nonstructural and Semistructured Data Storage 61 Big Data Analytics 62 Data Mining 62 Image and Speech Data Recognition 62 Social Network Data Analysis 63 Data Fusion and Integration 64 Management of Big Data Distributed Systems 64 Hadoop Technologies 65 Hadoop Distributed File System (HDFS) 65 Hadoop MapReduce 66 NoSQL Database Management System (NoSQL DBMS) 66 Software as a Service (SaaS)–Based Business Analytics 68 Master Data Management 68 Conclusion 69 References 69 58 ◾ Networking for Big Data exabyte and more. Variety of Big Data is an aggregation of many types of data and maybe structured or unstructured including social media, multimedia, web server logs, and many other types of information forms.
  • Book cover image for: Practical Data Science for Information Professionals
    Despite the rapid growth in the popularity of big data, pinning down what is meant by the term is more difficult. Although it was initially used to refer to situations where the volume of data had grown so large that it no longer fitted into a computer’s processing memory (Mayer-Schönberger and Cukier, 2013), as computers have become more powerful (Moore’s law) and for a long time memory got much cheaper (Kryder’s law) (Rosenthal, 2017), such usage is crude as these situations are moveable targets: what is considered big data one day may not be big data the next. Rather than considering the absolute size of big data, the challenge is to analyse it for meaningful insights. Laney (2001) identified three growing data management challenges, and these have since been widely seen as traits of big data: its volume, velocity and variety. There are now enormous volumes of data available for analysis about ever more specific areas of our lives. This data is increasingly up to date. Whereas a census may have taken place every ten years, and market research data have taken weeks to gather and analyse, now research can be carried out and data gathered in near real time. The internet brings together a wide range of data in the same place, from vast quantities of unstructured documents to highly structured data adhering to agreed international standards. The three Vs of big data have since been extended to include various other traits, including exhaustivity, resolution and indexicality, relationality, and extensionality and scalability (Kitchin and McArdle, 2016), although in practice big data sets rarely have all, or even most, of the traits.
  • Book cover image for: Big Data, Big Analytics
    eBook - PDF

    Big Data, Big Analytics

    Emerging Business Intelligence and Analytic Trends for Today's Businesses

    • Michael Minelli, Michele Chambers, Ambiga Dhiraj(Authors)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    If you’re a midmarket consumer packaged goods (CPG) company, you might consider 10 terabytes as Big Data. But if you’re a multinational phar- maceutical corporation, then you would probably consider 500 terabytes as Big Data. If you’re a three-letter government agency, anything less than a petabyte is considered small. The industry has an evolving definition around Big Data that is currently defined by three dimensions: 1. Volume 2. Variety 3. Velocity These are reasonable dimensions to quantify Big Data and take into account the typical measures around volume and variety plus introduce the velocity dimension, which is a key compounding factor. Let’s explore each of these dimensions further. Data volume can be measured by the sheer quantity of transactions, events, or amount of history that creates the data volume, but the volume is often further exacerbated by the attributes, dimensions, or predictive vari- ables. Typically, analytics have used smaller data sets called samples to create predictive models. Oftentimes, the business use case or predictive insight has been severely blunted since the data volume has purposely been limited due to storage or computational processing constraints. It’s similar to seeing the iceberg that sits above the waterline but not seeing the huge iceberg that lies beneath the surface. By removing the data volume constraint and using larger data sets, enter- prises can discover subtle patterns that can lead to targeted actionable micro- decisions, or they can factor in more observations or variables into predictions that increase the accuracy of the predictive models. Additionally, by releasing the bonds on data, enterprises can look at data over a longer period of time to create more accurate forecasts that mirror real-world complexities of inter- related bits of information. 10 BIG DATA, BIG ANALYTICS Data variety is the assortment of data.
  • Book cover image for: Handbook of Big Data
    • Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan, Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan(Authors)
    • 2016(Publication Date)
    All of these factors suggest a kind of ubiquity of data, but also contain a functionally vague understanding, which is situationally determined, and because of that it can be deployed in many contexts, has many advocates, and can be claimed by many as well. Partly because of this context-sensitive definition of the concept of big data, it is by no means a time phenomenon or novelty, but has a long genealogy that goes back to the earliest civilizations. Some aspects of this phenomenon will be discussed in the following sections. In addition, we will show in this chapter how big data embody a conception of data science at least at two levels. First of all, data science is the technical-scientific discipline, specialized in managing the multitude of data: collect, store, access, analyze, visualize, interpret, and protect. It is rooted in computer science and statistics; computer science is traditionally oriented toward data structures, algorithms, and scalability, and statistics is focused on analyzing and interpreting the data. In particular, we may identify here the triptych database technology/information retrieval, computational intelligence/machine learning, and finally inferential statistics. The first pillar concerns database/information retrieval technology. Both are core disciplines of computer science since many decades. Emerging from this tradition in recent years, notably researchers of Google and Yahoo have been working on techniques to cluster many computers in a data center, making data accessible and allowing for data-intensive calculations: think, for example, of BigTable, Google File Systems, a programming paradigm as Map Reduce, and the open source variant Hadoop. The paper of Halevy precedes this development as well. The second pillar relates to intelligent algorithms from the field of computational intelligence (machine learning and data mining).
  • Book cover image for: Big Data
    eBook - PDF

    Big Data

    Storage, Sharing, and Security

    Storage and database management is a vast field with many decades of results from very talented scientists and researchers. There are numerous books, courses, and articles dedicated to the study. This chapter attempts to highlight some of these developments as they relate to the equally vast field of big data. However, it would be unfair to say that this chapter provides a comprehensive analysis of the field—such a study would require many volumes. It is our hope that this chapter can be used as a launching pad for researchers interested in the study. Where possible, we highlight important studies that can be pursued for further reading. In Section 2.2, we discuss the big data challenge as it relates to storage and database engines. The chapter goes on to discuss database utility compared to large parallel storage arrays. Then, the chapter discusses the history of database management systems with special emphasis on current and upcoming database technology trends. In order to provide readers with a deeper understanding of these technologies, the chapter will provides a deep dive into two canonical open source database technologies: Apache Accumulo [1], which is based on the popular Google BigTable design, and a NewSQL array database called SciDB [59]. Finally, we will provide insight into technology selection and walk readers through a case study which highlights the use of various database technologies to solve a medical big data problem. 2.2 Big Data Challenge Working with big data is prone to a variety of challenges. Very often, these challenges are referred to as the three Vs of big data: Volume, Velocity and Variety [45]. Most recently, there has been a new emergent challenge (perhaps a fourth V): Veracity. These combined challenges constitute a large reason why big data is so difficult to work with. Big Data Volume stresses the storage, memory, and computational capacity of a computing system and often requires access to a computing cloud.
  • Book cover image for: Big Data and Social Science
    • Sudha Menon, University of Kerala, India(Authors)
    • 2019(Publication Date)
    Big data appears to potentially handle the data irrespective of its source by consolidating all kind of information into a single system. The types and formats of social media can significantly differ. Rich media such as video files, audio recordings and images ingested together with the structured logs, text files, etc. Though more conventional systems Big Data and Social Science 6 of data processing might expect the data to get into the pipeline which is already categorized, formatted, and organized. Usually, big data systems accept and store data closer to its raw state. If possible, any changes or transformations to the raw data will take place in memory at the time of processing (Figure 1.3). Figure 1.3: 3Vs model of big data. Source: https://www.theviable.co/how-big-data-impact-to-corporate/3v-model-of-big-data/. 1.4. WHAT KIND OF DATASETS ARE CONSIDERED BIG DATA? The applications of big data are very diverse, similar to their size. Significant examples comprise: networks of social media, analyzing the data of their members to learn further about them and link them with advertising as well as content related to their interests, or search engines considering the relationships between queries and results to provide best answers to the questions of users. Two of the key sources of data in the large quantities are transactional data and sensor data. The transactional data includes everything beginning from stock prices to bank data and to the purchase histories of individual merchants. On the other hand, sensor data comes from the Internet of Things (IoT). Introduction to Big Data 7 The sensor data might be anything starting from robots on the manufacturing line of an automaker to the location data on a cellphone network, to a prompt electrical usage data in businesses and homes, to passenger boarding information taken on a transit system.
  • Book cover image for: Signal Processing and Networking for Big Data Applications
    Part I Overview of Big Data Applications 1 Introduction 1.1 Background Today, scientists, engineers, educators, citizens, and decision-makers have unprece- dented amounts and types of data available to them. Data come from many disparate sources, including scientific instruments, medical devices, telescopes, microscopes, satellites; digital media including text, video, audio, e-mail, weblogs, twitter feeds, image collections, click streams, and financial transactions; dynamic sensor, social, and other types of networks; scientific simulations, models, and surveys; or computational analysis of observational data. Data can be temporal, spatial, or dynamic; structured or unstructured. Information and knowledge derived from data can differ in repre- sentation, complexity, granularity, context, provenance, reliability, trustworthiness, and scope. Data can also differ in the rate at which they are generated and accessed. The phrase “big data” refers to the kinds of data that challenge existing analytical methods due to size, complexity, or rate of availability. The challenges in managing and analyzing “big data” can require fundamentally new techniques and technologies in order to handle the size, complexity, or rate of avail- ability of these data. At the same time, the advent of big data offers unprecedented opportunities for data-driven discovery and decision-making in virtually every area of human endeavor. A key example of this is the scientific discovery process, which is a cycle involving data analysis, hypothesis generation, the design and execution of new experiments, hypothesis testing, and theory refinement. Realizing the transformative potential of big data requires addressing many challenges in the management of data and knowledge, computational methods for data analysis, and automating many aspects of data-enabled discovery processes.
  • Book cover image for: Big Data and Business Analytics
    All of this amounts to a lot of data: very big data. And Google ™ is betting on using its powerful cloud computing to perform, the same infrastructure that powers its successful web search engine, on the semantic and natural language data analytics domain to improve healthcare. A user-friendly dashboard will be ubiquitously accessed and displayed via Google Glass. Big Data: Structured and Unstructured • 255 S o w h a t a r e t h e f e a t u r e s t h a t w o u l d b e c o m m o n t o language understanding? Our viewpoint is that they possess the following characteristics: 1. Seamless User Interfaces— The application of advanced speech recogni-tion and natural language processing for converting the unstructured human communications into machine-understandable information. 2. A Diversity of Technologies— The use of multiple forms of state-of- the-art information organization and indexing, computing lan-guages and models for AI * , as well as various kinds of retrieval and processing methods. 3. New Data Storage Technologies— Software such as Not Only SQL (NoSQL) enables efficient and also interoperable forms of knowledge representations to be stored so that it can be utilized with various kinds of reasoning methods. 4. Reasoning and Learning Artificial Intelligence— The integration of artificial intelligence techniques so that the machine can learn from its own mistakes and build that learned knowledge into its knowl-edge stores for future applications of its own reasoning processes. 5. Model Driven Architectures (MDA)— The use of advanced frameworks depends on a diversified and large base of models, which themselves depend on the production of interoperable ontologies. These make it possible to engineer a complex system of heterogeneous components for open-domain, real-time, real-world interaction with humans, in a way that is comfortable and fits within colloquial human language use. The common theme in all of this: The key to big data is small data .
  • Book cover image for: Social Big Data Mining
    • Hiroshi Ishikawa(Author)
    • 2015(Publication Date)
    • CRC Press
      (Publisher)
    • The kinds (Variety) of data have expanded into unstructured texts, semi-structured data such as XML, and graphs (i.e., networks). • As is often the case with Twitter and sensor data streams, the speed (Velocity) at which data are generated is very high. Figure 2.1 Data deluge. 18 Social Big Data Mining Therefore, big data are often characterized as V 3 by taking the initial letters of these three terms Volume, Variety, and Velocity. Big data are expected to create not only knowledge in science but also values in various businesses. By variety, the author of this book means that big data appear in a wide variety of applications. Big data inherently contain “vagueness” such as inconsistency and deficiency. Such vagueness must be resolved in order to obtain quality analysis results. Moreover, a recent survey done in Japan has made it clear that a lot of users have “vague” concerns as to the securities and mechanisms of big data applications. The resolution of such concerns are one of keys to successful diffusion of big data applications. In this sense, V 4 should be used for the characteristics of big data, instead of V 3 . Social media data are a kind of big data that satisfy these V 4 characteristics as follows: First, sizes of social media are very large, as described in chapter one. Second, tweets consist mainly of texts, Wiki media consist of XML (semi-structured data), and Facebook articles contain photos and movies in addition to texts. Third, the relationships between users of social media, such as Twitter and Facebook, constitute large-scale graphs (networks). Furthermore, the speed of production of tweets is very fast. Social data can also be used in combination with various kinds of big data though they inherently contain contradictions and deficits. As social data include information about individuals, sufficient privacy protection and security management are mandatory.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.