Computer Science

Big Data

Big Data refers to extremely large and complex datasets that traditional data processing applications are unable to handle. It encompasses the collection, storage, and analysis of vast amounts of information to extract valuable insights and make data-driven decisions. Big Data technologies and techniques are essential for managing the volume, velocity, and variety of data in today's digital world.

Written by Perlego with AI-assistance

7 Key excerpts on "Big Data"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Creating Smart Enterprises
    eBook - ePub

    Creating Smart Enterprises

    Leveraging Cloud, Big Data, Web, Social Media, Mobile and IoT Technologies

    ...The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of Big Data computing applications. This chapter explores the challenges of Big Data computing. 7.1.1 What Is Big Data? Big Data can be defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambiguity that cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. Data defined as Big Data includes weather, geospatial, and geographic information system (GIS) data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance and human-resources departments; and device-generated data from sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines (Figures 7.1 and 7.2). Figure 7.1 4V characteristics of Big Data. Figure 7.2 Use cases for Big Data computing. 7.1.1.1 Data Volume The most interesting data for any organization to tap into today is social media data. The amount of data generated by consumers every minute provides extremely important insights into choices, opinions, influences, connections, brand loyalty, brand management, and much more. Social media sites not only provide consumer perspectives but also competitive positioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize marketing of products and services to each customer. Many additional applications are being developed and are slowly becoming a reality...

  • Big Data Analytics
    eBook - ePub

    Big Data Analytics

    Turning Big Data into Big Money

    • Frank J. Ohlhorst(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)

    ...Chapter 1 What Is Big Data? What exactly is Big Data ? At first glance, the term seems rather vague, referring to something that is large and full of information. That description does indeed fit the bill, yet it provides no information on what Big Data really is. Big Data is often described as extremely large data sets that have grown beyond the ability to manage and analyze them with traditional data processing tools. Searching the Web for clues reveals an almost universal definition, shared by the majority of those promoting the ideology of Big Data, that can be condensed into something like this: Big Data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can no longer effectively handle either the size of the data set or the scale and growth of the data set. In other words, the data set has grown so large that it is difficult to manage and even harder to garner value out of it. The primary difficulties are the acquisition, storage, searching, sharing, analytics, and visualization of data. There is much more to be said about what Big Data actually is. The concept has evolved to include not only the size of the data set but also the processes involved in leveraging the data. Big Data has even become synonymous with other business concepts, such as business intelligence, analytics, and data mining. Paradoxically, Big Data is not that new. Although massive data sets have been created in just the last two years, Big Data has its roots in the scientific and medical communities, where the complex analysis of massive amounts of data has been done for drug development, physics modeling, and other forms of research, all of which involve large data sets...

  • Digital Transformation
    eBook - ePub

    Digital Transformation

    Survive and Thrive in an Era of Mass Extinction

    • Thomas M. Siebel(Author)
    • 2019(Publication Date)
    • Rodin Books
      (Publisher)

    ...To expand capacity, you add more CPUs, memory, and connectivity, thereby ensuring performance does not dip as you scale. The result is a vastly more flexible and less costly approach than scale-up architectures and is ideally suited to handle Big Data. Software technologies designed to leverage scale-out architectures and process Big Data emerged and evolved, including MapReduce and Hadoop. Big Data as a term first appeared in an October 1997 paper by NASA researchers Michael Cox and David Ellsworth, published in the Proceedings of the IEEE 8 th Conference on Visualization. The authors wrote: “Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of Big Data.” 13 By 2013, the term had achieved such widespread circulation that the Oxford English Dictionary confirmed its cultural adoption, including it in that year’s edition of the OED. In 2001, Doug Laney—then an analyst at META Group—described three main traits that characterize Big Data: volume (the size of the data set, as measured in bytes, gigabytes, exabytes, or more); velocity (the speed of data arrival or change, as measured in bytes per second or messages per second or new data fields created per day); and variety (including its shape, form, storage means, and interpretation mechanisms). 14 Size, Speed, and Shape Big Data continues to evolve and grow along all three of these dimensions—size, speed, and shape. It’s important for senior executives—not just the technologists and data scientists in the organization—to understand how each of these dimensions adds value as a business asset. Size. The amount of data generated worldwide has increased exponentially over the last 25 years, from about 2.5 terabytes (2.5 × 10 1 2 bytes) a day in 1997 to 2.5 exabytes (2.5 × 10 1 8 bytes) in 2018—and will continue to do so into the foreseeable future...

  • Innovating Analytics
    eBook - ePub

    Innovating Analytics

    How the Next Generation of Net Promoter Can Increase Sales and Drive Business Results

    • Larry Freed(Author)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...Remember microfiche? Remember stacks of old, yellowing newspapers and magazines in libraries? No more. I bet my sons have never even used microfiche. With the amount of digital data doubling every three years, as of 2013 less than 2 percent of all stored information is nondigital. An extraordinary change. So what is a workable definition of Big Data? For me, it is the explosion of structured and unstructured data about people caused by the digitization and networking of everything: computers, smartphones, GPS devices, embedded microprocessors, and sensors, all connected by the mobile Internet that is generating data about people at an exponential rate. Big Data is driven by the three Vs: an increasing Volume of data with a wide range of Variety and gathered and processed at a higher Velocity. Big Data Volume The increase in volume provides us a bigger set of data to manipulate. This provides higher accuracy, a lower margin of error, and the ability to analyze the data into many more discrete segments. As entrepreneur and former director of the MIT Media Lab Frank Moss explains in an interview on MSN 1 : Every time we perform a search, tweet, send an e-mail, post a blog, comment on one, use a cell phone, shop online, update our profile on a social networking site, use a credit card, or even go to the gym, we leave behind a mountain of data, a digital footprint, that provides a treasure trove of information about our lifestyles, financial activities, health habits, social interactions, and much more. He adds that this trend has been “accelerated by the spectacular success of social networks like Facebook, Twitter, Foursquare, and LinkedIn and video- or picture-sharing services like YouTube and Flickr. When acting together, these services generate exponential rates of growth of data about people in astonishingly short periods of time.” More statistics show the scope of Big Data...

  • Big Data Mining and Complexity

    ...For example, in the Forbes article we mentioned earlier about how Big Data is changing the airline industry, its author explained that ‘today, through thousands of sensors and sophisticated digitised systems, the newest generation of jets collects exponentially more, with each flight generating more than 30 times the amount of data the previous generation of wide-bodied jets produced. . . . By 2026, annual data generation should reach 98 billion gigabytes, or 98 million terabytes, according to a 2016 estimate by Oliver Wyman.’ 3 Variety: Big Data today is also generated through a wide array of types and formats: structured and unstructured, relational, transactional, longitudinal, discrete, dynamic, visual, textual, numeric, audio, geospatial, physical, ecological, biological, psychological, social, economic, cultural, political and so on and so forth. Velocity: In our Big Data world, the issue is not just the speed at which massive amounts of data are being generated but also the speed at which they often need to be acquired and processed. Also, there is a significant amount of Big Data that remains important for very short moments of time: for example, delayed flight schedules, ticket price fluctuations or sudden interruptions in travel that an airport has to respond to quickly. And then there are the complex ways this increased data velocity, in turn, speeds up the decision-making process – forcing decisions, often times, into a matter on nanoseconds rather than days, weeks or months; all of these present major challenges to the hardware and software of companies and users – not to mention the ‘knockoff’ effects on social life that come from this increased speed in decision-making. Variability: While the velocity and volume of Big Data appear constant, in actuality they are rather variable, with inconsistencies in their flow, as in the case of a sudden Twitter trend or online searches in response to a disease outbreak...

  • It's All Analytics!
    eBook - ePub

    It's All Analytics!

    The Foundations of Al, Big Data and Data Science Landscape for Professionals in Healthcare, Business, and Government

    ...Examples might be color, breed of dog, state of residence and phone brand. Quantitative data can be measured on numeric scales such as the number of readmissions per year, per member per month (PMPM) insurance rates, Gross Domestic Product (GDP) and revenue per year. What Is Big Data? There are various reports of who officially coined the term “Big Data” and of where it actually started. Part of the confusion revolves around the question, “Is Big Data a descriptive term or a technology?” We cover both in this section. We like the following as a descriptive term: Big Data – a massive volume of data that is so large it is difficult to process using traditional technology (as of about 2005). In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity. From a technology basis, the following are some (there are others) of the technologies created to support Big Data: Data Lakes High-Performance Relational Database Technologies (Massive Parallel Processing (MPP)) Hadoop, HDFS, and MapReduce (see following gray box, “Quick Note on Apache, Hadoop and Spark”) Data Hubs Cloud Data Warehouses Data Virtualization (DV) These technologies are sometimes defined as Big Data, but they support Big Data rather than describing what Big Data is. Additionally, the description of Big Data may include the 3 V’s or 5 V’s. Initially, there were three: 1) Data Volume – the sheer amount of data 2) Data Variety – disparate types, different structures, and formats of data 3) Data Velocity – how fast data is being added to systems, refreshed Then two more qualities were added to make it the 5 V’s of Big Data. 4) Value – What is the return on investment for sourcing this data? 5) Veracity – What is the quality, reliability, and trustworthiness of the data? Quick Note on Apache, Hadoop and Spark The Apache Software Foundation (www.apache.org) was incorporated in 1999 as an American nonprofit corporation...

  • Big Data, Big Analytics
    eBook - ePub

    Big Data, Big Analytics

    Emerging Business Intelligence and Analytic Trends for Today's Businesses

    • Michael Minelli, Michele Chambers, Ambiga Dhiraj(Authors)
    • 2012(Publication Date)
    • Wiley
      (Publisher)

    ...It is proficient at parsing data. Part of the Apache Hadoop project. Batch A job or process that runs in the background without human interaction. Big Data The de facto standard definition of Big Data is data that goes beyond the traditional limits of data along three dimensions: volume, variety, velocity. The combination of these three dimensions makes the data more complex to ingest, process, and visualize. Big Insights IBM’s commercial distribution of Hadoop with enterprise class value added components. Cassandra An open-source columnar database managed by the Apache Software Foundation. Clojure Pronounced “closure.” A dynamic programming language based on LISP (which was the de facto artificial programming language from late 1950s). Typically used for parallel data processing. Cloud General term used to refer to any computing resources—software, hardware or service—that is delivered as a service over a network. Cloudera The first commercial distributor of Hadoop. Cloudera provides enterprise-class value-added components with the Hadoop distribution. Columnar Database The storing and optimizing of data by columns...