Computer Science

Big Data Technologies

Big Data Technologies refer to the tools, frameworks, and platforms used to process, analyze, and extract insights from large and complex data sets. These technologies often include distributed storage systems like Hadoop, data processing frameworks like Spark, and NoSQL databases. They enable organizations to handle massive volumes of data and derive valuable information for decision-making and business intelligence.

Written by Perlego with AI-assistance

12 Key excerpts on "Big Data Technologies"

  • Book cover image for: Big Data Computing
    eBook - PDF

    Big Data Computing

    A Guide for Business and Technology Managers

    The answer to these challenges is a scalable, integrated 208 Big Data Computing computer systems hardware and software architecture designed for parallel processing of big data computing applications. This chapter explores the challenges of big data computing. 9.1.1 What Is Big Data? Big data can be defined as volumes of data available in varying degrees of complexity, gen- erated at different velocities and varying degrees of ambiguity that cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off- the-shelf solutions. Data defined as big data includes weather, geospatial, and geographic information sys- tem (GIS) data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance and human-resources department; and device-generated data from sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines (Figures 9.1 and 9.2). 9.1.1.1 Data Volume The most interesting data for any organization to tap into today is social media data. The amount of data generated by consumers every minute provides extremely important insights into choices, opinions, influences, connections, brand loyalty, brand manage- ment, and much more. Social media sites not only provide consumer perspectives but also competitive positioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize marketing of products and services to each customer. Data variety Torrent Batch Trickle Stream Structure data GB MB PB TB Photo Video Audio HTML Free text Data volume Data velocity Data veracity Certain Confirmed Complete Consistent Clear Correct FIGURE 9.1 4V characteristics of big data. 209 Introducing Big Data Computing Many additional applications are being developed and are slowly becoming a reality.
  • Book cover image for: Software Architecture for Big Data and the Cloud
    • Ivan Mistrik, Rami Bahsoon, Nour Ali, Maritta Heisel, Bruce Maxim(Authors)
    • 2017(Publication Date)
    • Morgan Kaufmann
      (Publisher)
    Chapter 14

    Exploring the Evolution of Big Data Technologies

    Stephen Bonner; Ibad Kureshi; John Brennan; Georgios Theodoropoulos    Institute of Advanced Research Computing, Durham University, United Kingdom

    Abstract

    This chapter explores the rise of “big data” and the computational strategies, both hardware and software, that have evolved to deal with this paradigm. Starting with the concept of data-intensive computing, the different facets of data processing like Map/Reduce, Machine Learning, and Streaming data are explored. The evolution of different frameworks such as Hadoop and Spark are outlined and an assessment of the modular offerings within the frameworks is compared with a detailed analysis of the different functionalities and features. The hardware considerations required to move from compute-intensive to data-intensive are outlined along with the impact of cloud computing on big data. The chapter concludes with the upcoming developments in the near future for big data and how this computing paradigm fits into the road to exascale.

    Keywords

    Big Data Technologies; Hadoop ecosystem; Spark; Data locality; Data intensive frameworks; Stream computing; Machine learning; Resilient distributed dataset

    14.1 Introduction

    Since the adoption of cloud computing, big data has been increasing exponentially in popularity, both computer science and the wider world. This seemingly new paradigm of processing emerges on the heels of e-Commerce and the explosion of Internet-enabled digital devices that allow companies multiple channels and touch points to engage potential customers. The accepted definition of big data is the digital analysis of datasets to extract insights, correlations and causations, and value from data. Different groups have come up with different “Vs” to attempt to formalize the definition of the big aspect of this phenomenon. The 3 Vs definition of big data, by Doug Laney for Gartner, states that if it has Volume, Variety and Velocity then the data can be considered big [39] . Bernard Marr, in his book Big Data, adds Veracity (or validity) and Value to the original list to create the 5 V's of big data [50] . With Volatility and Variability, Visibility, and Visualization added in some combination to the list by different authors there is now a 7 Vs definition of what constitutes big data [3 ,46 ,54 ,60]
  • Book cover image for: Big Data at Work
    eBook - PDF

    Big Data at Work

    Dispelling the Myths, Uncovering the Opportunities

    5 Technology for Big Data Written with Jill Dyché A major component of what makes the management and analysis of big data possible is new technology.* In effect, big data is not just a large volume of unstructured data, but also the technologies that make processing and analyzing it pos-sible. Specific Big Data Technologies analyze textual, video, and audio content. When big data is fast moving, technologies like machine learning allow for the rapid creation of statistical models that fit, opti-mize, and predict the data. This chapter is devoted to all of these Big Data Technologies and the difference they make. The technologies addressed in the chapter are outlined in table 5-1. *I am indebted in this section to Jill Dyché, vice president of SAS Best Practices, who collaborated with me on this work and developed many of the frameworks in this sec-tion. Much of the content is taken from our report, Big Data in Big Companies (Inter-national Institute for Analytics, April 2013). 114 big data @ work If you are looking for hardcore detail about how big data technol-ogy works, you’ve come to the wrong place. My focus here is not on how Hadoop functions in detail, or whether Pig or Hive is the better scripting language (alas, such expertise is beyond my technological pay grade anyway). Instead, my focus will be on the overall technology architecture for big data and how it coexists with that for traditional data warehouses and analytics. No single business trend in the last decade has as much potential impact on incumbent IT investments as big data. Indeed, big data promises—or threatens, depending on how you view it—to upend legacy technologies within many companies. The way that data is stored and processed for analysis, and the hardware and software for doing so, are being transformed by the technology solutions that are tied to big data. Some of that technology is truly new with big data, and some has been around for a while but is being applied in different ways.
  • Book cover image for: Guide to Cloud Computing for Business and Technology Managers
    eBook - PDF

    Guide to Cloud Computing for Business and Technology Managers

    From Distributed Computing to Cloudware Applications

    The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of big data computing applications. This chapter explores the challenges of big data computing. 21.1.1 What Is Big Data? Big data can be defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambi-guity, which cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. Data defined as big data include weather; geospatial and GIS data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance, and human-resources depart-ment; and device-generated data from sensor networks, nuclear plants, x-ray and scanning devices, and airplane engines. 21.1.1.1 Data Volume The most interesting data for any organization to tap into today are social media data. The amount of data generated by consumers every minute pro-vides extremely important insights into choices, opinions, influences, con-nections, brand loyalty, brand management, and much more. Social media sites provide not only consumer perspectives but also competitive posi-tioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize market-ing of products and services to each customer. Every enterprise has massive amounts of e-mails that are generated by its employees, customers, and executives on a daily basis. These e-mails are all considered an asset of the corporation and need to be managed as such. After Enron and the collapse of many audits in enterprises, the US government mandated that all enterprises should have a clear life-cycle management of e-mails and that e-mails should be available and auditable on a case-by-case basis.
  • Book cover image for: Big Data and Social Science
    • Sudha Menon, University of Kerala, India(Authors)
    • 2019(Publication Date)
    The importance of big data in business and healthcare is also discussed together with the life cycle of big data. Inconsistencies of big data have been briefly explained. At the end of the chapter, certain technologies are described that are used in big data. 1.1. INTRODUCTION Big data is a comprehensive term which is used for the non-traditional strategies as well as technologies that are mainly utilized to collect, organize, and process insights from big datasets. The issue of working with data that goes beyond the power and storage of a computer is not a new one. But the scale, value, and pervasiveness of such type of computing have really expanded in the past few years (Figure 1.1). Figure 1.1: Big data are extremely large sets of data that can be computation-ally analyzed. Source: https://www.moneyobserver.com/our-analysis/six-stocks-and-funds-to-play-big-data-theme. In this chapter, we will mainly discuss about the notion of big data on a fundamental level along with some other general concepts related to big data. Introduction to Big Data 3 A precise definition of big data is quite difficult to pin down as vendors, business professionals and practitioners apply it in a very different manner. Taking this into consideration, big data is generally defined as: • Large data sets; • The class of computing technologies and strategies that are highly mainly used to handle these large datasets. In this outlook, ‘large dataset’ means a very large dataset to process or store on a single computer system by using conventional tools. This actually means that the fundamental scale of large datasets is frequently shifting and may significantly differ from one organization to another. In addition to that, the term ‘Big Data’ refers to all kind of data which is being generated all around the world at an exceptional rate.
  • Book cover image for: Big Data Computing
    Big Data Management Technologies Big Data represents data management and analytic solutions that could not previously be supported because of technology performance limitations, the high costs involved, or limited information. Big Data solutions allow organi-zations to build optimized systems that improve performance, reduce costs, and allow new types of data to be captured for analysis. Big Data involves two important data management technologies: • Analytic relational systems that are optimized for supporting complex analytic processing against both structured and multistructured data. These systems are evolving to support not only relational data, but also other types of data structures. These systems may be offered as software-only solutions or as custom hardware/software appliances. • Non-relational systems that are well suited to the processing of large amounts of multistructured data. There are many different types of nonrelational systems, including distributed file systems, document management systems, and database and analytic systems for han-dling complex data such as graph data. When combined, these Big Data Technologies can support the management and analysis of the many types of electronic data that exist in organizations, 383 Advanced Data Analytics for Business regardless of volume, variety, or volatility. They are used in concurrence with four advances in business analytics: 1. Latest and improved analytic techniques and algorithms that increase the sophistication of existing analytic models and results and allow the creation of new types of analytic applications. 2. Value-added data visualization techniques that make large volumes of data easier to explore and understand. 3. Analytics-powered business processes that enhance the pace of decision-making and enable close to real-time business agility. 4. Stream processing systems that filter and analyze data in action as it flows through IT systems and across IT networks.
  • Book cover image for: Data analysis and Information processing
    • Jovan Pehcevski(Author)
    • 2023(Publication Date)
    • Arcler Press
      (Publisher)
    OVERVIEW OF BIG DATA TECHNOLOGY Big data technology is a related technical means emerging with the development of big data era. It mainly involves big data platform, big data index system and other related technologies, and has been well applied in many fields. Big data refers to massive data, and the corresponding information data cannot be intuitively observed and used. It is faced with high difficulties in data information acquisition, storage, analysis and application, and inevitably shows strong application significance, and has become an important content that attracts more attention under the development of the current information age. From the point of view of big data itself, in addition to the obvious characteristics of large amount, it is often characterized by obvious diversity, rapidity, complexity and low value density. Therefore, it is inevitable to bring great difficulty to the application of these massive data, and it puts forward higher requirements for the application of big data technology, which needs to be paid high attention to (Ingrams, 2019). Application Research of Big Data Technology in Audit Field 203 Based on the development of the era of big data, the core is not to obtain massive data information, but how to conduct professional analysis and processing for these massive information, so as to play its due role and value. In this way, it is necessary to strengthen the research on big data technology, so that all fields can realize the optimization analysis and processing of massive data information with the assistance of big data technology, and meet the original application requirements. In terms of the development and application of current Big Data Technologies, data mining technology, massively parallel processing database, distributed database, extensible storage system and cloud computing technology are commonly used.
  • Book cover image for: Big Data, Big Analytics
    eBook - PDF

    Big Data, Big Analytics

    Emerging Business Intelligence and Analytic Trends for Today's Businesses

    • Michael Minelli, Michele Chambers, Ambiga Dhiraj(Authors)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    Today we can run the algorithm, look at the results, extract the results, and feed the business process—automatically and at massive scale, using all of the data available. BIG DATA TECHNOLOGY 65 We continue our conversation with Mehta later in the book. For the moment, let’s boil his observations down to three main points: 1. The technology stack has changed. New proprietary technologies and open-source inventions enable different approaches that make it easier and more affordable to store, manage, and analyze data. 2. Hardware and storage is affordable and continuing to get cheaper to enable massive parallel processing. 3. The variety of data is on the rise and the ability to handle unstruc- tured data is on the rise. Data Discovery: Work the Way People’s Minds Work There is a lot of buzz in the industry about data discovery, the term used to describe the new wave of business intelligence that enables users to explore data, make discoveries, and uncover insights in a dynamic and intuitive way versus predefined queries and preconfigured drill-down dashboards. This approach has resonated with many business users who are looking for the freedom and flexibility to view Big Data. In fact, there are two software companies that stand out in the crowd by growing their busi- nesses at unprecedented rates in this space: Tableau Software and QlikTech International. Both companies’ approach to the market is much different than the tra- ditional BI software vendor. They grew through a sales model that many refer to as “land and expand.” It basically works by getting intuitive software in the hands of some business users to get in the door and grow upward. In the past, BI players typically went for the big IT sale to be the preferred tool for IT to build reports for the business users to then come and use. In order to succeed at the BI game of the “land and expand model,” you need a product that is easy to use with lots of sexy output.
  • Book cover image for: Signal Processing and Networking for Big Data Applications
    Part I Overview of Big Data Applications 1 Introduction 1.1 Background Today, scientists, engineers, educators, citizens, and decision-makers have unprece- dented amounts and types of data available to them. Data come from many disparate sources, including scientific instruments, medical devices, telescopes, microscopes, satellites; digital media including text, video, audio, e-mail, weblogs, twitter feeds, image collections, click streams, and financial transactions; dynamic sensor, social, and other types of networks; scientific simulations, models, and surveys; or computational analysis of observational data. Data can be temporal, spatial, or dynamic; structured or unstructured. Information and knowledge derived from data can differ in repre- sentation, complexity, granularity, context, provenance, reliability, trustworthiness, and scope. Data can also differ in the rate at which they are generated and accessed. The phrase “big data” refers to the kinds of data that challenge existing analytical methods due to size, complexity, or rate of availability. The challenges in managing and analyzing “big data” can require fundamentally new techniques and technologies in order to handle the size, complexity, or rate of avail- ability of these data. At the same time, the advent of big data offers unprecedented opportunities for data-driven discovery and decision-making in virtually every area of human endeavor. A key example of this is the scientific discovery process, which is a cycle involving data analysis, hypothesis generation, the design and execution of new experiments, hypothesis testing, and theory refinement. Realizing the transformative potential of big data requires addressing many challenges in the management of data and knowledge, computational methods for data analysis, and automating many aspects of data-enabled discovery processes.
  • Book cover image for: Big Data
    eBook - ePub

    Big Data

    Concepts, Warehousing, and Analytics

    • Maribel Yasmina Santos, Carlos Costa(Authors)
    • 2022(Publication Date)
    • River Publishers
      (Publisher)
    Begoli and Horey (2012) complement these perspectives, stating that several analytical mechanisms should be included in Big Data solutions, ranging from statistical analysis to data mining and visualization. Moreover, processed data and insights can be made available using open and recognized standards, interfaces, and Web services. Regarding Big Data analytics, there is a vast set of available techniques that can be used to extract value from data. Data mining techniques, such as clustering, association rules, classification, and regression (Han, Pei, & Kamber, 2012), are still present in Big Data environments (Manyika et al., 2011), now with the challenge of distributing them to perform at scale (C. L. P. Chen & Zhang, 2014; Fan & Bifet, 2013). Achieving scalability in these techniques is what makes Big Data analytics different from traditional data analytics. The range of analytical mechanisms and the ambiguous terms to define them may lead to a completely new buzzword: data science. Techniques such as sentiment analysis, time series analysis/forecasting, spatial analysis, optimization, visualization, or unstructured analytics (e.g., text, audio, and video) (Gandomi & Haider, 2015), can all be part of a data scientist’s knowledge base (C. Costa & Santos, 2017b).
    2.4.1.2.  Architectural and Infrastructural Requirements
    The different steps required to process Big Data, presented above, must be performed in Big Data environments, following the requirements identified by Krishnan (2013):
    •   Absence of fixed data models to adequately accommodate the complexity and size of data, regardless of its characteristics;
    •   Scalable and high-performance systems to collect and process data either in real-time or in batches;
    •   The architecture should support data partitioning due to the volume of data;
  • Book cover image for: Big Data
    eBook - PDF

    Big Data

    Storage, Sharing, and Security

    15 16 Big Data: Storage, Sharing, and Security 2.5.8 Deep dive into NewSQL technology ...................................... 32 2.5.8.1 Data model ................................................. 32 2.5.8.2 Design ..................................................... 33 2.5.8.3 Performance ................................................ 33 2.6 How to Choose the Right Technology .............................................. 34 2.7 Case Study of DBMSs with Medical Big Data ..................................... 36 2.8 Conclusions ...................................................................... 37 Acknowledgments ........................................................................ 37 References ............................................................................... 37 2.1 Introduction The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity, and variety. While there has been great progress in the world of database technologies in the past few years, there are still many fundamental considerations that must be made by scientists. For example, which of the seemingly infinite technologies are the best to use for my problem? Answers to such questions require a careful understanding of the technology field in addition to the types of problems that are being solved. This chapter aims to address many of the pressing questions faced by individuals interested in using storage or database technologies to solve their big data problems. Storage and database management is a vast field with many decades of results from very talented scientists and researchers. There are numerous books, courses, and articles dedicated to the study. This chapter attempts to highlight some of these developments as they relate to the equally vast field of big data.
  • Book cover image for: Networking for Big Data
    • Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen, Shui Yu, Xiaodong Lin, Jelena Misic, Xuemin (Sherman) Shen(Authors)
    • 2015(Publication Date)
    10–21, 2011. 41. K. Michael and K. Miller, Big data: New opportunities and new challenges, Computer (Long. Beach. Calif) ., vol. 46, no. 6, 2013, pp. 22–24. 42. NoSQL Database Technology, Report, 2014. This page intentionally left blank This page intentionally left blank 57 C H A P T E R 4 Big Data Distributed Systems Management Rashid A. Saeed and Elmustafa Sayed Ali B ig data deals with large scales of data characterized by three concepts: volume, variety, and velocity known as the 3Vs of Big Data. Volume is a term related to Big Data, and as known data can be organized in sizes by gigabytes or terabytes of data stor-age but Big Data means there are a lot of data amounting to more than terabytes such as petabytes or exabytes and it is one of the challenges of Big Data that it requires a scalable storage. Really, data volume will continue to grow every day, regardless of the organized sizes because of the natural tendency of companies to store all types of data such as finan-cial data, medical data, environmental data, and so on. Many of these companies’ data-sets are within the terabytes range today, but soon they could reach petabytes or even CONTENTS Big Data Challenges 58 Big Data Management Systems 60 Distributed File System 61 Nonstructural and Semistructured Data Storage 61 Big Data Analytics 62 Data Mining 62 Image and Speech Data Recognition 62 Social Network Data Analysis 63 Data Fusion and Integration 64 Management of Big Data Distributed Systems 64 Hadoop Technologies 65 Hadoop Distributed File System (HDFS) 65 Hadoop MapReduce 66 NoSQL Database Management System (NoSQL DBMS) 66 Software as a Service (SaaS)–Based Business Analytics 68 Master Data Management 68 Conclusion 69 References 69 58 ◾ Networking for Big Data exabyte and more. Variety of Big Data is an aggregation of many types of data and maybe structured or unstructured including social media, multimedia, web server logs, and many other types of information forms.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.