Computer Science

Big Data Technologies

Big Data Technologies refer to the tools, frameworks, and platforms used to process, analyze, and extract insights from large and complex data sets. These technologies often include distributed storage systems like Hadoop, data processing frameworks like Spark, and NoSQL databases. They enable organizations to handle massive volumes of data and derive valuable information for decision-making and business intelligence.

Written by Perlego with AI-assistance

7 Key excerpts on "Big Data Technologies"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Big Data Analytics
    eBook - ePub

    Big Data Analytics

    A Social Network Approach

    • Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien, Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien(Authors)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)

    ...CHAPTER 12 Big Data Analysis Technology and Applications Suprem Abhijit School of Computer Science, Georgia Institute of Technology, USA; [email protected] Introduction Big data is a nebulous term that incorporates a variety of areas—from large-scale data collection, to storage methodologies, to analytics, to visualization. In each of these cases, the challenge is on efficient data operation on huge amounts of data. As computing power, hardware and storage continues to increase, there is no clear indication on what exactly the ‘big’ in big data means. Commonly, however, big data refers to giga-, tera-, or peta-scale data, such as text corpora from millions of books, the billions of images in Facebook’s database, or the trillions of file signatures in security companies’ malware identification databases. Large-scale data collection is common in both industry and the public sector, and the presence of various collection agencies as financial entities, social media corporations, and public- and industry-sector monitoring organizations has significantly increased the volume of data collected and made publicly available. Modern research and development in big data goes beyond the collection and management paradigm and enters the domain of visual and data analytics. The former is concerned with effective interaction and visualization tools for developers and end-users to better analyze and tune tools, respectively. The latter deals with myriad domain-specific frameworks, platforms and algorithms for a variety of analytics applications including data mining, prediction, ranking, language processing, financial modeling, human-computer interaction, and automated summarization. This survey covers the two broad research areas—visual analytics and data analytics, and details current trends and research in information visualization and large-scale analytics. This paper provides a survey of big data tools and systems...

  • Creating Smart Enterprises
    eBook - ePub

    Creating Smart Enterprises

    Leveraging Cloud, Big Data, Web, Social Media, Mobile and IoT Technologies

    ...The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of Big Data computing applications. This chapter explores the challenges of Big Data computing. 7.1.1 What Is Big Data? Big Data can be defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambiguity that cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. Data defined as Big Data includes weather, geospatial, and geographic information system (GIS) data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance and human-resources departments; and device-generated data from sensor networks, nuclear plants, X-ray and scanning devices, and airplane engines (Figures 7.1 and 7.2). Figure 7.1 4V characteristics of Big Data. Figure 7.2 Use cases for Big Data computing. 7.1.1.1 Data Volume The most interesting data for any organization to tap into today is social media data. The amount of data generated by consumers every minute provides extremely important insights into choices, opinions, influences, connections, brand loyalty, brand management, and much more. Social media sites not only provide consumer perspectives but also competitive positioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize marketing of products and services to each customer. Many additional applications are being developed and are slowly becoming a reality...

  • Big Data Analytics
    eBook - ePub

    Big Data Analytics

    Turning Big Data into Big Money

    • Frank J. Ohlhorst(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)

    ...Yet this situation does not mean that those who seek value from large data sets should wait. Big Data is far too important to business processes to take a wait-and-see approach. The real trick with Big Data is to find the best way to deal with the varied data sources and still meet the objectives of the analytical process. This takes a savvy approach that integrates hardware, software, and procedures into a manageable process that delivers results within an acceptable time frame—and it all starts with the data. Storage is the critical element for Big Data. The data have to be stored somewhere, readily accessible and protected. This has proved to be an expensive challenge for many organizations, since network-based storage, such as SANS and NAS, can be very expensive to purchase and manage. Storage has evolved to become one of the more pedestrian elements in the typical data center—after all, storage technologies have matured and have started to approach commodity status. Nevertheless, today’s enterprises are faced with evolving needs that can put the strain on storage technologies. A case in point is the push for Big Data analytics, a concept that brings BI capabilities to large data sets. The Big Data analytics process demands capabilities that are usually beyond the typical storage paradigms. Traditional storage technologies, such as SANS, NAS, and others, cannot natively deal with the terabytes and petabytes of unstructured information presented by Big Data. Success with Big Data analytics demands something more: a new way to deal with large volumes of data, a new storage platform ideology. AN OPEN SOURCE BRINGS FORTH TOOLS Enter Hadoop, an open source project that offers a platform to work with Big Data. Although Hadoop has been around for some time, more and more businesses are just now starting to leverage its capabilities...

  • Digital Transformation
    eBook - ePub

    Digital Transformation

    Survive and Thrive in an Era of Mass Extinction

    • Thomas M. Siebel(Author)
    • 2019(Publication Date)
    • Rodin Books
      (Publisher)

    ...To expand capacity, you add more CPUs, memory, and connectivity, thereby ensuring performance does not dip as you scale. The result is a vastly more flexible and less costly approach than scale-up architectures and is ideally suited to handle big data. Software technologies designed to leverage scale-out architectures and process big data emerged and evolved, including MapReduce and Hadoop. Big data as a term first appeared in an October 1997 paper by NASA researchers Michael Cox and David Ellsworth, published in the Proceedings of the IEEE 8 th Conference on Visualization. The authors wrote: “Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data.” 13 By 2013, the term had achieved such widespread circulation that the Oxford English Dictionary confirmed its cultural adoption, including it in that year’s edition of the OED. In 2001, Doug Laney—then an analyst at META Group—described three main traits that characterize big data: volume (the size of the data set, as measured in bytes, gigabytes, exabytes, or more); velocity (the speed of data arrival or change, as measured in bytes per second or messages per second or new data fields created per day); and variety (including its shape, form, storage means, and interpretation mechanisms). 14 Size, Speed, and Shape Big data continues to evolve and grow along all three of these dimensions—size, speed, and shape. It’s important for senior executives—not just the technologists and data scientists in the organization—to understand how each of these dimensions adds value as a business asset. Size. The amount of data generated worldwide has increased exponentially over the last 25 years, from about 2.5 terabytes (2.5 × 10 1 2 bytes) a day in 1997 to 2.5 exabytes (2.5 × 10 1 8 bytes) in 2018—and will continue to do so into the foreseeable future...

  • It's All Analytics!
    eBook - ePub

    It's All Analytics!

    The Foundations of Al, Big Data and Data Science Landscape for Professionals in Healthcare, Business, and Government

    ...Users needed to access the data, and analyze and model the data more expediently. From an analytics and data science perspective, there was a major technology and practice shift when adopting Big Data technology. Traditionally, most analysis and machine learning involved moving data from large data repositories (and most of these distributed) into a single “sandbox” for analytics. This involved increasingly more time as the volume of the data increased; it was taking more and more time to move the data across the wire. It also was a security risk since you were moving this data across a network, thus making it easier for someone to tap into sensitive data. You were also creating a duplicate copy of all this data on another server or servers, which increased costs and added additional security risks. What if you could instead move the algorithms (machine learning and other) to where the data lives in the first place? Move the algorithms instead of the data? This was the brilliance of big data technology. We will call this in-cluster, in-database or in-memory machine learning for short. A cluster refers to a group of servers that are grouped together to work on the same computational set of problems and can be viewed as one computer resource. Our examples will focus on Hadoop and Spark, two open source technologies available as part of the Apache (see gray box on “Apache, Hadoop, and Spark”) scalable in-database and in-cluster processing. For in-cluster, in-database computing, we are referring to moving the algorithms to the cluster where the data lives (Disk, Hadoop HDFS, and MapReduce). In-memory computing refers to moving the algorithms to faster, volatile/RAM, which is much faster (Spark). We address near-memory computing in the “Other Important Data Focuses of Today and Tomorrow” section of this chapter. The Hype of Big Data For many years “big data” was the rage, it was a major hype cycle (see gray box, “Big Data and the Gartner Hype Cycle”)...

  • Big Data, Big Analytics
    eBook - ePub

    Big Data, Big Analytics

    Emerging Business Intelligence and Analytic Trends for Today's Businesses

    • Michael Minelli, Michele Chambers, Ambiga Dhiraj(Authors)
    • 2012(Publication Date)
    • Wiley
      (Publisher)

    ...It is proficient at parsing data. Part of the Apache Hadoop project. Batch A job or process that runs in the background without human interaction. Big Data The de facto standard definition of big data is data that goes beyond the traditional limits of data along three dimensions: volume, variety, velocity. The combination of these three dimensions makes the data more complex to ingest, process, and visualize. Big Insights IBM’s commercial distribution of Hadoop with enterprise class value added components. Cassandra An open-source columnar database managed by the Apache Software Foundation. Clojure Pronounced “closure.” A dynamic programming language based on LISP (which was the de facto artificial programming language from late 1950s). Typically used for parallel data processing. Cloud General term used to refer to any computing resources—software, hardware or service—that is delivered as a service over a network. Cloudera The first commercial distributor of Hadoop. Cloudera provides enterprise-class value-added components with the Hadoop distribution. Columnar Database The storing and optimizing of data by columns...

  • Big Data Analytics Methods
    eBook - ePub

    Big Data Analytics Methods

    Analytics Techniques in Data Mining, Deep Learning and Natural Language Processing

    • Peter Ghavami(Author)
    • 2019(Publication Date)
    • De Gruyter
      (Publisher)

    ...HITRUST offers a Common Security Framework (CSF) that aligns HIPAA security controls with other security standards. Data scientists may apply other data cleansing programs in this layer. They might write tools to de-duplicate (remove duplicate records) and resolve any data inconsistencies. Once the data has been ingested, it’s ready to be analyzed by engines in the next layer. Since big data requires fast retrieval, several organizations, in particular the various open source foundations have developed alternate database architectures that allow parallel execution of queries, read, write and data management. There are three architectural taxonomies or strategies for storing big data that impact data governance, management and analytics: Analyze Data in-Place: Traditionally, data analysts have used the native application and SQL query the application’s data without moving the data. Many data analysts’ systems build analytics solutions on top of an application’s database without using data warehouses. They perform analytics in place, from the existing application’s data tables without aggregating data into a central repository. The analytics that are offered by EMR (electronic medical records) companies as integrated solutions to their EMR system fit this category. Build Data Repository: Another strategy is to build data warehouses to store all the enterprise data in a central repository. These central repositories are often known as enterprise data warehouses (EDW). Data from business systems, customer relationship management (CRM) systems, enterprise resource planning (ERP) systems, data warehouses, financial, transactional and operational systems are normalized and stored in these data warehouses. A second approach called data lake has emerged. Data lakes are often implemented using Hadoop distributed file system or through cloud storage solutions. The data is either collected through ETL extraction (batch files) or via interface programs and APIs...