Computer Science
Big Data Challenges
Big data challenges refer to the difficulties associated with managing, analyzing, and extracting valuable insights from large and complex datasets. These challenges include issues related to data storage, processing, analysis, and privacy. Addressing big data challenges often requires advanced technologies and techniques such as distributed computing, machine learning, and data visualization.
Written by Perlego with AI-assistance
Related key terms
1 of 5
10 Key excerpts on "Big Data Challenges"
- eBook - ePub
Big Data
Concepts, Warehousing, and Analytics
- Maribel Yasmina Santos, Carlos Costa(Authors)
- 2022(Publication Date)
- River Publishers(Publisher)
4 ). Defining Big Data by the inadequacy of traditional technologies is relatively dangerous, as advancements are constantly being made (e.g. quantum computing); furthermore such definition implies that Big Data has always existed and will continue to exist (Ward & Barker, 2013). The current definitions of Big Data are relatively dependent on the techniques and technologies for collecting, storing, processing, and analyzing it. These techniques and technologies will evolve over time and we need to learn to adapt ourselves to those changes. Analyzing new technological trends that may benefit business and reconsider new strategies related to data will always be important. Currently, a new paradigm shift is happening and it does not need to impact all organizations. Nevertheless, scientific progress in this field will continue providing to organizations evaluated and efficient techniques and technologies for addressing data-driven environments. The state of the art regarding Big Data techniques and technologies will be presented later in this book. The next section presents the challenges regarding Big Data.2.3. Big Data Challenges
This section presents several challenges associated with handling Big Data, including general dilemmas, challenges in the Big Data life cycle, issues concerning security, privacy, and monitoring, as well as the changes organizations may need to undergo. These challenges also serve to identify relevant research topics across various fields.2.3.1. Big Data General Dilemmas
General dilemmas may include challenges such as the lack of consensus and rigor in Big Data’s definition, models, and architectures. For example, M. Chen et al. (2014) claim that the concept of Big Data often has more to do with commercial speculation than with scientific research. The authors also mention the lack of standardization in Big Data, such as data quality evaluation and benchmarking. In fact, the lack of standard benchmarks to compare different technologies is seriously aggravated by the constant technological evolution of Big Data environments (Baru, Bhandarkar, Nambiar, Poess, & Rabl, 2013).How to take full advantage of Big Data in areas such as scientific research, engineering, medicine, finance, education, government, retail, transportation, or telecommunications remains an open question (M. Chen et al., 2014). Discussions about how to select the most appropriate data from several sources or how to estimate their value are major issues (Chandarana & Vijayalakshmi, 2014). Another issue regularly discussed is how Big Data helps representing the population better than a small dataset does (Fisher, DeLine, Czerwinski, & Drucker, 2012). The answer obviously depends on the context, but the authors make the important point that one should not assume that more data is always better. - eBook - ePub
- Amit Kumar Tyagi, Ajith Abraham(Authors)
- 2022(Publication Date)
- Academic Press(Publisher)
To boost the data processing and information discovery in wide-scale automation systems, a variety of technologies such as computational intelligence and big data may be combined. Every 2years, the volume of data obtained by different applications around the world across a vast range of fields is projected to double. It is useless until these are evaluated to obtain valuable facts. This necessitates the introduction of methods that will make large data processing easier. The advancement of efficient machines has made it easier to bring these methods into use, resulting in automated processes. For high-performance, large-scale data analysis, like leveraging parallelism in existing and future computing architectures for data mining, transforming data into information is by no means a simple job. Furthermore, these data may include a variety of levels of ambiguity. Many different models have been found to be useful in representing data, including fuzzy sets, rough sets, soft sets, neural networks, their generalizations, and hybrid models created by integrating two or more of these models. These models are often very useful for research. Big data is often reduced to just the critical characteristics required for a specific analysis or depending on the application environment. As a result, strategies for reduction have been established. Missing values are common in the data obtained. Until analyzing the results, these values must be produced or the tuples with these missing values must be removed. More specifically, these new challenges can jeopardize, if not worsen, the output, reliability, and scalability of dedicated data-intensive computing systems. The latter solution will result in data loss, so it is not recommended. This raises a number of research concerns in the business and research sector, such as accurately collecting and accessing data. Another challenge is rapid processing while maintaining high efficiency and throughput, as well as saving data effectively for potential usage. Programming for large data processing is also a significant challenge. The need for expressing device data access specifications and constructing programming language abstractions to leverage parallelism is urgent. Furthermore, machine learning principles and techniques are gaining traction among researchers as a means of achieving practical results from these ideas. Data analysis, algorithm implementation, and optimization have become the subject of machine learning research for large data. Many of the machines learning tools for big data that have recently been introduced need significant changes to be adopted. We argue that, although each method has its own set of benefits and drawbacks, more effective tools for coping with big data issues can be created. The effective tools that are being built must be able to deal with noisy and imbalanced data, complexity and variance, and missing values.5. Various sectors in data analytics
Data is a valuable resource that comes in a variety of ways. Big data does not have a single meaning, although it is debated in a variety of ways. Big data is a concept used to characterize the rapid development in data flows in different industries that is too huge to handle through current database and technologic techniques. Big data is often seen as frightening, notwithstanding the fact that it is an explosion in the field of intelligence. It enables multiple analytics to be performed, which can have an effect on economic development, create jobs, and improve productivity in comparison to other organizations. This massive amount of information is often characterized as three-dimensional, namely volume, velocity, and variety, with some also including veracity. Data volume is influenced by a number of variables. That may be transactional data that has been collected over time or data flowing via social media. The cumulative quantity of mass data inside an entity is referred to as the data amount. The amount of data produced in an enterprise grows at an erratic pace, which may range from petabytes to zettabytes depending on the organization's output activities and type. The data in the cumulative data exchanged in an organization or in motion is referred to as velocity. The rate at which an enterprise generates, processes, and analyses data is typically increasing. It has an effect on the production and distribution of data from one stage to another. It is frequently time sensitive. - eBook - PDF
Guide to Cloud Computing for Business and Technology Managers
From Distributed Computing to Cloudware Applications
- Vivek Kale(Author)
- 2014(Publication Date)
- Chapman and Hall/CRC(Publisher)
The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of big data computing applications. This chapter explores the challenges of big data computing. 21.1.1 What Is Big Data? Big data can be defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambi-guity, which cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. Data defined as big data include weather; geospatial and GIS data; consumer-driven data from social media; enterprise-generated data from legal, sales, marketing, procurement, finance, and human-resources depart-ment; and device-generated data from sensor networks, nuclear plants, x-ray and scanning devices, and airplane engines. 21.1.1.1 Data Volume The most interesting data for any organization to tap into today are social media data. The amount of data generated by consumers every minute pro-vides extremely important insights into choices, opinions, influences, con-nections, brand loyalty, brand management, and much more. Social media sites provide not only consumer perspectives but also competitive posi-tioning, trends, and access to communities formed by common interest. Organizations today leverage the social media pages to personalize market-ing of products and services to each customer. Every enterprise has massive amounts of e-mails that are generated by its employees, customers, and executives on a daily basis. These e-mails are all considered an asset of the corporation and need to be managed as such. After Enron and the collapse of many audits in enterprises, the US government mandated that all enterprises should have a clear life-cycle management of e-mails and that e-mails should be available and auditable on a case-by-case basis. - eBook - PDF
- Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan, Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan(Authors)
- 2016(Publication Date)
- Chapman and Hall/CRC(Publisher)
All of these factors suggest a kind of ubiquity of data, but also contain a functionally vague understanding, which is situationally determined, and because of that it can be deployed in many contexts, has many advocates, and can be claimed by many as well. Partly because of this context-sensitive definition of the concept of big data, it is by no means a time phenomenon or novelty, but has a long genealogy that goes back to the earliest civilizations. Some aspects of this phenomenon will be discussed in the following sections. In addition, we will show in this chapter how big data embody a conception of data science at least at two levels. First of all, data science is the technical-scientific discipline, specialized in managing the multitude of data: collect, store, access, analyze, visualize, interpret, and protect. It is rooted in computer science and statistics; computer science is traditionally oriented toward data structures, algorithms, and scalability, and statistics is focused on analyzing and interpreting the data. In particular, we may identify here the triptych database technology/information retrieval, computational intelligence/machine learning, and finally inferential statistics. The first pillar concerns database/information retrieval technology. Both are core disciplines of computer science since many decades. Emerging from this tradition in recent years, notably researchers of Google and Yahoo have been working on techniques to cluster many computers in a data center, making data accessible and allowing for data-intensive calculations: think, for example, of BigTable, Google File Systems, a programming paradigm as Map Reduce, and the open source variant Hadoop. The paper of Halevy precedes this development as well. The second pillar relates to intelligent algorithms from the field of computational intelligence (machine learning and data mining). - eBook - PDF
- Rajendra Akerkar(Author)
- 2013(Publication Date)
- Chapman and Hall/CRC(Publisher)
83 Hive ............................................................................................................... 84 MonetDB ....................................................................................................... 84 MongoDB ...................................................................................................... 85 Objectivity ..................................................................................................... 85 OpenQM ....................................................................................................... 86 RDF-3X .......................................................................................................... 86 58 Big Data Computing Introduction Although the management of huge and growing volumes of data is a chal-lenge for the past many years, no long-term solutions have been found so far. The term “Big Data” initially referred to huge volumes of data that have the size beyond the capabilities of current database technologies, consequently for “Big Data” problems one referred to the problems that present a combina-tion of large volume of data to be treated in short time. When one establishes that data have to be collected and stored at an impressive rate, it is clear that the biggest challenge is not only about the storage and management, their analysis, and the extraction of meaningful values, but also deductions and actions in reality is the main challenge. Big Data problems were mostly related to the presence of unstructured data, that is, information that either do not have a default schema/template or that do not adapt well to relational tables; it is therefore necessary to turn to analysis techniques for unstruc-tured data, to address these problems. Recently, the Big Data problems are characterized by a combination of the so-called 3Vs: volume , velocity, and variety ; and then a fourth V too has been added: variability . - eBook - PDF
Computational Social Science in the Age of Big Data
Concepts, Methodologies, Tools, and Applications
- Martin Welker, Cathleen M. Stützer, Marc Egger(Authors)
- 2018(Publication Date)
- Herbert von Halem Verlag(Publisher)
This has even led to research that tried to define Big Data by using techniques like natural language processing to find common key word usage among Big Data publications (de Mauro et al. 2016). The findings show Big Data to be associated with information extraction, process-generated data, the size of datasets, but also specific technical implementations and assump-tions of societal relevance. Following an extensive literature overview, de Mauro et al. (2016: 129) arrive at their proposal for a more inclusive defi- 80 Jan r. rieBling nition: »Big Data is the Information asset characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value.« Definitions like the preceding one are meant to be as inclusive as possi-ble with regards to already existing research, which is also the reason why they end up being very vague. However, this is only a problem if one tries to define an object in an absolute sense. Here the definition used to describe Big Data is not assumed to be a correct description outside of this actual argument. It is used simply to illustrate the difference between two sets of problems, Big and Medium Data, and ways to address them. Within the confines of this chapter the term ›Big Data‹ is meant to refer to the prob-lem of making analytical inquiries on a dataset under the constraints of processing time and memory size. Therefore, Big Data methods and tech-nologies are those that functionally and foremost address these problems. This definition is in line with the original intention of the technology, mainly the MapReduce framework pioneered by Google (dean/gheMaWat 2008). The general idea is to partition the data using a mapping of keys to values. Thereby allowing the data to be distributed across several disks while at the same time allowing for the transformation of the singular datapoints in parallel processes. - eBook - ePub
- Kathy A. Mills(Author)
- 2019(Publication Date)
- Taylor & Francis(Publisher)
3 Challenges of big data for qualitative researchersIt has been recently claimed that “the world’s most valuable resource is no longer oil, but data” (The Economist , 2017). The growing big data literature and research points to both the challenges and possibilities of using this so-called “digital oil” (Yi, Liu, Liu, & Jin, 2014) for research and other academic or corporate purposes (Sivarajah, Kamal, Irani, & Weerakkody, 2017), while specific directions and applications for qualitative research are still emergent (Mills, 2017). In recent times, new benefits of big data analytics (BDA) have been advanced and demonstrated in relation to text mining in the humanities (Rockwell & Berendt, 2016), sentiment analysis of tweets (Yu & Wang, 2015), and visual analytics in undergraduate health education (Vaitsis, Nilsson, & Zary, 2014). Online software has been developed to generate a range of machine-enabled data on students’ written compositions for educational practice and research applications (Smith, Cope, & Kalantzis, 2017). Methods and software have been proposed in social semiotics to integrate qualitative multimodal analysis with data mining and visualization (O’Halloran, Tan, Pham, Bateman, & Vande Moere, 2018). In educational research, a wide range of writing and learning environments and online tools are used to capture and collate learning process data (Knight, Shum, & Littleton, 2014).When purposefully realizing these potentials, researchers are also navigating the logistical challenges, costs, and responsibilities. Some of the commonly observed difficulties include the complexities of integrating multiple data sources (Gandomi & Haider, 2015), a lack of knowledge, and an insufficient number of skilled personnel or data scientists (Kim, Trimi, & Chung, 2014). Others have pointed to difficulties keeping pace with new data infrastructure requirements (Barbierato, Gribaudo, & Iacono, 2014), such as scalable and flexible technologies to manage substantial amounts of data, whether textual or multimedia (Sivarajah et al., 2017). In terms of data management, there are new issues for ownership, authenticity, privacy, security, data governance, and data and information sharing (Barnaghi, Sheth, & Henson, 2013; Sivarajah et al., 2017). This chapter extends recent debates about the challenges and potentials of BDA for qualitative researchers who work with digital data. - eBook - PDF
Big Data
Storage, Sharing, and Security
- Fei Hu(Author)
- 2016(Publication Date)
- Auerbach Publications(Publisher)
Such a disclosure could lead to competitors taking strategic moves. This emphasizes the fact that preserving only anonymity may not be sufficient in some cases. 310 Big Data: Storage, Sharing, and Security 12.4 Privacy Challenges Providing users with quality recommendations using big data techniques is a seemingly con-flicting objective with the equally important goal of privacy preservation. Even a small amount of personal information may lead to identifying a user with high probability in the presence of side channel external data [24]. Currently, systems that provide personalization function as a black box from the user’s perspective. Users do not know what is really collected about them, what is inferred about them by the system, with which other data sources their private data may be combined, what are their benefits of disclosure. Furthermore, faced with the multitude and growing number of external data sources, even limited disclosure of information to a given system may reveal enough about them for the same system to be able to infer knowledge they would have otherwise preferred to remain private. We list below (Sections 12.4.1 through 12.4.11) some of the main categories of challenges that users face concerning their privacy in the existing systems using big data techniques for personalization. 12.4.1 Transparency Users are often unable to monitor and follow precisely what information about them the system has collected. For example, it is common knowledge that different services, such as Google, Facebook, Amazon, and so on, use big data analytics to provide personalization in many of their services. However, it is not always transparent to users what information has been collected, inferred, and how it is used by whom. Even if these services wish to provide more transparency it is often technically challenging to provide tools to visualize complex processing and manipulation (and in particular aggregation) of user information. - Richa Tiwari(Author)
- 2023(Publication Date)
- Society Publishing(Publisher)
3.6.3. Getting Meaningful Insights Through the Use of Big Data Analytics This is imperative for business organizations and that is to gain important insights from Big Data analytics. Also, it is important that only the appropriate department has the access to this information. There is one big Exploring Data and Business Management: How Information Helps in Supervision 80 challenge which is faced by the companies in the Big Data analytics and that is mending this wide gap in an effective manner. 3.6.4. Getting Voluminous Data into the Big Data Platform It is not much surprising that data is growing along with every passing day. This indicates that the business organizations require to handle a vast amount of data on a daily basis. The amount as well as variety of data that are available these days can overwhelm any data engineer. This is the reason why it is considered important to make data accessibility easy and convenient specially for the brand owners and managers. 3.6.5. Uncertainty of Data Management Landscape Along with the rise of Big Data, there are new technologies and companies which are being developed every day. Nevertheless, a major challenge that is being faced by the companies in the Big Data analytics is to find out which technology will best suit them without the introduction of new problems as well as potential risks. 3.6.6. Data Storage and Quality The business organizations are growing at a fast speed. Along with the tremendous growth of the companies as well as large business organizations, there is an increase in the amount of data being produced. The storage of such large amount of data is becoming a real challenge for every other person. Some examples of the popular data storage options such as data lakes/ warehouses. These are commonly used in order to gather and store large quantities of unstructured as well as structured data in its native format.- eBook - PDF
Large Scale and Big Data
Processing and Management
- Sherif Sakr, Mohamed Gaber, Sherif Sakr, Mohamed Gaber(Authors)
- 2014(Publication Date)
- Auerbach Publications(Publisher)
With the significant benefits in terms of greater flexibility, performance, scalability, clouds are here to stay. Similarly, advances in Big Data-processing technology will reap numerous benefits. However, as many of our everyday computing services move to the cloud, we do need to ensure that the data and computation will be secure and trustworthy. In this chapter, we have outlined the major research questions and challenges in cloud and big security and privacy. 591 Security in Big Data and Cloud Computing The fundamental nature of clouds introduce new security challenges. Today’s clouds are not secure, accountable, or trustworthy. Many open problems need to be resolved before major users will adopt clouds for sensitive data and computations. For wider adoption of clouds and Big Data technology in critical areas such as busi-ness and healthcare, it is vital to solve these problems. Solving the security issues will popularize clouds further, which in turn, will lower costs and have a broader impact on our society as a whole. AUTHOR BIOGRAPHY Dr. Ragib Hasan is a tenure-track assistant professor at the Department of Computer and Information Sciences at the University of Alabama at Birmingham (UAB). With a key focus on practical computer security problems, Hasan explores research on Big Data, cloud security, mobile malware security, secure provenance, and data-base security. Hasan is the founder of the SECuRE and Trustworthy Computing Lab (SECRETLab, http://secret.cis.uab.edu) at UAB. He is also a member of the UAB Center for Information Assurance and Joint Forensics Research. Before joining UAB in the Fall of 2011, Hasan was an NSF/CRA Computing Innovation Fellow and assistant research scientist at the Department of Computer Science, Johns Hopkins University. He received his PhD and MS degrees in computer science from the University of Illinois at Urbana Champaign in October 2009 and December 2005, respectively.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.









