eBook - ePub

Blockchain for Big Data

Name: Blockchain for Big Data
Author: Shaoliang Peng

Shaoliang Peng

224 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Blockchain for Big Data

Shaoliang Peng

Book details

Book preview

Table of contents

Citations

About This Book

In recent years, the fast-paced development of social information and networks has led to the explosive growth of data. A variety of big data have emerged, encouraging researchers to make business decisions by analysing this data. However, many challenges remain, especially concerning data security and privacy.

Big data security and privacy threats permeate every link of the big data industry chain, such as data production, collection, processing, and sharing, and the causes of risk are complex and interwoven. Blockchain technology has been highly praised and recognised for its decentralised infrastructure, anonymity, security, and other characteristics, and it will change the way we access and share information. In this book, the author demonstrates how blockchain technology can overcome some limitations in big data technology and can promote the development of big data while also helping to overcome security and privacy challenges.

The author investigates research into and the application of blockchain technology in the field of big data and assesses the attendant advantages and challenges while discussing the possible future directions of the convergence of blockchain and big data. After mastering concepts and technologies introduced in this work, readers will be able to understand the technical evolution, similarities, and differences between blockchain and big data technology, allowing them to further apply it in their development and research.

Author:
Shaoliang Peng is the Executive Director and Professor of the College of Computer Science and Electronic Engineering, National Supercomputing Centre of Hunan University, Changsha, China. His research interests are high-performance computing, bioinformatics, big data, AI, and blockchain.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Blockchain for Big Data an online PDF/ePUB?

Yes, you can access Blockchain for Big Data by Shaoliang Peng in PDF and/or ePUB format, as well as other popular books in Informatik & Informatik Allgemein. We have over one million books available in our catalogue for you to explore.

Information

Publisher

CRC Press

Year

2021

ISBN

9781000432831

Edition

Topic

Informatik

Subtopic

Informatik Allgemein

CHAPTER 1 The Development of Big Data

1.1 The Concept of Big Data

Gartner defines big data as a massive, high growth rate, and diversified information asset that requires new processing mode to have stronger decision-making power, insight and discovery ability, and process optimisation ability (Ward & Barker, 2013).

McKinsey’s definition: big data refers to the data collection that cannot be collected, stored, managed, and analysed by traditional database software tools within a certain period.

Modern society is an information-based and digital society. With the rapid development of the Internet, Internet of Things, and cloud computing technology, data is flooding the whole world, which makes data become a new resource. People must make rational, efficient, and full use of it. The number of data is increasing exponentially, and the structure of data is becoming increasingly complex, which makes ‘big data’ have different deep connotations from ordinary ‘data’.

The volume of data in such fields as astronomy, high-energy physics, biology, computer simulation, Internet applications, and e-commerce has shown a rapid growth trend. According to the United States Internet Data Center (IDC), data on the Internet grows by more than 50% per year, doubling every two years, and more than 90% of the world’s data has been generated in recent years. Data does not simply refer to the information that people publish on the Internet; the world’s industrial equipment, cars, meters with countless sensors, measuring and transmitting information about position, movement, vibration, temperature, humidity, and even changes in the air quality at any time, also generated a huge amount of data.

Dr. Jim Gray, a famous database expert, author of transaction processing, and Turing Award winner, summed up that in the history of human scientific research, there have been three paradigms, namely, Empirical, Theoretical, and Computational. Today, with the increasing amount of data and the complexity of data structure, these three paradigms can no longer meet the needs of scientific research in new fields, so Dr. Jim Gray proposed the fourth paradigm, a new data research method, namely Data Exploration, to guide and update scientific research in different fields.

The size of data is not the only indicator to judge big data. The characteristics of big data can be summarised in 4Vs, which are volume, velocity, variety, and value.

1.1.1 Large Amount of Data

As we enter the information society, data grows naturally, and its production is not transferred according to human will. From 1986 to 2010, the amount of global data has increased 100 times. In the future, the growth rate of data will be faster. We are living in an era of ‘data explosion’. Today, only 25% of the world’s devices are connected to the Internet, and about 80% of Internet devices are computers and mobile phones. In the near future, more users will become Internet users, and various devices such as automobiles, televisions, household appliances, and manufacturing equipment will also be connected to the Internet. With the rapid development of Web 2.0 and mobile Internet, people can publish all kinds of information, including blogs, microblogs, WeChat, and so on, anytime, anywhere, and at will. In the future, with the promotion and popularisation of the Internet of Things, all kinds of sensors and cameras will be everywhere in our work and life. These devices automatically generate a large amount of data every moment.

1.1.2 Variety of Data Types

Big data comes from many sources, with new data being generated continuously from scientific research, enterprise applications, and web applications. Biological big data, transportation big data, medical big data, telecom big data, electric power big data, financial big data, etc. are showing a ‘blowout’ growth, involving a huge number, which has jumped from the TB level to PB level.

The data types of big data are rich, including structured data and unstructured data. The former accounts for about 10%, mainly refer to the data stored in the relational database; the latter accounts for about 90% with various types, mainly including email, audio, video, WeChat, microblog, location information, link information, mobile phone call information, and network log.

Such a wide variety of heterogeneous data brings new challenges and opportunities to data processing and analysis technology.

1.1.3 Fast Processing Speed

In the era of big data, the speed of data generation is very fast. In Web 2.0 applications, Sina can generate 20,000 microblogs, Twitter can generate 100,000 tweets, Apple can download 47,000 applications, Taobao can sell 60,000 products, Renren can generate 300,000 visits, and Baidu can generate 1,000,000 tweets in 1 minute. Facebook generates 6 million page views for 900,000 search queries. The famous Large Hadron Collider (LHC) generates about 600 million collisions per second, generating about 700 MB of data per second, with thousands of computers analysing these collisions.

Many applications in the era of big data need to provide real-time analysis results based on rapidly generated data to guide production and a life practice. As a result, the speed of data processing and analysis is usually in the order of seconds, which is fundamentally different from traditional data mining techniques, which do not require real-time analysis results.

1.1.4 Low-Value Density

As beautiful as it may look, big data has a much lower value density than what is already available in traditional relational databases. In the era of big data, much of the valuable information is scattered throughout the mass of data. In the case of an accident, only a small piece of video recording of the event will be valuable. However, in order to get the valuable video in case of theft, we have to invest a lot of money to buy surveillance equipment, network equipment, storage devices, and consume a lot of power and storage space to save the continuous monitoring data from the cameras.

1.2 The Past and Present of Big Data

At the symposium of the 11th International Joint Conference on Artificial Intelligence held in Detroit, Michigan, USA in 1989, the concept of ‘Knowledge Discovery (KDD) in Database’ was put forward for the first time. In 1995, the first International Conference on Knowledge Discovery and Data Mining was held. With the increase of participants, the KDD International Conference was developed into an annual meeting. The fourth International Conference on Knowledge Discovery and Data Mining was held in New York in 1998, where not only academic discussions were held but also more than 30 software companies demonstrated their products. For example, Intelligent Miner, developed by IBM, is used to provide a solution for data mining. SPSS Co., Ltd. developed data mining software Clementine based on decision tree. Darwin data mining suite is developed by Oracle, Enterprise of SAS and Mine set of SGI, and so on.

In the academic community, nature launched a special issue of ‘big data’ as early as 2008, which focuses on the research of big data from the aspects of Internet technology, supercomputing, and other aspects.

Economic interests have become the main driving force, and multinational giants such as IBM, Oracle, Microsoft, Google, Amazon, Facebook, Teradata, EMC, HP, and other multinational giants have become more competitive due to the development of big data technology. In 2009 alone, Google contributed $54 billion to the U.S. economy through big data business; since 2005, IBM has invested $16 billion in more than 30 big data-related acquisitions, making the performance stable and promoting rapid growth. In 2012, the share price of IBM broke the $200 mark, tripled in three years; eBay accurately calculated the return of each keyword in advertising through data mining. Since 2007, advertising expenses have decreased by 99%, while the percentage of top sellers in total sales has increased to 32%; in 2011, Facebook made public, for the first time, a new data processing and analysis platform, Puma. Through the differentiation and optimisation of multiple data processing links, the data analysis cycle is reduced from 2 days to less than 10 seconds, tens of thousands of times more efficient, and large-scale commercial applications based on it have been blooming since then (Wei-Pang & Ntafos, 1992).

In March 2012, the Obama administration announced the ‘Big Data Research and Development Initiative’, which aims to improve people’s ability to acquire knowledge from massive and complex data and develop the core technologies needed for collecting, storing, retaining, managing, analysing, and sharing massive data. Big data has become the focus of information technology after an integrated circuit and the Internet.

People have never stopped to analyse data mining, but the rise of the concept of big data has happened in recent decades. The reason for its formation is the result of the joint action of various factors. If any of the factors is not developed enough, it will not form the hot and extensive application of big data.

1.3 Technical Support of Big Data

Information technology needs to solve the three core problems of information storage, information transmission, and information processing, and the continuous progress of human society in the field of information technology provides technical support for the advent of the big data era.

1.3.1 Storage Device Capacity Is Increasing

In recent years, computer hardware technology has developed rapidly, but on the other hand, information and data are also growing. To meet the requirements of information storage, hardware storage devices have been constantly improved. Nowadays, storage cards of the size of fingernails have several gigabytes or even tens of gigabytes of information capacity, which was unthinkable in the past. Hardware storage devices include hard disk, optical disk, U disk, mobile storage device, and so on. This kind of storage devices is collectively referred to as solid-state storage devices. At present, solid-state storage devices are used to store data and information in the world. This is mainly because of its low-carbon manufacturing nature. Besides, it is made of solid-state electronic storage chips through matrix arrangement, which has many advantages. Traditional devices do not have the advantages, so solid-state storage devices become the mainstream storage devices.

In 1956, IBM produced the world’s first commercial hard disk with a capacity of only 5 MB. It was not only expensive but also the size of a refrigerator. In 1971, Seagate founder Alan Shugart launched an 8-inch floppy disk storage device, which was no longer so huge. However, at the earliest time, its capacity was only 81 kb. A long document may need several floppy disks to copy. Then, there were 5.25-inch and 3.5-inch floppy disks, and the capacity was no more than 2 MB. In 1973, American Jim Russell built the first compact disc recording prototype. Since then, CD storage has entered the stage of history. Different storage technologies such as CD, MD, VCD, DVD, and HD-DVD have been developed. The storage capacity has also entered the G era from the M era. From storage cost, HP’s tape can store up to 15 TB in a single disk, and the price is only about 900 yuan ($139.88 USD), only 60 yuan ($9.33 USD) per TB. The cheap and high-performance hard disk storage device not only provides a large amount of storage space but also greatly reduces the cost of data storage.

Data volume and storage device capacity complement and promote each other. On the one hand, with the continuous generation of data, the amount of data to be stored is increasing, which puts forward higher requirements for the capacity of storage devices, which urges storage device manufacturers to manufacture products with a larger capacity to meet the market demand; on the other hand, storage devices with larger capacity further speed up the growth of data volume. In the era of high storage equipment prices, due to the consideration of cost issues, some unnecessary devices that cannot clearly reflect the value of data are often discarded. However, with the continuous reduction of the price of unit storage space, people tend to save more data in order to use more advanced data analysis tools to excavate value from it at some time in the future.

1.3.2 Increasing Network Bandwidth

In the 1950s, communication researchers recognised the need to allow conventional communication between different computer users and communication networks. This leads to the study of decentralised networks, queuing theory, and packet switching. In 1960, ARPANET, created by the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense, triggered technological progress and made it the centre of Internet development. In 1986, NSFNET, a backbone network based on TCP/IP technology, was established by the National Science Foundation of the United States to interconnect the Supercomputer Centre and academic institutions. The speed of NSFNET increased from 56 kbit/s to T1 (1.5 mbit/s) and finally to T3 (45 mbit/s).

The development of Internet access technology is very rapid. The bandwidth has developed from the initial 14.4 kbps to the current 100 Mbps or even 1 Gbps bandwidth. The access mode has also developed from the single telephone dial-up mode in the past to the diversified wired and wireless access methods. The access terminal also begins to develop towards mobile devices.

In the era of big data, information transmission is no longer encountering bottlenecks and constraints in the early stage of network development.

1.3.3 CPU Processing Capacity Increased Significantly

In 1971, Intel released its first 4-bit microprocessor, the 4004 microprocessor, which had a maximum frequency of only 108 kHz. Soon after, Intel introduced 8008 microprocessors, and in 1974, 8008 developed into 8080 microprocessors, thus CPU entered the second generation of microprocessors. The second-generation microprocessors all adopt NMOS technology. Only four years later, 8086 microprocessor was born. It is the first 16-bit microprocessor in the world and the starting point of the third generation microprocessor. In 2000, Pentium 4 was born, when the CPU frequency has reached the GHz level, in 2004 Intel built the 3.4 GHz processor. But the process of CPU is only 90 nm, the consequence of ultra-high frequency is huge heat generation and power consumption, the ...

Citation styles for Blockchain for Big Data

APA 6 Citation

Peng, S. (2021). Blockchain for Big Data (1st ed.). CRC Press. Retrieved from https://www.perlego.com/book/2554941/blockchain-for-big-data-pdf (Original work published 2021)

Chicago Citation

Peng, Shaoliang. (2021) 2021. Blockchain for Big Data. 1st ed. CRC Press. https://www.perlego.com/book/2554941/blockchain-for-big-data-pdf.

Harvard Citation

Peng, S. (2021) Blockchain for Big Data. 1st edn. CRC Press. Available at: https://www.perlego.com/book/2554941/blockchain-for-big-data-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Peng, Shaoliang. Blockchain for Big Data. 1st ed. CRC Press, 2021. Web. 15 Oct. 2022.