The Data Industry
eBook - ePub

The Data Industry

The Business and Economics of Information and Big Data

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

The Data Industry

The Business and Economics of Information and Big Data

About this book

Provides an introduction of the data industry to the field of economics

This book bridges the gap between economics and data science to help data scientists understand the economics of big data, and enable economists to analyze the data industry. It begins by explaining data resources and introduces the data asset. This book defines a data industry chain, enumerates data enterprises' business models versus operating models, and proposes a mode of industrial development for the data industry. The author describes five types of enterprise agglomerations, and multiple industrial cluster effects. A discussion on the establishment and development of data industry related laws and regulations is provided. In addition, this book discusses several scenarios on how to convert data driving forces into productivity that can then serve society. This book is designed to serve as a reference and training guide for ata scientists, data-oriented managers and executives, entrepreneurs, scholars, and government employees.

  • Defines and develops the concept of a "Data Industry, " and explains the economics of data to data scientists and statisticians
  • Includes numerous case studies and examples from a variety of industries and disciplines
  • Serves as a useful guide for practitioners and entrepreneurs in the business of data technology

The Data Industry: The Business and Economics of Information and Big Data is a resource for practitioners in the data science industry, government, and students in economics, business, and statistics.

CHUNLEI TANG, Ph.D., is a research fellow at Harvard University. She is the co-founder of Fudan's Institute for Data Industry and proposed the concept of the "data industry". She received a Ph.D. in Computer and Software Theory in 2012 and a Master of Software Engineering in 2006 from Fudan University, Shanghai, China.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Data Industry by Chunlei Tang in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2016
Print ISBN
9781119138402
eBook ISBN
9781119138426

CHAPTER 1
WHAT IS DATA INDUSTRY?

The next generation of information technology (IT) is an emerging and promising industry. But, what's truly the “next generation of IT”? Is it the next generation mobile networks (NGMN), Internet of Things (IoT), high-performance computing (HPC), or is it something else entirely? Opinions vary widely.
From the academic perspective, the debates, or arguments, over specific and sophisticated technical concepts are merely hype. How so? Let's take a quick look at the essence of information technology reform (IT reform) – digitization. Technically, it is a process that stores “information” that is generated in the real world from the human mind in digital form as “data” into cyberspace. No matter what types of new technologies emerge, the data will stay the same. As the British scholar Viktor Mayer-Schonberger once said [1], it's time to focus on the “I” in the IT reform. “I,” as information, can only be obtained by analyzing data. The challenge we expect to face is the burst of a “data tsunami,” or “data explosion,” so data reform is already underway. The world of “being digital,” as advocated some time ago by Nicholas Negroponte [2], has been gradually transformed to “being in cyberspace.”1
With the “big data wave” touching nearly all human activities, not only are academic circles resolved to change the way of exploring the world as the “fourth paradigm”2 but industrial community is looking forward to enjoying profits from “inexhaustible” data innovations. Admittedly, given the fact that the emerging data industry will form a strategic industry in the near future, this is not difficult to predict. So the initiative is ours to seize, and to encourage the enterprising individual who wants to seek means of creative destruction in a business startup or wants to revamp a traditional industry to secure its survival. We ask the reader to follow us, if only for a cursory glimpse into the emerging big data industry, which handily demonstrates the properties property of the four categories in Fisher–Clark's classification, which is to say: the resource property of primary industry, the manufacturing property of secondary industry, the service property of tertiary industry, and the “increasing profits of other industries” property of quaternary industry.
At present, industrial transformation and the emerging business of data industry are big challenges for most IT giants. Both the business magnate Warren Buffett and financial wizard George Soros are bullish that such transformations will happen. For example,3 after IBM switched its business model to “big data,” Buffett and Soros increased their holdings in IBM (2012) by 5.5 and 11%, respectively.

1.1 DATA

Scientists who are attempting to disclose the mysteries of humankind are usually interested in intelligence. For instance, Sir Francis Galton,4 the founder of differential psychology, tried to evaluate human intelligence by measuring a subject's physical performance and sense perception. In 1971, another psychologist, Raymond Cattell, was acclaimed for establishing Crystallized Intelligence and Fluid Intelligence theories that differentiate general intelligence [3]. Crystallized Intelligence describes to “the ability to use skills, knowledge, and experience”5 acquired by education and previous experiences, and this improves as a person ages. Fluid Intelligence is the biological capacity “to think logically and solve problems in novel situations, independently of acquired knowledge.”5
The primary objective of twentieth-century IT reform was to endow the computing machine with “intelligence,” “brainpower,” and, in effect, “wisdom.” This all started back in 1946 when John von Neumann, in supervising the manufacturing of the ENIAC (electronic numerical integrator and computer), observed several important differences between the functioning of the computer and the human mind (such as processing speed and parallelism) [4]. Like the human mind, the machine used a “storing device” to save data and a “binary system” to organize data. By this analogy, the complexities of machine's “memory” and “comprehension” could be worked out.
What, then, is data? Data is often regarded as the potential source of factual information or scientific knowledge, and data is physically stored in bytes (a unit of measurement). Data is a “discrete and objective” factual description related to an event, and can consist of atomic data, data item, data object, and a data set, which is collected data [5]. Metadata, simply put, is data that describes data. Data that processes data, such as a program or software, is known as a data tool. A data set refers to a collection of data objects, a data object is defined in an assembly of data items, a data item can be seen as a quantity of atomic data, and an atomic data represents the lowest level of detail in all computer systems. A data item is used to describe the characteristics of data objects (naming and defining the data type) without an independent meaning. A data object can have other names [6] (record, point, vector, pattern, case, sample, observation, entity, etc.) based on a number of attributes (e.g., variable, feature, field, or dimension) by capturing what phenomena in nature.

1.1.1 Data Resources

Reaping the benefits of Moore's law, mass storage is generally credited for the drop in cost per megabyte from US$6,000 in 1955 to less than 1 cent in 2010, and the vast change in storage capacity makes big data storage feasible.
Moreover, today, data is being generated at a sharply growing speed. Even data that was handwritten several decades ago is collected and stored by new tools. To easily measure data size, the academic community has added terms that describe these new measurement units for storage: kilobyte (KB), megabyte (MB), gigabyte (GB), terabyte (TB), petabyte (PB), exabyte (EB), zettabyte (ZB), yottabyte (YB), nonabyte (NB), doggabyte (DB), and coydonbyte (CB).
To put this in perspective, we have, thanks to a special report, “All too much: monstrous amounts of data,”6 in The Economist (in February 2010), an ingenious descriptions of the magnitude of these storage units. For instance, “a kilobyte can hold about half of a page of text, while a megabyte holds about 500 pages of text.”7 And on a larger scale, the data in the American Library of Congress amounts to 15 TB. Thus, if 1 ZB of 5 MB songs stored in MP3 format were played nonstop at the rate of 1 MB per minute, it would take 1.9 billion years to finish the playlist.
A study by Martin Hilbert of the University of Southern California and Priscila López of the Open University of Catalonia at Santiago provides another interesting observation: “the total amount of global data is 295 EB” [7]. A follow-up to this finding was done by the data storage giant EMC, which sponsored an “Explore the Digital Universe” market survey by the well-known organization IDC (International Data Corporation). Some subsequent surveys, from 2007 to 2011, were themed “The Diverse and Exploding Digital Universe,” “The Expanding Digital Universe: A Forecast of Worldwide Information,” “As the Economy Contracts, The Digital Universe Expands,” “A Digital Universe – Are You Ready?” and “Extracting Value from Chaos.”
The 2009 report estimated the scale of data for the year and pointed out that despite the Great Recession, total data increased by 62% compared to 2008, approaching 0.8 ZB. This report forecasted total data in 2010 to grow to 1.2 ZB. The 2010 report forecasted that total data in 2020 would be 44 times that of 2009, amounting to 35 ZB. Additionally the increase in the amount of data objects would exceed that amount in total data. The 2011 report brought us further to the unsettling point that we have reached a stage where we need to look for a new data tool to handle the big data that is sure to change our lifestyles completely.
As data organizations connected by logics and data areas assembled by huge volumes of data reach a “certain scale,” those massive different data sets become “data resources” [5]. The reason why a data resource can be one of the vital modern strategic resources for humans – even possibly exceeding, in the twenty-first century, the combined resources of oil, coal, and mineral products – is that currently all human activities, and without exception including the exploration, exploitation, transportation, processing, and sale of petroleum, coal, and mineral products, will generate and rely on data.
Today, data resources are generated and stored for many different scientific disciplines, such as astronomy, geography, geochemistry, geology, oceanography, aerograph, biology, and medical science. Moreover various large-scale transnational collaborative experiments continuously provide big data that can be captured, stored, communicated, aggregated, and analyzed, such as CERN's LHC (Large Hadron Collider),8 American Pan-STARRS (Panoramic Sur...

Table of contents

  1. COVER
  2. TITLE PAGE
  3. COPYRIGHT
  4. BIBLIOGRAPHY
  5. DEDICATION
  6. ENDORSEMENTS
  7. TABLE OF CONTENTS
  8. PREFACE
  9. CHAPTER 1: WHAT IS DATA INDUSTRY?
  10. CHAPTER 2: DATA RESOURCES
  11. CHAPTER 3: DATA INDUSTRY CHAIN
  12. CHAPTER 4: EXISTING DATA INNOVATIONS
  13. CHAPTER 5: DATA SERVICES IN MULTIPLE DOMAINS
  14. CHAPTER 6: DATA SERVICES IN DISTINCT SECTORS
  15. CHAPTER 7: BUSINESS MODELS IN THE DATA INDUSTRY
  16. CHAPTER 8: OPERATING MODELS IN THE DATA INDUSTRY
  17. CHAPTER 9: ENTERPRISE AGGLOMERATION OF THE DATA INDUSTRY
  18. CHAPTER 10: CLUSTER EFFECTS OF THE DATA INDUSTRY
  19. CHAPTER 11: A MODE OF INDUSTRIAL DEVELOPMENT FOR THE DATA INDUSTRY
  20. CHAPTER 12: A GUIDE TO THE EMERGING DATA LAW
  21. REFERENCES
  22. INDEX
  23. END USER LICENSE AGREEMENT