Big Data is everywhere. It shapes our lives in more ways than we know and understand. This comprehensive introduction unravels the complex terabytes that will continue to shape our lives in ways imagined and unimagined.
Drawing on case studies like Amazon, Facebook, the FIFA World Cup and the Aadhaar scheme, this book looks at how Big Data is changing the way we behave, consume and respond to situations in the digital age. It looks at how Big Data has the potential to transform disaster management and healthcare, as well as prove to be authoritarian and exploitative in the wrong hands.
The latest offering from the authors of Artificial Intelligence: Evolution, Ethics and Public Policy, this accessibly written volume is essential for the researcher in science and technology studies, media and culture studies, public policy and digital humanities, as well as being a beacon for the general reader to make sense of the digital age.
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Big Data by Saswat Sarangi,Pankaj Sharma in PDF and/or ePUB format, as well as other popular books in Social Sciences & Databases. We have over one million books available in our catalogue for you to explore.
Information is the oil of the 21st century, and analytics is the combustion engine.
â Peter Sondergaard1
Amazon: In the business of selling or in the business of data?2
Amazon is one of the prime examples of analytics success stories. They were one of the early adopters and are leaders in collecting, storing, processing and analyzing personal information from customers as a means of determining how customers are spending their money. The company uses predictive analytics and proprietary algorithms for targeted marketing to increase customer satisfaction and build company loyalty. Amazonâs recommendations are based on customersâ in-depth profiles and are a great example of what can be achieved with data analytics. The methods have evolved and are getting more sophisticated but the underlying concept is the same: the traditional objectives of marketing, which are selling more and selling better can be helped by using data analytics. Amazon analyzes which items customers purchased previously, what is in their online shopping cart, what they search for and, on the basis of that, what they may buy in future. This information is used to recommend products, hence Amazon uses the power of suggestion to encourage you to buy more and this increases the companyâs revenue significantly. There are other methods as well. On the basis of the words highlighted in Kindle by a reader, Amazon may send you more book recommendations.
Amazon also has an anticipatory shipping model which uses Big Data for predicting which products customers are likely to purchase and when. Amazon uses analytics to increase its product sales and profit margins while decreasing its delivery time and expenses. Because Amazon wants to fulfill orders quickly, the company links with manufacturers and tracks their inventory. Amazon uses Big Data systems for choosing the warehouse, best delivery schedule, route and product groupings to reduce shipping costs.
The next one is a little more controversial. Big Data is also used for managing Amazonâs prices to increase profits. Prices are set according to your activity on the website, competitorsâ pricing, product availability, item preferences, order history and other factors. Product prices typically change very quickly as Big Data is updated and analyzed. The pricing is individualized. Amazon also sells these services through Amazon Web Services and companies can use Big Data to benefit by analyzing customer demographics and spending habits.
It is not an exaggeration to say that Amazon is not in the business of selling goods to you, it is in the business of data analytics so that it can sell better . . . not just today . . . but, tomorrow, the day after and forever.
What is Big Data?
The Oxford Dictionary defines Big Data as3 âExtremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.â
In its June 2011 report titled Big Data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute defined Big Data as:4
âBig Dataâ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered Big Data â i.e., we donât define Big Data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of datasets that qualify as Big Data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, Big Data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
Essentially, Big Data is a term which is used to mean a massive volume of both structured and unstructured data that is so large that it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big, or it moves too fast, or it exceeds current processing capacity.5 Interestingly, it is difficult to agree upon a standard definition because different people will use the term Big Data in different contexts and that will determine what they mean when they talk about âBig Data.â
When someone is talking about data-storage capacity, Big Data means the size or volume of the data. When the computing capability is under discussion, Big Data perhaps means the processing capability when discussing computing. The vendors specializing in this technology will refer to the technology and tools which will be used to analyze this data when they talk about Big Data. Organizations will talk more about data generation and accumulation when they mention Big Data. However, there are certain characteristics of Big Data which will be there most of the time and they are: size, unstructured nature and the need for processing to make sense of it.
The history
The term Big Data has been in use since the 1990s, with some giving credit to John Mashey for coining it, or at least making it popular. Big Data encompasses unstructured, semi-structured and structured data, however, the main focus is on unstructured data. Big Data âsizeâ is a constantly moving target as storage capacity increases and processing gets faster.6
In a 2001 research report7 and related lectures, META Group (now Gartner) defined data-growth challenges and opportunities as being three-dimensional i.e. increasing volume (amount of data), velocity (speed of data in and out) and variety (range of data types and sources). Most of the industry continue to use this volume, velocity and variety (or 3Vs model) for describing Big Data. In 2012, Gartner updated its definition as follows: âBig Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.â8
For the last few years, we have also started to see and use the concept of 4Vs (including the fourth one called Veracity implying that Big Data could have ambiguities and uncertainties because of the nature of the data) or 5Vs (Veracity and Value, implying that Big Data will need to be associated with meaningful benefits to have value) of Big Data. However, Gartnerâs definition of the 3Vs is still widely used and is in agreement with a consensual definition that states that9 âBig Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value.â The 3Vs have been expanded to other complementary characteristics of Big Data:10
Volume: Big Data doesnât sample; it just observes and tracks what happens.
Velocity: Big Data is often available in real time.
Variety: Big Data draws from text, images, audio, video; plus it completes missing pieces through data fusion.
Machine learning: Big Data often doesnât ask why and simply detects patterns.
Digital footprint: Big Data is often a cost-free byproduct of digital interaction.
The data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information so that both visible and invisible issues with various components can be considered and taken into analysis.11
FIGURE 1.1 The Three Vs of Big Data
FIGURE 1.2 Big Data â The Characteristics and Processing
The terminology12
In the context of Big Data, the first term is data storage. The different units of data measurement describe disk space, or data-storage space and system memory and this is evolving very fast. From a bulky and difficult-to-handle 1.2 MB to 1.44 MB floppy disk (8-inch, 5Ÿ-inch and 3½-inch floppy disks) up until the early years of the 21st century, to pen drives today, which can easily carry 64 GB.
According to the IBM Dictionary of Computing, when used to describe disk storage capacity, a megabyte is 1,000,000 bytes in decimal notation. But when the term megabyte is used for real and virtua...