eBook - ePub

Big Data

Name: Big Data
ISBN: 9781780172637

Opportunities and challenges

60 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Big Data

Opportunities and challenges

About this book

Despite the hype around big data, there is no denying that its potential to benefit organisations, businesses and customers is enormous. The articles in this ebook aim to give practical guidance for all those who want to understand big data better and learn how to make the most of it. Topics range from big data analysis, mobile big data and managing unstructured data to technologies, governance and intellectual property and security issues surrounding big data.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

BCS, The Chartered Institute for IT

Year

2014

Topic

Computer Science

eBook ISBN

9781780172637

Subtopic

Business Strategy

Index

Computer Science

1 WHERE ARE WE WITH BIG DATA?

Brian Runciman, Head of Editorial and Website Services at BCS, The Chartered Institute for IT, looks at what big data is all about.

INTRODUCTION

There have been many descriptions of big data of late – mostly metaphors or similes for ‘big’ (deluge, flood, explosion) – and not only is there a lot of talk about big data, there is also a lot of data. But what can we do with structured and unstructured data? Can we extract insights from it? Or is ‘big data’ just a marketing puff term?

There is absolutely no question that there is an awful lot more data around now than there was only a few years ago. IBM say that ‘every day we create 2.5 quintillion bytes of data – so much that 90 per cent of the data in the world today has been created in the last two years alone’.

SOURCES

Social media platforms produce huge quantities of data, both from individual network profiles and the content that influencers and the less influential alike produce. Short form blogging, link-sharing, expert blog comments, user forums, ‘likes’ and more all contain potentially useful information.

There is also data produced through sheer activity, for example machine-generated content in the form of device log files, which could be characterised as the ‘internet of things’. This would include output from such things as geo-tagging.

Yet more data can be mined from software-as-a-service and cloud applications – data that’s already in the cloud but mostly divorced from internal enterprise data. Another large, but at this stage largely untapped, area is the data languishing in legacy systems, which include things like medical records and customer correspondence.

CAVEATS

A post from BCS’s future blogger called into question some of the behind-the-scenes story: ‘For the big data commercial advocates, there must be algorithms that can trawl the data and create outcomes better, that is to say more cost effectively, than traditional advertising. Where is the evidence that such algorithms exist? How will these algorithms be created and evaluated and improved upon if they do exist? One problem is that in a huge data set, there may be many spurious correlations, and the difference between causation and correlation is hard to prove.’

As we would perhaps expect, the likes of IBM say that big data goes beyond hype: ‘While there is a lot of buzz about big data in the market, it isn’t hype. Plenty of customers are seeing tangible ROI using IBM solutions to address their big data challenges.’

Big Blue go on to quote a 20 per cent decrease in patient mortality by analysing streaming patient data in the health care arena; a telco that enjoyed a 92 per cent decrease in processing time by analysing networking and call data; and a whopping 99 per cent improved accuracy in placing power generation resources by analysing 2.8 petabytes of untapped data for a utilities organisation.

TOOLS

To handle large data sets in times gone-by enterprises used relational databases and warehouses from proprietary suppliers. However, these just can’t handle the volumes of data being produced. This has seen a trend towards some open source alternatives such as Hadoop, which Wikipedia defines as ‘an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware.’

Wired recently reported on Cloudera – one of several companies that help build and use Hadoop applications – which is offering a Google-style search engine for Hadoop called, uninspiringly, Cloudera Search. Interestingly, Wired pointed to a recent Microsoft paper on whether customers really need to put all their data in Hadoop. It argued that ‘most companies don’t (have) data problems that justify the use of big clusters of servers. Even Yahoo and Facebook, two of the companies most associated with big data, are using clusters to solve problems that could actually be done on a single server.’

Despite that, interest is on the up and big organisations are taking advantage. A recent piece from The Sun Daily mentions that ‘analyst firm International Data Corp projects the global big data technology and services market will grow at a compound annual growth rate of 31.7 per cent – about seven times the rate of the overall information and communications technology market’.

The same article reports further investment in the perceived future of big data with announcements by Dell, Intel Corporation and Revolution Analytics of the Big Data Innovation Centre in Singapore. The new centre brings together expertise from all three organisations to provide training programmes, proof-of-concept capabilities and solution development support on big data and predictive analytic innovations catering to the Asian market.

HOW AND WHEN

The ‘when’ of embracing any new technology is massively variable depending on your organisation’s aims, business sector and so on. Some of the things that could affect your timing are neatly summed up by Redmond magazine in a recent article, simply by listing some of the possible motivators. They mention that you could utilise ‘CRM [customer relationship management] systems and data feeds to tweets mentioning their organisations that can alert them to a sudden problem with a product’. If this kind of real-time feedback is of benefit, then dipping a toe into the deluge of the big data waters is best done sooner rather than later.

Another area mentioned is ‘potential market opportunities spawned by an event’ – not as business-critical as product feedback, but important in a time of global austerity. Redmond magazine also mentions things such as online and big-box retailers using big data to automate their supply chains on the fly and law enforcement agencies analysing huge amounts of data to thwart potential crime and terror attacks. The scope and motivations vary widely, but potential benefits are both long and short-term.

As to how to go about it, some of the tools are mentioned above, often oriented around Hadoop. Microsoft recently launched Windows Azure HDInsight and Redmond magazine also cited VMware’s key application infrastructure and big data and analytics portfolio called Pivotal.

There’s plenty to read about, as the following list shows.

Further reading

Microsoft’s special report on using clusters for analytics: http://research.microsoft.com/apps/pubs/default.aspx?id=179615

Victor Mayer-Schonenberger and Kenneth Cukier, ‘Big Data’ review: http://www.bostonglobe.com/arts/books/2013/03/05/book-review-big-data-viktor-mayer-schonberger-and-kenneth-cukier/T6YC7rNqXHgWowaE1oD8vO/story.html

IBM on big data: www-01.ibm.com/software/data/bigdata

Wired on Cloudera: www.wired.com/wiredenterprise/2013/06/cloudera-search

The hardware perspective: www.techrepublic.com/blog/big-data-analytics/are-we-headed-for-a-platform-change-for-big-data/445?tag=content;blog-list-river

Big data sources: www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926

Hadoop: http://en.wikipedia.org/wiki/Hadoop

Things you should know about implementing big data: http://redmondmag.com/articles/2013/05/01/buried-in-big-data.aspx

2 BIG DATA TECHNOLOGIES

Keith Gordon MBCS CITP, former Secretary of BCS Data Management Specialist Group and author of Principles of Data Management, looks at definitions of big data and the database models that have grown up around it.

Whether you live in an ‘IT bubble’ or not, it is very difficult nowadays to miss hearing of something called ‘big data’. Many of the emails hitting my inbox go further and talk about ‘big data technologies’. These fall into two camps: the technologies to store the data and the technologies required to analyse and make sense of the data.

So, what is big data? In an attempt to find out I attended a seminar put on by The Institution of Engineering and Technology (IET) in 2012. After listening to five speakers I was even more confused than I had been at the beginning of the day. Amongst the interpretations of the term ‘big data’ I heard on that day were:

Making the vast quantities of data that is held by the government publically available – the ‘Open Data’ initiative. I am really not sure what ‘big’ means in this scenario!
For a future project, storing in a ‘hostile’ environment with no readily available power supply, and then analysing in slow time large quantities of very structured data of limited complexity. Here ‘big’ means ‘a lot of’.
For a telecoms company, analysing data available about a person’s previous web searches and tying that together with that person’s current location so that, for instance, they can be pinged with an advert for a nearby Chinese restaurant if their searches have indicated they like Chinese food before they have walked past the restaurant. Here ‘big’ principally means ‘very fast’.
Trying to gain business intelligence for the mass of unstructured or semi-structured data an organisation has in its documents, emails and so on. Here ‘big’ equates to ‘complex’.

So, although there is no commonly accepted definition of big data, we can say that it is data that can be defined by some combination of the following five characteristics:

Volume – Where the amount of data to be stored and analysed is large enough to require special considerations.
Variety – Where the data consists of multiple types of data, potentially from multiple sources; here we need to consider structured data held in tables or objects for which the metadata is well defined, semi-structured data held as documents or similar where the metadata is contained internally (for example XML documents) or unstructured data, which can be photographs, video or any other form of binary data.
Velocity – Where the data is produced at high rates and operating on ‘stale’ data is not valuable.
Value – Where the data has perceived or quantifiable benefit to the enterprise or organisation using it.
Veracity – Where the correctness of the data can be assessed.

Interestingly, I saw an article from The New York Times about a group that works for the council in New York. It was faced with the problem of finding the culprits who were polluting the sewers with old cooking fats. One department had details of where the sewers ran and where they were getting blocked, another department had maps of the city with details of all the restaurants and a third department had details of which restaurants had contracts with disposal companies for the removal of old cooking fats.

Putting this information together produced details of the restaurants that did not have disposal contracts, were close to the blockages and were, therefore, possible culprits. That was described as an application of big data, but there was no mention of any specific big data technologies. Was it just an application of common sense and good detective work?

THE TECHNOLOGIES

More recently, following the revelations from Edward Snowden, the American whistle-blower, The Washington Post had an article explaining how the National Security Agency is able to store and analyse the massive quantities of data it is collecting about the telephone, text and online conversations that are going on around the world. This was put down to the arrival, within the last few years, of big data technologies.

However, it is not just government agencies that are interested in big data. Large data-intensive companies, such as Amazon and Google, are taking the lead in some of the developments of the technologies to handle big data.

Our beloved SQL databases, based on the relational model of data, do not scale easily to handle the growing quantities of structured data and have only limited facilities for handling semi-structured and unstructured data. There is, therefore, a need for alternative storage models for data.

Collectively, databases built around these alternative storage models have become known as NoSQL databases, where this can mean ‘NotOnlySQL’ or ‘No,NeverSQL’ depending on the alternative storage model being considered (or, indeed, your perception of SQL as a database language).

There are over 150 different NoSQL databases available on the market. They all achieve performance gains by do...

Front Cover
Title Page
Contents
PREFACE – John Morton
1. WHERE ARE WE WITH BIG DATA? – Brian Runciman
2. BIG DATA TECHNOLOGIES – Keith Gordon
3. BIG DATA = BIG GOVERNANCE? – Adam Davison
4. MAXIMISING ON BIG DATA – Jon Milward
5. MOBILITY AND BIG DATA: AN INTERESTING FUSION – Paul Sweeney
6. BIG DATA ANALYSIS – Allen Bonde
7. REMOVING THE OBSTACLES TO BIG DATA ANALYTICS – David Nys
8. MANAGING UNSTRUCTURED DATA – Vijay Magon
9. BIG DATA: RISKY BUSINESS – Jamal Elmellas
10. SECURING BIG DATA – Mike Small
11. DATA, GROWTH AND INNOVATION – Bernard Geoghegan
12. THE NEW ARCHITECTURE – Jason Hobbs
13. INTELLECTUAL PROPERTY IN THE ERA OF BIG AND OPEN DATA – Jude Umeh
14. BIG DATA, BIG HATS – Johny Morris
15. THE COMMERCIAL VALUE OF BIG DATA – Chris Yapp
16. BIG DATA, BIG OPPORTUNITIES – Dalim Basu and Jon G Hall
Copyright Page

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Big Data an online PDF/ePUB?

Yes, you can access Big Data by in PDF and/or ePUB format, as well as other popular books in Computer Science & Business Strategy. We have over 1.5 million books available in our catalogue for you to explore.

Big Data

Opportunities and challenges

Big Data

Opportunities and challenges

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions