eBook - ePub
Machine Learning
An Algorithmic Perspective, Second Edition
Stephen Marsland
This is a test
Share book
- 457 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Machine Learning
An Algorithmic Perspective, Second Edition
Stephen Marsland
Book details
Book preview
Table of contents
Citations
About This Book
A Proven, Hands-On Approach for Students without a Strong Statistical FoundationSince the best-selling first edition was published, there have been several prominent developments in the field of machine learning, including the increasing work on the statistical interpretations of machine learning algorithms. Unfortunately, computer science students
Frequently asked questions
How do I cancel my subscription?
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoâs features. The only differences are the price and subscription period: With the annual plan youâll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Machine Learning an online PDF/ePUB?
Yes, you can access Machine Learning by Stephen Marsland in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.
Information
CHAPTER 1
Introduction
Suppose that you have a website selling software that youâve written. You want to make the website more personalised to the user, so you start to collect data about visitors, such as their computer type/operating system, web browser, the country that they live in, and the time of day they visited the website. You can get this data for any visitor, and for people who actually buy something, you know what they bought, and how they paid for it (say PayPal or a credit card). So, for each person who buys something from your website, you have a list of data that looks like
(computer type, web browser, country, time, software bought, how paid).
For instance, the first three pieces of data you collect could be:- Macintosh OS X, Safari, UK, morning, SuperGamel, credit card
- Windows XP, Internet Explorer, USA, afternoon, SuperGamel, PayPal
- Windows Vista, Firefox, NZ, evening, SuperGame2, PayPal
Based on this data, you would like to be able to populate a âThings You Might Be Interested Inâ box within the webpage, so that it shows software that might be relevant to each visitor, based on the data that you can access while the webpage loads, i.e., computer and OS, country, and the time of day. Your hope is that as more people visit your website and you store more data, you will be able to identify trends, such as that Macintosh users from New Zealand (NZ) love your first game, while Firefox users, who are often more knowledgeable about computers, want your automatic download application and virus/internet worm detector, etc.
Once you have collected a large set of such data, you start to examine it and work out what you can do with it. The problem you have is one of
prediction:
given the data you have, predict what the next person will buy, and the reason that you think that it might work is that people who seem to be similar often act similarly. So how can you actually go about solving the problem? This is one of the fundamental problems that this book tries to solve. It is an example of what is called supervised learning,
because we know what the right answers are for some examples (the software that was actually bought) so we can give the learner some examples where we know the right answer. We will talk about supervised learning more in Section 1.3.1.1 If Data Had Mass, The Earth Would Be a Black Hole
Around the world, computers capture and store terabytes of data every day. Even leaving aside your collection of MP3s and holiday photographs, there are computers belonging to shops, banks, hospitals, scientific laboratories, and many more that are storing data incessantly. For example, banks are building up pictures of how people spend their money, hospitals are recording what treatments patients are on for which ailments (and how they respond to them), and engine monitoring systems in cars are recording information about the engine in order to detect when it might fail. The challenge is to do something useful with this data: if the bankâs computers can learn about spending patterns, can they detect credit card fraud quickly? If hospitals share data, then can treatments that donât work as well as expected be identified quickly? Can an intelligent car give you early warning of problems so that you donât end up stranded in the worst part of town? These are some of the questions that machine learning methods can be used to answer.
Science has also taken advantage of the ability of computers to store massive amounts of data. Biology has led the way, with the ability to measure gene expression in DNA microar-rays producing immense datasets, along with protein transcription data and phylogenetic trees relating species to each other. However, other sciences have not been slow to follow. Astronomy now uses digital telescopes, so that each night the worldâs observatories are storing incredibly high-resolution images of the night sky; around a terabyte per night. Equally, medical science stores the outcomes of medical tests from measurements as diverse as magnetic resonance imaging (MRI) scans and simple blood tests. The explosion in stored data is well known; the challenge is to do something useful with that data. The Large Hadron Collider at CERN apparently produces about 25 petabytes of data per year.
The size and complexity of these datasets mean that humans are unable to extract useful information from them. Even the way that the data is stored works against us. Given a file full of numbers, our minds generally turn away from looking at them for long. Take some of the same data and plot it in a graph and we can do something. Compare the table and graph shown in Figure 1.1: the graph is rather easier to look at and deal with. Unfortunately, our three-dimensional world doesnât let us do much with data in higher dimensions, and even the simple webpage data that we collected above has four different features, so if we plotted it with one dimension for each feature weâd need four dimensions! There are two things that we can do with this: reduce the number of dimensions (until our simple brains can deal with the problem) or use computers, which donât know that high-dimensional problems are difficult, and donât get bored with looking at massive data files of numbers. The two pictures in Figure 1.2 demonstrate one problem with reducing the number of dimensions (more technically,
projecting it into fewer dimensions),
which is that it can hide useful information and make things look rather strange. This is one reason why machine learning
is becoming so popular â the problems of our human limitations go away if we can make computers do the dirty work for us. There is one other thing that can help if the number of dimensions is not too much larger than three, which is to use glyphs
that use other representations, such as size or colour of the datapoints to represent information about some other dimension, but this does not help if the dataset has 100 dimensions in it.In fact, you have probably interacted with machine learning algorithms at some time. They are used in many of the software programs that we use, such as Microsoftâs infamous paperclip in Office (maybe not the most positive example), spam filters, voice recognition software, and lots of computer games. They are also part of automatic number-plate recognition systems for petrol station security cameras and toll roads, are used in some anti-skid braking and vehicle stability systems, and they are even part of the set of algorithms that decide whether a bank will give you a loan.
The attention-grabbing title to this section would only be true if data was very heavy. It is very hard to work out how much data there actually is in all of the worldâs computers, but it was estimated in 2012 that was about 2.8 zettabytes (2.8 Ă 1021 bytes), up from about 160
exabytes
(160 Ă 1018 bytes) of data that were created and stored in 2006, and projected to reach 40 zettabytes by 2020. However, to make a black hole the size of the earth would take a mass of about 40 Ă 10 grams. So data would have to be so heavy that you couldnât possibly lift a data pen, let alone a computer before the section title were true! However, and more interestingly for machine learning, the same report that estimated the figure of 2.8 zettabytes (âBig Data, Bigger Digital Shadows, and Biggest Growth in the Far Eastâ by John Gantz and David Reinsel and sponsored by EMC Corporation) also reported that while a quarter of this data could produce useful information, only around 3% of it was tagged, and less that 0.5% of it was actually used for analysis!1.2 Learning
Before we delve too much further into the topic, letâs step back and think about what learning actually is. The key concept that we will need to think about for our machines is
learning from data,
since data is what we have; terabytes of it, in some cases. However, it isnât too large a step to put that into human behavioural terms, and talk about learning from experience.
Hopefully, we all agree that humans and other animals can display behaviours that we label as intelligent by learning from experience. Learning is what gives us flexibility in our life; the fact that we can adjust and adapt to new circumstances, and learn new tricks, no matter how old a dog we are! The important parts of animal learning for this book are remembering, adapting,
and generalising:
recognising that last time we were in this situation (saw this data) we tried out some particular action (gave this output) and it worked (was correct), so weâll try it again, or it didnât work, so weâll try something different. The last word, generalising, is about recognising similarity between different situations, so that things that applied in one place can be used in another. This is what makes learning useful, because we can use our knowledge in lots of different places.Of course, there are plenty of other bits to intelligence, such as
reasoning,
and logical deduction,
but we wonât worry too much about those. We are interested in the most fundamental parts of intelligenceâlearning and adaptingâand how we can model them in a computer. There has also been a lot of interest in making computers reason and deduce facts. This was the basis of most early Artificial Intelligence,
and is sometimes known as symbolic processing
because the computer manipulates symbols that reflect the environment. In contrast, machine learning methods are sometimes called subsymbolic
because no symbols or symbolic manipulation are involved.1.2.1 Machine Learning
Machine learning, then, is about making computers
modify
or adapt
their actions (whether these actions are making predictions, or controlling a robot) so that these actions get more accurate, where accuracy is measured by how well the chosen actions reflect the correct ones. Imagin...