eBook - ePub

Made by Humans

Name: Made by Humans
ISBN: 9780522873320

The AI Condition

Ellen Broad,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Made by Humans

The AI Condition

Ellen Broad,

About this book

Who is designing AI? A select, narrow group. How is their world view shaping our future?Artificial intelligence can be all too human: quick to judge, capable of error, vulnerable to bias. It's made by humans, after all. Humans make decisions about the laws and standards, the tools, the ethics in this new world. Who benefits. Who gets hurt. Made by Humans explores our role and responsibilities in automation. Roaming from Australia to the UK and the US, elite data expert Ellen Broad talks to world leaders in AI about what we need to do next. It is a personal, thought-provoking examination of humans as data and humans as the designers of systems that are meant to help us.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Made by Humans by Ellen Broad in PDF and/or ePUB format, as well as other popular books in Philosophy & Philosophy & Ethics in Science. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Melbourne University Press Digital

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Philosophy & Ethics in Science

PART I

Humans as Data

There’s no such thing as ‘raw’ data. Raw data
is both an oxymoron and a bad idea.

Professor Geoffrey C Bowker, Department of Informatics,
University of California, Irvine

There is a famous six-word story: ‘For sale: baby shoes, never worn.’ It’s often attributed to Ernest Hemingway, supposedly the result of a bet he made with friends that he could write an entire story in only six words. It’s probably the most famous six-word story in the English language.

It’s also an example of how we build stories from data. From three data points—something is for sale; it is baby shoes; they have never been worn—we can imagine the context in which they’re being recorded, and what they’re telling us. If you’re like me, you might have immediately assumed that these data points add up to a baby who has died, and who died very young, perhaps a miscarriage or stillbirth. The data hints at hope and excitement, and crushing tragedy.

Of course with only those three data points I don’t know that this item for sale in fact does relate to a baby who has died. It could just be a baby who was born with enormous feet. It could be a baby born without feet. What do the shoes look like? Perhaps the baby was born in Marble Bar in Western Australia, one of the hottest places in the world, and some well-meaning relatives have mindlessly bought the baby Ugg boots. Maybe they’re just an ugly, unwanted gift. My grandmother and great aunt didn’t speak to each other for more than ten years, supposedly over a returned baby present!

If this was a data set and we were really trying to glean insight from it, we’d probably look for more information to confirm the assumption we’ve made. We might look for birth and death notices around the same time. We might seek out location and contact details associated with the sales advertisement, a photo of the shoes, information about past sales with the same contact details. We might look up information about the person placing the advertisement: their age, their marital status, whether they have kids. But we still won’t know the real context behind it. We’re just gathering information to help us make a more accurate prediction.

If we wanted to be absolutely certain our assumption is correct, we could just track down the person who placed the advertisement and ask them directly. That would validate or invalidate our hypothesis, wouldn’t it? But they don’t know us, and have no real reason to trust us. If the advertisement does reflect tragedy, it could be deeply insensitive to intrude. Who’s to say they’d answer our question at all, let alone tell us the truth?

And so we might go really crazy and start pulling phone records and medical records, pawing through trash to figure out what that six-word story—those three data points—is really about. But sometimes the more information we have, the easier it is to end up down rabbit holes. We could start clinging to information that, while more removed from the data points we started with—say, the number of empty wine bottles in the trash—seems relevant. If there are lots of wine bottles in the trash around the time the advertisement was placed, then we might correlate this with sadness and strain. Before we know it, we’re just subconsciously selecting and relying on the data points that fit our pre-existing hypothesis.

In looking at wine bottles, we miss the empty baby formula tin in the same trash. In linking heavy wine consumption with tragedy, we dismiss the possibility of celebratory parties. We fixate on trash data over medical records, which show no sign of a person being pregnant in the first place. We create a very convincing, detailed, data-rich argument to support our original gut assumption as soon as we see those three pieces of information together, which in fact is completely and utterly false. Imagine it turns out the advertisement for the baby shoes was placed by a mother who bought shoes for her daughter, and they didn’t fit. All that extra data we uncovered was irrelevant.

This is what it’s like working with data to make predictions about people. There are so many little decisions to make: what data is important, how you’re going to collect it, what you think the data you have can actually tell you. How closely it’s related to the concept you’re trying to measure. Whether the context in which you’re gathering the data affects its truthfulness or reliability. The data you’re using to base your predictions on can result in great accuracy or ignominy.

Statisticians and social researchers know a lot about the complexities of working with data about humans. They not only have a set of complicated, fickle, ambiguous humans—each with their own circumstances and background and feelings—that they’re trying to get information from and make predictions about. They’re also wrestling with their own humanness—that they have biases and beliefs that could affect what’s being collected in the first place, who it’s being collected from and what they think is important, in ways that distort their results.

Historically, because data was expensive and time intensive to collect, a lot of effort went into figuring out upfront data collection methods that would reduce the risks associated with bias and error and unreliability. While I was in the United Kingdom, I worked on a project with government food statisticians trying to anonymise nearly thirty years of detailed food diaries for public release as open data. These were the individual food diaries—itemised lists of the food and drink a household consumed within a set period, among tens of thousands of households surveyed every year—that underpinned the annual National Food Survey statistical release, charting overarching trends in British dietary habits and consumption.

The National Food Survey was carefully designed to produce reliable insights about the national population’s eating habits. It had to reflect households from across Great Britain (not just in busy metropolitan centres). It had to be carefully controlled, to ensure everyone was answering questions about their food at the same time of year and using the same questions. As a long-running survey (it started during the Second World War to help the government monitor diets and rations, and population health), it offers a lovely glimpse not only of changing food trends but also of changing societal attitudes influencing collection methods among statisticians. The food diaries couldn’t capture ‘everything’—they had to ask questions in a structured, analysable way. Certain questions would be added and removed, food codes updated, as broader societal trends became visible enough to warrant changes to survey questions.

Questions about alcohol consumption weren’t included in the food survey until the early 1990s, for example. The statisticians charged with managing the survey today have mused that it could be because it was long assumed housewives would be filling out the food diary on their household’s behalf, and wouldn’t know what alcohol the husband consumed. In other words, statisticians assumed wives did not drink alcohol, and that alcohol consumption at home at all was uncommon. Often the food diaries would play catch-up. Questions about microwaves were added in the late 1980s, after they emerged as a popular household appliance. Questions about eating out weren’t added till the mid 1990s. These quirks in survey design limit what you can do with the food diaries, say for figuring out when people started eating foods that went on to become popular. But it’s still a rich and useful data set, particularly for understanding broad dietary trends.

These days, rather than issue time-intensive surveys to thousands of households at great cost, we might try to use supermarket loyalty card data to generate the same kinds of population level insights. But there are trade-offs. For the convenience of loyalty card data, we might sacrifice getting a diverse, representative sample. We can end up a step removed from what it is we’re actually trying to measure, which in this case is trends in eating habits of a large population. Supermarket data is not as reliable a data set for measuring what people eat. It records what they purchase in supermarkets. There’s still work to be done to figure out whether they’re purchasing it for themselves, for their family, for parties, for neighbours and friends, for work, and whether it gets eaten.

These days data is cheap and abundant and endlessly increasing. People shed it with every digital interaction they have, whether using a search engine to browse the web, the mobile phone in their pocket, participating on social media, watching movies on Netflix, paying with credit cards, registering for medical appointments on digital databases, getting digital scans and X-rays, driving cars with GPS or using smart devices. The ease with which data can be gathered seems to make old, time-intensive data collection methods redundant. We don’t have to ask people directly for information about themselves anymore, and deal with all the problems associated with that method of gathering information—that we might ask the wrong questions, or they might lie or misremember or misunderstand. We can simply gather up all the data they trail about themselves and use that to answer the questions we have instead. We’ve moved from a ‘one use’ data landscape (one question, one purpose for collection) to a reuse landscape—using existing data and making it fit lots of different purposes. The abundance of data that already exists is what has made so much exciting experimentation in machine learning possible.

But there’s so much data, of such a dizzying variety, it can be easy to assume that anyone with enough data knows all there is to know about a person. ‘We can literally know everything if we want to,’ said former Google CEO Eric Schmidt in 2010. ‘What people are doing, what people care about, information that’s monitored, we can literally know it if we want to, and if people want us to know it.’¹ The idea that data—our social media interactions, our search habits, our emails, our phone usage, our online reviews and ratings—could reveal everything there is to know is fuelling its use in lots of automated decision-making systems: to predict our trustworthiness, our workplace capability, our health, our ability to pay a loan. These new, effortlessly created data sets are seen as authentic, reliable, unfiltered, raw.

But they’re not. All the data you could possibly glean from a person using their digital interactions is not going to add up to knowing everything about that person. In some instances, ‘unfiltered’ is the last word you’d use to describe our interactions. These data sets are shadows—the bits of an interaction online that can be recorded as data. We still spend a lot of time offline. Even sitting in front of a computer screen for eight hours a day, or checking social media interactions and text messages every few minutes, those online interactions take place in a much broader, more complex offline context. What’s happening away from a screen, away from the GPS in your mobile phone, the smart assistant in your home, the credit card in your wallet, creates confounding context gaps in large data sets about our online lives.

Getting from data to knowledge—to why and what it means—still requires lots of human assumptions and choices about what the data says, and what other data points might be useful. A person’s search habits, for example, only reveal what they search for. Not what they like, or what they already know. We extrapolate from what a person searches for to understand what a person might like. With other bits of information like credit card purchases and social media interactions, we try to figure out why a person is searching for something, and what they might be interested in. We build a story. The story could be close to accurate or wildly off base. We could end up just making connections between data points that confirm our own pre-existing biases—counting wine bottles and missing the baby formula. Sometimes more data doesn’t mean more accuracy. Google knows a lot about this.

In 2008, Google researchers claimed that they could ‘nowcast’ the flu based on people’s search habits. When people were sick, they would search for flu-related information on Google. Therefore, the researchers reasoned, Google search data could provide almost instant information about flu outbreaks.

Early results indicated that insights from search data could potentially produce accurate estimates of flu prevalence up to two weeks earlier than traditional flu tracking methods.² It was international news. Search data could be potentially life saving. For several years Google Flu Trends was hailed as an example of what could be done with ‘big’ data … until Google Flu Trends failed. At the peak of the 2013 US flu season, predictions from Google Flu Trends were out by 140 per cent. Eventually, Google quietly closed it down. Searches for the flu via Google were not as reliable a data set for predicting flu outbreaks as originally thought.

The reality is, today, data quality issues—issues of bias and accuracy and reliability—are increasing, not decreasing. Statistical practices that tried to reduce quality issues haven’t been applied as consistently to born-digital data. Even though we talk about how valuable these data sources are, most online businesses aren’t built to ‘sell’ data. Google and Facebook, for example, sell targeted placement of advertisements on their platforms, based on the data they have about their users. For most online services, getting really good-quality data about people—demographically representative, unbiased, reliable, carefully controlled information—isn’t the focus; data is the by-product of providing a good-quality service. And so these kinds of data have biases and gaps, even issues with basic authenticity, that historically statisticians and social researchers have tried their best to avoid.

Whether training an AI to predict a person’s team spirit from their social media likes, or to recognise faces, the data that AI is trained on will influence, to a great extent, how ‘accurate’ it can be. Really examining data—the context in which it has been created, the purpose for which it was originally used, how it was collected, the human biases that might be reflected in it—tells us whether an AI making predictions is a helpful aid to humans or closer to that hotline horoscope.

You don’t need to be a technical expert to appreciate the possibilities and the pitfalls of using data. You just have to know what it’s like to be human.

Provenance and purpose

When Luke Oakden-Rayner checked his blog stats on 20 December 2017 he was shocked: his latest post had racked up fifteen thousand views in under three days. Oakden-Rayner had written about his concerns with the quality of one of the world’s largest, publicly available chest X-ray data sets, called ChestXray14, which had been released by the US National Institutes of Health (NIH) Clinical Center a few months earlier. When the Center released it, they expressed a hope that it could be used by institutions around the country—and overseas—to teach computers to detect and diagnose disease.¹ Oakden-Rayner was worried.

‘I don’t want to bury the lede here, so I will say this upfront,’ he wrote. ‘I believe the ChestXray14 data set, as it exists now, is not fit for training medical AI systems to do diagnostic work.’²

Oakden-Rayner, who lives in Adelaide, is both a machine learning researcher and a trained radiologist. He teaches medical students how to interpret medical images at the University of Adelaide, alongside machine learning research towards his PhD. When Oakden-Rayner opened up the ChestXray14 data set he noticed a problem. Every X-ray had one or more labels describing the anomaly present in the scan: issues like fibrosis (thickened tissue) or effusions (fluid around the lung). As a radiologist, Oakden-Rayner noticed that some of the images were incorrectly labelled. A computer scientist without the radiology background would never notice this.

There was a range of issues with ChestXray14’s image labelling. A random sample of eighteen images tagged as featuring atelectasis (a partial or complete lung or lung node collapse), for example, included seven that were not labelled correctly. Oakden-Rayner did a visual analysis of X-rays in every disease class, using a random sample of 130 images per class, and realised that, even though the NIH had reported the labels as being fairly highly accurate, those associated with some chest conditions seemed to be off by a significant margin.

In his blog post, Oakden-Rayner noted that while image labelling is a key part of teaching an AI system to recognise signs of disease from scans and report those issues, the image labels didn’t really reflect actual clinical practice for radiologists. Sometimes radiologists label an image with an anomaly they already know is there but can’t really be diagnosed from that image. While fibrosis is visible in some X-rays, for example, it is more commonly diagnosed on CT scans. Pneumonia, which a number of X-rays in the data set were labelled as showing, isn’t diagnosed from X-rays much at all. It’s often almost entirely visually indistinguishable from other conditions like atelectasis, and so diagnosing it correctly requires more information than an X-ray can provide. Any AI system trained on the ChestXray14 data set would essentially be looking for differences that, in fact, might only be found with extra information from a CT scan.

There were other problems. Images labelled as indicating pneumothorax (air in the space around the lung, which can lead to lung collapse) also had chest drains in them. That means they were images of patients who were already being treated for pneumothorax. Oakden-Rayner feared that the AI was actually learning to identify chest drains, not air around the lungs. That could be disastrous. If the AI could learn only to see pneumothorax where there was a chest drain in an X-ray, then it would never report pneumothorax in new patients.

In writing up his concerns with the data set, Oakden-Rayner noted that radiologists incorrectly label X-rays too. Some images are hard to read and different radiologists might interpret them differently. A little bit of image noise—some uncertainty—is okay in deep learning. Deep learning can help to work through image noise. But when an AI is learning from data with bigger flaws, like images with chest drains in them, it starts to create meaningless predictions.

Even Oakden-Rayner’s efforts building his own model to relabel the X-ray images (the data set included over a hundred thousand images, and eyeballing every image wasn’t feasible) didn’t work. At fir...

Cover
Title page
Copyright page
Contents
How we got here
A note on language
Part I: Humans as Data
Part II: Humans as Data
Part III: Making Humans Accountable
Notes
Acknowledgements
Index

About this book

Frequently asked questions

Information

Table of contents