Artificial Intelligence and Software Testing
eBook - ePub

Artificial Intelligence and Software Testing

A practical guide to quality

Rex Black, James Davenport, Joanna Olszewska, Jeremias Rößler, Adam Leon Smith, Jonathon Wright, Adam Leon Smith, Adam Smith

Share book
  1. 150 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Artificial Intelligence and Software Testing

A practical guide to quality

Rex Black, James Davenport, Joanna Olszewska, Jeremias Rößler, Adam Leon Smith, Jonathon Wright, Adam Leon Smith, Adam Smith

Book details
Book preview
Table of contents
Citations

About This Book

WINNER: Independent Press Awards 2023 - Category: Technology

AI presents a new paradigm in software development, representing the biggest change to how we think about quality and testing in decades. Many of the well known issues around AI, such as bias, manifest themselves as quality management problems. This book, aimed at testing and quality management practitioners who want to understand more, covers trustworthiness of AI and the complexities of testing machine learning systems, before pivoting to how AI can be used itself in software test automation.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Artificial Intelligence and Software Testing an online PDF/ePUB?
Yes, you can access Artificial Intelligence and Software Testing by Rex Black, James Davenport, Joanna Olszewska, Jeremias Rößler, Adam Leon Smith, Jonathon Wright, Adam Leon Smith, Adam Smith in PDF and/or ePUB format, as well as other popular books in Computer Science & Quality Assurance & Testing. We have over one million books available in our catalogue for you to explore.

1 INTRODUCTION

Rex Black
I’ve always loved the ocean. When I’m close to an ocean – provided it’s not too cold or the weather too inclement – I like to swim or SCUBA dive in it. One of the things about oceans, though, is that oceans have waves and currents. Some are gentle, some a little more sizeable and some are massive. The massive ones can be dangerous if you don’t know what you’re doing, but serious surfers search these massive waves out and have the time of their lives in them. I’ve always envied those surfers, flying down the face of an enormous wave, though I’ve never learned to do it myself.
The software industry is like an ocean: there are always waves of change coming, of various sizes. The big ones can be exciting if you have the skills to catch them, but they can also swamp your career, as lots of software testing professionals who pooh-poohed the Agile wave have learned in a painful fashion.
Another big wave – which oddly enough has been decades in coming – is artificial intelligence (AI). I took a class in AI as a senior at UCLA, and worked on a proof-of-concept project for a professor to use AI in stock trading. This was in 1988. Slow wave, but now it’s finally here, bringing real change to the real world.
So, in this book, you’re going to read about the skills you need to ride this wave as a test professional. As a test professional already, you probably know that one key part of your job is to ask questions and understand risks. So, what are some of the questions, risks and skills that this book will raise and enhance?
One key question is whether we can trust AI, especially given some of the crucial roles it will play (e.g. self-driving cars). Any time you have objects moving in the physical world – beyond just electrons whirring around in silicon and circuitry – you have the possibility of damage, injury or death. Yes, software has long been involved in making potentially dangerous objects move in the real world – think avionics software or implantable medical devices – but AI promises to make encounters with software-driven moving objects a daily, if not hourly, experience for all but those who choose to live as hermits. As software testing professionals, how can we help ensure that society can trust these systems to be more beneficial than risky?
Of course, we would do that by testing the AI systems; but how can we test AI systems, especially since the most common form, machine learning, will change its behaviours in response to our tests? This is a marked difference from traditional software that, under most circumstances, will give the same output for the same inputs over and over again provided the software is not changed. This book gives some ideas on how to attack this challenge.
The point of running a test is, of course, to learn something, to get a result. We always prefer that result to be definitive, to be able to say the test passed or failed, not to say, ‘Well, maybe that worked.’ But what does ‘passing a test’ even mean for an AI? There may not be a clear specification of correct behaviour, or correct behaviour may change over time, or we may not even know what correct behaviour is beyond what the software tells us. At one time, the solution to this kind of problem was to create a parallel system, an approach that was once favoured for certain high-criticality systems, but is no longer in wide use. In this book, you’ll read about some ways to approach this challenge as well.
This change in what we get in terms of test results means that test metrics will be different, too. For example, functional testing of traditional systems often involves looking at the percentage of tests that pass versus those that fail. For AI systems, the correct questions for functional tests are likely to be, ‘How often does each test give a result that appears correct?’ or ‘How far from the expected result values are the results of each test?’
THE CHALLENGES OF TESTING AI
To some extent, testing AI systems will be harder because the problem space is harder than many of us are used to dealing with as test professionals.
AI systems are being used to deal with complex, chaotic, messy realities, where the number of possibilities is huge, even compared to existing software, such as software that plays chess (which has somewhere on the order of 10123 moves) or Go (which has more than 10360 moves; see Koch, 2016). As both of these numbers are greater than the number of atoms in the universe, it’s not like we are used to solving only trivial problems. However, in thinking about self-driving cars consider the number of possible driving routes from any location in the United Kingdom (UK) or Germany or the United States to any other location within the same country. Obviously, that’s a much less constrained set of possibilities than moves on a chess board or a Go board, so the number of possible outcomes is much larger.
Not only are the problems harder, but AI systems are different. AI systems change in response to stimuli, unlike other software that only changes when updated deliberately. AI’s change in behaviour is driven by the stimuli, not predetermined like other software updates. So, the testing itself will influence how the system will behave next.
Historically, we have used computers to automate activities that humans are bad at (doing the same thing the exact same way over and over again) or that take too long to do manually (complex maths or accounting). Now, we are trying to use computers to automate activities that have complexities that don’t easily lend themselves to mathematical formulas but which humans learn to do as children.
People and data have biases, and these can become embedded in AI systems. For example, the relative number of women in IT is smaller than the number of men, which can lead an AI to be biased in favour of men when deciding who is more likely to succeed in an IT role. Broad-based use of AI systems resulting in the calcification and reinforcement of such biases is a significant societal risk that must be addressed, and test professionals must be aware of and responsive to managing that risk.
This risk is compounded by the fact that people trust computers too much. That may seem odd to you and me, because, as test professionals, we have learned to be very sceptical of software. We tend to expect it to fail. However, many people don’t have that outlook, but rather assume, ‘Well, if the computer says so, that must be right.’
Another challenge to the test professional arises because the world constantly changes, which means AI systems will change too. Consider the COVID-19 pandemic. When I first heard of a strange respiratory disease in China, I thought, ‘Yeah, I bet this will be like SARS and MERS, something that will be contained and maybe a little freaky but not a huge deal.’ Well, I was completely wrong about that. When testing AI systems, we’ll need to think about not just small, incremental changes, but big, fast, disruptive changes like pandemics, otherwise we risk missing important tests.
Stepping back a bit to consider the objectives of testing, one typical objective is to reduce risks to the quality of the software to an acceptable level. However, traditional quality models don’t adequately capture quality for AI systems. For example, traditional software either gives the correct answer for a given set of inputs or it doesn’t. AI systems may give correct answers sometimes and not others for the same inputs, or may give an answer that is different from the expected but still correct, or may give different results for inputs that are in the same equivalence partitions and thus would (traditionally) be expected to be handled the same way. This means that we will need to re-think our testing techniques.
We will also need to re-think how we measure test coverage. Just based on what has been said so far, you’ve probably guessed that requirements, design and other specification coverage measurements clearly won’t work, and that will be reinforced in a moment when I get to the probabilistic rather than deterministic behaviour of AI systems. Other common dimensions of coverage, such as risk coverage and supported configurations, may be relevant, but do they take the place of specification coverage? Further, since we aren’t using traditional programming, code coverage – always of limited use for measuring completeness of testing in any case – is even less useful, if not utterly useless. This book will help you to understand how to approach this critical problem for testing AI systems.1
Because AI systems often change in response to inputs, it’s important that such changes be desirable. For example, deliberate malice or simply exposure to the wrong data could result in an AI becoming racist, as happened on one occasion with a Twitter bot (Vincent, 2016). Of course, the very idea of an artificial intelligence becoming racist is surreal, almost Kafkaesque, on multiple levels. ‘Not intelligent enough to pass a Turing test but capable of directing hateful comments at others who could pass a Turing test’ is a statement that perhaps could describe more than just racist bots, but also some people one might have the misfortune to meet, but that’s a question outside the scope of this book.
How do we test for ethical and unethical behaviour? In a situation where grey areas or known ethical conundrums exist, how to handle it? For example, a runaway street car could plough into a crowd of people where it may kill over a dozen, unless a person (or in this case an AI) acts on it to shunt it onto another track where it will only hit one person but will certainly kill them. If that seems simple, consider a self-driving car where a dozen careless bicyclists recklessly swerve in front of it.2 Should it avoid hitting them by turning onto a sidewalk where a single law-abiding pedestrian will be struck? What constitutes correct behaviour here? What constitutes ethical behaviour? How do we program such behaviour? How do we test for it?
Over recent years, regulators in the UK, EU and the United States (US) have struggled with issues of data privacy. However, when behaviour can vary from one set of inputs to another, how do we test for compliance with regulations, such as data privacy? For example, in the US, access to patient information is regulated under a law called Health Insurance Portability and Accountability Act (HIPAA). Testers must be able to test for compliance or non-compliance with such laws, but can we be confident that the results of our tests will not change as the AI evolves?
So, what is our role as quality professionals in social issues? Of course, as individuals, we may choose to donate to one cause or another, or participate in demonstrations for or against something or someone, but those are personal choices. With AI systems, we may find our work thrust into the middle of some very thorny matters. For example, the market for home ownership in the United States has some extremely fraught social history revolving around race.3 If you have ever worked as a software tester, software engineer, business analyst or other software professional in banking, you may be aware of the regulations associated with ensuring the banks no longer perpetuate the damage that was done to racial minorities who were systematically disadvantaged in home loans in the United States. Outside that domain, your professional involvement in this area may have been limited. Now, with AI systems, as this book will explain, to the extent that the systems you are working on can influence social outcomes (for good or evil), you may find yourself professionally engaged in evaluating whether those systems are having malign effects, which may be both inadvertent and quite subtle.
As this book will further explain, to the extent that your work testing AI systems has an intersection with social issues, it will be complicated by various biases. ‘I’m not biased’, you might protest, and you might well be right, but are the data that were used to train your AI biased? Is your AI biased through some other means? In what way? How can you test to ascertain whether such biases exist?
In testing these AI systems, the hard-won test design techniques that we have accumulated over the years, especially in the work of pioneers like Glenford Myers and Boris Beizer, such as equivalence partitioning, boundary value analysis, decision tables, state diagrams and combinatorial testing, may lose some of their power, because of what Beizer referred to as the ‘bug assumption’ behind each technique.4 The bug assumptions are the types of bugs each technique was particularly powerful at finding, and those types of bugs are the types of bugs that occur in traditional procedure and object-oriented programming. In AI systems, other types of bugs exist, and some of the bugs that we find in traditional programs are less likely. In this book, you’ll gain insights into new test design techniques, to augment the traditional techniques, for testing AI systems.
We are used to software working the same way (at least functionally) every time it is used to solve the problem with the same set of inputs. However, for non-functional behaviours like reliability and performance, we often see probabilistic behaviour, where reliability can be expressed in terms of percentage likelihood of the system failing under a given level of load or the percentage of responses that are received within a given time target under a given level of load. For AI systems, functional behaviours can also be probabilistic, in addition to evolving over time. This is another factor that makes it difficult to find reliable test oracles for functional testing of AI systems.
Of course, one of the bright shiny objects in testing has been, for decades, test automation. Tool vendors have made large amounts of money, often by deploying trendy buzzwords and promising easy success and quick return on investment (ROI), but my clients and I have found that test automation, in the long run, is less likely to succeed than open-heart surgery. Over 80 per cent of otherwise-healthy people 70 years or older who have open-heart surgery are still alive five years later (Khan et al., 2000), but less than half of major test automation efforts I’ve seen with my clients are still achieving a positive ROI, using the same strategy and technologies, after five years. In this book, you will learn how AI will affect test automation. Just as importantly, you’ll learn how it won’t affect test automation, and the obstacles that stand in the way of certain AI benefits for test automation in the short term, so that you are less likely to get snowed by a buzzy sales pitch from a tool vendor.
Automated tests can work at multiple levels and through various interfaces. The level of testing and the interface of automation change the challenges associated with applying AI systems to the test automation problem. However, the fundamental challenges associated with test automation at each level and through a particular interface often do not change just because an AI is being applied, though it is – of course – very much in the interests of the boosters of the test automation tools to assert the contrary. This book will help you to define the right criteria for test automation tool evaluation, which is critical in any test automation project. As always, a strong business case and demonstrable ROI is essential for any major endeavour, and test automation – whether done with AI or not – will almost always be a major endeavour. Remember, too, that return on investment must be measured against clearly defined objectives, and those must be the right objectives.
As should be clear by now, all these challenges and differences associated with testing AI systems have implications for skills. For example, suppose you are testing an AI system that helps make high-frequency stock trades. In addition to needing serious domain expertise in terms of financial systems, financial markets and financial regulations – all skills n...

Table of contents