CHAPTER 1 Practical and Ethical Concerns in Usability Testing with Children
Gavin Sim, Janet C. Read and Matthew Horton
CONTENTS
Executive Summary
Organization/Institution Background
Case Study Description
Method
The Game Prototype
Study Design
Ethics
Participants
Procedure
Analysis
Results
Use of the Fun Toolkit to Capture User Experience
Direct Observations
Retrospective Think Aloud
Direct Observation Compared to Retrospective Think Aloud
Challenges
Methodological Challenges
Challenges of Conducting Evaluation Studies with Children
Ethical Challenges
Informed Consent
Solutions and Recommendations
Set Up and Planning
Introduction
During the Study
Finishing Up
Conclusions
References
List of Additional Sources
Biographies
Key Terms and Definitions
EXECUTIVE SUMMARY
It is a common practice to evaluate interactive technology with users. In industry, usability companies typically carry out these evaluations and the participants in the evaluation are usually adults. In research studies, researchers who do not do this sort of work on a daily basis, typically perform the evaluation. Complexity can be increased if the researcher is also the developer of the software and if the users are children. This case study explores that space, the evaluation of software with researchers/developers with children. The chapter describes the evaluation of an educational game that was designed to teach Spanish to children. The chapter outlines the planning for, and the execution of, a usability study of the game with 25 children aged 7â8 years in a school in the United Kingdom. The study used two methods to try and discover usability problems; direct observation and retrospective think aloud, and also gathered user experience data using the Fun Toolkit. The focus in this chapter is less on the results of the evaluation (although these are presented) but more on the practical and ethical concerns of conducting usability evaluations of games with children within a school setting. Those reading the chapter will gather hints and tips from the narrative and will understand better the use of the three methods included in the study. In addition, the researcher/developer role is discussed and it is shown that the methods used here enabled children to make judgments without the ownership of the product being an issue. To make the main points more concrete, the chapter closes with a set of âkey pointsâ to consider when doing usability testing with children in schools.
ORGANIZATION/INSTITUTION BACKGROUND
The study described in this chapter took place in the United Kingdom and involved children from a primary school in a semi-rural area of Northern England. The work was carried out by members of the ChiCI (Child Computer Interaction) research group at the University of Central Lancashire (UCLan)âa modern University with over 30,000 students. The ChiCI group was formed in 2002 when a group of the four researchers at UCLan came together around a shared interest in designing for, and evaluating with, children. The group has since grown and at the time of writing this chapter was made up of eight academics, five PhD students, and four students on specialist masters courses. ChiCI receives funding from the European Union (EU), the UK research councils, and industry.
The ChiCI group has a long tradition of working with school children from around the region. The group has a dedicated PlayLab within the university and uses this to hold MESS days (Horton et al. 2012), which are structured events that bring a whole class of children (25â30) at a time to the university to rotate through a variety of activities aimed at evaluating and designing technologies for the future. The overarching aim of the ChiCI group is to âdevelop and test methods that facilitate the design and delivery of highly suitable technologies for children.â These technologies may be for fun, learning, the benefit of children in communicating with others, or for recording their thoughts or ideas. Innovations to date have included a handwriting recognition system designed for children aged between 5 and 10 years, a tabletop game for kindergarten children, a specialized pod for use by teenagers to identify with domestic energy use, and a mobile game for use with children aged between 5 and 11 years with communication difficulties.
CASE STUDY DESCRIPTION
The case study described in this chapter concerns the processes and outcomes around the evaluation, by children, of an educational game. The evaluation took place in a UK primary school and took the form of a usability test that was carried out to identify usability problems and also capture satisfaction metrics. The aim was to improve the design of the game but in the process the research team also sought to investigate several elements of school-centered evaluation. The authors developed the game that was used in the study; it took the form of a medium to high fidelity prototype that included all the required functionality and had suitable graphical elements. The game met the appropriate educational objectives for children who would be evaluating the game. The educational merit of the game was not going to be examined in this case study. It is noted however that usability can be examined from a pedagogical perspective focusing on the user interface, design of the learning activities, and the determination of whether learning objectives have been met (Laurillard 2002).
The case study provides the reader with a clear narrative that explains how different tools can be used to capture both usability problems and user experience data from children within a school setting. The use of two different evaluators, one with a personal tie to the game (the developer) and the other looking at the game from an impartial view (the researcher), is also explored to see whether the investment of the evaluator may affect how the children respond to the user study and what they report.
Usability testing with children has been the subject of many academic papers with researchers focusing on the development and refinement of tools and techniques that can help children engage with researchers to evaluate products and systems. Various adult methods have been explored including think aloud, interviews, and the use of questionnaires (Markopoulos and Bekker 2003, LeMay et al. 2014). Using these, and other methods, it has been shown that children can identify and report usability problems. For example, direct observation has been shown to identify signs of engagement or frustration along with the ability to identify usability problems (Sim et al. 2005, Markopoulos et al. 2008). Think aloud has been used effectively by children to identify usability problems (Donker and Reitsma 2004, Khanum and Trivedi 2013). Hoysniemi et al. (2003) found that children were able to detect usability problems which would aid the design of a physically and vocally interactive computer game for children aged 4â9 years. However, when conducting usability research with children, there are still a number of challenges that need to be considered, with one example being the impact of childrenâs less mature communication skills. Several studies have identified that younger children, especially when using the think-loud technique, are less able to verbalize usability problems than older children. Despite the apparent success of the think aloud method, it still comes under some criticism. There is concern that the think aloud method is quite challenging for children due to its cognitive demands (Donker and Reitsma 2004), especially for younger children (Hanna et al. 1997) as they could forget to think aloud unless being prompted (Barendregt et al. 2008). One study by Donker and Reitsma (2004) found that out of 70 children only 28 made verbal remarks during a user testâthis is a low number and could be considered hardly representative of that group. Where think aloud has been shown to be taxing for children, the use of retrospective methods, where the child describes what happened after the usability test has ended, have shown some promise. Kesteren et al. (2003) found that with retrospective techniques children were able to verbalize their experiences. It has been suggested that children may be less communicative, not because of a lack of skill in communicating but rather as a result of personality traits. Barendregt et al. (2007) showed that personality characteristics influenced the number of problems identified by children in one usability test. Research is still needed to understand usability methods and to identify and catalogue their limitations in order to ascertain which can be reliably used with children. The literature provides guidance on how to perform usability studies in Hanna et al. (1997) and Barendregt and Bekker (2005) but these are somewhat dated, are restricted to the studies being performed in usability labs, and do not take account of recent research in the field.
For user experience, similar to usability, many methods for children have emerged over the years. These include survey tools (Read 2008, Zaman et al. 2013) and specialized verbalization methods (Barendregt et al. 2008) that have typically focused on measuring fun within the context of game play or with children using interactive technology. The survey tools that are widely used with children, including the Fun Toolkit (Read and MacFarlane 2006) and the This or That method (Zaman 2009), capture quantifiable data relating to user experience. Research that has compared the results from the Fun Toolkit and the This or That method has shown that they yielded similar results which can be taken as evidence that they are, when used appropriately, collecting useful data (Sim and Horton 2012, Zaman et al. 2013). The literature on the use of survey methods with children highlights that gathering opinion data is not without difficulties as the younger the children are, the more immature they are at understanding the questionâanswer process. Children are generally unused to giving opinions in this sort of context and this gives rise to problems including suggestibility and satisficing (Read and Fine 2005). These two problems are related but come from two sidesâsuggestibility is seen where a question might be phrased a certain way in order that the child is âmore mindedâ to answer a particular way. An example might be a question like âDo you like this game more than that hopeless one you just played?ââsatisficing is more about answers than questions and is more difficult to deal with in survey design as it is really a process problem. Satisficing is where a child seeks to give an answer that he or she thinks the adult wants to hear. It is born out of the imbalance in the relationship between the child and the evaluator and is inherent in all evaluation studies.
One of the aims of the case study presented here is to explore satisficing as a known issue within usability and user experience studies with children. Another aim is to consider the effectiveness of the three different evaluation methods. The study presents data relating to identified usability problems and reported satisfaction, and this is critiqued to understand the limitations of the process and methods, and to offer suggestions for further research. The main lessons from the case study are used to generate a set of guidelines for carrying out usability and user experience testing in school settings. These guidelines will follow the same structure as those presented by Hanna et al. (1997).
Method
As described earlier, for this study, children were being asked to evaluate the usability and user experience of a medium to high fidelity prototype educational game. Each child would play the game and data would be collected using three different methods; the Fun Toolkit, direct observation, and retrospective think aloud. The researchers carrying out the study had experience of carrying out observations and capturing usability problems (Sim et al. 2005)âhad this not been the case, video recording might have been considered an option for this work to ensure events were not missed whilst notes were being taken. It is quite feasible that some events may have been missed as a result of not recording the screen, but if there were severe problems or obvious problems it was anticipated that several children would experience this and so it would be captured.
Satisficing was examined at the level of âwho made the game.â The use of two adult evaluators acted as âdeveloperâ and âresearcherâ in order to explore how the children reported on, and talked about, the software that they saw. This presentation was controlled for in a between subjects design so that the âdeveloperâ was a different person for half the children than for the other half. The usability study was also controlled with half the children being told extensively about the ethics of their inclusion and the other half getting only a brief explanation before being told afterwards. The case study will focus mainly on the qualitative data that was gathered and will give examp...