IET Software Review Article Usability and user experience evaluation of natural user interfaces: a systematic mapping study ISSN 1751-8806 Received on 10th February 2020 Revised 20th May 2020 Accepted on 16th July 2020 E-First on 2nd September 2020 doi: 10.1049/iet-sen.2020.0051 www.ietdl.org Guilherme Corredato Guerino1 , Natasha Malveira Costa Valentim1 1Computer Science Department, Federal University of Paraná, 383-391 Evaristo F. Ferreira da Costa St, Curitiba, Brazil E-mail: guilherme.guerinosi@gmail.com Abstract: Natural user interface (NUI) is considered a recent topic in human–computer interaction (HCI) and provides innovative forms of interaction, which are performed through natural movements of the human body like gestures, voice, and gaze. In the software development process, usability and user eXperience (UX) evaluations are a relevant step, since they evaluate several aspects of the system, such as efficiency, effectiveness, user satisfaction, and immersion. Thus, the goal of the authors’ systematic mapping study (SMS) is to identify usability and UX evaluation technologies used by researchers and developers in software with NUIs. Their SMS selected 56 papers containing evaluation technologies for NUI. Overall, the authors identified 30 different usability and UX evaluation technologies for NUI. The analysis of these technologies reveals most of them are used to evaluate software in general, without considering the specificities of NUI. Besides, most technologies evaluate only one aspect, Usability or UX. In other words, these technologies do not consider Usability and UX together. For future work, they intend to develop an evaluation technology for NUIs that fills the gaps identified in their SMS and combining Usability and UX. 1 Introduction Natural user interface (NUI) came up to improve users’ interaction with the system using natural body movements to perform actions [1]. According to Wigdor and Wixon [1], the natural property is not referring to the interface, but to the way that users interact with it and what they feel using it. Norman [2], a prominent human– computer interaction (HCI) researcher, cited Steve Ballmer (Microsoft) about NUI definition: ‘I believe we will look back on 2010 as the year we expanded beyond the mouse and keyboard and started incorporating more natural forms of interaction such as touch, speech, gestures, handwriting, and vision – what computer scientists call the ‘NUI’ or natural user interface’ [3]. Fernández et al. [4] show that NUI is the most modern user interface, which uses speech, hand gestures, visual markers, and body how interaction ways. NUI has several classifications of their interactions. The classification used by our Systematic Mapping Study (SMS) is based on the definition of Fernández et al. [4] with adaptations based on Ballmer [3]: multitouch (use of hand gestures in touchscreen); voice (use of speech); gaze (use of visual interaction); and gesture (use of body movements). To understand how an application using NUI should behave, Wigdor and Wixon [1] exemplify by stating an application with a good natural design should create the perception, the object is an extension of its body. Therefore, the user, through natural movements, can have the perception of being able to control all application features. NUI is a recent topic at HCI and emerged in 2011 [1]. Several systems that use these forms of interaction need to be tested to provide a better usability and user eXperience (UX), and to consolidate these interactions across the industry and society. Usability and UX evaluation have become a significant step in the software development process. Through quality aspects, the usability and UX evaluation verify the product [5]. According to ISO 25010 [6], Usability is ‘the ability of the software product to be understood, learned, operated, user-friendly and standardscompliant when used under specific conditions’. Thus, when a usability evaluation is performed, the goal is to verify if aspects are in agreement with the product being tested, e.g. efficiency and effectiveness. Therefore, usability evaluation is important because IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 it evaluates pragmatic aspects of a product, linked to behavioural goals that the software must achieve [7]. Despite high usage of the term ‘Usability’ in HCI, in recent years, a new expression emerged, the UX. According to ISO 9241 [8], UX is the perceptions and responses of the person that results from the use and/or anticipated use of product, system, or service. UX focuses on emotions and judgments that a user has when using an application [7], like immersion, emotion, and motivation. Still, according to Hassenzahl and Tractinsky [7], UX can be considered as a combination of the user's internal state, system characteristics, and the context in which the interaction occurs. UX evaluation is important to verify the hedonic aspects of the product, linked to the user's feelings and how the software is behaving about them [7]. Therefore, it is important to evaluate Usability and UX jointly in the software development process because both pragmatic and hedonic aspects are considered [9]. Consequently, the relationship between Usability and UX is intertwined. While Usability focuses on task performance as the number of errors, UX focuses on experiences by analysing peoples’ emotions while they interact with the software [10]. Hence, the current challenge is to understand which evaluation technologies are used and how they are applied in the NUI context. Thus, the impacts of these technologies in the software are identified, and software improvements can be obtained to evolve the quality proposed by industry, which extends to society in general. Thus, the goal of our paper is to present an SMS conducted to investigate which technologies (tools, methodologies, techniques, etc. [11]) are used to evaluate usability and UX in software that implements NUIs. An SMS characterises state-of-the-art on a research topic. Our SMS identified evaluation technologies for NUIs and their characteristics, such as technology type, evaluation focus, aspects evaluated, among others. Our SMS was based on the structure shown by Kitchenham and Charters [12], composed of research questions, goals, definition of data sources, search string, and inclusion and exclusion criteria, strategy to extract data and to synthesise them. Besides, we verified publications years and venues. After filtering publications, we selected and extracted 56 papers. Our results identified 30 different usability and/or UX evaluation technologies for NUI. We verified most of the technologies are for generic software context, i.e. they do not 451 Table 1 SMS goal analyse for the purpose of with respect to from the point of view of in context of scientific publications to characterise technologies that evaluate the usability and/or UX of natural user interfaces HCI researchers publications available from SCOPUS, ACM, IEEExplore, engineering village and science direct consider specificities of NUI. Besides, most technologies evaluate only one side, usability or UX, without considering these aspects together. Moreover, most technologies extract only quantitative data. We believe quantitative and qualitative analyses are the right choice for researchers because they provide different types of data for examination. The remaining of our paper is organised as follows: Section 2 presents the related work; Section 3 describes in detail the SMS structure; Section 4 shows the SMS findings; Section 5 presents a discussion about the results; Section 6 describes the threats to validity; and Section 7 presents the conclusions and future work. 2 Related work With the request for new features and interaction's forms in software, usability and UX evaluation become relevant allies in the development process, testing product quality and interaction [5]. Some secondary studies were conducted about usability and UX evaluation. Moreover, we found studies that investigated some topics related to NUI. Paz and Pow-Sang [13] conducted an SMS to investigate usability evaluations in the overall software development process. This work identified the survey/questionnaire method is the most used in the literature to perform usability evaluations in software in general. However, the authors only identified the evaluation method without showing which are the questionnaires and surveys. Moreover, the purpose of the paper is to investigate any software context, which does not consider interaction specificities. In the systematic literature review (SLR) performed by Insfran and Fernandez [14], the authors identified evaluations realised in web development, specifying the application's context. The results revealed most of the papers present evaluation methods explicitly designed for web and perform a user test. However, the authors only researched about usability. Moreover, the authors did not consider different forms of interactions. The SLR presented by Zapata et al. [15] investigated the mHealth (Mobile Health) application evaluation process and focused their results on helping developers to build more useful applications. The main result was that the adoption of automated mechanisms could improve usability methods for the mHealth context. Moreover, the study identified that evaluation processes must be revised to combine more than one method. However, the authors did not consider the types of interactions used by mHealth applications. Besides, the authors evaluated usability without considering UX aspects. In the NUI context, we did not find secondary studies that investigate the usability or UX evaluation. However, we found studies which investigate topics related to NUI. Torres-Carrión et al. [16] conducted an SLR to investigate state-of-the-art regarding gesture-based children's–computer interactions, and how they may help in the inclusive education. The result shows that design guidelines for natural interfaces are applied in studies and highlight human cognitive and sensory factors, not considering emotional factors. However, the study did not focus on usability and/or UX evaluation. In SLR proposed by Groenewald et al. [17], the authors’ goal was to provide a more in-depth classification of mid-air hand gestures to help developers offer better experiences to users of interactive digital displays. The results show that most of the gestures evaluations were made using Kinect and Leap Motion devices. Although the gesture is considered a type of NUI, authors did not verify aspects of UX or usability evaluation of software using classified gestures. 452 Furthermore, Mewes et al. [18] examine researches related to the use of touchless forms of interaction in operating rooms and radiology. Results revealed that most of the identified approaches test their software in real surgery contexts. Although authors cover different types of interaction, such as voice, gesture, and gaze, they restrict the search for radiology and surgery software. Secondary studies are important to deepen the research and to understand state-of-the-art. As mentioned, we found secondary studies about usability and UX evaluation, as well as about NUI topics. However, we did not find secondary studies that combined these two concepts, usability/UX evaluation, and NUI. Therefore, our SMS aimed to fill this gap, checking the state-of-the-art of technologies used to evaluate usability and UX in the general context of NUIs. 3 Systematic mapping study The SMS is a type of literature review and part of evidence-based research. In an SLR, the goal is to identify, evaluate, and compare all relevant research for a given topic [12]. However, in SMS, the goal is to structure and categorise a research topic [12]. We choose to realise an SMS because it identifies results that can be explored in the future by an SLR. Our SMS followed the guidelines proposed by Kitchenham and Charters [12]. According to these guidelines [12], SMS was divided into three phases: planning, conduction, and reporting. In planning, we structured the research protocol with research questions, data sources, search string, and inclusion and exclusion criteria. In conduction, we searched in data sources, selected publications through two filters, extracted data of these studies, and analysed the data. In reporting, we shared the results. 3.1 Phase 1: planning the mapping 3.1.1 Goal: SMS's goal was based on the goal-question-metric (GQM) paradigm [19] and it is described in Table 1. 3.1.2 Research questions: The main research question is ‘What technologies are used to evaluate the Usability and/or UX of software which implements Natural User Interfaces?’. According to Petersen et al. [11], ‘technology’ is understood as tools, methodologies, techniques, among other proposals in the Software Engineering and HCI field. Besides, we defined sub-questions (SQs) to identify characteristics of technologies. SQs are presented in Tables 2 and 3. 3.1.3 Data sources: Data sources were chosen: (i) for providing an efficient search engine, (ii) for allowing the use of similar terms in strings, and (iii) for a great number of papers obtained due to the breadth of databases used. Besides these criteria, the relevance of these repositories to the research area of our work was crucial for choices. Data sources are Scopus (https://www.scopus.com/search/ form.uri?display=basic), IEEExplore (https://ieeexplore.ieee.org/ Xplore/home.jsp), ACM Digital Librarytnote (https://dl.acm.org/), Engineering Village (https://www.engineeringvillage.com/ home.url) and Science Direct (https://www.sciencedirect.com/). 3.1.4 Search string: For the definition of keywords used in the search string, the PICOC criterion (Population, Intervention, Comparison, Outcome, and Context) [12] was applied. For SMS, comparison and context are not applied because this research type is not intended to compare technologies, but rather to characterise them. Therefore, we defined PICOC as follows: IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Table 2 Research SQs SQs Possible answers SQ1. Which NUI does technology evaluate? The technology can evaluate the following interactions, according to the definition based on Norman, Ballmer, and Fernández et al. [2–4]. Observation: The classification used here is one of several existing for NUI. This classification starts from the premise that computer needs to understand human interactions, and not that humans need to be interpreted by computer peripherals (e.g. mouse, keyboard, touchpad) [20]. Thus, human interaction has a direct path between the body and the software. Moreover, the term ‘Gesture’ used in classification is for body movements. Hand gestures in touchscreens (e.g. zooms or scroll) are part of the multitouch group. (a) Multitouch: it uses two or more fingers on touchscreen to perform some action, for example zooming. (b) Gesture: it uses body movements to perform some action, for example the movement of ‘tapping’ to click on an option. (c) Voice: it uses speech to perform some action, for example a command to save a new reminder. (d) Gaze: it uses eye-tracking to perform some action, such as fixing the eye on some letter on a virtual keyboard. SQ2. What is the quality criterion of evaluation The technology can meet the following quality criteria: technology? (a) Usability – the technology aims to evaluate the usability of the software. (b) User experience – the technology aims to evaluate the user experience of those who use the software. (c) Both – the technology aims to evaluate both the usability and the UX of the software. SQ2.1. What types of Usability evaluation Usability evaluation technologies were classified according to Ivory and Hearst [21]: technology were used? (a) Testing: an evaluator observes users interacting with an interface (i.e. completing tasks) to determine usability problems. (b) Inspection: an evaluator uses a set of criteria or heuristics to identify potential usability problems in an interface. (c) Inquiry: users provide feedback on an interface via interviews, surveys, etc. (d) Analytical modelling: an evaluator employs user and interface models to generate usability predictions. (e) Simulation: an evaluator employs user and interface models to mimic a user interacting with an interface and report the results of this interaction (e.g. simulated activities, errors, and other quantitative measures). SQ2.2. What types of UX evaluation technology The technologies were classified according to Roto et al. [22]: were used? (a) Laboratory study: system is available in a simulated context to understand users’ experiences. (b) Case study: system is available in a real context to understand users’ experiences. (c) Survey: collecting information online from users to understand their experiences. (d) Expert: system is available to a specialist to detect possible UX issues. SQ2.3. What Usability and/or UX aspects does the Answers obtained by this SQ are subjective and vary from paper to paper. However, the technology evaluate? goal was to verify which aspects were evaluated by evaluation technology, such as effectiveness, user satisfaction, fatigue, etc. SQ3. Is the technology specific to software which The technology can be classified into one of the following groups: implements NUI, or is it for software in general? (a) Specific: Usability and/or UX evaluation technology is specific to the software that implements NUI. (b) Generic: Usability and/or UX evaluation technology is not specific to the software that implements NUI. SQ4. Is the evaluation technology based on any The technology can be classified into one of the following answers: existing technology? (a) Yes, evaluation is performed using existing evaluation technology. (b) No, evaluation is not performed using existing evaluation technology. SQ5. Was the evaluation technology applied? The answer can be one of the following: (a) Yes, the evaluation technology was applied in the empirical evaluation. (b) No, the evaluation technology was not applied in the empirical evaluation. SQ5.1 How and by what technology was the The goal of this SQ is to detail how the experiment was carried out, its steps, methods and empirical evaluation performed? which technology was used to evaluate usability and/or UX. SQ5.1.1 What is the classification of the evaluation The goal of this SQ is to identify, according to Petersen et al. [11], if the technology is a technology? method, a technique, a methodology, etc. • Population: NUI; • Intervention: technologies which evaluate usability and/or UX used in software development which implements NUI; • Comparison: not applicable; • Outcome: usability/UX evaluation of software which uses NUI; • Context: not applicable. Table 4 shows the terms and search string used in our SMS. The terms are divided into three parts: the first part presents population, IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 i.e. terms related to NUI; the second part represents intervention, what is planned to discover; and the third part presents results, what want to evaluate or improve. 3.1.5 Inclusion criteria: • IC1 – Publications which present usability and/or UX evaluation technologies in the development process of software that implements NUI. 453 • IC2 – Publications that describe experimental studies of usability and/or UX evaluation technologies in the development process of software that implements NUI. • IC3 – Publications that discuss any aspects related to usability and/or UX evaluation in the development process of software that implements NUI. Table 3 Research SQs (continuation Table 2) SQs Possible answers SQ5.2 Was the evaluation done at the academy, industry or laboratory? 3.1.6 Exclusion criteria: • EC1: Publications that did not attend inclusion criteria were not selected. • EC2: Publications that do not have content available for reading and analysing data (especially in cases where papers are paid or not made available by search repository). • EC3: Publications which have a different language of English or Portuguese. • EC4: Publications or files that were not peer-reviewed, such as technical reports, books, and work in progress. • EC5: Publications that have already added to another search engine defined in our SMS (duplicate). SQ5.3 Was the evaluation quantitative or qualitative? 3.2 Phase 2: conducting the mapping 3.2.1 Primary studies selection: Our SMS started in September 2018, and the last string search was performed in March 2019. In the preliminary selection process (first filter), two researchers evaluated the titles/abstracts of returned papers. The first researcher read all titles/abstracts and classified the papers based on inclusion and exclusion criteria. Afterwards, the other researcher also read all titles/abstracts and classified the papers. If researchers could not reach a conclusion about the paper inclusion based solely on title and abstract, the paper was automatically included to be evaluated more specifically in the next step. If the paper was deleted, a plausible justification would have to be provided. In the final selection process (second filter), the first researcher performed the complete reading of papers and classified them as included or excluded for extraction. Afterwards, the second researcher checked all justifications of excluded papers and extractions of included papers. If researchers disagreed on a paper classification, a discussion was necessary to reach an agreement. As demonstrated in Table 5, 246 papers were returned after applying the search string to the selected search repositories. A total of 126 papers were selected using the first filter, based on inclusion and exclusion criteria. A total of 56 papers were selected after the application of the second filter. Some papers have appeared more than once in different repositories. In these cases, they were considered only in the first repository according to the search order performed: Scopus, IEEExplore, Science Direct, Engineering Village, and ACM. As shown in Table 5, data was extracted from 56 papers approved in the second filter, which guided the SMS results. 3.2.2 Data extraction: The data extraction strategy designated in our mapping was based on providing answers to each SQ defined above (see Tables 2 and 3). Besides, we extracted publication venues (journal, conference, or congress), publication years, and devices used to capture natural interaction. The extraction strategy certifies the application of the same data extraction criteria for all papers selected, facilitating the classification. The 56 selected publications are [23–78]. 3.2.3 Data analysis: The researchers extracted all papers based on SQs and criteria mentioned above. After the extraction, their analysis was performed in the Microsoft Excel tool, which helped to create charts and results shown in the next section. 3.3 Phase 3: reporting the mapping 3.3.1 Publications years: The selected papers were published between 2011 and 2019. The years returned are recent because the term Natural User Interface began to be used from 2011 [1]. As 454 The evaluation environment can be classified into one of the following answers: (a) Industrial: the system evaluation was performed in an industrial context with professionals. (b) Academic: the system evaluation was performed in an academic context with students. (c) Laboratory: system evaluation was performed in a laboratory context. (d) Mixed: system evaluation was performed in the industrial and academic, industrial and laboratory, or academic and laboratory. The evaluation can be classified into one of the following answers: (a) Qualitative: the analysis of the evaluation was made qualitatively. (b) Quantitative: the analysis of the evaluation was made in quantitative form. (c) Mixed: the analysis of the evaluation was made in both qualitative and quantitative form. Table 4 Terms and search string used in SMS population (‘natural user interface*’ OR ‘natural interface*’ OR AND ‘natural user interaction*’ OR ‘natural user communication*’ OR ‘natural communication’) intervention (‘tool’ OR ‘framework’ OR ‘technique’ OR ‘method’ AND OR ‘model’ OR ‘process’ OR ‘guideline’ OR ‘pattern’ OR ‘metric’ OR ‘approach’ OR ‘inspection’ OR ‘principle’ OR ‘aspect’ OR ‘requirement’ OR ‘heuristic’ OR ‘methodology’ OR ‘mechanism’) outcome (‘Usability evaluation’ OR ‘Usability assessment’ OR ‘Usability improvement’ OR ‘ux evaluation’ OR ‘ux assessment’ OR ‘ux improvement’ OR ‘user experience evaluation’ OR ‘user experience assessment’ OR ‘user experience improvement’) Table 5 Papers returned and selected in first and second filters Source Total After first filter After second filter Scopus IEEExplore Science direct Engineering village ACM total 138 67 31 7 3 246 76 39 11 0 0 126 38 15 3 0 0 56 illustrated in Fig. 1, the number of publications increased in 2013. Moreover, the number of publications decreased in 2018. The year 2019 could not be thoroughly analysed because our SMS considered papers published until March 2019, which can justify the low number of publications for this year. The year 2017 is the year with the most significant amount of papers (13 papers), followed by 2016 and 2014 (10 papers each). 3.3.2 Publications venues: Only peer-reviewed publication venues (including journals, conferences, and congresses) were considered. Fig. 2 provides an overview of papers distribution by conferences. The order of Fig. 2 is based on the classification The Computing Research and Education Association of Australasia (CORE 2018). Conferences without rank are marked how not IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Fig. 1 Graph of publications years extracted from SMS Fig. 2 Conference papers distribution Cybernetics (SMC), and International Conference on HumanComputer Interaction – Interacción (AIPO), with two papers each. Fig. 3 presents an overview of identified journal papers. The order of Fig. 3 is based on the Scimago Journal Ranking (SJR 2018). The main returned journals are International Journal of Robotics Research (JRR) and Computers in Human Behavior (CHB). Most returned journals are International Journal of Human–Computer Studies (JHCS), Universal Access in the Information Society (UAIS), and Interacting with Computers (IWC), with two papers each. Two publications in congresses were also identified. The congresses are Digital Heritage International Congress (DHIC) and World Congress on Health and Biomedical Informatics (MEDINFO), with one paper each. The results for each SQ of our SMS are shown in the next section. 4 Fig. 3 Journal papers distribution ranked (NR). The main conferences returned are Conference on Human Factors in Computing Systems (CHI) and International Conference on Intelligent User Interfaces (IUI). Conferences most returned are International Conference on Intelligent User Interfaces (IUI), International Conference on Systems, Man, and IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Findings Table 6 demonstrates a summary of the results found in our SMS. Overall, 110 technologies used in the studies were found. We used these technologies to answer SQ1–SQ4. SQ1 shows 119 technologies because the same technology was used multiple times in the same paper to evaluate different types of NUIs, increasing the number. In questions SQ5 to SQ5.3, the number of empirical evaluations was considered. SQ5 shows a Boolean analysis of the 56 returned papers. However, the number of evaluations returned 455 Table 6 Summary of technologies and evaluations found in SMS Research SQs Possible Findings answers Technologies Percentage, % SQ1. The technology evaluates which type of NUI? multitouch gestures voice gaze SQ2. What is the usability quality criterion of user evaluation technology? experience both SQ2.1 What types of testing Usability evaluation inspection technology were inquiry used? analytical modelling simulation SQ2.2 What types of lab studies UX evaluation case studies technology were survey used? experts SQ3. Is the technology specific specific to software generic which implements NUI, or is it for software in general? SQ4. Is the evaluation yes technology based on no any existing total of technology? If so, technologies which one? SQ5. Was the evaluation technology applied? SQ5.2 Was the evaluation done at the academy, industry, or laboratory? SQ5.3 Was the evaluation quantitative or qualitative? yes no industry laboratory academy mixed quantitative qualitative both total of evaluations 14 80 17 8 94 14 11.76 67.23 14.29 6.72 85.45 12.73 2 40 2 54 0 1.82 41.67 2,08 56.25 0 0 16 0 0 0 30 80 0 100 0 0 0 27.27 72.73 82 28 110 74.55 25.45 100 Evaluation 55 1 Percentage, % 98.21 1.79 1 61 0 2 36 9 19 64 1.56 95.31 0 3.13 56.25 14.06 29.69 100 was 64 because there were several evaluations in the same paper, increasing the number of analyses. Therefore, our SMS identified 110 technologies used to evaluate usability and/or UX in software that implemented a NUI. Removing duplicates was found 30 different evaluation technologies. Sub-questions SQ2.3 (What Usability and/or UX aspects does the technology evaluate?), SQ5.1 (How and by what technology was the empirical evaluation performed?), and SQ5.1.1 (What is the classification of the evaluation technology?) were not presented in Table 6. First, because of a large number of varied responses that these SQs identified. Second, because these SQs are subjective questions and vary from paper to paper. However, all SQs’ findings (including SQ2.3, SQ5.1, and SQ5.1.1) are detailed in the subsections below. Classification of publications and technologies by SQ is shown in Table 7. 4.1 SQ1: type of NUI The results of SQ1 reveal that 67.23% of technologies evaluated the usability and/or UX of software using gesture-based interfaces (such as a website that allows scrolling your page with a hand 456 gesture). Falcao et al. [24] presented a usability evaluation that contained tasks to be done in Photoshop using hand gestures. The evaluation technology used was a questionnaire combining Nielsen's [79] and Jordan's [80] heuristics. These heuristics focus on the design of a user-friendly interface, where Nielsen [79] involves issues such as user control, error prevention, and documentation. In contrast, Jordan [80] focuses on topics as consistency and visual clarity. About 14.29% of returned technologies evaluated the usability and/or UX of software using voice-based interfaces (such as a search engine that performs a search according to the word spoken by user). Rocha et al. [34] presented an experiment that the main task was to perform a voice search. One of the evaluation technologies used was a direct observation. About 11.76% of technologies evaluated the usability and/or UX of software that used multitouch-based interfaces (such as a two-finger map application to zoom in on navigation). UebbingRumke et al. [37] evaluated the UX of a flight controller software handled through a multitouch. The evaluation technology used was the user experience questionnaire (UEQ) [81]. About 6.72% of technologies evaluated the usability and/or UX of software that implemented a gaze-based interface (as a system that has your cursor controlled by gaze). Zhu et al. [41] evaluated the usability of a virtual soccer game, where users used their eyes to control players and kick the ball. One of the technologies used was the analysis of the experiment's video recording. 4.2 Devices to capture natural interactions Fig. 4 provides an overview of devices used to capture users’ natural interactions. The results reveal that 31 primary studies used Kinect to capture users’ gestures. Kinect is a motion sensor developed by Microsoft [82] and used in their video games. The main feature of Kinect is the high accuracy to capture the user's body gestures, as well as the ease of developing applications that use Kinect, due to the Kinect for Windows Software Development Kit (SDK) developed by Microsoft [83]. This device is used in Kazuma et al. [65], which the authors proposed and evaluated a new approach for oral presentation based on gestures captured from Kinect. Leap Motion is a gesture capture device and the second most device used by studies. However, this sensor only captures finger gestures [84]. This device is used in Vosinakis et al. [68], where the authors developed an application in which users, through interaction with Leap Motion, assumed the role of sculptors and were able to create a virtual statue by selecting and applying appropriate tools. 4.3 SQ2: quality criterion of evaluation technology The results indicate that 85.45% of evaluation technologies have focused on usability. Deng et al. [78] evaluated user satisfaction when using gestures and gaze to move cubes to target positions in software. The evaluation technology used was the system usability scale (SUS) [85]. About 12.73% of technologies focused on UX. d'Ornellas et al. [35] presented an evaluation of aspects such as immersion, tension, and user competence when using a serious game through gestures for stroke rehabilitation. The UX evaluation technology was the main module of the Game Experience Questionnaire (GEQ) [86]. About 1.82% of evaluation technologies focused on usability and UX together. Economou et al. [44] evaluated (using the same technology) user satisfaction (usability) and user engagement (UX) of a virtual hangout with avatars captured by Kinect. The experiment recording and subsequent analysis of recorded videos were used to obtain data. 4.3.1 SQ2.1: types of usability evaluation technology: Inquiry technologies are the most used, around 56.25%. This type of technology collects data from experiment participants, such as the satisfaction questionnaire presented in Rybarczyk et al. [38], where users could answer after using a telerehabilitation system. IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Table 7 Legend: SQ1: (A) Multitouch; (B) Gesture; (C) Voice; (D) Gaze. SQ2: (A) Usability; (B) User Experience; (C) Both. SQ2.1: (A) Testing; (B) Inspection; (C) Inquiry; (D) Analytical Modelling; (E) Simulation. SQ2.2: (A) Laboratory Study; (B) Case Study; (C) Survey; (D) Expert. SQ3: (A) Specific; (B) Generic. SQ4: (A) Yes; (B) No. – not applicable Ref. Technology SQ1 SQ2 SQ2.1 SQ2.2 SQ3 SQ4 A B C D A B C A B C D E A B C D A B A B [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] T01_1 T01_2 T02_1 T02_2 T03_01 T03_02 T04_1 T04_2 T04_3 T04_4 T05_01 T05_02 T06_01 T06_02 T07 T08 T09 T10 T11 T12_1 T12_2 T13_1 T13_2 T13_3 T14 T15_01 T15_02 T15_03 T15_04 T15_05 T16 T17_01 T17_02 T18_01 T18_02 T19_01 T19_02 T19_03 T20 T21 T22 T23_01 T23_02 T24 T25_01 T25_02 T25_03 T26_01 T26_02 T26_03 T27 T28_01 T28_02 T29 T30_01 T30_02 T31_1 T31_2 T32 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x — — — x x x x x x x x x x x x x x x x x x x x x x x x x x x x — x x x x x x x x x x x x x x x x IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 — x x x x — — x x x x — x — — x x x x x x — — — — x x x x x x — x x — — — x x x x x x x x x x x x x x — — — — x x x x x x x x — — — x — — — — x — — x x x x x x x x x x x x — — — — — — — — — — — — — — — — — — — — — — — — — x x x — — x — — — — — — — — — — — — — x — — — — x x — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 457 Ref. Technology A [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] T33_01 T33_02 T33_03 T34_01 T34_02 T34_03 T35 T36_1 T36_2 T36_3 T37_1 T37_2 T38_1 T38_2 T38_3 T38_4 T39_01 T39_02 T39_03 T40 T41_01 T41_02 T42_01 T42_02 T43 T44 T45_1 T45_2 T46_01 T46_02 T46_03 T47 T48_01 T48_02 T48_03 T49 T50_01 T50_02 T51_01 T51_02 T52_01 T52_02 T52_03 T53_01 T53_02 T54_01 T54_02 T55 T56_01 T56_02 T56_03 SQ1 B C x x x x x x x x x D x x x x x x x x x x x x — x x x x x x x x x — — — x — x x — — — — — — — x x — — — — — — — — — x x x x x x x x x x About 41.67% of technologies are usability tests, where a researcher observes users’ interaction in the experiment. Tang et al. [45] used an observation form to record any peculiar movement during the tasks’ accomplishment. About 2.08% of evaluation technologies are usability inspection, which requires an expert to find usability problems. Guimarães et al. [64] developed an inspection form with several aspects to evaluate. Finally, there were not analytic modelling and simulation technologies. 458 — x x x x x x — x x x x x x x x — x x x x — x x x x x x x x E x x x x x x x x SQ2.1 C D x x x x x x x x B x x x x x x x x x x x x x x x x x x x x A x x x x x x x x x x x x x x x SQ2 B C x x x x x x x x x x x x x x x x x x x x x x A x — x — — — — — — — — x — — — x x x x x x — — — — — — — — A — — — — — — — — — x — — — — — — — — — — — — — — — — — — x x x — — — x — — — — x — — x x x — — — — — — SQ2.2 B C D — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — SQ3 A B x SQ4 A B x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x — — — — — — — — — — — — — — — — — — x x x x x x x x x x x x x x x 4.3.2 SQ2.2: types of UX evaluation technology: The results of SQ2.2 show that all UX evaluation technologies were laboratory studies. There were no case studies, surveys, or UX expert evaluation. 4.3.3 SQ2.3: usability and/or UX aspects evaluated: Results of SQ2.3 reveal 61 different aspects evaluated by studies. Tables 8 and 9 show the aspects list by publication. To reduce the research bias, the two researchers did not provide their opinion about IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Fig. 4 Devices used to capture natural users interactions in primary studies usability or UX classifications. Therefore, the same aspects considered of usability in some papers were considered of UX in other papers according to the authors, as performance, efficiency, ease of use, user preference, usefulness, naturalness, effort, engagement, control, pleasantness, frustration, competence, and confidence. The most evaluated aspect was user satisfaction, with 34 evaluations. Muender et al. [60] evaluated user satisfaction in a game where users could manipulate 3D protein structures through a multitouch interface. The second most evaluated aspect was effectiveness, with 21 evaluations. Fiorentino et al. [49] evaluated the effectiveness through the error rate made by users during the experiment. The performance achieved 18 ratings and was the third most rated aspect. Kondori et al. [63] calculated the performance based on the score obtained by experiment participants. 4.4 SQ3: specificity of the technology which evaluates NUI About 72.73% of evaluation technologies are for software in general, i.e. they are not specific to evaluate software that implements NUI. Caggianese et al. [70] applied a usability questionnaire used in different contexts, the SUS [85], to evaluate software that manipulated virtual sculptures through hand gestures. About 27.27% of technologies were created specifically to evaluate systems with NUI. Su et al. [76] developed a questionnaire to evaluate the usability of a gesture-based system for home rehabilitation. 4.5 SQ4: basis of evaluation technology Results of SQ4 show that 74.55% of technologies are based on existing ones. Uebbing-Rumke et al. [37] used the UEQ [81] to perform the UX evaluation. About 25.45% of evaluation technologies are not based on existing technologies, i.e. the authors created a new evaluation technology for the study. Macaranas et al. [26] developed an interview to evaluate software usability. Fig. 5 illustrates the combination of results from SQ3 and SQ4. These results indicate there were 30 evaluation technologies for NUI and 80 generic evaluation technologies. The combined analysis shows that the use of generic technologies in the NUI context is much higher than specific technologies. Of 30 specific technologies, only 6 were based on any existing technologies, i.e. other 24 were developed solely to evaluate the study of the paper, without providing a common standard of evaluation and replicability. IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 4.6 SQ5: empirical evaluation Results of SQ5 reveal that only one paper did not apply the evaluation technology. Sun and Cheng [43] showed the software developed and the evaluation technology. However, they did not apply the technology in the empirical evaluation. Within SQ5, there are SQ5.1 (Which evaluation technology was used in the empirical study?), SQ5.2 (What is the evaluation environment?), and SQ5.3 (Type of analysis) detailed below. Table 10 shows the publications and evaluations list. 4.6.1 SQ5.1: evaluation technology used in the study: The 30 different technologies returned in our SMS are illustrated in Fig. 6. Results of SQ5.1 show that the SUS [85] and the analysis of the study results were the most used evaluation technologies. We found the use of these technologies in Li et al. [74] and Eckert et al. [53], respectively. In the first study, researchers use SUS to evaluate the usability of distance interaction on display screens. With SUS, it is possible to evaluate user satisfaction through ten questions answered with a Likert scale. Regarding the second study, the authors analysed the study results concerning usability. In this case, the efficiency and effectiveness of participants were obtained according to the performance and score results analysis. Within SQ5.1, there is SQ5.1.1, which extracted the evaluation technologies classifications. Classifications were based on terms used in the intervention of search string, according to Petersen et al. [11] (e.g. method, technique, tool, etc.). Technologies were classified based on definitions of the authors who created them. Table 11 shows the classifications and papers that used the terms. 4.6.2 SQ5.2: evaluation environment?: Results of SQ5.2 show that 95.31% of empirical evaluations were performed in a laboratory environment, as Su et al. [76], where the authors simulated a context for testing a home rehabilitation system using Kinect. About 3.13% were performed in a mixed environment, as Shishido et al. [30], where a laboratory environment was mixed with an academic environment. Only Zhu et al. [41] conducted the study exclusively at the academy, where the authors experimented in a real context with students. No studies were performed in the industry. 4.6.3 SQ5.3: type of analysis: Results of SQ5.3 reveal that 56.25% of analyses were performed quantitatively, emphasising the collection of numbers, as shown in Lee et al. [51], where the authors (through graphs) confirmed results obtained with the questionnaire answers. Approximately 29.69% of analyses were performed by a mixedmethod (quantitatively and qualitatively). Chatzidaki and Xenos [73] used tasks to evaluate efficiency (quantitative) and interview to evaluate participants’ opinions (qualitative). 459 Table 8 Legend SQ2.3: (a) User satisfaction; (b) Effectiveness; (c) Performance; (d) Efficiency; (e) Ease of use; (f) User preference; (g) Usefulness/Utility; (h) Immersion; (i) Naturalness; (j) Effort; (k) Ease of learning; (l) Gamification; (m) Limitation/ Difficulty; (n) Attractiveness; (o) Engagement; (p) Workload; (q) Control; (r) Novelty; (s) Intuitiveness; (t) Pleasantness/Enjoyable to use; (u) Frustration; (v) Overall UX/UX issues; (w) Estimated/Execution time; (x) Impression/Expectation; (y) Competence; (z) Space/Time Pressure; (aa) Dependability; (ab) Perspicacity; (ac) Acceptance; (ad) Attention; (ae) Fatigue; (af) Virtualisation/ Virtual Reality; (ag) Participants’ Behaviour; (ah) Challenge; (ai) Flow; (aj) Positive/negative effects; (ak) Tension; (al) Precision; (am) Reaction time; (an) Endurance; (ao) Hand Coordination; (ap) Error tolerance; (aq) Suitability; (ar) Individualisation; (as) Self descriptiveness; (at) Working posture relaxed; (au) Stimulation; (av) Interaction; (aw) Intervention; (ax) Consciousness; (ay) Distraction; (az) Time to get used to; (ba) Information quality; (bb) Interface quality; (bc) Navigation; (bd) Confidence; (be) Clarity; (bf) Agility; (bg) Aesthetics; (bh) Pleasure; (bi) Nielsen's heuristics Ref. a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah ai aj ak [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] 460 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Ref. a b c d e f g h i j k l m n o p q r s t u v w x y z aa [76] [77] [78] x ab ac ad ae af ag ah ai aj ak x x x x x x Table 9 (Continuation Table 8) Legend SQ2.3: (a) User satisfaction; (b) Effectiveness; (c) Performance; (d) Efficiency; (e) Ease of use; (f) User preference; (g) Usefulness/Utility; (h) Immersion; (i) Naturalness; (j) Effort; (k) Ease of learning; (l) Gamification; (m) Limitation/Difficulty; (n) Attractiveness; (o) Engagement; (p) Workload; (q) Control; (r) Novelty; (s) Intuitiveness; (t) Pleasantness/Enjoyable to use; (u) Frustration; (v) Overall UX/UX issues; (w) Estimated/Execution time; (x) Impression/ Expectation; (y) Competence; (z) Space/Time Pressure; (aa) Dependability; (ab) Perspicacity; (ac) Acceptance; (ad) Attention; (ae) Fatigue; (af) Virtualisation/Virtual Reality; (ag) Participants’ Behaviour; (ah) Challenge; (ai) Flow; (aj) Positive/Negative Effects; (ak) Tension; (al) Precision; (am) Reaction time; (an) Endurance; (ao) Hand Coordination; (ap) Error tolerance; (aq) Suitability; (ar) Individualisation; (as) Self descriptiveness; (at) Working posture relaxed; (au) Stimulation; (av) Interaction; (aw) Intervention; (ax) Consciousness; (ay) Distraction; (az) Time to get used to; (ba) Information quality; (bb) Interface quality; (bc) Navigation; (bd) Confidence; (be) Clarity; (bf) Agility; (bg) Aesthetics; (bh) Pleasure; (bi) Nielsen's heuristics Ref. al am an ao ap aq ar as at au av aw ax ay az ba bb bc bd be bf bg bh bi [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] x x x x x x x x IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 x x x x x x x x x x x x x 461 Ref. al am an ao ap [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] aq ar as at au av Only 14.06% of analyses were performed qualitatively, valuing the subjectivity of the answers. Economou et al. [44] analysed the experiment recording, where they extracted subjective information about aspects defined by them, such as the user's focus. Discussion Our SMS showed evidence about technologies that were used to evaluate usability and/or UX in software that implements NUIs. From a total of 246 papers, we selected and extracted 56 manuscripts in our mapping, after applying the first and second filters. Results of our SMS identified 30 different technologies [6, 31, 32, 37, 40, 52, 79, 81, 85–100] (see Fig. 6) and their characteristics that answer the main research question: ‘What technologies are used to evaluate the Usability and/or UX of software which implements Natural User Interfaces?’. Besides, we created SQs to extract the approved papers fully. Results of SQ1 indicate a need for further studies using voice, gaze, and multitouch as a form of interaction. Besides, we showed that the vast majority (67%) of studies involving NUI use gestures to perform interaction between user and system. The results about devices showed several methods used to capture natural user interactions. The significant use of Kinect (55%) and Leap Motion (32%) is justified because gestures are the most commonly NUI used. The popularity and quality of these sensors also justify the choice of developers when programming their applications. We observed when using voice, gaze, or multitouch; no device stands out in the amount of use. The main result of SQ2 is the lack of technologies that evaluate usability and UX jointly, specifically for software with some NUI. Thus, pragmatic aspects of usability are evaluated on one occasion, and hedonic aspects of UX on another, lacking technologies that unite these two criteria. The joint evaluation of usability and UX is recommended because it brings these complementary criteria in the same technology, allowing pragmatic aspects linked to behavioural goals and hedonic aspects related to user's feelings to be evaluated simultaneously. 462 ax ay az ba bb bc bd be bf x x x bg bh bi x Fig. 5 Combined results of SQ3 and SQ4 5 aw SQ2.1 shows that the most used usability evaluation technologies are of inquiry (56%), which collects data from users of the experiment. Our SMS was not found analytic modelling technologies or simulation technologies. This result happened because analytic modelling or simulation depends on other technologies as models and tools. About UX technologies discussed in SQ2.2, all evaluation technologies were laboratory studies, revealing this technology is the simplest to apply in the NUI context. A case study requires an environment of real application, a survey relies on online evaluation, and a study with experts needs a field specialist, making these technologies more challenging to apply. Besides, SQ2.3 shows a more significant concern with Usability, as the aspects most evaluated by technologies are user satisfaction (60%), effectiveness (37%), and performance (32%). We observed there is no standardisation of which aspects are usability and which are UX since several aspects are used to evaluate both. The results of SQ3 indicate there is a lack of specific technologies to evaluate usability and/or UX of software with a NUI. This lack happens because most authors prefer to use a generic technology already consolidated in the literature than to create a new one to use in a specific context. Still, we discovered some authors use technologies created only for the study, without going through a validation process. Besides, SQ4 shows most technologies used are based on an existing one (74%). On SQ4, we observed a shortage in technologies that evaluate usability and/or UX specifically of software with a NUI, and that are replicable in other works. From SQ5, we noted the majority of publications show the evaluation technology applied in empirical evaluations. Technologies classified as method and technique are the most used since these classifications are commonly used in software engineering and HCI field. About evaluations, 33% of them are performed by SUS, and 95% of them were realised in a laboratory environment. We observed that creating a laboratory environment to apply the study is more feasible. This happens because, in a controlled environment, it is possible to develop and conduct experiments more efficiently. Besides, evaluate a product first in a controlled context is more interesting than in a real context because authors can find improvements and updates without actually placing the product in a working context. Furthermore, there is a scarcity of studies conducted in the industry, as it is necessary to make partnerships with development companies to apply technologies. Regarding the type of analysis (SQ5.3), 56% were performed quantitatively. With these results, we observed a preference for quantitative analysis, since it is part of most evaluation technologies. We believe a mixed (both quantitative and qualitative) evaluation is ideal since it provides different types of data for analysis. 6 Threats to validity As well all SMS, even if minimal, the risk of bias remains. In our paper, we tried to mitigate the bias performing the peer review, where two researchers reviewed all papers. First, a researcher IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 Table 10 Legend: SQ5.2: (A) Academy; (B) Industry; (C) Laboratory; (D) Mixed. SQ5.3: (A) Quantitative; (B) Qualitative; (C) Both. – not applicable Ref. Evaluation SQ5.2 SQ5.3 A B C D A B C [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] E01_1 E01_2 E02 E03 E04 E05 E06 E07 E08_1 E08_2 E09 E10_1 E10_2 E10_3 E10_4 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23_1 E23_2 E23_3 E24 E25 E26 E27_1 E27_2 E28 E29 E30 E31 E32 E33 E34 E35 E36 E37 E38 E39_1 E39_2 E40 E41 E42 E43 E44 E45 E46 E47 E48 E49 E50 IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x — — x — x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x — — — x x x x x x — x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 463 Ref. Evaluation A [73] [74] [75] [76] [77] [78] E51 E52 E53 E54 E55 E56 B SQ5.2 C x x x x x x D A SQ5.3 B C x x x x x x Fig. 6 Evaluation technologies returned in the SMS Table 11 Classification of evaluation technologies and papers that used them Classification Paper method technique model tool framework approach [23–30, 34, 35, 37–43, 45, 47–52, 55–65, 67–70, 72–76, 78] [23–26, 28, 34, 36, 40, 41, 44, 46, 54, 55, 58–61, 63, 64, 67, 68, 71–74, 76–78] [32, 33] [66, 70] [31] [60] performed the entire first filter (reading the title and abstract of all papers). Then, the other researcher performed the first filter separately. If there were divergences in inclusion or exclusion decisions, researchers sought to resolve. If the researchers did not reach a joint conclusion, the paper was automatically approved for the second filter. A plausible justification was necessary if a paper 464 was excluded. In the second filter, the first researcher read all papers and extracted data. Then, the second researcher checked the excluded papers and their justifications and also the included papers and their extractions. Again, if there was any disagreement in decisions, the researchers tried to find a conclusion. One threat to validity may be in the search string. The search string was attempted to cover all synonyms that were close to NUI. It may have occurred of some papers using NUI were not returned because they were not using our keywords. However, despite this threat, we believe N = 56 is a relevant return for the search question presented. Another threat is some authors do not focus their work on usability and/or UX evaluations. In some cases, the authors explain and detail the entire evaluation process. However, in others, the authors focus on the development and design process, leaving much implicit information about evaluation, which generated doubts and multiple interpretations. To mitigate this threat, we discussed all justifications and paper exclusions. IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 7 Conclusion and future work This paper detailed the results obtained in our SMS. We showed the evidence found in digital libraries about technologies that were used to evaluate usability and/or UX in software that implements NUI. From a total of 246 papers, we selected 56 and extracted in our mapping after applying the first and second filters. The SMS results identified 30 technologies and their characteristics to verify if systems are useful and provide a good experience to users. Most of these studies focus on gesture-based interaction, which highlights the high usage of Kinect [82] and Leap Motion [84] devices. However, from the analysis performed, we found some research gaps. Most studies focus only on one criterion, usability or UX, such as SUS [85] (one of the most returned technology in our SMS) that evaluates only the usability. Moreover, regardless of whether usability is evaluated in conjunction with UX or not, most identified evaluations are not context-specific for NUI. In other words, they can be used to evaluate any system or service. When they are specific, technologies are designed just for the work in question, without providing a standard for replicability. Besides, most of the studies found to focus their results on quantitative analysis. However, we believe the mixed evaluation (quantitative and qualitative) provides more robust data for analysis that can be used by researchers according to their examination ways. Summarising, we identified the gaps [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] • There is a dearth of evaluations for voice, gaze, and multitouch, as most of them are made for gestures. • Few technologies were found that evaluated usability in conjunction with UX. When found, they were not specific to the NUI context. • Technologies used to evaluate usability and UX, even separately, are generally not NUI-specific. When specific, they are designed just for the work in question, with no standardised aspects which provide replicability. • Most evaluations are quantitative. [20] From these gaps, we observe as future work the possibility to develop a technology that evaluates usability and UX jointly, and that is specific to software which implements a NUI. The technology will aim to help researchers and developers who want to improve their NUIs. Besides, gaps identified in our SMS can serve as a basis for the realisation of other SMSs or SLRs. [24] 8 Acknowledgment [21] [22] [23] [25] [26] [27] This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. [28] 9 [29] [1] [2] [3] [4] [5] [6] [7] [8] [9] References Wigdor, D., Wixon, D.: ‘Designing natural user interfaces for touch and gesture’ (Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2011, 1st edn.) Norman, D.A.: ‘Natural user interfaces are not natural’, Interactions, 2010, 17, (3), pp. 6–10 Ballmer, S.: ‘CES 2010: a transforming trend – the natural user interface’. (Huffpost). Available at https://www.huffpost.com/entry/ces-2010-atransforming-t_b_416598 Fernández, R.A.S., Sanchez-Lopez, J.L., Sampedro, C., et al.: ‘Natural user interfaces for human-drone multi-modal interaction’. 2016 Int. Conf. on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 2016, pp. 1013– 1022 Madan, A., Dubey, S.K.: ‘Usability evaluation methods: a literature review’, Int. J. Eng. Sci. Technol. (IJEST), 2012, 4, (2), pp. 590–599 ISO/IEC 25010: ‘Software product quality requirements and evaluation system and software quality models’. International Organization for Standardization: Systems and software engineering – SQuaRE, 2011 Hassenzahl, M., Tractinsky, N.: ‘User experience – a research agenda’, Behav. Inf. Technol., 2006, 25, (2), pp. 91–97 ISO DIS 9241-210: ‘Part 210: human-centered design for interactive systems (formerly known as 13407)’. International Organization for Standardization. Ergonomics of Human System Interaction, 2010 Hassenzahl, M.: ‘User experience (UX): towards an experiential perspective on product quality’. Proc. of the 20th Int. Conf. of the Association Francophone d'Interaction Homme-Machine, Metz, France, 2008, pp. 11–15 IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 [30] [31] [32] [33] [34] [35] [36] [37] Vermeeren, A.P.O.S., Law, E.L.C., Roto, V., et al.: ‘User experience evaluation methods: current state and development needs’. Proc. of the 6th Nordic Conf. on Human-Computer Interaction: Extending Boundaries, Nova Iorque, EUA, 2010, pp. 521–530 Petersen, K., Vakkalanka, S., Kuzniarz, L.: ‘Guidelines for conducting systematic mapping studies in software engineering: an update’, Inf. Softw. Technol., 2015, 64, pp. 1–18 Kitchenham, B., Charters, S.: ‘Guidelines for performing systematic literature reviews in software engineering’ (University of Durham, Durham, 2007) Paz, F., Pow-Sang, J.A.: ‘Usability evaluation methods for software development: A systematic mapping review’. Proc. of 8th Int. Conf. on Advanced Software Engineering & its Applications (ASEA), Jeju Island, South Korea, 2015, pp. 1–4 Insfran, E., Fernandez, A.: ‘A systematic review of usability evaluation in web development’. Web Information Systems Engineering – WISE 2008 Workshops, Auckland, New Zealand, 2008 (LNCS, 5176) Zapata, B.C., Fernández-Alemán, J.L., Idri, A., et al.: ‘Empirical studies on usability of mhealth apps: a systematic literature review’, J. Med. Syst., 2006, 39, (1), pp. 1–19 Torres-Carrión, P., González-González, C., Bernal-Bravo, C., et al.: ‘Gesturebased children computer interaction for inclusive education: a systematic literature review’. Technology Trends. CITT 2018, Communications in Computer and Information Science, Babahoyo, Ecuador, 2018, vol. 895 Groenewald, C., Anslow, C., Islam, J., et al.: ‘Understanding 3d mid-air hand gestures with interactive surfaces and displays: a systematic literature review’. Proc. of the 30th Int. BCS Human Computer Interaction Conf.: Fusion!, Swindon, UK, 2016, pp. 43:1–43:13 Mewes, A., Hensen, B., Wacker, F., et al.: ‘Touchless interaction with software in interventional radiology and surgery: a systematic literature review’, Int. J. Comput. Assist. Radiol. Surg., 2006, 12, (2), pp. 291–305 Basili, V.R., Rombach, H.D.: ‘Towards a comprehensive framework for reuse: a reuse-enabling software evolution environment’ (University of Maryland, Maryland, USA, 1988) Delimarschi, D., Swartzendruber, G., Kagdi, H.: ‘Enabling integrated development environments with natural user interface interactions’. Proc. of the 22nd Int. Conf. on Program Comprehension, ICPC 2014, Hyderabad, India, 2014, pp. 126–129 Ivory, M.Y., Hearst, M.A.: ‘The state of the art in automating usability evaluation of user interfaces’, ACM Comput. Surv. (CSUR), 2001, 33, pp. 470–516 Roto, V., Obrist, M., Matilla, K.V.V.: ‘User experience evaluation methods in academic and industrial contexts’. Proc. of the Workshop on User Experience Evaluation Methods (UXEM'09), Uppsala, Sweden, 2009 Vallejo, V., Tarnanas, I., Yamaguchi, T., et al.: ‘Usability assessment of natural user interfaces during serious games: adjustments for dementia intervention’. 10th Int. Conf. Disability, Virtual Reality & Associated Technologies, Serpa, Portugal, 2014, pp. 10–26 Falcao, C., Lemos, A.C., Soares, M.: ‘Evaluation of natural user interface: a usability study based on the leap motion device’, Procedia Manuf., 2015, 3, pp. 5490–5495 Ismail, N.A., Pang, Y.Y.: ‘A multimodal interaction for map navigation and evaluation study of its usability’, ARPN J. Eng. Appl. Sci., 2015, 10, pp. 17962–17970 Macaranas, A., Antle, A.N., Riecke, B.E.: ‘What is intuitive interaction? Balancing users’ performance and satisfaction with natural user interfaces’, Interact. Comput., 2015, 27, (3), pp. 357–370 Carvalho, D., Bessa, M., Magalhȧes, L., et al.: ‘Age group differences in performance using diverse input modalities: insertion task evaluation’. Proc. of the XVII Int. Conf. on Human Computer Interaction, Salamanca Spain, 2016, pp. 12:1–12:8 Schröder, S., Loftfield, N., Langmann, B., et al.: ‘Contactless operating table control based on 3d image processing’. 36th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society, Chicago, Illinois, USA, 2014, pp. 388–392 Kawamoto, A.L.S., Martins, V.F., da Silva, F.S.C.: ‘Converging natural user interfaces guidelines and the design of applications for older adults’. 2014 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), San Diego, California, USA, 2014, pp. 2328–2334 Shishido, Y., Tsukagoshi, T., Yasuda, R., et al.: ‘Adaptive prompt system using a ghost shadowing approach: a preliminary development’. 2015 Int. Conf. on Virtual Rehabilitation (ICVR), Valencia, Spain, 2015, pp. 168–169 Canbulut, C.: ‘Usability of user interfaces based on hand gestures implemented using Kinect-ii and leap motion devices’. Int. Conf. on Information Technology, Singapore, Singapore, 2017, pp. 65–68 Erazo, O., Pino, J.A.: ‘Predicting task execution time on natural user interfaces based on touchless hand gestures’. Proc. of the 20th Int. Conf. on Intelligent User Interfaces, Atlanta, Georgia, USA, 2015, pp. 97–109 Erazo, O., Pino, J.A.: ‘Predicting user performance time for hand gesture interfaces’, Int. J. Ind. Ergon., 2018, 65, pp. 122–138 Rocha, T., Carvalho, D., Bessa, M., et al.: ‘Usability evaluation of navigation tasks by people with intellectual disabilities: a Google and Sapo comparative study regarding different interaction modalities’, Univers. Access Inf. Soc., 2017, 16, (3), pp. 581–592 d'Ornellas, M.C., Cargnin, D.J., Prado, A.L.C.: ‘Evaluating the impact of player experience in the design of a serious game for upper extremity stroke rehabilitation’, Studies Health Technol. Inf., 2015, 216, pp. 363–367 Kurschl, W., Augstein, M., Burger, T., et al.: ‘User modeling for people with special needs’, Int. J. Pervasive Comput. Commun., 2014, 10, (3), pp. 313– 336 Uebbing-Rumke, M., Gürlük, H., Jauer, M.L., et al.: ‘Usability evaluation of multi-touch-displays for TMA controller working positions’. Fourth SESAR Innovation Days, Madrid, Spain, 2014, pp. 1–10 465 [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] 466 Rybarczyk, Y., Cointe, C., Gonçalves, T., et al.: ‘On the use of natural user interfaces in physical rehabilitation: a web-based application for patients with hip prosthesis’, J. Sci. Technol. Arts, 2018, 10, (2), pp. 581–592 Milani, F., Rovadosky, D.N., de Ávila Mendes, T., et al.: ‘Usability evaluation of menus in a gesture-based game’. Proc. of the 15th Brazilian Symp. on Human Factors in Computing Systems, São Paulo, Brazil, 2016, pp. 37:1– 37:4 Kirst, D., Bulling, A.: ‘On the verge: voluntary convergences for accurate and precise timing of gaze input’. CHI Extended Abstracts, San Jose, California, USA, 2016, pp. 1–7 Zhu, D., Gedeon, T., Taylor, K.: ‘Moving to the centre: a gaze-driven remote camera control for teleoperation’, Interact. Comput., 2011, 23, (1), pp. 85–95 Cohen, L., Haliyo, S., Chetouani, M., et al.: ‘Intention prediction approach to interact naturally with the microworld’. 2014 IEEE/ASME Int. Conf. on Advanced Intelligent Mechatronics, Besançon, France, 2014, pp. 396–401 Sun, H.M., Cheng, H.H.: ‘The analogical transfer effect of user's experience on usability for gesture control interface’. 18th Pacific Asia Conf. on Information Systems, Chengdu, China, 2014, pp. 1–9 Economou, D., Doumanis, I., Argyriou, L., et al.: ‘User experience evaluation of human representation in collaborative virtual environments’, Pers. Ubiquitous Comput., 2017, 21, (6), pp. 989–1001 Tang, T.Y., Falzarano, M., Morreale, P.A.: ‘Assessment of the utility of gesture-based applications for the engagement of Chinese children with autism’, Univers. Access Inf. Soc., 2018, 17, (2), pp. 275–290 Hsiao, S.W., Lee, C.H., Yang, M.H., et al.: ‘User interface based on natural interaction design for seniors’, Comput. Hum. Behav., 2017, 75, pp. 147–159 McCaffery, J.P., Miller, A.H.D., Kennedy, S.E., et al.: ‘Exploring heritage through time and space: supporting community reflection on the highland clearances’. Digital Heritage Int. Congress, Marseille, France, 2013, pp. 371– 378 Ashok, V., Puzis, Y., Borodin, Y., et al.: ‘Web screen reading automation assistance using semantic abstraction’. Proc. of the 22nd Int. Conf. on Intelligent User Interfaces, Limassol, Cyprus, 2017, pp. 407–418 Fiorentino, M., Radkowski, R., Boccaccio, A., et al.: ‘Magic mirror interface for augmented reality maintenance: an automotive case study’. Proc. of the Int. Working Conf. on Advanced Visual Interfaces, Capri Island, Italy, 2016, pp. 160–167 Tang, G., Webb, P.: ‘The design and evaluation of an ergonomic contactless gesture control system for industrial robots’, J. Robot., 2018, 2018, pp. 1–10 Lee, J., Lee, C., Kim, G.J.: ‘Vouch: multimodal touch-and-voice input for smartwatches under difficult operating conditions’, J. Multimodal User Interfaces, 2017, 11, (3), pp. 289–299 Di-Nuovo, A., Broz, F., Wang, N., et al.: ‘The multi-modal interface of robotera multi-robot services tailored for the elderly’, Intell. Service Robot., 2018, 11, (1), pp. 109–126 Eckert, M., Gómez-Martinho, I., Meneses, J., et al.: ‘New approaches to exciting exergame-experiences for people with motor function impairments’, Sensors, 2017, 17, (2), pp. 1–22 Postolache, O., Lourenço, F., Dias Pereira, J.M., et al.: ‘Serious game for physical rehabilitation: measuring the effectiveness of virtual and real training environments’. 2017 IEEE Int. Instrumentation and Measurement Technology Conf. (I2MTC), Torino, Italy, 2017, pp. 1–6 Gustavsson, P., Syberfeldt, A., Brewster, R., et al.: ‘Human-robot collaboration demonstrator combining speech recognition and haptic control’, Procedia CIRP, 2017, 63, pp. 396–401 Profanter, S., Perzylo, A., Somani, N., et al.: ‘Analysis and semantic modeling of modality preferences in industrial human-robot interaction’. 2015 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 1812–1818 Mustafa, Z., Flores, J., Cotos, J.M.: ‘Multimodal user interaction for gis applications (mui-gis)’. XIX Int. Conf. on Human Computer Interaction, Palma, Spain, 2018 Rodrigues, M.A.F., Serpa, Y.R., Macedo, D.V., et al.: ‘A serious game to practice stretches and exercises for a correct and healthy posture’, Entertain. Comput., 2018, 28, pp. 78–88 Pérez Medina, J.L., González, M., Pilco, H.M., et al.: ‘Usability study of a web-based platform for home motor rehabilitation’, IEEE Access, 2019, 7, pp. 7932–7947 Muender, T., Gulani, S.A., Westendorf, L., et al.: ‘Comparison of mouse and multi-touch for protein structure manipulation in a citizen science game interface’, J. Sci. Commun., 2019, 18, (1), pp. 1–16 Nestorov, N., Hughes, P., Healy, N., et al.: ‘Application of natural user interface devices for touch-free control of radiological images during surgery’. 2016 IEEE 29th Int. Symp. on Computer-Based Medical Systems (CBMS), Belfast and Dublin, Ireland, 2016, pp. 229–234 Hsu, F., Lin, W.: ‘Human-oriented interaction with a tof sensor’. 2012 Southeast Asian Network of Ergonomics Societies Conf. (SEANES), Langkawi, Kedah, Malaysia, 2012, pp. 1–5 Kondori, F.A., Yousefit, S., Ostovar, A., et al.: ‘A direct method for 3d hand pose recovery’. 2014 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, 2014, pp. 345–350 Guimarães, M.D.P., Martins, V.F., Brega, J.R.F.: ‘A software development process model for gesture-based interface’. 2012 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), Seoul, Korea, 2012, pp. 2985–2990 Kazuma, T., Yoshida, E., Yu, Y., et al.: ‘Pseudohandwriting: new approach for oral presentation to have both advantages of slide and handwriting’. 2016 30th Int. Conf. on Advanced Information Networking and Applications Workshops (WAINA), Crans-Montana, Switzerland, 2016, pp. 461–465 Madni, T.M., Nayan, Y.B., Sulaiman, S., et al.: ‘Usability evaluation of orientation techniques for medical image analysis using tabletop system’. 2016 3rd Int. Conf. on Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 2016, pp. 477–482 [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] Carvalho, D., Bessa, M., Magalhães, L., et al.: ‘Age group differences in performance using distinct input modalities: a target acquisition performance evaluation’. 2017 24° Encontro Português de Computação Gráfica e Interação (EPCGI), Guimarães, Portugal, 2017, pp. 1–8 Vosinakis, S., Koutsabasis, P., Makris, D., et al.: ‘A kinesthetic approach to digital heritage using leap motion: the cycladic sculpture application’. 2016 8th Int. Conf. on Games and Virtual Worlds for Serious Applications (VSGAMES), Barcelona, Spain, 2016, pp. 1–8 Baćíková, M., Marićák, M., Vanćík, M.: ‘Usability of a domain-specific language for a gesture-driven ide’. 2015 Federated Conf. on Computer Science and Information Systems (FedCSIS), Lodz, Poland, 2015, pp. 909– 914 Caggianese, G., Gallo, L., Pietro, G.D.: ‘Design and preliminary evaluation of a touchless interface for manipulating virtual heritage artefacts’. 2014 Tenth Int. Conf. on Signal-Image Technology and Internet-Based Systems, Marrakech, Morocco, 2014, pp. 493–500 Zhao, L., Lu, X., Tao, X., et al.: ‘A kinect-based virtual rehabilitation system through gesture recognition’. 2016 Int. Conf. on Virtual Reality and Visualization (ICVRV), Hangzhou, China, 2016, pp. 380–384 Fabroyir, H., Teng, W., Wang, S., et al.: ‘Mapxplorer handy: an immersive map exploration system using handheld device’. 2013 Int. Conf. on Cyberworlds, Yokohama, Japan, 2013, pp. 101–107 Chatzidaki, E., Xenos, M.: ‘A case study on learning through natural ways of interaction’. 2017 IEEE Global Engineering Education Conf. (EDUCON), Athens, Greece, 2017, pp. 746–753 Li, A.X., Lou, X., Hansen, P., et al.: ‘On the influence of distance in the interaction with large displays’, J. Disp. Technol., 2016, 12, (8), pp. 840–850 Mäkelä, V., James, J., Keskinen, T., et al.: ‘‘It's natural to grab and pull’: retrieving content from large displays using mid-air gestures’, IEEE Pervasive Comput., 2017, 16, (3), pp. 70–77 Su, C.J., Chiang, C.Y., Huang, J.Y.: ‘Kinect-enabled home-based rehabilitation system using dynamic time warping and fuzzy logic’, Appl. Soft Comput., 2014, 22, pp. 652–666 Derboven, J., Roeck, D.D., Verstraete, M.: ‘Semiotic analysis of multi-touch interface design: the mutable case study’, Int. J. Hum.-Comput. Stud., 2012, 70, pp. 714–728 Deng, S., Jiang, N., Chang, J., et al.: ‘Understanding the impact of multimodal interaction using gaze informed mid-air gesture control in 3d virtual objects manipulation’, Int. J. Hum.-Comput. Stud., 2017, 105, pp. 68– 80 Nielsen, J.: ‘Usability engineering’ (Academic Press, Boston, 1993) Jordan, P.W.: ‘An introduction to usability’ (Taylor & Francis, Londres, 1998) Laugwitz, B., Held, T., Schrepp, M.: ‘Construction and evaluation of a user experience questionnaire’, in Holzinger, A. (Ed.): ‘HCI and usability for education and work’ (Springer, Berlin, Heidelberg, 2008), pp. 63–76 Microsoft: ‘Kinect for windows’. (Microsoft), accessed May 2019. Available at https://developer.microsoft.com/pt-br/windows/kinect Microsoft: ‘Kinect for windows sdk 2.0’. (Microsoft), accessed May 2019. Available at https://www.microsoft.com/en-us/download/details.aspx? id=44561 ‘Leap motion’. (Leap Motion), accessed May 2019. Available at https:// www.leapmotion.com Brooke, J.: ‘Sus: A ‘quick and dirty’ usability scale’, in Jordan, P.W., et al. (Eds.): ‘Usability evaluation in industry’ (Taylor & Francis, London, 1996), pp. 1–7 Ijsselsteijn, W.A., de Kort, Y.A.W., Poels, K.: ‘The game experience questionnaire’ (Technische Universiteit Eindhoven, Eindhoven, the Netherlands, 2013) Turunen, M., Hakulinen, J., Melto, A., et al.: ‘Suxes – user experience evaluation method for spoken and multimodal interaction’. Proc. of INTERSPEECH 2009, Brighton, UK, 2009, pp. 2567–2570 Lewis, J.R.: ‘Ibm computer usability satisfaction questionnaires: psychometric evaluation and instructions for use’, Int. J. Hum.-Comput. Interact., 1995, 7, (1), pp. 57–78 Vorderer, P., Wirth, W., Gouveia, F.R., et al.: ‘Mec spatial presence questionnaire (mec-spq)’. Report to the European Community, 2004 Tsai, T.W., Tsai, I.C.: ‘Aesthetic experience of proactive interaction with cultural art,’, Int. J. Arts Technol., 2009, 2, pp. 94–111 Lewis, J.R.: ‘Psychometric evaluation of an after-scenario questionnaire for computer usability studies: the asq’, SIGCHI Bull., 1991, 23, (1), pp. 78–81 Kirakowski, J., Corbett, M.: ‘Sumi: the software usability measurement inventory’, Br. J. Educ. Technol., 1993, 24, pp. 210–212 Bhuiyan, M., Picking, R.: ‘A gesture controlled user interface for inclusive design and evaluative study of its usability’, J. Softw. Eng. Appl., 2011, 4, (9), p. 513 Borg, G., Hassmen, P., Lagerstrom, M.: ‘Perceived exertion related to heart rate and blood lactate during arm and leg exercise’, Eur. J. Appl. Physiol. Occup. Physiol., 1987, 56, (6), pp. 679–685 Kalckert, A., Ehrsson, H.H.: ‘The moving rubber hand illusion revisited: comparing movements and visuotactile stimulation to induce illusory ownership’, Conscious Cogn., 2014, 26, pp. 117–132 WAMMI: ‘Website analysis and measurement inventory (wammi)’. (WAMMI), accessed December 2019. Available at http://www.wammi.com/ index.html Deci, E.L., Ryan, R.M.: ‘Intrinsic motivation and self-determination in human behavior’ (Plenum, New York, 1985) Lund, A.M.: ‘Measuring usability with the use questionnaire’, Usability Interface, 2011, 8, pp. 3–6 Hart, S.G., Staveland, L.E.: ‘Development of nasa-tlx (task load index): results of empirical and theoretical research’, in Hancock, P.A., Meshkati, N. (Eds.): ‘Human mental workload’. vol. 52 of Advances in Psychology. (Elsevier, North-Holland, 1988), pp. 139–183 IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 [100] Nielsen Norman Group: ‘Thinking aloud: the #1 usability tool’. (Jakob Nielsen), accessed December 2019. Available at http://www.nngroup.com/ articles/thinking-aloud-the-1-usability-tool IET Softw., 2020, Vol. 14 Iss. 5, pp. 451-467 © The Institution of Engineering and Technology 2020 467