COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 “Measuring User’s Perception and Opinion of Software Quality” Dimitris Stavrinoudis, Michalis Xenos, Pavlos Peppas, Dimitris Christodoulakis March, 1999 Abstract This paper presents a method for modeling users’ perception of software quality. The method aims to improve the quality of data derived from user opinion surveys and facilitate the analysis of such data. Additionally, using aspects of Belief Revision theory, the proposed model offers a way to measure users’ opinion in early stages of product release and a way of predicting the opinion subsequently formed after their opinion revisions using the initial measurements. The paper presents graphical examples from modeling users’ perception of software quality and of handling the users’ belief revision. Finally, conclusions from our case studies and data analysis are presented. ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 1 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ 1. Introduction The value of surveys is well recognized in measuring product quality characteristics, as the customers perceive these characteristics. Moreover, international standards such as ISO9001 (ISO9001, 1991), IEEE (IEEE, 1989), Baldrige (Brown, 1991) and CMM (Bate, 1995), (Curtis 1995) encourage software production companies to measure users’ perceived quality of their products. Not only are surveys indicators in themselves, but they also allow more sophisticated analysis techniques which are required of organizations with higher levels of quality maturity. Furthermore, surveys commonly allow one to focus on just the issues of interest; and, as a result, are quantifiable and thus provide a convenient way to bring the power of statistics to bear. However, in surveys some difficulties arise related to the quality of data, the high cost of conducting the survey and the usual revisions of the customers’ opinion of the product until these define themselves sufficiently. In addition, the procedure of measuring users’ perception of quality is timeconsuming, especially when conducting the survey when the results need to be reaffirmed by subsequent surveys. As a result, the following goals must be achieved: a) improve the accuracy of the input from the surveys, b) improve the analysis and the interpretation of the derived results from the surveys and c) find a way of modeling the belief revision of the customers so as, not only to extrapolate their opinion in the early stages of product release, but also their final opinion. The aim of this paper is to offer a model aimed at meeting the aforementioned goals, provide the guidelines for ensuring and improving the quality of the surveys’ data and, finally, analyze the data, thus allowing the prediction of users’ opinion revisions. In order to achieve the first goal, the opinion of users is evaluated in relation to their overall computerusing ability and their ability to use a particular product. As regards the second and the third goal, rules from Belief Revision theory and Grove’s Systems of Spheres (Grove, 1988) have been adapted within the proposed method. Our goal is to represent, in a comprehensive manner, the way that the opinion of users may be revised. Moreover, various cases of users and how to apply the Grove’s Systems of Spheres rules in each of them are presented and analysed. All the software quality factors with which the enduser is concerned are dealt with in our model to present the way a revision of the opinion of users in one factor may also change their opinion in others. In the next chapter the formulas used for measuring users’ opinion, the analysis and the findings of these measurements are presented. In chapter 3 rules of Grove’s System of Spheres and the proposed model are presented. 2. Modeling users’ perception measurements The conclusion reached in our previous research (Xenos, 1996) is that, although developeroriented and useroriented software quality measurements are highly correlated, satisfaction of internal quality standards does not guarantee a priori success in fulfilling the customers’ demand for quality. Consequently, the measurement and the evaluation of the opinion of users and perception of a software product are essential. What must also be taken under consideration is the differentiation of the users’ opinion of quality over time. The conclusions reached from the juxtaposition of this differentiation between the experienced and the inexperienced users are to be analyzed in the following sections. ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 2 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ 2.1 Measuring users’ opinion In order to measure users’ opinion of software quality we focused on the useroriented quality characteristics derived from the FactorCriteriaMetrics model (McCall, 1977). In order to collect the measurements of the users’ opinion, a multiplechoice format was used in the questionnaires in order to guide the user to select predefined responses that were ordered in interval scales (with choice bars, percentage estimations, etc.). This method can also be applied by focusing on the user-oriented quality characteristics derived from the ISO9126 (ISO9126, 1991) standard (functionality, reliability, efficiency, usability). Examples of the questions used in these questionnaires are the following: “What is your opinion of the product’s accuracy and consistency?”, “Is invalid data entry properly recognized?”, “Are all functions that relate to the window available when needed?”, “Is help available for each item and is it context sensitive?”. Formulas (1) and (2) were used in the surveys conducted. These formulas weigh users’ opinions according to their qualifications. In QWCO (Qualifications Weighed Customer Opinion) method (Xenos, 1995), which uses formula (1), Oi, measures the normalised measured results of user’s i opinion and Ei measures the qualifications of user i. Finally, n is the number of users who participated in the survey. Therefore, each user contributes to the average according to his/her qualifications. n QWCO O E i i 1 i (1) n E i 1 i In QWCODS (Qualifications Weighed Customer Opinion with Double Safeguards) (Xenos, 1997), a number of safeguards were embedded into the questionnaires. Safeguards are questions placed inside the questionnaire so as to measure the correctness of responses and not aimed at measuring user perceived quality. They are control questions aiming at detecting errors. In equation (2), Si is the number of safeguards that the user i has replied to correctly and ST is the total number of safeguards. Since the use of the QWCUDS technique implies the use at least of one safeguard in the questionnaire, division by S T is always valid. In this method, safeguards were used not only to detect errors when measuring customer’s opinion, but also to detect errors when measuring customers’ qualifications. In equation (2), Pi value can be 0 or 1. The value of Pi is zero when at least one error has been detected when measuring the qualifications of customer i. Pi value is set to 1 only if no error has been detected. This method results in the rejection of a customer’s responses if errors were detected while measuring his/her qualifications. Pi i 1 T n S Ei i Pi ST i 1 n Si O E S i QWCODS i (2) ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 3 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ These methods were used in various software products measuring the opinion of users, whose levels of experience differed. The results of the measurements and the analysis method are presented in the following section. 2.2 Analysing measurements over time In order to measure users’ opinion of a software product efficiently, surveys in fixed time intervals must be conducted. Despite the fact that such a practice cannot be applied in a professional setting due to the high cost, monthly surveys were conducted for the requirements of this research using the same sample of users for the same software products. For the analysis of the measurements, the users were divided into two main categories, the experienced and the inexperienced users. Figure 1 shows the derivations of these surveys. In this figure the limits of the differentiation of the user’s opinion over time are illustrated. The horizontal bar represents the time in monthly intervals and the vertical bar represents the user’s opinion, which was measured using the formulas mentioned above. The user’s opinion in each survey takes values from 0 to 1. The line AvOp represents the average users’ opinion of the quality of the software product, formed after the final opinion of users has been measured. The opinion of experienced users over time varies between the curves e1 and e2, whereas the opinion of inexperienced users over time varies between the curves ne1 and ne2. Figure 1: Boundaries of users opinion The experienced users, in contrast to the inexperienced, form an opinion for the quality of the product from the early stages of its release, which is very close to their final opinion. On the contrary, the inexperienced users will form an opinion close to their final opinion after using the software product for a long period of time. The length of this period depends on the complexity of the product, the number and the variety of the functions it supports, the amount of usage and the conditions under which usage occurs, as well as usage of similar software products. This period of time usually varies from six to twelve months, when the user is experienced in the use of this specific product. After a period of time, the line AvOp usually starts to decline as the user requirements usually increase over time. This phenomenon is dependent on factors, such as the similar software products that may be released and the advances in hardware. It was also observed that when an experienced user gives the software a higher score than his final score or vice versa, this does not display fluctuations but is seemed to slowly close the gap between the high or low score and the final score. Amongst inexperienced users, however, such predictable variability ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 4 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ was not observed; opinion fluctuated between widely ne1 και ne2. Over time, the degree of fluctuation receded to the users’ final opinion of the product quality. For example, the differentiation of inexperienced users’ opinion over time can be intimated by the diagram of figure 2, where UO represents an example of the changes in a user’s opinion. This fluctuation results from the inexperienced user either finding a new feature of the product, which has remained undiscovered or has uncovered some aspect of the product, which the user has sought and has not found up till now and, as a result, rates the product highly. Similarly, if the user uncovers a flaw in the product (whether real or perceived), the user will rate it lowly regardless of whether the aforementioned flaw could not have been avoided at the production stage. Figure 2: Fluctuation of inexperienced users’ opinion Software quality factors are not clearly perceived by inexperienced users. If they discover a characteristic indicating that the product fails in one particular factor, then they consider that the product fails in all the other areas as well. On the contrary, experienced users do clearly perceive the independent nature of these factors. After a justifiable time period, inexperienced users become accustomed to the new features or flaws they discover in the product and, as a result, their opinion begin to lean towards the final opinion as is the case with experienced users. 2.3 Using findings to improve how surveys are conducted From the measurements of the surveys, it is obvious that over time: a) the experienced users’ opinion of the quality of the software product approaches their final opinion and b) the deviation of the inexperienced users’ opinion from their final opinion declines continuously. Thus, the more a customer uses a product, the more weight must be given to his opinion. In other words, the time factor must also be taken into account for effective measurements of software quality. The analysis of section 2.2 revealed that, after a long period of time, inexperienced users will form an opinion that will be close enough to their final opinion of the quality of the product. The length of this period may surpass six months. As a result, in order to define the quality of a software product sufficiently in the early stages of its release, the sample of the users being asked must be restricted to experienced users. Additionally, in the early stages, the opinion of inexperienced users fluctuates greatly. Their opinion can be considered only if the sample of users is large enough to be considered representative, thus ensuring sound results. Moreover, the opinion of experienced users ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 5 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ should be given greater weight than that of the inexperienced users, regardless of their being fewer of the former. Furthermore, from the findings for individual user groups participating in the surveys, it was also observed that the larger the degree of fluctuation in their opinion, the more difficult it was for them to learn the features which are more relevant to their specific context. 3. Early predictions of users’ opinion From the findings it is clear that the weight of the customer’s opinion of the quality of a software product increases over time. As a result, every new characteristic of the product detected by the customer, which will undoubtedly lead to the revision of his belief for the product, must be considered. Moreover, with each newlydiscovered feature, users become more assured of the validity of their opinion. However, because of the high cost of surveys, a simpler, faster and more automated way must be found, in order to estimate the revision of the opinion of user groups without needing to conduct a new survey every while. 3.1 Using Systems of Spheres Rules from Belief Revision theory, Grove’s Systems of Spheres, have been adapted to meet the needs of this approach (Gardenfors, 1988), (Peppas, 1996). In this system, any belief set K can be represented by the subset [K] of M that consists of all maximal sets where all the sentences in K are included, where M is the set of all possible worlds that can be described in a propositional language L. Thus, a system of spheres centered on [K] is a collection S of subsets of M that can be represented figuratively in figure 3. In this system the more innermost the sphere is, the more possible the world centered on [K]. When a new sentence A appears to be true, with A [K], and A is always accepted to be reliable, the possible world must be revised in order to encompass A. So the closest sphere (SA) must be taken, where SA [A] , in order to have minimal changes to our first belief state. Our new world is now C(A)=[K*A]= SA [A]. Figure 3: Grove’s System of Spheres 3.2 Designing the model Adapting Grove’s System of Spheres rules, an alternative model for representing user’s opinion of software quality is presented. The points (e.g. the possible worlds) in this model ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 6 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ represent user perceptions of the software product according to the software quality factors. For every belief set K the user has for the quality of the product, there is a system of spheres S in M centered on [K] that shows where his belief set is presented in M. Furthermore, this belief set K must contain the user’s beliefs for the product in all the software quality factors. The first step for designing this model is to find various characteristics and events for each software quality factor, which indicate the users’ opinion for this factor. These characteristicsevents are the sentences that determine the possible worlds of the model. The second step is to represent all these possible worlds to the model. The model must be separated with a line or curve into two parts for every single event E. One where the event occurs (the Epart) and one where does not (the notEpart). If in a possible world Sx the user is determined that the event E is true, then the Sx must be located into the Epart and vice versa. But if the users are not determined for the event E (e.g. they don’t know whether it happens or not) then the Sx must be located in both. After separating the model with lines for every single event, knowing users’ opinion of these events, their possible world can be represented to the model. The third step is to determine how a possible world of a user group will change in the model, when users revise their opinion of an event. The new possible worlds must be delimited by the parts of the model in which the event does or does not occur. Moreover, it must be determined in what way the users’ belief in the other events may change or not. If it does change, this has direct and occasionally radical ramifications on their new possible world. These revisions must have minimal changes and after finishing with all of them, the new possible world will represent the new belief set of the user group about the quality of the software product. 3.3 Modeling users categories The analysis of section 2.2 revealed that users must be separated into categories according to their experience, since their opinion of a software product alters in a different way. As a result, the proposed model differs for each user category, because the revision of the opinion in one quality criterion will produce different results in the belief set of each user category. In other words, when inexperienced users discover that an event occurs or not, then their opinion for the quality of the product will change in a higher degree than the opinion of experienced users will. Software quality factors are not clearly perceived by inexperienced users and when they are either satisfied or dissatisfied with one, the other factors follow suit. As a result, a revision in their opinion of one quality factor leads to the revision in their opinion of others. On the contrary, according to the experienced users’ perception of quality, the different factors of quality are not seen as being interdependent. In the experienced users case, a revision in the opinion of one quality factor will affect the opinion of the other factors only if this revision is of a radical nature. The different level of interdependence among software quality factors, according to users’ opinion, leads to a model differently designed for each category of users. In the case of experienced users, the boundaries that declare whether an event occurs or not are presented in such a way, that a revision in an opinion of one event will result in minimal changes in their belief set. In other words, these boundaries are independent. Therefore, no areas that are dense in event boundaries are observed. Otherwise, if the belief set of a user was represented by a sphere designed into this area, a revision in the opinion of one event would lead to a radical revision, which is not observed in the case of experienced users. The model in this case can be intimated by figure 4. ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 7 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Figure 4: Model of experienced users’ category On the contrary, in the case of inexperienced users, every new characteristic of the software product that has been detected differentiates their opinion of all the software quality factors. As a result, the model in this case is designed in such a way, that the revision in users’ opinion of one factor leads to a revision of a radical nature. The possible world of inexperienced users is not as stable as in the experienced users case. Since inexperienced users have usually the opinion that the events of the model are interrelated, the boundaries of these events must be in close proximity. The model has areas that are dense in event boundaries and it is illustrated in figure 5. Figure 5: Model of inexperienced users’ category Figures 4 and 5 also illustrate the differentiation between the experienced users and the inexperienced users, after a belief revision. For example, in the new world C(A), derived from the revision in the event A, inexperienced users form an opinion of all the events completely different from their initial one (opinion in world [K]). On the contrary, experienced users could revise their opinion only in one additional event. In conclusion, software production companies using this method must adapt the proposed model to their particular needs. The boundaries of the events must be designed according to the weight given to each software quality factor. Further research is currently planned in order ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 8 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ to refine this model for each factor of the FactorCriteriaMetrics model (or for each quality characteristic of the ISO-9126 standard). In that case, the events that determine the possible worlds of the model will be related to the specific criteria of one single factor. Furthermore, software production companies will be able to predict users’ opinion separately for each software quality characteristic. 4. References Bate Roger, et al: A Systems Engineering Capability Maturity Model, Version 1.1. Software Engineering Institute, CMU/SEI-95-MM-003, November 1995. Brown M. G.: Baldrige Award Winning Quality: How to Interpret the Malcom Baldrige Award Criteria. Milwaukee, WI: ASQC Quality Press, 1991. Curtis Bill et al: People Capability Maturity Model. Software Engineering Institute, CMU/SEI-95-MM-02, September 1995. Gardenfors Peter: Knowledge in Flux Modeling the Dynamics of Epistemic States. MIT Press, Cambridge, England, ISBN 0-262-07109-6, 1988. Grove Adam: Two modelings for theory change. Journal of Philosophical Logic, 17, 157-170, 1988. IEEE: Standard for a Software Quality Metrics Methodology. P-1061/D20, IEEE Press, New York, 1989. ISO9001: Quality Management and Quality Assurance Standards. International Standard, ISO/IEC 9001: 1991. ISO9126: Software Product Evaluation - Quality Characteristics and Guidelines for their Use, ISO/IEC Standard ISO-9126, 1991 McCall J. A., Richards P. K. and Walters G. F.: Factors in Software Quality, Vols I, II, III. US Rome Air Development Center Reports NTIS AD/A-049 014, 015, 055, 1977. Peppas Pavlos: Well Behaved and Multiple Belief Revision. European Conference on Artificial Intelligence, 1996. Xenos M. and Christodoulakis D.: Software Quality: The User’s Point of View. pp. 266-272 of Software Quality and Productivity, Chapman & Hall, ISBN: 0-412-62960-7, 1995. Xenos M., Stavrinoudis D. and Christodoulakis D.: The Correlation Between Developeroriented and User-oriented Software Quality Measurements (A Case Study). 5th European Conference on Software Quality, EOQ-SC, Dublin, pp. 267-275, 1996. Xenos M. and Christodoulakis D.: Measuring Perceived Software Quality. Information and Software Technology Journal, Butterworth Publications UK, Vol. 39, pp. 417-424, June 1997. ___________________________________________________________________________________ TECHNICAL REPORT No. TR 99/03/04 9