International Research Journal of Computer Science and Information Systems (IRJCSIS) Vol. 2(5) pp. 73-85, July, 2013 Available online http://www.interesjournals.org/IRJCSIS Copyright©2013 International Research Journals Full Length Research Paper Health Informatics System requirements dependency analysis as audit facilitator Marcelo Antonio de Carvalho Junior*1 Paulo Roberto de Lima Lopes2 1 2 Frederico Molina Cohrs Ivan Torres Pisa 1 Health Informatics Management – Master/Doctor degree program, EPM/UNIFESP 2 Departament of Health Informatics, EPM/UNIFESP *Corresponding Author Email: carvalho.junior@unifesp.br Accepted June 25, 2013 INTRODUCTION / BACKGROUND: Audit requirements provide for software quality assurance and can be performed in different moments of its lifecycle. Functional and Non-Functional requirements relate each other at least once but in many cases at a higher rate, causing dependency and correlation effects that may increase the complexity of audit work. OBJECTIVES: Discuss methods to aid auditors for a better understanding of system requirements dependency and correlation. Provide for an audit test script proposition based on given requirement text set by automatic means. METHOD: Propose and test a semi-automated sequential process for requirements text analysis. Evaluate the requirements connection depth established using nonsemantical frequency-based data-mining approach. CONCLUSIONS: Requirements data-mining can help auditors to foresee cascading effects on audit checking, possibly reducing the need for repeated evidence gathering and the effort while planning and conducting complex assurance and compliance analysis on health informatics systems. Keywords: Ambulatory Care Information Systems (L01.700.508.300.680.030), Computerized Medical Records Systems (E05.318.308.940.968.625), Computer Systems Evaluation (L01.224.230), Systems Analysis (L01.906), System Compliance. INTRODUCTION Software requirements are a set of desired conditions or behaviors that can be used to express system’s specific characteristics or in a wide approach establishing generic standards for sector or industry scope reference. Deviations within systems and their requirements are error prone, possibly causing software to act unexpectedly or unpredictably. Determining the adherence and conformity to applicable requirements is desirable from system’s design to delivery stages and even after its deployment for use. Software quality assurance aims to increase confidence in the quality of system throughout independent audit and review based on a reference requirements set. Test techniques include, for instance, the process of executing a program or application so that system processes and flows are shown. It can be performed either by simply observing system’s responses or by interpreting data-flow using additional tools. The aim is check whether planned and desired system behavior is achieved by checking its performance and characteristics during and after software development. The system development model used affects the interactions among developers and stakeholders as well as the development approach and the tests to be 74 Int. Res. J. Comput. Sci. Inform. Syst. conducted. Although generically described as audit at this paper, depending on the moment performed this verification can be simply called test. System audits can also provide for risk identification. Software development frequently includes scope changing or adaption, integration and compatibility issues. The ability to preview and prevent malfunctions, bugs and errors is important for system developers and buyers. For this purpose, system auditing can provide confidence. Audits that attest one or more requirements set bound to a system are usually related to a specific certification seal, which visually identify the product with that requirements set name or quality program, and bonds trust to its potential users or buyers. As system requirement tends to evolve, granted seals are not static. They need to be renewed frequently and hence repeatedly audited. New or updated requirements means new tests needed during audit. System requirements set however tend to interleave and relate each other in many ways as they focus system’s specific conditions. Meeting requirements expectations can hence demand for tight project control and senior professional team. Depending on industry or system applicability field, the set of requirements involved can grow considerably. Health informatics is now a “hot topic” in terms of requirements and specifications worldwide. International standard development organizations such as International Organization for Standardization (ISO), American Society of Technology and Materials (ASTM) among others has dedicated working groups and committees to address health informatics specifics. Several standards and guidelines are available including ISO 21090:2011 - Health informatics - Harmonized data types for information interchange, ISO 27799:2008 Health informatics - Information security management in health, ASTM E1384 - 07 Standard Practice for Content and Structure of the Electronic Health Record (EHR) or ASTM E1869 - 04(2010) Standard Guide for Confidentiality, Privacy, Access, and Data Security Principles for Health Information Including Electronic Health Records for example. Generally speaking, a Health Informatics System’s (HIS) utility is determined by both its functional and nonfunctional requirements (FR and NRF) profile. Although not limited to, common NFR characteristics usually relates to: security, performance, cultural/political and operational specifications (McGregor et al., 2008). At least one functional to non-functional relationship is seen although many-to-many are possible. Access to Protected Health Information (PHI) along with many other particularly important and regulated systems characteristics turn HIS’s subjected to several FR/NFR requirements. Considering the medical decision support provided by HIS and the risk of audit error associated to the mentioned cascading requirement correlation issue, the understanding of textual bound between them is of paramount importance. This article discusses the impact of those diverse requirements contents relationships at the moment of system audit procedure. These relationships are meant to validate software observed behavior in comparison to one or more requirement document used. Each system requirement evaluation is then placed in a dependency context and the multiple FR and NFR audit approach is discussed by using requirement summarization to construct an optimized script test using a semiautomated framework. The use of data-mining is suggested to analyze textual requirements contents using automated tools and unsupervised techniques so auditors can previously detect existent relationships or dependencies allowing better audit tasks preparation and execution. The use of unsupervised approaches gets around the issue of costly training data and presents an optimal solution for auditors with equivalent accuracy (Nomoto and Matsumot, 2001). More specifically, the use of correlation tools is useful for auditing tasks, as the human perception capacity for those requirements attributes relationships analysis is considerably limited. As the system requirements tend to be complex and considering that the auditor may reference more than one requirement standard or guideline set at the time, the use of mathematical calculations from automated tools are more than welcome to ease this process. Furthermore, visual behavior is the most important cognitive mode of human beings (Wen-Jie and Yan, 2010). Therefore, despite the fact that system audit demands for fine and granular review of reference requirements from auditor, the opportunity to visualize requirements correlations can consistently reduce time consuming audit checks due repetition of requirements tests using traditional audit script test construction. The following sections of this paper describe auditor main activities as a background for readers, study objective and suggested method phases description and ends by discussing and commenting the method in a use-case scenario for a particular chosen requirement set. Audit attributions The ability to understand software requirements implications in terms of system/software expected behavior is perhaps the auditor´s most important desired skill. The audit approval for a given system under evaluation is subjected to demonstrated evidence of sufficient adherence to the reference requirement used for analysis. Carvalho et al. 75 This is done by manual and automated approaches and they are highly dependent upon auditor´s capabilities to visualize system’s characteristics and the environment that it’s deployed. Audit´s trust and recognition is also an auditor dependent variable, meaning that his title is usually bound to a third party issuing certification entity attesting his presumably minimal skill set. Tsung-Hui Lu et al. cite Charles M. Ray and Randy McCoy (Tsung-Hui et al., 2011), stating that the certification will bring benefits such as higher morale and commitment to the task. There are different international auditor titles such as CISA (CISA- http://www.isaca.org/Certification/CISACertified-Information-Systems-Auditor/How-to-BecomeCertified/Pages/default.aspx (Last viewed 6-20-2012), IRCA, IRCA http://www.irca.org/engb/certification/schemes/ (Last viewed 6-20-2012) and others available from information systems associations worldwide that can attest for the necessary body of knowledge for a professional information systems auditor candidate but none of them are HIS specific at this moment. Certified professionals must observe a code of ethics from those associations that must guide all their auditor activities during certification period. There are few initiatives worldwide that seek to bring these auditor skills available in training course format. In Brazil, a pioneer project called proTICS developed by Brazilian Society of Health Informatics (SBIS) (Protics, 2012) is gathering the essentials HIS related knowledge content into a course proposal for technical training and subsequent certification (cpTICS) of professionals who wish to deal with system construction and support. By the same token, COMPTIA has developed a certification program (Health IT Deployment) (Comptia HITCertificationhttp://certification.comptia.org/getCertifie d/ certifications/hittech.aspx (Last viewed 6-20-2012) based on Health Insurance Portability and Accountability Act (HIPAA) that is being internationally offered to technicians. Again, unfortunately none of these programs are audit focused. The auditor’s knowhow and profile are hence not trivial. Not only because they are required to technically recognize system’s parts, but integrations and expected inputs and outputs from its various components, auditor’s relational skills are important as they need to navigate within corporate hierarchical structure. This is particular useful as the audit is performed including interviewing auditee’s system developers, key-users and even steering committees and stakeholders as part of the necessary previous system usage understanding stage as also for the main validations and checking steps. Although ethics and highly specialized profile are prone electing a person as qualified auditor for a specific scope, also the capability to selectively gather meaningful information during audit preparation, execution and report phases is certainly hard to find and necessary to the several audit attributions. The audit tasks performed include in depth study and understanding of in focus evaluation requirements in order to conduct the audit properly (see Figure 1 for simplified audit plan representation). This is a particularly extensive and time-consuming task as the auditor depends on the requirement text for an appropriate test scenario or script needed to be planned and developed. According to ISACA´s (ISACA holder´s mandatory process) audits guideline (ISACA, 2010), the plan stage main tasks are listed below. The last two are directly related to the requirement textual study proposed on this paper: • The IS auditor should plan the information systems audit coverage to address the audit objectives and comply with applicable laws and professional auditing standards. • The IS auditor should develop and document a riskbased audit approach. • The IS auditor should develop and document an audit plan that lists the audit detailing the nature and objectives, timing and extent, objectives and resources required. • The IS auditor should develop an audit program and/or plan and detailing the nature, timing and extent of the audit procedures required to complete the audit. Study objective The present study aims to apply text-mining techniques to summarize system requirements and correlate them using clustering. The correlation finding might aid system auditors to better understand link-patterns within different requirements and hence allowing dependency-aware test script construction for system evaluation purpose. METHODS In opposition to single/individual requirement test traditionally used by system auditors, we propose a method for better foreseeing possible dependencies. This is done using data-mining techniques and auditor expertise to build a more efficient test script that potentially covers more than one requirement and, more importantly, identifies correlations among them. Auditor is keen on understanding this relationship as it affects his audit plan and execution. The process is hence divided in two main parts, being the second highly dependent from auditor´s perspective and experience. 76 Int. Res. J. Comput. Sci. Inform. Syst. Figure 1. Audit plan phase example We use text analysis to indicate terms that are likely to be considered by auditor due its overall relevance while promoting a script test proposition for a specific function or behavior evaluation (see Figure 2 proposed vs. traditional audit method representation) during audit. To do so, we weight words frequency in the relevant requirement documents set, reducing scope. The rationale is to identity the relevant terms collection conserving the main requirements meaning. Document summarization by frequency analysis is a well-known and proven approach method to process these documents (Reeve et al., 2007). There are a variety of scripts and automated tools available for this purpose. We choose a freely available option that is later cited while study case and method application is described. We then group these output found using clustering. This allows separate views of the potential terms to be used for auditor´s manual selection and script construction in a second stage. A higher dependency degree (shorter angle between terms) indicates a more consistent requirement terms similarity. Different terms similarity between terms A and B can be calculated, and the measure ranges from 0 to 1 (90º to 0º cosine) can describe less to more similar respectively. That association results in a list of terms that can lead to better requirements correlation understanding by the auditor. Thus, data reduction increases scale by allowing auditors to find relevant “audit-important” terms in fulltext system requirements from several sources more quickly through automated means. Also, it allows assimilating only essential information from many texts with reduced effort (Bing et al., 2005). The suggested approach though does not only focus in the timesaving strategy of testing more requirements in a single system behavior assessment. It meant to prevent that different requirements be inappropriately approved whilst independently examined, provided that this dependencies are found. We suggest a generic Requirements Dependency Discovery Method (RD2M) for that purpose using the mentioned tools as a system audit facilitator. This structured method includes five different phases partially automated: Phase 1- Document selection and import. At this phase, the auditor defines the requirement document to be used during audit and then import them into the analysis tool. There is no difference here from traditional approach for the selection process portion as in both cases the initial audit plan includes gathering the references to be used. This auditor’s task is usually dictated by industry applicable regulations in use or by system’s builder discretion. After document selection, all the necessary requirements to be used are imported into the tool database or reference repository. Relational databases are likely the choice for this process but depending on the data-mining tool used, simply placing the text documents in a directory can suffice to allow reading. In our method, we use only the contents of title and body for text analysis. Phase 2- Document pre-processing. This phase main objective is to filter unnecessary content from the imported documents and perform term indexing. The requirements are here tokenized by the tool, meaning the words are split and are placed at internal database for subsequent processes. By choosing web-site pages as requirement sources for previous step, an additional Carvalho et al. 77 Figure 2. Audit plan using proposed method vs. traditional approach module (process) needs to be called now so all the html markers and codes are striped off. Terms (tokens) are then parsed to low case and stopwords are removed using applicable word-lists. Phase 3- Frequency analysis. In this phase, summarization techniques are applied. The requirements terms are selected by computing its term frequency (tf) considering their appearance within the document texts under analysis and term frequency–inverse document frequency (tf.idf) (Teng et al., 2008; Salton and Buckley, 1988) for Vector Space Model (VSM) construction indicating a certain dependency (similarity) degree among them. This statistical model is a classic approach for textual semantics acquisition method. The model represents the document set as a vector of terms and its coefficients. We then assign the importance to the terms extracted from requirement set body and title. Thus, the importance of terms is calculated by combination of the two methods (Ko et al., 2002). To finish this phase, we apply the resulted data-set in a k-means clustering algorithm (cosine similarity) for grouping. The idea is allow for visual interpretation of vectors that are pointing roughly in the same direction and hence can be grouped. Phase 4- Semantics and system environment analysis. The audit plan analysis now considers an additional manual stage here, where the dependency found by similarity will also be reviewed from semantics stand point. Requirements words alone can refer very distinct system desired behavior. This phase requires auditors to check analysis results in order to select terms that can benefits audit script test construction and also assesses system environment to decide test applicability. Phase 5- Audit test script. At this final phase, the auditor manually selects the terms that are related to the system requirements under evaluation and proposes a scenario assessment or script test to be executed. The test scripts resulted from this phase can describe actions that once performed will cover all or part of termscollections found. Another approach for test script construction can be, for example, identifying the presence of selected terms within System’s function/module under analysis. A well designed script can aid auditor to cover not only feature directly related requirements but its correlated ones. Considering the traditional approach, the proposed method directly affects audit plan in terms of information available to auditor. He can now perform similar audit preparation tasks but considerably better assisted. A conceptual process to illustrate the automated and manual RD2M phases mentioned can be seen at Figure 3 below: 78 Int. Res. J. Comput. Sci. Inform. Syst. Figure 3. Requirements dependency finding via partially automated process RESULTS: CASE STUDY Requirements analysis process discussion CCHIT certification process is an independently developed method that includes a rigorous inspection of an electronic health record (ehr) integrated functionality, interoperability and security in systems. It was developed through a voluntary contribution and consensus-based process engaging diverse industry/society stakeholders and has a very similar certification process to SBIS/CFM (SBIS Certification – www.sbis.org.br/site/site.dll/view?pagina=25 (Last viewed 6-20-2012 – Portuguese version only) used in Brazil. The certification is part of a u.s. government project but is used as reference in different countries either directly or as a source to build nationwide requirements standards. Recently, Eun Young Heo et al. (2012) discussed the CCHIT applicability in Korea considering existent local EHRs characteristics in terms of functions, processes and government policies and then mapping the differences for analysis. The CCHIT’s requirements set was chosen for case study due similarities to the Brazilian process and its worldwide use. For sake of text mining illustration, only ambulatory requirements were considered. This particular requirement set and all others can be found at http://www.cchit.org/. Although described as a suggested conceptual process, we have implemented the generic automated phases using RapidMiner tool from http://rapid-i.com using the above described requirements for demonstration and discussion. We discuss a few aspects of the proposed method that can affect its proposed objective. The comments hereby described cover whole method analysis and phases in particular when applicable. We focus on the observations made whilst applying it into the case study scenario and comments that were found useful to guide its implementation. Although the theories, algorithms and techniques here described are well known and supported by numbers of other papers, we stress the method assembly at this text portion. The following section also presents findings using the methods to leverage discussion. For the sake of discussion, we choose the “access-control” term to promote direct comments on processes. The potential reduced efficiency or limitation of RD2M applicability may vary from different aspects. For example, the use of traditional method based on keyword importance only cannot inform the semantics of requirements itself, even after being clustered. The semantic association is a very subjective concept as it cannot be extracted from text itself without considering the domain knowledge. The lack of the domain knowledge leads to incoherence of textual semantics and the bad understanding of text. This burdens auditor with semantics filtering and association responsibilities. Instead of the statistical method used at RD2M’s phase 3 (VSM), other possibilities for requirement semantics retrieval could be used for the same objective: a) Probability topic models, such as Author Topic Mode Carvalho et al. 79 (ATM), Author-recipient-topic model ARn (McCallum et al., 2004) or Correlated Topic Models (CTM) (Blei and Lafrerty, 2006); b) Ontology based models, such as ontology inference layer (OIL) (Fikes and McGuinness, 2001); c) Cognition based model such as Element Fuzzy Cognitive Map (EFCM) (Luo and Yao, 2006) or concept algebra based model and Associated linked Network model (ALN) (Luo et al., 2011). Due the fact that the starting point phase demands requirements reading by text-analysis tool used, we found important to programmatically automate this process. The majority of requirement sets available for system requirements can be obtained in three different formats: Spreadsheets, Text documents and PDF’s images. As for the latest no easy retrieval method was found, we then defined the other two as input formats. Thus, the last option demands manual transformation in one of the two accepted formats. An import script routine using Active Server Pages (ASP) was created to ease this process and populate a MySQL (www.mysql.com) database containing author, requirement set name, year, version, requirement title, requirement code or number and requirement text body. After selected from standard or guidelines to be used by auditor following the RD2M process, a total of 283 text requirements entries were loaded into the automated tool for the particular case study scenario. It included requirements title and body which, if considered separately, would double this figure. We found the title inclusion convenient, as in most cases it allows semantic distortion reduction as the requirement scope tends to be more meaningfully described (Ko et al., 2002). At the term indexing process, we initially counted 558 distinct terms from a total of 1,439,888. We then continue pre-processing this data by filtered stop-words, identifying 467 different terms for subsequent analysis. An important influence to be considered at this task is the stop-word dictionary used for pre-processing in order to increase dependency search relevance. For instance, the term “and”, which is not focus of summarization process, appeared 229,500 times in 173 different requirements. As most of the standards and guidelines available and used in Brazil for HIS audit are written in Brazilian Portuguese and English, we choose to add Snowball’s dictionaries for both languages from http://snowball.tartarus.org/. For this particularly study case demonstration, the embedded English dictionary was used. Words were grouped using one and two-gram analysis. After n-gram processing we have extracted the same 467 terms by selecting 1-gram, and 942 when we used 2gram with a text analysis tool. We found that the use of phrase (2-gram) dependency instead of single word eases this process while keeping the summarization focus as well. For instance, considering the use case scenario, the term “access” alone appeared 27 times at TF analysis, while the “access control” 2-grams phrase resulted only 4 occurrences. For text association phase though, when the auditor wanted to see requirement dependencies, the phrase result made greater sense and binds test scripts tighter. The previously mentioned single and two-gram similarity distortion can be seen here (Figure 4) as part of auditor’s first look for patterns using tf.idf. At the representation (Figure 4), x-axis shows CCHIT requirements reference number where the example terms “access” and “access control” (first and second image respectively) while the y-axis represents correlation term coefficient. We could see about 20,1% of requirements items somehow correlated to the example terms and approximately 16.9% (3.1% reduction) while using 2-grams representation only. We handled the resulting 942 terms from earlier phases with cluster processing. The word/phrase vector was then created using different weighting attributes for words/phrases located at requirement title and body (arbitrarily set as 1.0 and 0.5 respectively). Central word/phrase requirement meaning was then selected by clustering requirement texts and by choosing the centroid element found (Sathya et al., 2011). This rationale was used for segmentation decision on case study scenario. Several other approaches can take place here depending on auditor’s desired test construction perspective. Some requirements sets include even inner segmentation by categories or domain division that can lead the auditor to this decision. As one of the method’s objective is to look for correlations outside the requirement itself, the number of requirements is not a suitable segmentation value. This number is auditor-dependent as the K definition is a prior decision for clustering analysis considering method´s algorithm and is a required setting for the used tool. Several methods for defining optimal K values can be used, as cited by Keke Chen and Liu (2009) either statistical or visual-based (distance functions and density concepts), including Best-K Plot (BKPlot) method proposition. Suitable values should balance aggregation and meaningfulness objectives. The clustering result for CCHIT’s dataset was grouped into 20(see Table 1) divisions by used text-analysis tool considering this balance by results observation only. As the clustering process is relatively fast, the auditor can perform various attempts to find a suitable K value that denotes centroids meanings accordingly. Although RD2M proposes unsupervised approach every time possible (partially automated phases 1, 2, and 3), this is a step where we hardly see a fixed number indefinitely. 80 Int. Res. J. Comput. Sci. Inform. Syst. Figure 4. CCHIT requirements TF-IDF weighting representation for “access” and “access-control” terms Table 1. CCHIT requirements grouping after data-set clustering formation Carvalho et al. 81 This is perhaps the most valuable visual information so far as the auditor can start constructing further manual analysis from the associated items checking its semantics relations (choosing the 10 most relevant requirements relationships for instance) in order to try to construct the script test that consistently validates adherence. Comparing different standards in different languages can have huge influence on the auditor´s initial plan material. Depending on the language used, a specific term can have multiple meanings. Also considering the previous example terms for sake of this requirements semantics association analysis using clustering grouping, the terms and phrases (2-grams) that could be used for initial script test could include Cluster 1,3 and 18 based on relevance (see Figure 5). The centroid terms for those groups found were “users”, “authorized” and “privileges”. Cluster 1 content example is shown below. A possible manual filtering resulting from auditor review for this matter is represented in bold. The next expected associated task hereafter would be another manual task resulting on the proposed script construction. This stage is highly auditor dependent as the method semantics limitation must be considered. Thus, the results should aid and guide the audit preparation tasks, but the auditor’s expertise and requirements understanding are crucial. For example, by choosing the access and access control terms for test script formulation, the auditor needs to be aware that other terms not resulted from RD2M may need to be considered. The term authentication for instance has highly semantic relationship degree and yet may not be listed within cluster, simply because the lookup terms were not part of requirement description (body) or title. It appears 12 times at title portion of study case requirement set but 3 only containing the lookup terms at body portion. Extensive test scripts connecting all related requirements terms are not feasible or desirable. Instead, auditor should focus on script proposition that takes account of those relations for better testing the system. Audit evidence gathering impact Audit evidence may vary from print-screens, video record, document copies, digital documents among others. Auditors must collect evidences as proof they checked and successfully or not approve the evaluated systems against one of more requirement. By designing a multiuse script test using the discussed method, auditors may assess the audit evidence gathered to make sure they represent dependency requirements sufficiently. As the evidence must demonstrate system’s characteristics presence and may be used for report purposes at final audit stages a special care must be taken for this task. Disperse requirements considerations and analysis efficiency Although we are processing single requirement set from data-base, no representative impact was seen while using the automated tools to assess different sources. For the sake of terms similarity studies though, different requirement sources may contain different vocabulary even while describing same topic. The correlation purposes hence being affected. We consider also that in some cases TF count can be significantly reduced if the documents under analysis are referencing each other or external documents. In these cases the text is implicit within references and, hence, will not be considered as part of analysis. The auditor in this case can adapt discussed process by importing external referenced texts topic/session into the database. Time reducing and potential ROI Although not intended to provide for a concrete fixed number as a benefit of RD2M method use on HIS audits, we did perform three comparison tests to position reader with an possible ROI and process improvement idea. The main rationale to justify suggested approach lays on the efficiency, but also reducing audit hours and hence related project costs. The tests were performed by two different auditors, not familiar to the proposed method and only required to execute the optimized script generated. Thus, the differences that could emerge from individual interpretation and script construction while applying suggested method were minimized. The particular traditional scenario compared already has a test-script proposed by CCHIT. As result, the time-frame comparison initially points a disadvantage of using RD2M since the traditional script is ready-to-use by default. Not all the system requirements comes with a built-in testscript though. In fact this is quite rare among existent HIS’ requirements sets. Even though, the number of testing procedures and evidence gathering was significantly reduced compensating the initial disadvantage found in the comparison. Also, the overall timeframe of audit execution was diminished in roughly 11% as seen in table 3 below and later at Figure 6. Considering the most likely scenario where the traditional audit would also require test script production and hence related time investment on that phase, the positive results of RD2M usage is presumably more aggressive. 82 Int. Res. J. Comput. Sci. Inform. Syst. 18 3 1 Figure 5. Access-control clustering distribution found system system_enforce enforce enforce_restrictive restrictive restrictive_set set set_rights rights rights_privileges privileges privileges_accesses accesses accesses_users users users_groups groups groups_system system system_administration administration administration_clerical clerical clerical_nurse nurse nurse_doctor doctor doctor_processes processes processes_acting acting acting_behalf behalf behalf_users users users_performance performance performance_specified specified specified_tasks tasks tasks_access access access_control control system system_able able able_associate associate associate_permissions permissions permissions_user user user_using using using_access access access_controls controls controls_user user user_based based based_access access access_rights rights rights_assigned assigned assigned_user user user_role role role_based based based_users users users_grouped grouped grouped_access access access_rights rights rights_assigned assigned assigned_groups groups groups_context context context_based based based_role role role_based based based_additional additional additional_access access access_rights rights rights_assigned assigned assigned_restricted restricted restricted_based based based_context context context_transaction transaction transaction_time time time_day day day_workstation workstation workstation_location location Carvalho et al. 83 Table 3. RD2M (R) vs traditional (T) audit execution comparison Figure 6 Overall timeframe comparison represantation (traditional vs RD2M) comparison represantation (traditional vs RD2M) The feedbacks provided by auditors after conducting the proposed test was that for some scripts, only one evidence gathering wouldn´t suffice script FR/NFR validation. The probable explanation for it is that the auditors would select the terms for script construction in a different way we did. The computational consumption is obviously affected by the amount of requirements processed. Even though, the manual activities burden to auditor represents the majority of time expended. Related research Resource usage The process described does not demand special computer hardware to be performed as is not heavily resource consuming. For this project, a common server hardware setup including Intel Xenon 1.60GHz processor, 6GB RAM on Windows Server 64 bits platform was used. The main processes time involved while performing the tasks described are listed at the table 4 below. The summarization, meaning reduction and text mining on health informatics standards and guidelines requirements approach discussed here are to be used as intermediate steps in the project “Translational Security, Operational and Functional HIS evaluation using a Brazilian comparative evolutional reference model” researched by the author at Sao Paulo’s Federal University (UNIFESP). The project aims building a stage divided evaluation scale model to be used at HIS maturity evaluation. 84 Int. Res. J. Comput. Sci. Inform. Syst. Table 4 Computer processing consumption for main used processes It uses several requirements documents sources applicable as reference to build or improve systems. They are positioned progressively into the model metrics initially fragmenting these sources by using requirement text-mining, TF and summarization techniques and then expert team applying classification using weight punctuation to the found key-requirement words (so called attractors). The last stage performed using Delphi method to survey IT, Administrators and Physicians to collect their opinions (the three most likely health informatics systems responsible in Brazil) as importance weighting input. Not only responding if a given system can suffice minimal characteristics to be used as HIS, the model means to position it into the maturity scale ruler created. CONCLUSION The HIS audit is not a trivial task. Especially considering the auditor profile and his skill set needed to allow successful analysis of the diverse and complex reference requirements available, it’s important to try to reduce the time consuming text review phase burden. As requirements dependency directly affects auditor’s ability to attest for overall requirements adherence, he should assess possible conflicts or correlations for each and every specification occurrence. Current text-mining techniques can considerably reduce auditor’s work related to requirements dependency understanding, allowing a concise and efficient audit script test construction/adaptation. Although limited by semantic relation not properly covered by the studied method in this paper, the text correlation can aid auditor’s work both during requirement understanding and system testing. The use of test scripts considering requirements correlations helps the auditor with a broader perception of system behavior. A single test script considering multiple requirement correlation can hence provide for evaluation of diverse aspects from related requirements on an integrated approach. ACKNOWLEDGMENT This research received no specific grant from any funding agency in the public, commercial, or not-forprofit sectors. REFERENCES Bing Q, Ting L, Sheng L (2005). MDS based on sub-topic NCIRCS-2005, Beijing, 2005. Blei D, Lafrerty D (2006). Correlated topic models. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 2006. Chen K, Liu L (2009).“Best K”: critical clustering structures in categorical datasets, 2009. CISA-http://www.isaca.org/Certification/CISA-Certified-InformationSystems-Auditor/How-to-Become-Certified/Pages/default.aspx (Last viewed 6-20-2012). Comptia HIT Certification -http://certification.comptia.org/getCertified/ certifications/hittech.aspx (Last viewed 6-20-2012). Fikes R, McGuinness D (2001). An Axiomatic Semantic Semantics for RDF, RDF Schema and DAML+OIL, 2001. Heo EY, Hwang H, Kim EH, Cho EY, Lee KH, Kim TH, Kim KD, Baek RM, Yoo S, et al, (2012). Comparing the Certification Criteria for CCHITCertified Ambulatory EHR with the SNUBH's EHR Functionalities, 2012. IRCA - http://www.irca.org/en-gb/certification/schemes/ (Last viewed 6-202012). ISACA (2010). IT Standards, Guidelines, and Tools and Techniques for Audit and Assurance and Control Professionals, 2010. Ko Y, Park J, Seo J (2002). Improving text categorization using the importance of sentences, 2002. Luo X, Xu Z, Yu J, Chen X (2011). Building Association Link Network for Semantic Link on Web Resources. IEEE Robotics and Automation Society, 2011. Luo X, Yao E (2006). The Reasoning Mechanism of Fuzzy Cognitive Maps. Proceeding of the First International Conference on Semantics, Knowledge, and Grid, 2006. McCallum A, Corrada-Ernmanuel A, Wang X (2004). The author-recipienttopic model for topic and role discovery in social networks. IDIAP, 2004. Carvalho et al. 85 McGregor C, Percival J, Curry J, Foster D, Anstey E, Churchill D (2008). A Structured Approach to Requirements Gathering Creation Using PaJMa Models, 2008. Nomoto T, Matsumot Y (2001). An Experimental Comparison of Supervised and Unsupervised Approaches to Text Summarization, 2001. Protics (2012). Competências Essenciais do Profissional de Informática em Saúdehttp://www.sbis.org.br/protics/Competencias_Informatica_Saude _SBIS_proTICS_v_1_0_consulta_publica_2012.pdf (Last viewed 620-2012 – Portuguese version only). Reeve LH, Han H, Ari D Brooks (2007). The use of domain-specific concepts in biomedical text summarization. Information Processing and Management, 2007. Salton G, Buckley C (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988. Sathya M, Jayanthi J, Basker N (2011). Link Based K-Means Clustering Algorithm for Information Retrieval, 2011. SBIS Certification - www.sbis.org.br/site/site.dll/view?pagina=25 (Last viewed 6-20-2012 – Portuguese version only). Teng Z, Liu Y, Ren F, Tsuchiya S, Ren F (2008). Single Document Summarization Based on Local Topic Identification andWord Frequency, 2008. Tsung-Hui L, Li-Yun C, Zhe-Jung L (2011). Integrating Security Certification with IT Education, 2011. Wen-Jie W, Yan X (2010). Correlation analysis of visual verbs subcategorization based on pearson’s correlation coefficient, 2010.