Document 14081118

advertisement
International Research Journal of Computer Science and Information Systems (IRJCSIS) Vol. 2(5) pp. 73-85,
July, 2013
Available online http://www.interesjournals.org/IRJCSIS
Copyright©2013 International Research Journals
Full Length Research Paper
Health Informatics System requirements dependency
analysis as audit facilitator
Marcelo Antonio de Carvalho Junior*1 Paulo Roberto de Lima Lopes2
1
2
Frederico Molina Cohrs Ivan Torres Pisa
1
Health Informatics Management – Master/Doctor degree program, EPM/UNIFESP
2
Departament of Health Informatics, EPM/UNIFESP
*Corresponding Author Email: carvalho.junior@unifesp.br
Accepted June 25, 2013
INTRODUCTION / BACKGROUND: Audit requirements provide for software quality assurance and
can be performed in different moments of its lifecycle. Functional and Non-Functional
requirements relate each other at least once but in many cases at a higher rate, causing
dependency and correlation effects that may increase the complexity of audit work. OBJECTIVES:
Discuss methods to aid auditors for a better understanding of system requirements dependency
and correlation. Provide for an audit test script proposition based on given requirement text set by
automatic means. METHOD: Propose and test a semi-automated sequential process for
requirements text analysis. Evaluate the requirements connection depth established using nonsemantical frequency-based data-mining approach. CONCLUSIONS: Requirements data-mining can
help auditors to foresee cascading effects on audit checking, possibly reducing the need for
repeated evidence gathering and the effort while planning and conducting complex assurance and
compliance analysis on health informatics systems.
Keywords: Ambulatory Care Information Systems (L01.700.508.300.680.030), Computerized Medical
Records Systems (E05.318.308.940.968.625), Computer Systems Evaluation (L01.224.230), Systems
Analysis (L01.906), System Compliance.
INTRODUCTION
Software requirements are a set of desired conditions or
behaviors that can be used to express system’s specific
characteristics or in a wide approach establishing generic
standards for sector or industry scope reference.
Deviations within systems and their requirements are
error prone, possibly causing software to act
unexpectedly or unpredictably.
Determining the adherence and conformity to
applicable requirements is desirable from system’s
design to delivery stages and even after its deployment
for use. Software quality assurance aims to increase
confidence in the quality of system throughout
independent audit and review based on a reference
requirements set. Test techniques include, for instance,
the process of executing a program or application so that
system processes and flows are shown. It can be
performed either by simply observing system’s responses
or by interpreting data-flow using additional tools. The
aim is check whether planned and desired system
behavior is achieved by checking its performance and
characteristics during and after software development.
The system development model used affects the
interactions among developers and stakeholders as
well as the development approach and the tests to be
74 Int. Res. J. Comput. Sci. Inform. Syst.
conducted. Although generically described as audit at
this paper, depending on the moment performed this
verification can be simply called test.
System audits can also provide for risk identification.
Software development frequently includes scope
changing or adaption, integration and compatibility
issues. The ability to preview and prevent malfunctions,
bugs and errors is important for system developers and
buyers. For this purpose, system auditing can provide
confidence. Audits that attest one or more requirements
set bound to a system are usually related to a specific
certification seal, which visually identify the product with
that requirements set name or quality program, and
bonds trust to its potential users or buyers. As system
requirement tends to evolve, granted seals are not static.
They need to be renewed frequently and hence
repeatedly audited. New or updated requirements means
new tests needed during audit.
System requirements set however tend to interleave
and relate each other in many ways as they focus
system’s specific conditions. Meeting requirements
expectations can hence demand for tight project control
and senior professional team. Depending on industry or
system applicability field, the set of requirements
involved can grow considerably.
Health informatics is now a “hot topic” in terms of
requirements and specifications worldwide. International
standard
development
organizations
such
as
International Organization for Standardization (ISO),
American Society of Technology and Materials (ASTM)
among others has dedicated working groups and
committees to address health informatics specifics.
Several standards and guidelines are available including
ISO 21090:2011 - Health informatics - Harmonized data
types for information interchange, ISO 27799:2008 Health informatics - Information security management in
health, ASTM E1384 - 07 Standard Practice for Content
and Structure of the Electronic Health Record (EHR) or
ASTM E1869 - 04(2010) Standard Guide for
Confidentiality, Privacy, Access, and Data Security
Principles for Health Information Including Electronic
Health Records for example.
Generally speaking, a Health Informatics System’s
(HIS) utility is determined by both its functional and nonfunctional requirements (FR and NRF) profile. Although
not limited to, common NFR characteristics usually
relates to: security, performance, cultural/political and
operational specifications (McGregor et al., 2008). At
least one functional to non-functional relationship is seen
although many-to-many are possible. Access to
Protected Health Information (PHI) along with many
other particularly important and regulated systems
characteristics turn HIS’s subjected to several FR/NFR
requirements.
Considering the medical decision support provided
by HIS and the risk of audit error associated to the
mentioned cascading requirement correlation issue, the
understanding of textual bound between them is of
paramount importance.
This article discusses the impact of those diverse
requirements contents relationships at the moment of
system audit procedure. These relationships are meant
to validate software observed behavior in comparison to
one or more requirement document used. Each system
requirement evaluation is then placed in a dependency
context and the multiple FR and NFR audit approach is
discussed by using requirement summarization to
construct an optimized script test using a semiautomated framework.
The use of data-mining is suggested to analyze
textual requirements contents using automated tools and
unsupervised techniques so auditors can previously
detect existent relationships or dependencies allowing
better audit tasks preparation and execution. The use of
unsupervised approaches gets around the issue of costly
training data and presents an optimal solution for
auditors with equivalent accuracy (Nomoto and
Matsumot, 2001). More specifically, the use of
correlation tools is useful for auditing tasks, as the
human perception capacity for those requirements
attributes relationships analysis is considerably limited.
As the system requirements tend to be complex and
considering that the auditor may reference more than
one requirement standard or guideline set at the time,
the use of mathematical calculations from automated
tools are more than welcome to ease this process.
Furthermore, visual behavior is the most important
cognitive mode of human beings (Wen-Jie and Yan,
2010). Therefore, despite the fact that system audit
demands for fine and granular review of reference
requirements from auditor, the opportunity to visualize
requirements correlations can consistently reduce time
consuming audit checks due repetition of requirements
tests using traditional audit script test construction.
The following sections of this paper describe auditor
main activities as a background for readers, study
objective and suggested method phases description and
ends by discussing and commenting the method in a
use-case scenario for a particular chosen requirement
set.
Audit attributions
The ability to understand software requirements
implications in terms of system/software expected
behavior is perhaps the auditor´s most important desired
skill.
The audit approval for a given system under
evaluation is subjected to demonstrated evidence of
sufficient adherence to the reference requirement used
for analysis.
Carvalho et al. 75
This is done by manual and automated approaches and
they are highly dependent upon auditor´s capabilities to
visualize system’s characteristics and the environment
that it’s deployed.
Audit´s trust and recognition is also an auditor
dependent variable, meaning that his title is usually
bound to a third party issuing certification entity attesting
his presumably minimal skill set.
Tsung-Hui Lu et al. cite Charles M. Ray and Randy
McCoy (Tsung-Hui et al., 2011), stating that the
certification will bring benefits such as higher morale and
commitment to the task.
There are different international auditor titles such as
CISA (CISA- http://www.isaca.org/Certification/CISACertified-Information-Systems-Auditor/How-to-BecomeCertified/Pages/default.aspx (Last viewed 6-20-2012),
IRCA,
IRCA
http://www.irca.org/engb/certification/schemes/ (Last viewed 6-20-2012) and
others available from information systems associations
worldwide that can attest for the necessary body of
knowledge for a professional information systems auditor
candidate but none of them are HIS specific at this
moment. Certified professionals must observe a code of
ethics from those associations that must guide all their
auditor activities during certification period.
There are few initiatives worldwide that seek to bring
these auditor skills available in training course format.
In Brazil, a pioneer project called proTICS developed by
Brazilian Society of Health Informatics (SBIS) (Protics,
2012) is gathering the essentials HIS related knowledge
content into a course proposal for technical training and
subsequent certification (cpTICS) of professionals who
wish to deal with system construction and support.
By the same token, COMPTIA has developed a
certification program (Health IT Deployment) (Comptia
HITCertificationhttp://certification.comptia.org/getCertifie
d/ certifications/hittech.aspx (Last viewed 6-20-2012)
based on Health Insurance Portability and Accountability
Act (HIPAA) that is being internationally offered to
technicians.
Again, unfortunately none of these programs are audit
focused.
The auditor’s knowhow and profile are hence not trivial.
Not only because they are required to technically
recognize system’s parts, but integrations and expected
inputs and outputs from its various components, auditor’s
relational skills are important as they need to navigate
within corporate hierarchical structure. This is particular
useful as the audit is performed including interviewing
auditee’s system developers, key-users and even
steering committees and stakeholders as part of the
necessary previous system usage understanding stage
as also for the main validations and checking steps.
Although ethics and highly specialized profile are
prone electing a person as qualified auditor for a
specific
scope, also the capability to selectively gather
meaningful information during audit preparation,
execution and report phases is certainly hard to find and
necessary to the several audit attributions.
The audit tasks performed include in depth study and
understanding of in focus evaluation requirements in
order to conduct the audit properly (see Figure 1 for
simplified audit plan representation). This is a particularly
extensive and time-consuming task as the auditor
depends on the requirement text for an appropriate test
scenario or script needed to be planned and developed.
According to ISACA´s (ISACA holder´s mandatory
process) audits guideline (ISACA, 2010), the plan stage
main tasks are listed below. The last two are directly
related to the requirement textual study proposed on this
paper:
• The IS auditor should plan the information systems
audit coverage to address the audit objectives and
comply with applicable laws and professional auditing
standards.
• The IS auditor should develop and document a riskbased audit approach.
• The IS auditor should develop and document an audit
plan that lists the audit detailing the nature and
objectives, timing and extent, objectives and resources
required.
• The IS auditor should develop an audit program and/or
plan and detailing the nature, timing and extent of the
audit procedures required to complete the audit.
Study objective
The present study aims to apply text-mining techniques
to summarize system requirements and correlate them
using clustering. The correlation finding might aid system
auditors to better understand link-patterns within different
requirements and hence allowing dependency-aware test
script construction for system evaluation purpose.
METHODS
In opposition to single/individual requirement test
traditionally used by system auditors, we propose a
method for better foreseeing possible dependencies. This
is done using data-mining techniques and auditor
expertise to build a more efficient test script that
potentially covers more than one requirement and, more
importantly, identifies correlations among them. Auditor
is keen on understanding this relationship as it affects his
audit plan and execution.
The process is hence divided in two main parts, being
the second highly dependent from auditor´s perspective
and experience.
76 Int. Res. J. Comput. Sci. Inform. Syst.
Figure 1. Audit plan phase example
We use text analysis to indicate terms that are likely to
be considered by auditor due its overall relevance while
promoting a script test proposition for a specific function
or behavior evaluation (see Figure 2 proposed vs.
traditional audit method representation) during audit.
To do so, we weight words frequency in the relevant
requirement documents set, reducing scope. The
rationale is to identity the relevant terms collection
conserving the main requirements meaning. Document
summarization by frequency analysis is a well-known and
proven approach method to process these documents
(Reeve et al., 2007). There are a variety of scripts and
automated tools available for this purpose. We choose a
freely available option that is later cited while study case
and method application is described.
We then group these output found using clustering.
This allows separate views of the potential terms to be
used for auditor´s manual selection and script
construction in a second stage.
A higher dependency degree (shorter angle between
terms) indicates a more consistent requirement terms
similarity. Different terms similarity between terms A and
B can be calculated, and the measure ranges from 0 to 1
(90º to 0º cosine) can describe less to more similar
respectively.
That association results in a list of terms that can lead
to better requirements correlation understanding by the
auditor. Thus, data reduction increases scale by allowing
auditors to find relevant “audit-important” terms in fulltext system requirements from several sources more
quickly through automated means. Also, it allows
assimilating only essential information from many texts
with reduced effort (Bing et al., 2005).
The suggested approach though does not only focus
in the timesaving strategy of testing more requirements
in a single system behavior assessment. It meant to
prevent that different requirements be inappropriately
approved whilst independently examined, provided that
this dependencies are found.
We suggest a generic Requirements Dependency
Discovery Method (RD2M) for that purpose using the
mentioned tools as a system audit facilitator. This
structured method includes five different phases partially
automated:
Phase 1- Document selection and import. At this
phase, the auditor defines the requirement document to
be used during audit and then import them into the
analysis tool. There is no difference here from traditional
approach for the selection process portion as in both
cases the initial audit plan includes gathering the
references to be used. This auditor’s task is usually
dictated by industry applicable regulations in use or by
system’s builder discretion.
After document selection, all the necessary requirements
to be used are imported into the tool database or
reference repository. Relational databases are likely the
choice for this process but depending on the data-mining
tool used, simply placing the text documents in a
directory can suffice to allow reading. In our method, we
use only the contents of title and body for text analysis.
Phase 2- Document pre-processing. This phase main
objective is to filter unnecessary content from the
imported documents and perform term indexing. The
requirements are here tokenized by the tool, meaning the
words are split and are placed at internal database for
subsequent processes. By choosing web-site pages
as requirement sources for previous step, an additional
Carvalho et al. 77
Figure 2. Audit plan using proposed method vs. traditional approach
module (process) needs to be called now so all the html
markers and codes are striped off.
Terms (tokens) are then parsed to low case and stopwords are removed using applicable word-lists.
Phase 3- Frequency analysis. In this phase,
summarization techniques are applied. The requirements
terms are selected by computing its term frequency (tf)
considering their appearance within the document texts
under analysis and term frequency–inverse document
frequency (tf.idf) (Teng et al., 2008; Salton and Buckley,
1988) for Vector Space Model (VSM) construction
indicating a certain dependency (similarity) degree
among them. This statistical model is a classic approach
for textual semantics acquisition method. The model
represents the document set as a vector of terms and its
coefficients.
We then assign the importance to the terms extracted
from requirement set body and title. Thus, the
importance of terms is calculated by combination of the
two methods (Ko et al., 2002).
To finish this phase, we apply the resulted data-set in
a k-means clustering algorithm (cosine similarity) for
grouping. The idea is allow for visual interpretation of
vectors that are pointing roughly in the same direction
and hence can be grouped.
Phase 4- Semantics and system environment
analysis. The audit plan analysis now considers an
additional manual stage here, where the dependency
found by similarity will also be reviewed from semantics
stand point. Requirements words alone can refer very
distinct system desired behavior. This phase requires
auditors to check analysis results in order to select terms
that can benefits audit script test construction and also
assesses system environment to decide test applicability.
Phase 5- Audit test script. At this final phase, the
auditor manually selects the terms that are related to the
system requirements under evaluation and proposes a
scenario assessment or script test to be executed. The
test scripts resulted from this phase can describe actions
that once performed will cover all or part of termscollections found. Another approach for test script
construction can be, for example, identifying the
presence
of
selected
terms
within
System’s
function/module under analysis.
A well designed script can aid auditor to cover not
only feature directly related requirements but its
correlated ones.
Considering the traditional approach, the proposed
method directly affects audit plan in terms of information
available to auditor. He can now perform similar audit
preparation tasks but considerably better assisted.
A conceptual process to illustrate the automated and
manual RD2M phases mentioned can be seen at Figure
3 below:
78 Int. Res. J. Comput. Sci. Inform. Syst.
Figure 3. Requirements dependency finding via partially automated process
RESULTS: CASE STUDY
Requirements analysis process discussion
CCHIT certification process is an independently
developed method that includes a rigorous inspection of
an electronic health record (ehr) integrated functionality,
interoperability and security in systems. It was developed
through a voluntary contribution and consensus-based
process engaging diverse industry/society stakeholders
and has a very similar certification process to SBIS/CFM
(SBIS
Certification
–
www.sbis.org.br/site/site.dll/view?pagina=25 (Last viewed
6-20-2012 – Portuguese version only) used in Brazil.
The certification is part of a u.s. government project
but is used as reference in different countries either
directly or as a source to build nationwide requirements
standards. Recently, Eun Young Heo et al. (2012)
discussed the CCHIT applicability in Korea considering
existent local EHRs characteristics in terms of functions,
processes and government policies and then mapping
the differences for analysis.
The CCHIT’s requirements set was chosen for case
study due similarities to the Brazilian process and its
worldwide use. For sake of text mining illustration, only
ambulatory requirements were considered. This
particular requirement set and all others can be found at
http://www.cchit.org/.
Although described as a suggested conceptual
process, we have implemented the generic automated
phases using RapidMiner tool from http://rapid-i.com
using
the
above
described
requirements for
demonstration and discussion.
We discuss a few aspects of the proposed method that
can affect its proposed objective. The comments hereby
described cover whole method analysis and phases in
particular when applicable.
We focus on the observations made whilst applying it
into the case study scenario and comments that were
found useful to guide its implementation. Although the
theories, algorithms and techniques here described are
well known and supported by numbers of other papers,
we stress the method assembly at this text portion. The
following section also presents findings using the
methods to leverage discussion. For the sake of
discussion, we choose the “access-control” term to
promote direct comments on processes.
The potential reduced efficiency or limitation of RD2M
applicability may vary from different aspects. For
example, the use of traditional method based on keyword
importance only cannot inform the semantics of
requirements itself, even after being clustered. The
semantic association is a very subjective concept as it
cannot be extracted from text itself without considering
the domain knowledge. The lack of the domain
knowledge leads to incoherence of textual semantics and
the bad understanding of text. This burdens auditor with
semantics filtering and association responsibilities.
Instead of the statistical method used at RD2M’s phase
3 (VSM), other possibilities for requirement semantics
retrieval could be used for the same objective:
a) Probability topic models, such as Author Topic Mode
Carvalho et al. 79
(ATM), Author-recipient-topic model ARn (McCallum et
al., 2004) or Correlated Topic Models (CTM) (Blei and
Lafrerty, 2006);
b) Ontology based models, such as ontology inference
layer (OIL) (Fikes and McGuinness, 2001);
c) Cognition based model such as Element Fuzzy
Cognitive Map (EFCM) (Luo and Yao, 2006) or concept
algebra based model and Associated linked Network
model (ALN) (Luo et al., 2011).
Due the fact that the starting point phase demands
requirements reading by text-analysis tool used, we
found important to programmatically automate this
process. The majority of requirement sets available for
system requirements can be obtained in three different
formats: Spreadsheets, Text documents and PDF’s
images. As for the latest no easy retrieval method was
found, we then defined the other two as input formats.
Thus, the last option demands manual transformation in
one of the two accepted formats.
An import script routine using Active Server Pages (ASP)
was created to ease this process and populate a MySQL
(www.mysql.com)
database
containing
author,
requirement set name, year, version, requirement title,
requirement code or number and requirement text body.
After selected from standard or guidelines to be used by
auditor following the RD2M process, a total of 283 text
requirements entries were loaded into the automated tool
for the particular case study scenario. It included
requirements title and body which, if considered
separately, would double this figure. We found the title
inclusion convenient, as in most cases it allows semantic
distortion reduction as the requirement scope tends to be
more meaningfully described (Ko et al., 2002).
At the term indexing process, we initially counted 558
distinct terms from a total of 1,439,888. We then
continue pre-processing this data by filtered stop-words,
identifying 467 different terms for subsequent analysis.
An important influence to be considered at this task is the
stop-word dictionary used for pre-processing in order to
increase dependency search relevance. For instance, the
term “and”, which is not focus of summarization process,
appeared 229,500 times in 173 different requirements.
As most of the standards and guidelines available and
used in Brazil for HIS audit are written in Brazilian
Portuguese and English, we choose to add Snowball’s
dictionaries
for
both
languages
from
http://snowball.tartarus.org/. For this particularly study
case demonstration, the embedded English dictionary
was used.
Words were grouped using one and two-gram analysis.
After n-gram processing we have extracted the same 467
terms by selecting 1-gram, and 942 when we used 2gram with a text analysis tool.
We found that the use of phrase (2-gram) dependency
instead of single word eases this process while
keeping the summarization focus as well. For instance,
considering the use case scenario, the term “access”
alone appeared 27 times at TF analysis, while the
“access control” 2-grams phrase resulted only 4
occurrences. For text association phase though, when
the auditor wanted to see requirement dependencies, the
phrase result made greater sense and binds test scripts
tighter.
The previously mentioned single and two-gram
similarity distortion can be seen here (Figure 4) as part of
auditor’s first look for patterns using tf.idf. At the
representation (Figure 4), x-axis shows CCHIT
requirements reference number where the example
terms “access” and “access control” (first and second
image respectively) while the y-axis represents
correlation term coefficient. We could see about 20,1%
of requirements items somehow correlated to the
example terms and approximately 16.9% (3.1%
reduction) while using 2-grams representation only.
We handled the resulting 942 terms from earlier
phases with cluster processing.
The word/phrase vector was then created using different
weighting attributes for words/phrases located at
requirement title and body (arbitrarily set as 1.0 and 0.5
respectively).
Central word/phrase requirement meaning was then
selected by clustering requirement texts and by choosing
the centroid element found (Sathya et al., 2011). This
rationale was used for segmentation decision on case
study scenario. Several other approaches can take place
here depending on auditor’s desired test construction
perspective. Some requirements sets include even inner
segmentation by categories or domain division that can
lead the auditor to this decision. As one of the method’s
objective is to look for correlations outside the
requirement itself, the number of requirements is not a
suitable segmentation value.
This number is auditor-dependent as the K definition is
a prior decision for clustering analysis considering
method´s algorithm and is a required setting for the used
tool. Several methods for defining optimal K values can
be used, as cited by Keke Chen and Liu (2009) either
statistical or visual-based (distance functions and density
concepts), including Best-K Plot (BKPlot) method
proposition.
Suitable values should balance aggregation and
meaningfulness objectives. The clustering result for
CCHIT’s dataset was grouped into 20(see Table 1)
divisions by used text-analysis tool considering this
balance by results observation only. As the clustering
process is relatively fast, the auditor can perform various
attempts to find a suitable K value that denotes centroids
meanings accordingly.
Although RD2M proposes unsupervised approach
every time possible (partially automated phases 1, 2, and
3), this is a step where we hardly see a fixed number
indefinitely.
80 Int. Res. J. Comput. Sci. Inform. Syst.
Figure 4. CCHIT requirements TF-IDF weighting representation for “access” and “access-control” terms
Table 1. CCHIT requirements grouping after data-set clustering formation
Carvalho et al. 81
This is perhaps the most valuable visual information so
far as the auditor can start constructing further manual
analysis from the associated items checking its
semantics relations (choosing the 10 most relevant
requirements relationships for instance) in order to try to
construct the script test that consistently validates
adherence.
Comparing different standards in different languages
can have huge influence on the auditor´s initial plan
material. Depending on the language used, a specific
term can have multiple meanings.
Also considering the previous example terms for sake
of this requirements semantics association analysis using
clustering grouping, the terms and phrases (2-grams)
that could be used for initial script test could include
Cluster 1,3 and 18 based on relevance (see Figure 5).
The centroid terms for those groups found were “users”,
“authorized” and “privileges”.
Cluster 1 content example is shown below. A possible
manual filtering resulting from auditor review for this
matter is represented in bold.
The next expected associated task hereafter would be
another manual task resulting on the proposed script
construction.
This stage is highly auditor dependent as the method
semantics limitation must be considered. Thus, the
results should aid and guide the audit preparation tasks,
but the auditor’s expertise and requirements
understanding are crucial.
For example, by choosing the access and access
control terms for test script formulation, the auditor needs
to be aware that other terms not resulted from RD2M
may need to be considered. The term authentication for
instance has highly semantic relationship degree and yet
may not be listed within cluster, simply because the
lookup terms were not part of requirement description
(body) or title. It appears 12 times at title portion of study
case requirement set but 3 only containing the lookup
terms at body portion.
Extensive test scripts connecting all related
requirements terms are not feasible or desirable. Instead,
auditor should focus on script proposition that takes
account of those relations for better testing the system.
Audit evidence gathering impact
Audit evidence may vary from print-screens, video
record, document copies, digital documents among
others. Auditors must collect evidences as proof they
checked and successfully or not approve the evaluated
systems against one of more requirement.
By designing a multiuse script test using the discussed
method, auditors may assess the audit evidence
gathered to make sure they represent dependency
requirements sufficiently. As the evidence must
demonstrate system’s
characteristics presence and may be used for report
purposes at final audit stages a special care must be
taken for this task.
Disperse requirements
considerations
and
analysis
efficiency
Although we are processing single requirement set from
data-base, no representative impact was seen while
using the automated tools to assess different sources.
For the sake of terms similarity studies though, different
requirement sources may contain different vocabulary
even while describing same topic. The correlation
purposes hence being affected.
We consider also that in some cases TF count can be
significantly reduced if the documents under analysis are
referencing each other or external documents. In these
cases the text is implicit within references and, hence,
will not be considered as part of analysis. The auditor in
this case can adapt discussed process by importing
external referenced texts topic/session into the database.
Time reducing and potential ROI
Although not intended to provide for a concrete fixed
number as a benefit of RD2M method use on HIS audits,
we did perform three comparison tests to position reader
with an possible ROI and process improvement idea. The
main rationale to justify suggested approach lays on the
efficiency, but also reducing audit hours and hence
related project costs. The tests were performed by two
different auditors, not familiar to the proposed method
and only required to execute the optimized script
generated. Thus, the differences that could emerge from
individual interpretation and script construction while
applying suggested method were minimized. The
particular traditional scenario compared already has a
test-script proposed by CCHIT. As result, the time-frame
comparison initially points a disadvantage of using RD2M
since the traditional script is ready-to-use by default. Not
all the system requirements comes with a built-in testscript though. In fact this is quite rare among existent
HIS’ requirements sets. Even though, the number of
testing procedures and evidence gathering was
significantly
reduced
compensating
the
initial
disadvantage found in the comparison. Also, the overall
timeframe of audit execution was diminished in roughly
11% as seen in table 3 below and later at Figure 6.
Considering the most likely scenario where the traditional
audit would also require test script production and hence
related time investment on that phase, the positive
results of RD2M usage is presumably more aggressive.
82 Int. Res. J. Comput. Sci. Inform. Syst.
18
3
1
Figure 5. Access-control clustering distribution found
system system_enforce enforce enforce_restrictive
restrictive restrictive_set set set_rights rights
rights_privileges privileges privileges_accesses
accesses accesses_users users users_groups groups
groups_system system system_administration
administration administration_clerical clerical
clerical_nurse nurse nurse_doctor doctor
doctor_processes processes processes_acting acting
acting_behalf behalf behalf_users users
users_performance performance performance_specified
specified specified_tasks tasks tasks_access access
access_control control system system_able able
able_associate associate associate_permissions
permissions permissions_user user user_using using
using_access access access_controls controls
controls_user user user_based based based_access
access access_rights rights rights_assigned assigned
assigned_user user user_role role role_based based
based_users users users_grouped grouped
grouped_access access access_rights rights
rights_assigned assigned assigned_groups groups
groups_context context context_based based
based_role role role_based based based_additional
additional additional_access access access_rights
rights rights_assigned assigned assigned_restricted
restricted restricted_based based based_context
context context_transaction transaction
transaction_time time time_day day day_workstation
workstation workstation_location location
Carvalho et al. 83
Table 3. RD2M (R) vs traditional (T) audit execution comparison
Figure 6 Overall timeframe comparison represantation (traditional vs RD2M)
comparison represantation (traditional vs RD2M)
The feedbacks provided by auditors after conducting the
proposed test was that for some scripts, only one
evidence gathering wouldn´t suffice script FR/NFR
validation. The probable explanation for it is that the
auditors would select the terms for script construction in
a different way we did.
The computational consumption is obviously affected by
the amount of requirements processed. Even though, the
manual activities burden to auditor represents the
majority of time expended.
Related research
Resource usage
The process described does not demand special
computer hardware to be performed as is not heavily
resource consuming. For this project, a common server
hardware setup including Intel Xenon 1.60GHz
processor, 6GB RAM on Windows Server 64 bits
platform was used. The main processes time involved
while performing the tasks described are listed at the
table 4 below.
The summarization, meaning reduction and text mining
on health informatics standards and guidelines
requirements approach discussed here are to be used as
intermediate steps in the project “Translational Security,
Operational and Functional HIS evaluation using a
Brazilian comparative evolutional reference model”
researched by the author at Sao Paulo’s Federal
University (UNIFESP).
The project aims building a stage divided evaluation
scale model to be used at HIS maturity evaluation.
84 Int. Res. J. Comput. Sci. Inform. Syst.
Table 4 Computer processing consumption for main used processes
It uses several requirements documents sources
applicable as reference to build or improve systems.
They are positioned progressively into the model metrics
initially fragmenting these sources by using requirement
text-mining, TF and summarization techniques and then
expert team applying classification using weight
punctuation to the found key-requirement words (so
called attractors). The last stage performed using Delphi
method to survey IT, Administrators and Physicians to
collect their opinions (the three most likely health
informatics systems responsible in Brazil) as importance
weighting input.
Not only responding if a given system can suffice
minimal characteristics to be used as HIS, the model
means to position it into the maturity scale ruler created.
CONCLUSION
The HIS audit is not a trivial task. Especially considering
the auditor profile and his skill set needed to allow
successful analysis of the diverse and complex reference
requirements available, it’s important to try to reduce the
time consuming text review phase burden.
As requirements dependency directly affects auditor’s
ability to attest for overall requirements adherence, he
should assess possible conflicts or correlations for each
and every specification occurrence.
Current text-mining techniques can considerably
reduce auditor’s work related to requirements
dependency understanding, allowing a concise and
efficient audit script test construction/adaptation.
Although limited by semantic relation not properly
covered by the studied method in this paper, the text
correlation can aid auditor’s work both during
requirement understanding and system testing. The use
of test scripts considering requirements correlations
helps the auditor with a broader perception of system
behavior. A single
test script considering multiple requirement correlation
can hence provide for evaluation of diverse aspects from
related requirements on an integrated approach.
ACKNOWLEDGMENT
This research received no specific grant from any
funding agency in the public, commercial, or not-forprofit sectors.
REFERENCES
Bing Q, Ting L, Sheng L (2005). MDS based on sub-topic NCIRCS-2005,
Beijing, 2005.
Blei D, Lafrerty D (2006). Correlated topic models. Advances in Neural
Information Processing Systems. MIT Press, Cambridge, MA, 2006.
Chen K, Liu L (2009).“Best K”: critical clustering structures in categorical
datasets, 2009.
CISA-http://www.isaca.org/Certification/CISA-Certified-InformationSystems-Auditor/How-to-Become-Certified/Pages/default.aspx (Last
viewed 6-20-2012).
Comptia HIT Certification -http://certification.comptia.org/getCertified/
certifications/hittech.aspx (Last viewed 6-20-2012).
Fikes R, McGuinness D (2001). An Axiomatic Semantic Semantics for
RDF, RDF Schema and DAML+OIL, 2001.
Heo EY, Hwang H, Kim EH, Cho EY, Lee KH, Kim TH, Kim KD, Baek RM,
Yoo S, et al, (2012). Comparing the Certification Criteria for CCHITCertified Ambulatory EHR with the SNUBH's EHR Functionalities,
2012.
IRCA - http://www.irca.org/en-gb/certification/schemes/ (Last viewed 6-202012).
ISACA (2010). IT Standards, Guidelines, and Tools and Techniques for
Audit and Assurance and Control Professionals, 2010.
Ko Y, Park J, Seo J (2002). Improving text categorization using the
importance of sentences, 2002.
Luo X, Xu Z, Yu J, Chen X (2011). Building Association Link Network for
Semantic Link on Web Resources. IEEE Robotics and Automation
Society, 2011.
Luo X, Yao E (2006). The Reasoning Mechanism of Fuzzy Cognitive
Maps. Proceeding of the First International Conference on Semantics,
Knowledge, and Grid, 2006.
McCallum A, Corrada-Ernmanuel A, Wang X (2004). The author-recipienttopic model for topic and role discovery in social networks. IDIAP,
2004.
Carvalho et al. 85
McGregor C, Percival J, Curry J, Foster D, Anstey E, Churchill D (2008).
A Structured Approach to Requirements Gathering Creation Using
PaJMa Models, 2008.
Nomoto T, Matsumot Y (2001). An Experimental Comparison of
Supervised and Unsupervised Approaches to Text Summarization,
2001.
Protics (2012). Competências Essenciais do Profissional de Informática
em
Saúdehttp://www.sbis.org.br/protics/Competencias_Informatica_Saude
_SBIS_proTICS_v_1_0_consulta_publica_2012.pdf (Last viewed 620-2012 – Portuguese version only).
Reeve LH, Han H, Ari D Brooks (2007). The use of domain-specific
concepts in biomedical text summarization. Information Processing
and Management, 2007.
Salton G, Buckley C (1988). Term-weighting approaches in automatic text
retrieval. Information Processing and Management, 1988.
Sathya M, Jayanthi J, Basker N (2011). Link Based K-Means Clustering
Algorithm for Information Retrieval, 2011.
SBIS Certification - www.sbis.org.br/site/site.dll/view?pagina=25 (Last
viewed 6-20-2012 – Portuguese version only).
Teng Z, Liu Y, Ren F, Tsuchiya S, Ren F (2008). Single Document
Summarization Based on Local Topic Identification andWord
Frequency, 2008.
Tsung-Hui L, Li-Yun C,
Zhe-Jung L (2011). Integrating Security
Certification with IT Education, 2011.
Wen-Jie W, Yan X (2010). Correlation analysis of visual verbs
subcategorization based on pearson’s correlation coefficient, 2010.
Download