Causes and Preventive Actions of Data Integrity

advertisement
Data Integrity:
The Pillar of Business Success
Authors
Dr. Steve Hallman, Dept. of Information and Computer Science
Park University 8700 NW River Park Drive, Parkville, MO 64152, USA
Phone: 1 (800) 745-7275 x6435 Fax: 1 (816) 741-4911
E-mail: hallmanDBA@Yahoo.com
Dr. Al Stahl, Graduate Associate
Park University, Parkville, MO 64152, USA
Phone: 1 (248) 361-0819
E-mail: albert.stahl@park.edu
Dr. Michel Plaisent, Dept. of Management and Technology
University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2
Phone: 1 (514) 987-3000 x4253, Fax: 1 (514) 987-3343
E-mail: michel.plaisent@uqam.ca
Dr. Prosper Bernard, Dept. of Strategic Affairs
University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2
Phone: 1 (514) 987-4250, Fax: 1 (514) 987-3343
E-mail: prosper1@compuserve.com
Dr. (Lt. Col) Michael L. Thomas
Geospatial Systems Architect
HQ USAREUR, ODCSENG
AEAEN-GI&S Branch
APO AE 09014
Tel: 011-49-6221-57-6769; DSN 314-370-6769
E-mail: mthomas304@att.net
James Thorpe, Dept. of Information and Computer Science
Park University 8700 NW River Park Drive, Parkville, MO 64152 USA
james.thorpe@park.edu
April 19, 2007
1
Introduction
All organizations want to maintain their records in an accurate, timely, and easily
retrievable fashion. Whether such records represent: client histories, tax payments, bank
transactions, donations, payroll, inventory, or contractor obligations, they are as critical as is the
integrity of so many other records. Validity, accuracy, ease of retrieval, and security of these
types of records is extremely important, relevant, and critical to organizational success.
These days, very few organizations maintain a “paper-based” records system because of
the effort, space, and retrieval time needed to store and access such records. The computer has
taken over as modern organizations are becoming increasingly automated in keeping their
records.
Not only are computer record keeping systems significantly faster and more powerful
than manual systems, but they provide flexibility (including remote access) that was not possible
through manual systems. Run by “database engines,” records can be accessed, updated, and
stored in an extremely efficient manner. Yet, there are challenges. While computers and
databases have become more effective and reliable, data integrity appears to be increasingly in
question.
The Issue
There is evidence that data, as it is currently stored in databases, has a significant error
rate, which some suggest could be as much as 10%. This error rate is based on an earlier survey
conducted in 1992 (Klein, 1997, p.2) and subsequently reinforced through numerous newspaper
accounts of problems that clients and customers have encountered. The potential for errors
might be even greater. For example, how often does one receive mailings that have some part of
the name or address misspelled?
While unsolicited mailings can be seen as minor, many
2
potentially greater errors may not be recognized. If the errors continue to go unspotted, then they
will affect business-related outcomes.
Background
Klein’s findings (p. 2) suggest that users of IS (Information Systems) tend to be
ineffective in finding data errors. Yet, from an educational perspective, there are some ways of
handling this type of problem and improving human detection techniques. Two laboratory-based
studies (also referenced by Klein, p. 2) show that explicit error detection goals and incentives can
modify user’s error detection performance.
In other words, providing an improved
understanding of conditions under which users may detect data errors may improve database
integrity.
Ultimately, database integrity is about trust. Are users and businesses able to trust the
data stored in their databases? Data integrity provides necessary internal controls for a database.
“Data should be managed where it resides.” That's the “storage management mantra,” which
many technology professionals espouse, when the subject of enterprise backup arises. This
“sound philosophy” has steered many network administrators, engineers and consultants through
successful storage management projects, over the years.
Consider the plight of a network manager whose network consists of 500+ GB of data on
Microsoft Windows NT Exchange/BackOffice servers spread across the corporate WAN with 30
GB on a Unix database server (DBMS Backup Agents p. 84). Because of distributed Win-NT
servers and relatively slow WAN links, this manager wisely decided to forego the idea of
backing up all the network data to a central Unix solution. Instead, “humming the mantra loud
and clear,” he achieves success by backing up the Win-NT data with an NT-based solution that
3
can be managed in a distributed environment, while backing up the UNIX data with either a
UNIX remote client agent or a native Unix solution (Conover, 1997 p. 84).
Discussion
Apart from recovering files that have been accidentally deleted, one of the main reasons a
company backups data is to safeguard against disasters. Disaster-recovery options may require
hard drives be partitioned, formatted, and set up to reload the operating systems prior to
recovering data. Others options include recovering the partition and master boot record “on the
fly.” It's also possible to gather the required device drivers on a floppy disk or tape to allow for
easier recovery; but, such options do not actually create a bootable image.
Choosing enterprise backup software increasingly hinges on add-ons such as: database
agents, application agents, image backup options, client agents, accelerators or interleaving client
agents, RAID options, open file agents, e-mail system agents and antiviral integration, all of
which help create a superior backup system or product line.
Another particularly thorny problem for enterprise backup systems is that databases need
to be up and running 24 hours a day, seven days a week. That is because many of the files
associated with the database or its applications remain open. A similar problem arises when
backing up e-mail or Groupware systems, most of which are databases in their own right. Most
major database vendors have added software application programming interfaces (APIs) or
hooks that place the database engine into a maintenance or backup mode that facilitates
successful backup of the database or database objects, while maintaining data integrity (Conover,
1997, p. 84)
4
Data Authenticity
Corporations today are dependent on the authenticity of the data provided to them
through their computers. Whether it is a multinational corporation working on a worldwide
network, or a local company using a vast database to operate within the firm, each depends on
(valid) data to make crucial decisions. Thus, it is important to analyze and evaluate a new
system that is being incorporated within an organization for its usage capability, plus its ability to
process data of the company.
An example would be Financial Departments, whose work
involves accurately maintaining massive amounts of financial data (Lombard, 1999 p. 26).
Totalling numbers, itemizing expenses, and producing detailed financial reports have
traditionally been the tasks of corporate financial departments. But, like many other business
operations, the finance function is undergoing a significant change as organizations make better
use of their internal resources to be more competitive.
Financial Managers need to spend more time managing both financial and non-financial
information that could affect the future growth and competitiveness of their companies. Issues
such as market share analysis and business management are just two examples of areas that
could affect growth of a company, where the integration of financial data and a financial
perspective, could lead to better strategic decision-making.
In many organizations today,
business decisions that have significant financial implications are often made without a
comprehensive understanding of their short-term and long-term financial impacts upon the
organization. Too often, many of these organizations under-utilize their financial staff people,
and fail to leverage the valuable skills and experience in analysis and disciplined thinking, that
they can offer.
5
More and more companies are now asking what else can be done, as financial
professionals, from Chief Financial officers to Department Managers play an increasingly critical
role in strategic decision-making. Many finance professionals are finding this new role difficult,
because they don't always have easy access to the corporate information that they need to make
critical decisions. As a result, many Finance Departments too often are not integrating other
business issues into their financial reporting.
One important way to address this problem and successfully increase the role of the
finance function is to free financial staff from manual data-collecting. Such a move would
provide easy access to financial data and minimize manual adjustments to data-management
maintenance functions that currently take up much of their time. Accounting records and other
related processes can be equally as important.
Studies have shown that financial professionals spend 80 percent of their time collecting
and managing data, and only 20 percent studying and analyzing specific trends and opportunities
that could help the business grow (ArkiData Corporation p. 3). Finance is a high-cost function
so it doesn't make much sense to spend large amounts of money to employ data clerks and data
managers.
Some Finance Departments are using desktop productivity tools such as MS-Excel &
Lotus Notes spreadsheets to generate the information required by decision-makers. The problem
with this approach is that it involves manual re-keying of data from general ledger and other
corporate databases. Not only is data integrity compromised, but people also spend most of their
time on what are administrative and clerical tasks, rather than providing value-added analysis of
important financial and non-financial information. Desktop generated spreadsheets also are
inflexible and difficult to manage, particularly if there are major changes to the original data
6
required or if the assumptions used to generate the spreadsheet change.
Most of these
modifications have to be entered manually, which can be very time-consuming and costly.
New enterprise-based technology solutions are overcoming these problems by integrating
financial and other corporate data into a single system or database. These programs enable
companies to maximize their efficiency in transaction processing, and minimize the manual
clerical tasks that historically have taken up so much of a Finance Department's time.
New business support software programs that utilize on-line transaction processing
(OLTP) systems are able to automatically capture and process transactions quickly and
accurately. Companies can manipulate data by time, location, category, and other variables,
depending on their specific corporate requirements.
Data Authenticity
Many software companies have enabled most consumer firms to have a large database
where data is easily retrieved, updated, or evaluated within a short period of time (often within a
few seconds). These activities are crucial when every individual within the organization may be
in need of data. There is no allowance for wrong findings; that is where authenticity of the data
comes into the picture. Imagine that a client’s orders to a computer company is for 5400 PC’s
and the order is recorded as 4500, and that the PC’s would need to be distributed to different
sites within a short time-table; there likely would be significant chaos!
Database companies such as Oracle and Counterparts have devised software that
incorporates a system where data such as these, are verified to avoid mistyping, or inaccurate
processing.
Replication
7
An organization also relies on their databases to be updated constantly, preferably
automatically. For example, a company that has many offices will have many users accessing
the databases from the server, in an online as well as off-line fashion. It is imperative for the
decision-makers to have access to the most resonate data for formulating strategies and making
decisions. Thus, data replication, online and immediate is crucial. Where data is constantly
retrieved and updated, software formulators have conquered these features.
However, the
challenge lies in updating online data, especially in the case of multinational organizations,
where multiple time zones also can play an important role in updating data. Updating data works
on an automatic time clock; hence it is a challenge for software to update data to the minute and
not involve itself in the time lag of five hours or six hours, etc. This year, the United States
changed to daylight savings time three weeks ahead of the rest of the world, which provided
some major challenges to “automatic clocks.”
Reliability
Computers running on electricity may shut down unexpectedly, with the result being only
partially updated databases. When this happens, it becomes dangerous for the users, as they have
only half of the updated data. This is the reason data integrity is so important. Databases today
are equipped with verification systems that ask the user to analyze the changes and then save
them so that anyone who retrieves the data will have the correct version. Unfortunately, this
procedure often updates errors that are attributable to mistyping.
Integrity Issues
Fausto Rabitti has clearly stated that there are at least three types of data integrity that
must be designed into any database.
8
1) Key Integrity: Every table should have a primary key. “The primary key must be
controlled so that no two records in the table have the same primary key value.” “Also, the
primary key for a record must never be allowed to have a null value." “Otherwise, that would
defeat the purpose of the primary key to be a unique identifier." If the database management
system does not enforce these rules, other steps must be taken to reduce their potentially
detrimental impact.
2) Domain Integrity: “Appropriate controls must be incorporated to ensure that no field
takes on a value that is outside the range of legal values." For example, if grade point average is
defined to be a number between 0 in 4, controllers must be implemented to prevent negative
numbers and numbers greater than 4. “For the foreseeable future the responsibility for data
editing will continue to be shared between the application programs and the DBMS.” (Data Base
Management System)
3) Referential Integrity: “The architecture of relational database implements relations
between the records in tables via foreign keys. The use of foreign keys increases the flexibility
and scalability of any database, but it also increases the risk of integrity errors. This type of error
exists when a foreign key value in one table has no matching primary key value in the related
table." For example, an invoice table usually includes a primary/foreign key, and a customer
number to “reference back to” the matching customer number’s primary key, within the
customer's table.
Improved Practices
Here is a method that will assist in preventing these types of errors. When considering
deletion of customer records, automatically delete all involved records that have the same
customer number.
9
There is strong evidence that humans do seem able to detect errors, under certain
circumstances. With proper and modified behavior (through goals and incentives) s/he could
develop the ability to flag common errors.
To make this type of integrated system work
effectively, the following should be considered:
1. An employee needs to know and be aware that it is part of his/her job to look for and
flag suspicious data.
2. An employee should be aware of the different type of errors.
3. The attention of the employee in finding errors is important.
4. An incentive scheme needs to give rewards for finding errors.
5. Management needs to handle false alarms, as fast as possible.
6. The company needs to have a good training and hiring programs in place.
(Klein, 1997, p.2).
Security
Security has become a large topic in recent times. Viruses, hackers, worms, trojans,
phishing, malware, and many other things threaten the security of databases.
Many
organizations today tend to focus on using technologies for building, maintaining, and accessing
databases and often provide very little focus on security (Britt, 2007).
The ability for an
individual to gain illegal access to a database and compromise precious data puts organizations at
great risk. An attacker would have the ability to gain sensitive data such as credit card and/or
social security numbers if the “attacker” was to break into a database that even had light security
measures.
Data integrity is also at risk. In an article written by Arif Mohamed of Computer weekly,
it was stated that attackers can discovers flaws in databases like Oracle and could “use higher
privileged code with DBMS_SQL to perform and insert, update, or delete command, and so
10
change the data within the database directly.” (Mohamed, 2006) In Mr. Mohamed’s example,
“In the case where the data being inserted must not contain single quote marks, and the higher
privileged code checks for their presence, the attacker can “snarf and replace” data, so that it
does contain a single quote mark, thereby causing an exception.” (Mohamed, 2006)
Many organizations are under pressure about their security as hackers find more
sophisticated ways to find vulnerabilities to exploit, in order to gain valuable data from databases
(Data, 2007). Currently, 83% of organizations in the U.S. “believe they have made their data
safer by installing or upgrading antivirus software, installing or upgrading a firewall,
implementing intrusion detection/prevention technologies, and implementing vulnerability/patch
management systems on their networks” via Communications News (Data, 2007). The truth is
that no matter how sophisticated the protection becomes so do the methods for acquiring
precious data from those databases. In 2005, the number of security breaches reported reached
100 million and is still growing, according to Privacy Rights Clearinghouse, who is a non-profit
consumer information and advocacy program (Data, 2007).
Currently, many organizations use passwords to prevent unwanted individuals from
gaining access to databases. This is a very common method of security, but is not always an
effective one. Unfortunately, many passwords used are not effective, because many are too
common or too easy, for an intruder to discover (Britt, 2007). In order to have an effective
password, it is recommended that the password be at least 12 characters long and include capital
letters and numbers (Britt, 2007). However, even though a more complex password helps
protect the system, it also can create a new problem for the users, who may forget that password.
One way of increasing database security is by adding security layers.
In a layered
security system, instead of one, multiple security measures are taken to protect a system. This
11
includes having firewalls, access controls, passwords, encryptions, as well as various monitoring
systems all active (Britt, 2007). This is ideal for databases, which have a large amount of
sensitive data to protect as well as databases that do not have much sensitive data to protect.
Michael Vizard, who has done research on security threats, believes the need exists to
create a uniform approach to security in the future (Vizard, 2007). He believes security tests
need to be implemented before a customer or another business interacts online with an
organization (Vizard, 2007). Although these measures may seem strict, they may be absolutely
necessary for organizations, in the future.
There are current database security tools being used today in order to test the security on
a network. Symantec has a security tool that is being used by hospitals in the Boston area, for
Alpha Testing, since 2006 (Messmer, 2006). Though this tool does not stop unwanted intruders
from entering a network, it monitors all activity on the network and can show an organization if
suspicious activity is present (Messmer, 2006).
The tool also can alert an organization if
someone was attacking the database and trying to alter or steal its information (Messmer, 2006).
This tool, though does not directly protect the database, is still a huge improvement over
previous database tools that did not monitor network activity and can be used to test whether or
not a database or network is safe.
Summary
Enterprise technology tools are becoming increasing sophisticated.
Many of these
technology tools can provide employees with a much clearer picture of the factors that can affect
growth and profitability, as well as opportunities that can increase organizational success. Data
warehousing systems are an example of such a technology. Today’s data warehousing systems
12
are enabling authorized staff to “drill down” into organizational databases for information that
normally would not available through stand-alone desktop systems.
Database integrity is a major, yet potentially volatile part of the effectiveness of
databases.
A well-designed and maintained database (by the users, programmers, and
management) can ensure key domain, and referential integrity.
Employees that are appropriately trained should be able to recognize, avoid and reduce
today’s threats and errors while focusing upon important decision-support activities. With the
implementation of appropriate training programs through which employees can understand the
causes and preventive actions of data integrity, employees have the potential of recognizing,
reducing, and eliminating common errors. Any organization that uses a distributed multi-user
database could benefit from the “right training” in many ways including: reducing data errors,
increasing employee job satisfaction, increasing cost savings, and making a positive impact on
the achievement of business outcomes. As noted by Anne Lewis (1997, p. 72), this is an area
where “Quality Counts.”
Database security and network tools also are very important considerations in data
management activities. The ability to understand every potential threat from virus to malware, as
well as any other issues that could threaten database integrity and security is critical. The ability
to anticipate and perform proactive manoeuvres that threaten data is paramount and essential for
success of today’s organizational environment.
13
References
Britt, P. (2007) Tightening Security in 2007. Information Today. Retrieved Wed Feb 28,
2007 from the Library, Information Science & Technology Abstracts database.
Conover, Joel,. (1997) DBMS Backup Agents: Because the Data Matters, Management,
pp. 84, [On-line serial] Available FTP: http://www.networkcomputing.com/802/802r12.html.
Klein, Barbaaara D., & Goodue, Dale, L., (1997). Can Humans Detect Errors in Data? Impact of
Base Rates, Incentives, and Goals. MIS Quarterly, 21, pp. 2.
Lewis, Anne, C., (1997) Data Integrity, Education Digest, 62, 7, pp. 72.
Lombard, III., (1999) ArkiData Corporation Announces Upgrade of Its Data Auditing and
Cleansing Technology, Business Wire, 26
Mesmer, Ellen. (Jan 2006) Caregroup Checks Out Symantec database Security Tool. Network
World. Retrieved Tues Mar 6, 2007
Mohamed, Arif (2006) Oracle Users Warned of New Threats to Firms’ data. Computer Weekly.
Retrieved Tue Mar 6, 2007 from the MasterFILE Premier database
No Author (Feb 2007) Data Security still at risk. Communications News. Retrieved Tues
Mar 6, 2007 from the MasterFILE Premier database.
Rabitti, Fausto. (1988) A Model of Authorization for Object-Oriented database systems, in: Proc.
Of the Intl. Conf., On Extending Database Technology p.231-250, ISBM:3-540-19074-0.
Treybig, James, G., (1995) Bring data integrity back into the equation, Interactive Age, 2, 6, pp.
3/5.
Vizard, Michael. (Feb 2007). Time to Get Tough on Security Threats. Baseline. Retrieved Tues.
Mar 6, 2007
14
Download