Data Integrity: The Pillar of Business Success Authors Dr. Steve Hallman, Dept. of Information and Computer Science Park University 8700 NW River Park Drive, Parkville, MO 64152, USA Phone: 1 (800) 745-7275 x6435 Fax: 1 (816) 741-4911 E-mail: hallmanDBA@Yahoo.com Dr. Al Stahl, Graduate Associate Park University, Parkville, MO 64152, USA Phone: 1 (248) 361-0819 E-mail: albert.stahl@park.edu Dr. Michel Plaisent, Dept. of Management and Technology University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2 Phone: 1 (514) 987-3000 x4253, Fax: 1 (514) 987-3343 E-mail: michel.plaisent@uqam.ca Dr. Prosper Bernard, Dept. of Strategic Affairs University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2 Phone: 1 (514) 987-4250, Fax: 1 (514) 987-3343 E-mail: prosper1@compuserve.com Dr. (Lt. Col) Michael L. Thomas Geospatial Systems Architect HQ USAREUR, ODCSENG AEAEN-GI&S Branch APO AE 09014 Tel: 011-49-6221-57-6769; DSN 314-370-6769 E-mail: mthomas304@att.net James Thorpe, Dept. of Information and Computer Science Park University 8700 NW River Park Drive, Parkville, MO 64152 USA james.thorpe@park.edu April 19, 2007 1 Introduction All organizations want to maintain their records in an accurate, timely, and easily retrievable fashion. Whether such records represent: client histories, tax payments, bank transactions, donations, payroll, inventory, or contractor obligations, they are as critical as is the integrity of so many other records. Validity, accuracy, ease of retrieval, and security of these types of records is extremely important, relevant, and critical to organizational success. These days, very few organizations maintain a “paper-based” records system because of the effort, space, and retrieval time needed to store and access such records. The computer has taken over as modern organizations are becoming increasingly automated in keeping their records. Not only are computer record keeping systems significantly faster and more powerful than manual systems, but they provide flexibility (including remote access) that was not possible through manual systems. Run by “database engines,” records can be accessed, updated, and stored in an extremely efficient manner. Yet, there are challenges. While computers and databases have become more effective and reliable, data integrity appears to be increasingly in question. The Issue There is evidence that data, as it is currently stored in databases, has a significant error rate, which some suggest could be as much as 10%. This error rate is based on an earlier survey conducted in 1992 (Klein, 1997, p.2) and subsequently reinforced through numerous newspaper accounts of problems that clients and customers have encountered. The potential for errors might be even greater. For example, how often does one receive mailings that have some part of the name or address misspelled? While unsolicited mailings can be seen as minor, many 2 potentially greater errors may not be recognized. If the errors continue to go unspotted, then they will affect business-related outcomes. Background Klein’s findings (p. 2) suggest that users of IS (Information Systems) tend to be ineffective in finding data errors. Yet, from an educational perspective, there are some ways of handling this type of problem and improving human detection techniques. Two laboratory-based studies (also referenced by Klein, p. 2) show that explicit error detection goals and incentives can modify user’s error detection performance. In other words, providing an improved understanding of conditions under which users may detect data errors may improve database integrity. Ultimately, database integrity is about trust. Are users and businesses able to trust the data stored in their databases? Data integrity provides necessary internal controls for a database. “Data should be managed where it resides.” That's the “storage management mantra,” which many technology professionals espouse, when the subject of enterprise backup arises. This “sound philosophy” has steered many network administrators, engineers and consultants through successful storage management projects, over the years. Consider the plight of a network manager whose network consists of 500+ GB of data on Microsoft Windows NT Exchange/BackOffice servers spread across the corporate WAN with 30 GB on a Unix database server (DBMS Backup Agents p. 84). Because of distributed Win-NT servers and relatively slow WAN links, this manager wisely decided to forego the idea of backing up all the network data to a central Unix solution. Instead, “humming the mantra loud and clear,” he achieves success by backing up the Win-NT data with an NT-based solution that 3 can be managed in a distributed environment, while backing up the UNIX data with either a UNIX remote client agent or a native Unix solution (Conover, 1997 p. 84). Discussion Apart from recovering files that have been accidentally deleted, one of the main reasons a company backups data is to safeguard against disasters. Disaster-recovery options may require hard drives be partitioned, formatted, and set up to reload the operating systems prior to recovering data. Others options include recovering the partition and master boot record “on the fly.” It's also possible to gather the required device drivers on a floppy disk or tape to allow for easier recovery; but, such options do not actually create a bootable image. Choosing enterprise backup software increasingly hinges on add-ons such as: database agents, application agents, image backup options, client agents, accelerators or interleaving client agents, RAID options, open file agents, e-mail system agents and antiviral integration, all of which help create a superior backup system or product line. Another particularly thorny problem for enterprise backup systems is that databases need to be up and running 24 hours a day, seven days a week. That is because many of the files associated with the database or its applications remain open. A similar problem arises when backing up e-mail or Groupware systems, most of which are databases in their own right. Most major database vendors have added software application programming interfaces (APIs) or hooks that place the database engine into a maintenance or backup mode that facilitates successful backup of the database or database objects, while maintaining data integrity (Conover, 1997, p. 84) 4 Data Authenticity Corporations today are dependent on the authenticity of the data provided to them through their computers. Whether it is a multinational corporation working on a worldwide network, or a local company using a vast database to operate within the firm, each depends on (valid) data to make crucial decisions. Thus, it is important to analyze and evaluate a new system that is being incorporated within an organization for its usage capability, plus its ability to process data of the company. An example would be Financial Departments, whose work involves accurately maintaining massive amounts of financial data (Lombard, 1999 p. 26). Totalling numbers, itemizing expenses, and producing detailed financial reports have traditionally been the tasks of corporate financial departments. But, like many other business operations, the finance function is undergoing a significant change as organizations make better use of their internal resources to be more competitive. Financial Managers need to spend more time managing both financial and non-financial information that could affect the future growth and competitiveness of their companies. Issues such as market share analysis and business management are just two examples of areas that could affect growth of a company, where the integration of financial data and a financial perspective, could lead to better strategic decision-making. In many organizations today, business decisions that have significant financial implications are often made without a comprehensive understanding of their short-term and long-term financial impacts upon the organization. Too often, many of these organizations under-utilize their financial staff people, and fail to leverage the valuable skills and experience in analysis and disciplined thinking, that they can offer. 5 More and more companies are now asking what else can be done, as financial professionals, from Chief Financial officers to Department Managers play an increasingly critical role in strategic decision-making. Many finance professionals are finding this new role difficult, because they don't always have easy access to the corporate information that they need to make critical decisions. As a result, many Finance Departments too often are not integrating other business issues into their financial reporting. One important way to address this problem and successfully increase the role of the finance function is to free financial staff from manual data-collecting. Such a move would provide easy access to financial data and minimize manual adjustments to data-management maintenance functions that currently take up much of their time. Accounting records and other related processes can be equally as important. Studies have shown that financial professionals spend 80 percent of their time collecting and managing data, and only 20 percent studying and analyzing specific trends and opportunities that could help the business grow (ArkiData Corporation p. 3). Finance is a high-cost function so it doesn't make much sense to spend large amounts of money to employ data clerks and data managers. Some Finance Departments are using desktop productivity tools such as MS-Excel & Lotus Notes spreadsheets to generate the information required by decision-makers. The problem with this approach is that it involves manual re-keying of data from general ledger and other corporate databases. Not only is data integrity compromised, but people also spend most of their time on what are administrative and clerical tasks, rather than providing value-added analysis of important financial and non-financial information. Desktop generated spreadsheets also are inflexible and difficult to manage, particularly if there are major changes to the original data 6 required or if the assumptions used to generate the spreadsheet change. Most of these modifications have to be entered manually, which can be very time-consuming and costly. New enterprise-based technology solutions are overcoming these problems by integrating financial and other corporate data into a single system or database. These programs enable companies to maximize their efficiency in transaction processing, and minimize the manual clerical tasks that historically have taken up so much of a Finance Department's time. New business support software programs that utilize on-line transaction processing (OLTP) systems are able to automatically capture and process transactions quickly and accurately. Companies can manipulate data by time, location, category, and other variables, depending on their specific corporate requirements. Data Authenticity Many software companies have enabled most consumer firms to have a large database where data is easily retrieved, updated, or evaluated within a short period of time (often within a few seconds). These activities are crucial when every individual within the organization may be in need of data. There is no allowance for wrong findings; that is where authenticity of the data comes into the picture. Imagine that a client’s orders to a computer company is for 5400 PC’s and the order is recorded as 4500, and that the PC’s would need to be distributed to different sites within a short time-table; there likely would be significant chaos! Database companies such as Oracle and Counterparts have devised software that incorporates a system where data such as these, are verified to avoid mistyping, or inaccurate processing. Replication 7 An organization also relies on their databases to be updated constantly, preferably automatically. For example, a company that has many offices will have many users accessing the databases from the server, in an online as well as off-line fashion. It is imperative for the decision-makers to have access to the most resonate data for formulating strategies and making decisions. Thus, data replication, online and immediate is crucial. Where data is constantly retrieved and updated, software formulators have conquered these features. However, the challenge lies in updating online data, especially in the case of multinational organizations, where multiple time zones also can play an important role in updating data. Updating data works on an automatic time clock; hence it is a challenge for software to update data to the minute and not involve itself in the time lag of five hours or six hours, etc. This year, the United States changed to daylight savings time three weeks ahead of the rest of the world, which provided some major challenges to “automatic clocks.” Reliability Computers running on electricity may shut down unexpectedly, with the result being only partially updated databases. When this happens, it becomes dangerous for the users, as they have only half of the updated data. This is the reason data integrity is so important. Databases today are equipped with verification systems that ask the user to analyze the changes and then save them so that anyone who retrieves the data will have the correct version. Unfortunately, this procedure often updates errors that are attributable to mistyping. Integrity Issues Fausto Rabitti has clearly stated that there are at least three types of data integrity that must be designed into any database. 8 1) Key Integrity: Every table should have a primary key. “The primary key must be controlled so that no two records in the table have the same primary key value.” “Also, the primary key for a record must never be allowed to have a null value." “Otherwise, that would defeat the purpose of the primary key to be a unique identifier." If the database management system does not enforce these rules, other steps must be taken to reduce their potentially detrimental impact. 2) Domain Integrity: “Appropriate controls must be incorporated to ensure that no field takes on a value that is outside the range of legal values." For example, if grade point average is defined to be a number between 0 in 4, controllers must be implemented to prevent negative numbers and numbers greater than 4. “For the foreseeable future the responsibility for data editing will continue to be shared between the application programs and the DBMS.” (Data Base Management System) 3) Referential Integrity: “The architecture of relational database implements relations between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of integrity errors. This type of error exists when a foreign key value in one table has no matching primary key value in the related table." For example, an invoice table usually includes a primary/foreign key, and a customer number to “reference back to” the matching customer number’s primary key, within the customer's table. Improved Practices Here is a method that will assist in preventing these types of errors. When considering deletion of customer records, automatically delete all involved records that have the same customer number. 9 There is strong evidence that humans do seem able to detect errors, under certain circumstances. With proper and modified behavior (through goals and incentives) s/he could develop the ability to flag common errors. To make this type of integrated system work effectively, the following should be considered: 1. An employee needs to know and be aware that it is part of his/her job to look for and flag suspicious data. 2. An employee should be aware of the different type of errors. 3. The attention of the employee in finding errors is important. 4. An incentive scheme needs to give rewards for finding errors. 5. Management needs to handle false alarms, as fast as possible. 6. The company needs to have a good training and hiring programs in place. (Klein, 1997, p.2). Security Security has become a large topic in recent times. Viruses, hackers, worms, trojans, phishing, malware, and many other things threaten the security of databases. Many organizations today tend to focus on using technologies for building, maintaining, and accessing databases and often provide very little focus on security (Britt, 2007). The ability for an individual to gain illegal access to a database and compromise precious data puts organizations at great risk. An attacker would have the ability to gain sensitive data such as credit card and/or social security numbers if the “attacker” was to break into a database that even had light security measures. Data integrity is also at risk. In an article written by Arif Mohamed of Computer weekly, it was stated that attackers can discovers flaws in databases like Oracle and could “use higher privileged code with DBMS_SQL to perform and insert, update, or delete command, and so 10 change the data within the database directly.” (Mohamed, 2006) In Mr. Mohamed’s example, “In the case where the data being inserted must not contain single quote marks, and the higher privileged code checks for their presence, the attacker can “snarf and replace” data, so that it does contain a single quote mark, thereby causing an exception.” (Mohamed, 2006) Many organizations are under pressure about their security as hackers find more sophisticated ways to find vulnerabilities to exploit, in order to gain valuable data from databases (Data, 2007). Currently, 83% of organizations in the U.S. “believe they have made their data safer by installing or upgrading antivirus software, installing or upgrading a firewall, implementing intrusion detection/prevention technologies, and implementing vulnerability/patch management systems on their networks” via Communications News (Data, 2007). The truth is that no matter how sophisticated the protection becomes so do the methods for acquiring precious data from those databases. In 2005, the number of security breaches reported reached 100 million and is still growing, according to Privacy Rights Clearinghouse, who is a non-profit consumer information and advocacy program (Data, 2007). Currently, many organizations use passwords to prevent unwanted individuals from gaining access to databases. This is a very common method of security, but is not always an effective one. Unfortunately, many passwords used are not effective, because many are too common or too easy, for an intruder to discover (Britt, 2007). In order to have an effective password, it is recommended that the password be at least 12 characters long and include capital letters and numbers (Britt, 2007). However, even though a more complex password helps protect the system, it also can create a new problem for the users, who may forget that password. One way of increasing database security is by adding security layers. In a layered security system, instead of one, multiple security measures are taken to protect a system. This 11 includes having firewalls, access controls, passwords, encryptions, as well as various monitoring systems all active (Britt, 2007). This is ideal for databases, which have a large amount of sensitive data to protect as well as databases that do not have much sensitive data to protect. Michael Vizard, who has done research on security threats, believes the need exists to create a uniform approach to security in the future (Vizard, 2007). He believes security tests need to be implemented before a customer or another business interacts online with an organization (Vizard, 2007). Although these measures may seem strict, they may be absolutely necessary for organizations, in the future. There are current database security tools being used today in order to test the security on a network. Symantec has a security tool that is being used by hospitals in the Boston area, for Alpha Testing, since 2006 (Messmer, 2006). Though this tool does not stop unwanted intruders from entering a network, it monitors all activity on the network and can show an organization if suspicious activity is present (Messmer, 2006). The tool also can alert an organization if someone was attacking the database and trying to alter or steal its information (Messmer, 2006). This tool, though does not directly protect the database, is still a huge improvement over previous database tools that did not monitor network activity and can be used to test whether or not a database or network is safe. Summary Enterprise technology tools are becoming increasing sophisticated. Many of these technology tools can provide employees with a much clearer picture of the factors that can affect growth and profitability, as well as opportunities that can increase organizational success. Data warehousing systems are an example of such a technology. Today’s data warehousing systems 12 are enabling authorized staff to “drill down” into organizational databases for information that normally would not available through stand-alone desktop systems. Database integrity is a major, yet potentially volatile part of the effectiveness of databases. A well-designed and maintained database (by the users, programmers, and management) can ensure key domain, and referential integrity. Employees that are appropriately trained should be able to recognize, avoid and reduce today’s threats and errors while focusing upon important decision-support activities. With the implementation of appropriate training programs through which employees can understand the causes and preventive actions of data integrity, employees have the potential of recognizing, reducing, and eliminating common errors. Any organization that uses a distributed multi-user database could benefit from the “right training” in many ways including: reducing data errors, increasing employee job satisfaction, increasing cost savings, and making a positive impact on the achievement of business outcomes. As noted by Anne Lewis (1997, p. 72), this is an area where “Quality Counts.” Database security and network tools also are very important considerations in data management activities. The ability to understand every potential threat from virus to malware, as well as any other issues that could threaten database integrity and security is critical. The ability to anticipate and perform proactive manoeuvres that threaten data is paramount and essential for success of today’s organizational environment. 13 References Britt, P. (2007) Tightening Security in 2007. Information Today. Retrieved Wed Feb 28, 2007 from the Library, Information Science & Technology Abstracts database. Conover, Joel,. (1997) DBMS Backup Agents: Because the Data Matters, Management, pp. 84, [On-line serial] Available FTP: http://www.networkcomputing.com/802/802r12.html. Klein, Barbaaara D., & Goodue, Dale, L., (1997). Can Humans Detect Errors in Data? Impact of Base Rates, Incentives, and Goals. MIS Quarterly, 21, pp. 2. Lewis, Anne, C., (1997) Data Integrity, Education Digest, 62, 7, pp. 72. Lombard, III., (1999) ArkiData Corporation Announces Upgrade of Its Data Auditing and Cleansing Technology, Business Wire, 26 Mesmer, Ellen. (Jan 2006) Caregroup Checks Out Symantec database Security Tool. Network World. Retrieved Tues Mar 6, 2007 Mohamed, Arif (2006) Oracle Users Warned of New Threats to Firms’ data. Computer Weekly. Retrieved Tue Mar 6, 2007 from the MasterFILE Premier database No Author (Feb 2007) Data Security still at risk. Communications News. Retrieved Tues Mar 6, 2007 from the MasterFILE Premier database. Rabitti, Fausto. (1988) A Model of Authorization for Object-Oriented database systems, in: Proc. Of the Intl. Conf., On Extending Database Technology p.231-250, ISBM:3-540-19074-0. Treybig, James, G., (1995) Bring data integrity back into the equation, Interactive Age, 2, 6, pp. 3/5. Vizard, Michael. (Feb 2007). Time to Get Tough on Security Threats. Baseline. Retrieved Tues. Mar 6, 2007 14