White Paper - Excel Risks and Controls Chris Mishler, CMA, CIA, CISA Subject Matter Expert, User-Developed Application Risks Experis Finance November, 2014 Abstract Excel is a powerful and popular application for business that can be even more effective with the consideration of the risks that come along for the ride and their corresponding controls. We will enumerate various risks and some methods to controls them in this white paper to enable users to create necessary solutions less likely to cause material adverse impacts on their organizations. It is possible to create excellent results without increasing the probability of serious errors with some basic education on design techniques and built-in Excel features. It helps enormously to know better practices for both the efficiency and effectiveness of these “user-developed applications” (UDAs), those created without benefit of IT application development, typically referred to as Software Development Lifecycle (SDLC), which include gathering user and functional requirements, testing, review and approval, and typically involve development management tools. Risk A common definition of risk is the probability of an adverse effect times the magnitude of the impact. Although some enormous numbers flow through many financial reporting supply chain spreadsheets, the risk of material errors could be more dependent on the sheer number of risky practices engaged in by Excel solution developers and their users. It would not be uncommon to find large workbooks with multiple worksheets and even millions of occupied cells. The visibility and importance of key numbers in revenues, profits, assets and liabilities increases the likelihood that they will draw the attention of reviewers. More difficult to discern, the existence and impact of subtle or low-key but pervasive behaviors that could result in adverse outcomes might play a bigger role in the production of higher risk in user-developed applications. A common scenario of uncontrolled risk involves large and complex financial workbooks. Where does one begin to assess the risk embedded in these files? Since prevention is an efficient way of reducing risk, the design of the workbook, as far as the data flow and the segregation of same data types (modular design) is a good overall starting point. A helpful maxim in risk reduction is to make the solution “as complex as necessary and as simple as possible.” Another rule of thumb in Excel solution design is to emulate a database as much as possible in the data and calculation areas. That is, each column would be a field and each row a record. The implication of this approach is to use each column for one purpose or data type. All too often, this potential guideline is unknown or underemployed, consequently raising risk levels. We combination developer/users may have been “home-schooled” or self-taught in Excel functions, but formal or informal education in risks in UDAs is difficult to find. No surprise then that many of us engage in “organic programming,” so to speak, in which the developer starts with a data set and an end in mind, or possibly only partial requirements for a desired report or outcome. The user starts to put together formulas for data that has been arranged somehow to lead to the kind of final reporting needed. Controls and Good Practices There are at least as many control techniques as there are risk types. The question is which controls best fit the given situation. The basic considerations are when to use preventive and detective controls. Since it is well-known that preventive measures are more efficient than detective ones, focusing on these first will yield the best results. The types of preventive controls to consider include planning, design, testing and even review by other experts. The border between preventive and detective controls may be fuzzy until an application (workbook) is launched into production, the point at which a workbook is used to create the intended outcomes in the financial, analytical, regulatory or operational arenas. While userdeveloped applications are by definition not run through the typical SDLC, some of these practices can be simulated to improve the quality of the production outcome. Pre-Production There are many benefits to imposing the discipline of planning the creation of a workbook solution before starting to put together the requisite components. It may be as simple as putting pencil to paper to sketch out the elements for a new process or redesigning an existing one. Gather all the user and functional requirements along with all documentation on the process to be reflected in the Excel application. Interviews with the ultimate users, if not oneself, will be of great benefit to the understanding of the need. While it is possible to encounter the “paralysis by analysis” bottleneck, anecdotally this situation is unlikely, due to the deadlines typically bearing on accountants, developers or other users. Modular Design The other key aspect in planning a spreadsheet solution is the design thereof. Contrasting with the “gowith-the flow” programming philosophy, preference for the modular design approach will pay dividends during implementation and during continued use through several benefits such as reducing risk through improved data flow and integrity, easing review and approval of calculations, setting up reports that do not have to change when inputs change, identifying assumptions and enhanced documentation and stronger controls. This concept is easily grasped from a diagram: The idea we need to employ is partly a kind of control itself – segregation of duties, achieved by grouping like data types and functions in separate worksheets (pages or tabs). In smaller applications, these can be on the same tab, but then formatting through the use of colors and borders are needed to delineate these separate data types. The saying, “Everything in its place and a place for everything,” takes on special impact in good spreadsheet design. The typical data types (counting calculations as data): Data inputs (typically periodically updated values used in calculations) Assumptions (infrequently updated universal values used throughout the application) Calculations or formulas, linked to input data Documentation such as standard work instructions, input sources, and an overview of the process Controls or validation checks Outputs (reports, journal entries, etc.) There are other options, such as navigational aids like a table of contents, or tabs that are section headers but these are more necessary or desirable to the extent that an application increases in size and complexity. Color. Another option to enhance spreadsheets is the appropriate use of color to draw the eye to the various distinct purposes of each area in a file. Conditional formatting often involves color too and can be quite beneficial in highlighting exceptions. Typical of medium to large files is the coloring of the tabs, as opposed to the actual contents. Good rules of thumb (but consistency is key): GREEN – Ideal for input values. Green for “go” here to enter the process inputs. Typically these ranges are unlocked. – Fits the need for values or other input which changes infrequently, but is still key, such as interest rates or periods (dates). Signifies caution, check whether the values need to be updated. These cells or name ranges may be unlocked for convenience, unless they rarely change. RED – Use for calculation sections (implies, “stop,” do not change without review and approval). Lock these cells and turn on protection for the worksheet after the formulas have been tested, reviewed approved to prevent unintended or other kinds of unapproved changes. BLUE – Typical for identifying linked cells. Other colors are useful for other purposes, such as identifying documentation, or error conditions, but the main point is to be consistent and explain the use of colors within the documentation, if not obvious. Another hint – Moderation in all things (don’t overdo it on colors, as they can also distract users). While the benefits were touched upon above, additional pluses for this approach include: Efficiency -Data is identified and entered once and used multiple times. This approach reduces re-keying, thereby reducing the possibility of entry errors. It is also easier to debug or trace errors if data is located in one place. Consistent quality of production – calculations being located in the same place allow easier checking and consistency of the formulas. Inconsistent formulas are a constant threat of error. Control – It is possible and even common for users to introduce checksums, input controls and similar tests throughout a large spreadsheet. It is better to move these controls to or create them in a single tab, including an overall control status cell which verifies the condition of all the controls as TRUE or FALSE, ideally with conditional formatting to highlight undesirable control status of the overall workbook application. As a bonus feature the overall status as a picture linked to wherever the majority of the spreadsheet’s work is being done can be a real plus. Ease of Navigation – Consistent existence of typical model elements and the order of their appearance makes it easier for users to follow the data flow and apply the process steps. Save time not wasted in poking around to find the input areas for example, which may seem arbitrary to non-developers. Review the File As obvious as it seems, bad spreadsheets can be caught red-handed and good spreadsheets can become better through a careful review by the developer. Once the major requirements, functions and processes have been compared to specifications (assuming they exist), ask an Excel expert to check it as well. In larger applications, this kind of review is impractical in any mode other than random-like sampling or spot-checking. In these cases, an investment in a spreadsheet diagnostic tool (see Appendix B) will be a wise one, worth the money. For the sheer heavy lifting and peace of mind they provide, the price will seem quite reasonable. There is a science to the use of these programs, since they tend to produce numerous false positives in certain tests (such as the search for “constants in formulas” which see the lookup column in a VLOOKUP function as a potential risk). There is some customizability of tests and built-in filtering of results that can overcome these situations. Checking the logical flow of a file is enabled by having documentation to use as a source for how the file is supposed to work. This kind of review is somewhat pointless without the baseline expectation of the various subprocesses represented by formula regions. The mathematical integrity of the various formulas is an easier chore. The trick is to substitute values for data which would produce outcomes easily verifiable against expected computations. Switch out existing numbers for zeroes or ones. Summary formulas should be easy to check. Unexpected results could signal inconsistent formulas or overriding of formulas with hardcoded values. Excel does provide a certain level of error checking, so it is a good quick check or complement to the automated diagnostic software. Hints about underlying issues are strewn about many workbooks in the form of little green triangles in the corner of a cell, with an exclamation mark call-out signaling some internal conflict with the rules. Instead of ignoring them, see if there is a good reason for the alert. If the answer is, “Yes, I know, that is okay,” then take the time to highlight the range in question and selecting “Ignore Error” from the smart icon (the callout) so that the alerts go away. This habit of confronting the error alerts precludes the tendency to ignore these warnings, which sometimes might actually be helpful. The other resource embedded in Excel is the Error Checking menu item under the Formulas>Formula Auditing ribbon. Database Mentality Making an Excel application look and act like a database (such as in Microsoft Access) has additional benefits in risk mitigation. Some will object that Excel is not a database and thinking so will cause misalignment of the program’s strengths and outcomes. Perhaps so, but within limitations, knowing some database principles and applying them as much as reasonably possible can have a positive impact on risk levels. In general, and especially for data inputs and calculations, think of columns as fields and rows as records. Some of these database preferences or features and their Excel implications: 1. Avoid blanks. Like Nature, databases abhor vacuums. One reason users build in numerous blank cell references, for example, is formatting output in a report. In turn, this perceived need stems from missing the modular design clue bus. We like a certain amount of white space around key output values, which is understandable, but if we adhere to the modular design concept, we can add all the white space that delights the eye without introducing risk by mixing data types and functions such as input, calculations and outputs. 2. Data type consistency. Anyone who has tried to upload an Excel page into Access will attest to the fact that mixing data types or even formats will lead to rejection of records in an upload. There is a good reason for this in that functions relying on given fields expect a consistent data type. A number field should not at the same time be formatted as text and vice versa. Dates should be in date format all the way down a column, and so forth. This database principle reinforces the modular design and the reverse is also true. 3. Process-flow enhancement. Databases consist primarily of data tables, queries to select or act on certain data sets and corresponding reports. The clear separation of database functions reinforces the modular and step-wise design of a process and an application containing a collection of processes. For example, data is uploaded, an update query is run to bring a table up to the latest version, and another query or report is run to produce the intended output. Thinking of Excel workbook solutions as processes with unwavering steps to add inputs, run calculations and produce outputs aligns well with the database mentality. Unintended variation is the enemy of both data quality and accurate outcomes. Practices to Avoid We have touched on the better practices in Excel spreadsheet use and development. One may ask what design practices should be avoided in order to improve control and increase efficiency and effectiveness of spreadsheet applications. A sample may be: “Bloatware” – in one sense, the definition of bloatware might be extended to include not just unwanted computer-clogging programs, but also unnecessary regions, pages, historical or ad hoc sections of a spreadsheet. Like aging humans, sometimes workbooks spread out in unhealthy ways with accumulated one-off or temporary analyses and supporting data that is irrelevant to the stated purpose of a critical application in Excel. Examples also include excessive periodic data, side calculations done for a one-time check, abandoned subprocesses, and “FYI” items. This condition tends to be found more in older files. One of the first risk indicators for a critical file will be its age, given this tendency to be packrats of data or analyses, which may be important enough to save, but would be better off in a separate file, appropriately named and documented. Opacity – One antonym to the desirable attribute of transparency is the usually unplanned behavior of hiding or not showing work that goes into a final output. This unhealthy tendency manifests itself in hidden ranges (columns, rows and tabs), but also in a subtler overcomplication of functions and a lack of documentation. Formulas that are overly complex are almost guaranteed to make all but the most determined critical eyes glaze over and pass on to something else. Skipping explanation of the input sources and how they are used, including any special functions (by special, consider those more sophisticated than normal math) tends to increase the odds of user misunderstanding and therefore misuse of the application. (A side benefit of properly documenting workbooks is reducing audit costs since auditors can follow what is going on with fewer questions for accounting staff. Employee transition upset is also reduced by good documentation.) Illogical flow – Similar to the stream of consciousness or organic programming, a map of sheet connections might show a drunken spider web of links, with some worksheets service as both sources precedents and dependents of each other. This circularity or seeming arbitrariness increases the complexity of the application and thereby makes it more difficult to discern its logic. It also may slow calculation or processing speed in larger files. For an antidote, recall the planning phase and (for more complex processes, use a project flow diagramming tool such as Microsoft Visio© to fully understand what is supposed to be happening in the workbook). Avoid overuse of volatile functions -- The common functions such as @NOW and @ROW can be handy, but the abuse of these expressions will slow performance. Array functions are also double-edged swords, very useful for auditors in particular, who are looking for an alternate method to recalculate, but are also resource hogs. Unlabeled and ad hoc calculations – Occasionally, one needs to do a quick double-check or reasonableness test on some data. The temptation to slide over to an open area on the source tab and pop in some formulas is very real, but is antithetical to good spreadsheet hygiene after the temporary section has served its brief purpose. Sometimes the purpose of the behavior is an ongoing control of some variable section of data. Instead, if the calculation is actually useful or could be made into an ongoing control, consider institutionalizing them in a dedicated control tab. Resist adding insult to injury that comes from just hiding these stray areas. Speaking of labeling, do not leave regions of data or formulas unlabeled -- There is an underlying assumption that “everyone” will know what the data or formulas represent. Refer back to the “database mentality” section above in which it is obvious that considering columns like fields in a database would prevent the lack of labeling, since fields cannot even be set up without a name for the data. By the way, a preferred way to create a label that contains multiple words is to put them all in one cell at the top of the section, then applying “wrap text” formatting. All too often users will enter longer labels in multiple rows. Doing so will prevent the best use of filters, as well as increase the number of header rows. Drawing a blank – A number of unhealthy spreadsheet habits can be traced back to the lack of modular design. Lacking appreciation for this technique can lead to formatting data or calculation heavy areas to also serve as outputs or reports by inserting blank rows or columns. Again, the effects include spreading the active section of the worksheet, but more ominously, the blank rows are often referenced in formulas, increasing risk significantly of inadvertent entry. Typically, this behavior reflects the need to add new records in an ensuing accounting period, for example, but there are better ways to make this entry easy without increasing risk, such as a simple data validation for the blank row (must equal a whole number = 0). This placeholder marks the row where new rows can be added that will be included in the summary formulas. Conclusion The purpose of this paper has been to share some information for spreadsheet developers and users to attain higher levels of excellence by reducing the risks inherent to user-developed applications. The time is now for boosting our knowledge base of risky Excel behaviors and their antidotes. Like international accounting standards, good Excel practices are more principles- than rules-based, which makes it easier to learn and more flexible to adapt to the myriad use cases encountered in four main spreadsheet domains of financial, analytical, operational and regulatory applications. The key to safer computing in Excel is knowing more and using more of the risk-reducing principles. Being at the decision-making table as professionals means taking proper safeguards even while learning and applying the coolest tools available. Make every risk you take a calculated one. Appendix A Spreadsheet Standards FAST http://www.fast-standard.org/ “Models should be Flexible, Appropriate, Structured and Transparent” SSRB http://www.ssrb.org/ “Custodians of the Best Practice Spreadsheet Modelling Standards” Appendix B Spreadsheet Diagnostic, Discovery & Management Tools Incisive suite http://new.incisive.com/ Cimcon suite http://www.sarbox-solutions.com/main/index.asp ClusterSeven http://www.clusterseven.com/spreadsheet-management/