Self-checks and Controls in spreadsheets Presentation to tenth annual EuSpRIG conference, Paris 2009 Patrick O'Beirne Systems Modelling Ltd Tara Hill, Gorey, Co. Wexford, Ireland Tel +353-53-942-2294 Email pob@sysmod.com http://www.sysmod.com Introduction – Patrick O’Beirne BSc MA FICS – Systems Modelling Ltd. Ireland (sysmod.com) – Current focus: spreadsheet quality and auditing. • ‘Spreadsheet Check and Control’ book • Software for assessing s/s (ScanXLS, XLTest) • Other IT books and articles. – Presentations to Irish Computer Society, ISACA, EuSpRIG, the Excel User Conference, and other interest groups. – Professional affiliations: • Irish Computer Society • European Spreadsheet Risk Interest Group (EuSpRIG) • Software Testing Interest Group in Ireland (SoftTest) Sloppy Spreadsheets: Readers Speak Out (cfo.com) • • • • • • http://www.cfo.com/article.cfm/11525407 CFO.com June 18, 2008 Should show footers showing file location, date, tab name, file name, and page #x of #y Should include a summary tab that directs the user's attention, and worksheets should be clearly named and flow from left to right. Hidden columns/rows and hidden sheets can contain confidential information that should not be sent to certain parties. columns and rows should be grouped, not hidden. In a nutshell, [a big problem is] hard-coding numbers in formulas that should be assumptions. Not setting up the sheet to print. I received a simple spreadsheet today that, when printed, came out as a three-page portrait, when it should have been a one-page landscape. Sloppier Spreadsheets: How Bad Can They Get? • • • • • • • • • • • • http://www.cfo.com/article.cfm/11950766 August 20, 2008 Tip: Color-coded Tabs: Delineate which tabs I, as the reader, should focus on. Trap: Not Deleting unneeded Data Tip: Totals at the Top: For readability and ease of adding items at the bottom. Have the summary tab first, just like in a document where the executive summary is the first page. Traps: Not Noting Revisions: a link to a missing spreadsheet; returning the file with no indication of having updated it, and no name change Tip: Test It Yourself: How many people actually test their spreadsheets? Trap: When Rows Don't Foot I have found the most basic, often-forgotten rule is the cross check. I now refuse to review where the footing cell is not visible. Tip: Publish Your Standards: with detailed structure and company reqts Trap: Where Did We Leave Off? Save the workbook so it opens to the appropriate worksheet, at the proper point. Tip: An Alternative to the Negative Sign: Format any negative numbers or percents into brackets, instead of showing that little bitty negative sign Trap: Dressed Up, but Nowhere to Go: a spreadsheet formatted correctly, formulas tied together nicely, etc. — yet the one who produced it cannot produce the data behind the spreadsheet. Presence of cross-checks When F11 is sum of B11:E11 then cross check is sum of F2:F10 - or vice versa Percentages should add up to 100% Compare proportions, changes, ratios Have one summary sheet that is always printed that contains the full set of headers and footers, summary lines and totals from every sheet with quality checks in printable form ↔ Cross foot • Sum every column and every row • Calculate difference row from column totals • Display a large red error indicator if the difference is significant, so that even on printouts it cannot be missed. • Excel precision variances as small as 1E-13 (10-13) will still be different from zero. =IF(ABS(H10-J10)<0.01, "", "Mismatch "&(H10-J10)) Data Integrity • Have error checks where they can be seen whether on screen or printed =IF(ABS(H10-J10)<0.001, "OK", "Out of balance!") =IF(C8>1000,"Too Large?","") =IF(NOW()>D9,"After review date, check assumptions","") =IF(ISNUMBER(D9),"","Enter the ")&"Review date:“ =IF(COUNT(B5:E5)=COLUMNS(B5:E5),"","Missing") Clear Navigation • • • Use of Summary and Table of Contents Meaningful sheet tab names Hyperlinks where appropriate http://www.bpmglobal.com DCLG Housing PFI Financial Model template.xls 10 Print setup header & footer Multiple plus ungood Great-grand-total • Sum everything above • Inner total (includes rows of subtotals only) divide by 2 • Outer total (includes grand totals at bottom and right) divide by 6 • Where there are no internal totals, or no grand total row, then divide by 4 • Adjust as appropriate when using SUBTOTAL Cross foot added SUM replaced by SUBTOTAL(9, Check for blank or text values Input / output balance What changed? A sense of proportion Room for expansion B12=SUM(B4:B11) Phantom Phormula Phixer • Extends if new data typed in • ..not if pasted or filled Redundancy • • • • • • • Different formulas giving the same result SUM SUMPRODUCT DSUM SUMIF Array SUM Data quality checks using COUNT etc Array SUM vs SUMPRODUCT • Count #error values in a range named ‘data’ • {=SUM(ISERROR(data)*1)} • =SUMPRODUCT(ISERROR(data)*1) • Count text values in a range • {=SUM(--ISTEXT(B4:B20))} • =SUMPRODUCT(--ISTEXT(B4:B20)) Array SUM vs SUMIF • Sum of data values less than 0 • {=SUM(Data*(Data<0))} • =SUMIF(Data,"<0") • Sum greater than 0 less than or equal to 5 • {=SUM((Data>0)*(Data<=5)*Data)} • =SUMIFS(Data,Data,">0",Data,"<=5") Top 10 Questions (1) 1. What is the purpose of the spreadsheet? a. Criticality – what if it were lost or corrupted? 2. Where is it kept – network location, set of files a. How do we know which is the current version? b. Complete list of data sources it depends upon c. What depends on this spreadsheet? 3. How is it used? (Process documentation, instructions) 4. Is it for one person or is it re-used by others? Top 10 Questions (2) 5. Is it once-off (project) or has it a periodic operation? 6. Who peer reviews its structure and version changes? a. If none, likelihood of key-person risk? b. Evidence of test (with results) and acceptance 7. What controls are around it? a. Who reviews & signs off its output? b. Reconciliation with data sources 8. What checks are included within it? – a. Cross-foot, balance checks, etc Top 10 Questions (3) 9. What evidence is there of conformity to good design practices? a. Potential long list, see below b. Clear block layout, formats, print output header/footer c. Formula integrity, protection, no errors, no external links d. Use of timesaving formulas and features 10. What are the pain points? a. Quality of input data; duplication, update b. Grunt work transforming data c. Effort maintaining & updating formulas d. Training in more efficient Excel skills e. Possible to replace with controlled shared system? Typical Data Quality checks (1) • Match and Merge: Combines columns from sheets where rows are matched by some comparison operator. • Compare: compares two sheets. Is the copy of the data in the spreadsheet consistent with the original data stored in the source database? • Extract: Separate a sheet into multiple sheets based on values in a column. Sampling. • Generate: Fill cells with random, fixed or incremental values, characters, dates, or numbers • Convert: transform or reformat data formats or data types. Did the spreadsheet correctly process the format and data types of the data at the time? Typical Data Quality checks (2) • Group: Subtotals, Top/Bottom Items, Date Aging, Stratification by bands, Cross-tabulation • Statistics: Descriptive Statistics, Summary • Duplicates: duplicated rows (are primary keys still unique?) • Gaps: missing rows, data items missing (empty cells), or invalid • Find: suspicious data (all the 9s, 01/01/01, and similar) • Spell-check: are there any spelling mistakes? • Benford's analysis: used to detect fraud from the pattern of digits where amounts have been invented. References http://www.eusprig.org/hdykysir.pdf How do you know your spreadsheet is right? Philip L. Bewig July 28,2005 http://arxiv.org/pdf/0809.3609 Information and Data Quality in Spreadsheets. Patrick O'Beirne. Proceedings of EuSpRIG 2008 http://www.spreadsheetsafe.com Spreadsheet Safe training and certification of user competence http://www.sysmod.com/scc.htm Spreadsheet Check and Control, Patrick O'Beirne, Systems Publishing 2005 Self-checks in spreadsheets: a survey of current practice; David Colver, Operis Ltd, Proceedings Eusprig conference, 2008 List of checks • • • • • • • • • • • • • • Balance sheet balances Financial statements add up Financial statements have expected signs Sources match uses Identities hold true Balance sheet clears out Cash cascade gives same net as cash flow Ratio Inclusion Analysis Tax reconciliation Yield analysis Physical identities (material balance) Iterative solution has converged Inputs make sense Outputs meet participants’ requirements Self-checks in spreadsheets: a survey of current practice, David Colver, EuSpRIG 2008 Spreadsheet Check and Control By Patrick O’Beirne Systems Publishing, 2005 ISBN 190540400X http://www.sysmod.com/scc.htm 49