Monetary & Financial Statistics: May 2004 Data cleansing for banking and monetary statistics By Julie Bigwood Tel: (020) 7601 4544 E-mail: julie.bigwood@bankofengland.co.uk The Bank of England compiles and publishes a wide range of aggregated statistical series using data collected from banks operating in the United Kingdom. This article describes some of the routine procedures and techniques employed to ensure that these data are fit for purpose. data items, for example they may be country specific. As forms are received these plausibility tests are applied, and items which breach the agreed ranges are passed to our analysts for further investigation. Using the failed plausie as a starting point, questions are then raised on an individual bank basis and sent back to the banks. Questions can range from a simple confirmation that the data are correct, to asking for a detailed breakdown of the movements between the two periods. Banks will generally provide the name of their customers whose business is driving the movement, so that the Bank can check that data have been allocated to the correct box on the form (this links back to the information gathering done during the round, for example we may be aware that a large company is being bought and the funds are coming from banks in the UK, so would expected this to be reflected in the answers to our questions). Banks (along with Bank of England staff) are aware that this information is confidential, and it is used purely for analytical purposes. Internal (within form) validations If the data received from individual institutions are in the correct format, they are processed by the Bank’s in-house statistical computer system, which automatically checks the raw data for compliance against a set of basic accounting identities. These validation checks can be as simple as requiring that total assets must equal total liabilities on a balance sheet return. If any validation is not met, it will be flagged up and the reporting bank asked to amend it. The validation rules are published on the Bank’s website1. Cross form validations These checks are similar in nature to internal validations but are designed to ensure consistency between two or more forms. One example is the Balance Sheet return, which contains the summary information for a month/quarter end. Additional forms reported (at a later date) provide a more detailed breakdown of certain items on the balance sheet, such that there is a mathematical relationship between data items on the two forms. If this relationship is not satisfied, reporting banks are asked to review the data. Problems often occur when changes made to the main balance sheet form are not carried through to the more detailed returns. These must be followed up and resolved. The cross form validation rules are also published on the Bank’s website1. Second stage plausibility checks When all forms of a certain type have been reported for a specific period, our analysts will typically run ‘secondary’ checks. These look at the largest and smallest movements overall for every box and, often, additional questions will then be raised as a result. Unusual data movements are also checked, for example, when all but one of the larger banks have moved one way and one in the opposite direction. This may not imply a data error, but movements like this would be investigated. Further questions may also be raised as a result of answers received, for example clarification of the type of business a company trades in (especially one that is not familiar to the analytical area). Answers received are also cross referenced to replies given by other banking institutions, to check all are reporting the business in a consistent manner. We also provide provisional embargoed data to other areas in the Bank such as Monetary Analysis (for Monetary Policy Committee purposes) which, although of a potentially lower quality at this stage of the checking procedure, can result in further questions for the banks. The output series are analysed for the final time at pre publication checking, and this might reveal an, as yet, unresolved or undiscovered error. Analysis down to the lowest level of detail will show us which banks have been included and the figures that are driving the changes. This procedure is sometimes referred to as ‘drilling down’ or ‘exploding’ the series. Information gathering The financial press are searched on a daily basis for details of securitisations, mergers, take over deals etc. This information is gathered together and put to use during the main balance sheet round, where data received are checked to ensure these details are reported as expected. Plausibility checks All reported data are subjected to plausibility checking, commonly known as plausies. Plausibility checks differ from validation rules in that they do not rely upon fixed accounting identities. They are based instead upon assessing the “plausible” range for individual data items, or the relationship between items, based on past behaviour. Plausibility checks may be different for each form, and can be different for individual or groups of 1 www.bankofengland.co.uk/mfsd/defs/index.htm 22