Data Cleansing for Banking and Monetary Statistics

advertisement
Monetary & Financial Statistics: May 2004
Data cleansing for banking and monetary statistics
By Julie Bigwood
Tel: (020) 7601 4544
E-mail: julie.bigwood@bankofengland.co.uk
The Bank of England compiles and publishes a wide range of aggregated statistical series using data
collected from banks operating in the United Kingdom. This article describes some of the routine
procedures and techniques employed to ensure that these data are fit for purpose.
data items, for example they may be country specific. As
forms are received these plausibility tests are applied, and
items which breach the agreed ranges are passed to our
analysts for further investigation. Using the failed
plausie as a starting point, questions are then raised on an
individual bank basis and sent back to the banks.
Questions can range from a simple confirmation that the
data are correct, to asking for a detailed breakdown of the
movements between the two periods. Banks will
generally provide the name of their customers whose
business is driving the movement, so that the Bank can
check that data have been allocated to the correct box on
the form (this links back to the information gathering
done during the round, for example we may be aware that
a large company is being bought and the funds are
coming from banks in the UK, so would expected this to
be reflected in the answers to our questions). Banks
(along with Bank of England staff) are aware that this
information is confidential, and it is used purely for
analytical purposes.
Internal (within form) validations
If the data received from individual institutions are in the
correct format, they are processed by the Bank’s in-house
statistical computer system, which automatically checks
the raw data for compliance against a set of basic
accounting identities. These validation checks can be as
simple as requiring that total assets must equal total
liabilities on a balance sheet return. If any validation is
not met, it will be flagged up and the reporting bank
asked to amend it. The validation rules are published on
the Bank’s website1.
Cross form validations
These checks are similar in nature to internal validations
but are designed to ensure consistency between two or
more forms. One example is the Balance Sheet return,
which contains the summary information for a
month/quarter end. Additional forms reported (at a later
date) provide a more detailed breakdown of certain items
on the balance sheet, such that there is a mathematical
relationship between data items on the two forms. If this
relationship is not satisfied, reporting banks are asked to
review the data. Problems often occur when changes
made to the main balance sheet form are not carried
through to the more detailed returns. These must be
followed up and resolved. The cross form validation
rules are also published on the Bank’s website1.
Second stage plausibility checks
When all forms of a certain type have been reported for a
specific period, our analysts will typically run
‘secondary’ checks. These look at the largest and
smallest movements overall for every box and, often,
additional questions will then be raised as a result.
Unusual data movements are also checked, for example,
when all but one of the larger banks have moved one way
and one in the opposite direction. This may not imply a
data error, but movements like this would be
investigated. Further questions may also be raised as a
result of answers received, for example clarification of
the type of business a company trades in (especially one
that is not familiar to the analytical area). Answers
received are also cross referenced to replies given by
other banking institutions, to check all are reporting the
business in a consistent manner. We also provide
provisional embargoed data to other areas in the Bank
such as Monetary Analysis (for Monetary Policy
Committee purposes) which, although of a potentially
lower quality at this stage of the checking procedure, can
result in further questions for the banks. The output
series are analysed for the final time at pre publication
checking, and this might reveal an, as yet, unresolved or
undiscovered error. Analysis down to the lowest level of
detail will show us which banks have been included and
the figures that are driving the changes. This procedure
is sometimes referred to as ‘drilling down’ or ‘exploding’
the series.
Information gathering
The financial press are searched on a daily basis for
details of securitisations, mergers, take over deals etc.
This information is gathered together and put to use
during the main balance sheet round, where data received
are checked to ensure these details are reported as
expected.
Plausibility checks
All reported data are subjected to plausibility checking,
commonly known as plausies. Plausibility checks differ
from validation rules in that they do not rely upon fixed
accounting identities. They are based instead upon
assessing the “plausible” range for individual data items,
or the relationship between items, based on past
behaviour. Plausibility checks may be different for each
form, and can be different for individual or groups of
1
www.bankofengland.co.uk/mfsd/defs/index.htm
22
Download