Data Analysis and Continuous Auditing Barbara Jacius Brian DeMilia State of Connecticut State of Connecticut Office of the Comptroller Office of the Auditors of Public Accounts NASACT IT Conference September 25, 2015 1 State of CT Database/Application “Core-CT is the name given to Connecticut’s integrated Human Resource Management System (HRMS) and Financials system. The Core-CT system was implemented in 2003 to replace numerous older legacy systems to provide standardization, increased ad hoc reporting capabilities, simplified reconciliation, and an interactive user environment. Core-CT is a comprehensive system that includes the State of Connecticut’s accounting; purchasing; accounts payable; accounts receivable; project costing; inventory and asset management systems; payroll; benefits; human resource and time and labor functions. The Office of the State Comptroller (OSC) and the Department of Administrative Services (DAS) jointly administer and maintain the Core-CT system. The system uses enterprise resource planning (ERP) software to incorporate all business functions, using an integrated suite of software applications, common databases, and a unified technical architecture. In addition to standardized reports, the Core-CT system utilizes an Enterprise Performance Management (EPM) ad-hoc reporting function. The EPM ad-hoc reporting function allows users to query the data warehouse and produce custom reports. “ 1 1 AUDITORS' REPORT, CORE-CT System, Information Technology Security November 2014 2 Data Architecture Live Data Static Data Transactional Data EPM Warehouse Source Tables Core Financials EPM Reporting Using PS Query Acential (ETL) Core HRMS Data Loader (App Engine) PS Query/ nVision/ Crystal Reporting Tables 3 EPM Data Warehouse • Human Capital Management, HCM • 171 staging tables • 40 Reporting tables • Modules Represented Human Resources Benefits (Active, Retiree and COBRA) Workers Comp (with TPA interface) Time and Labor Payroll (Active, Retiree) Peoplesoft Pension module (Target Date-March 2017) 4 EPM Data Warehouse Financials 260 Staging Tables 70 Reporting Tables Modules Represented General Ledger, Commitment Control Accounts Receivable and Billing Accounts Payable Supply Chain & Inventory Asset Management Projects and Contracts 5 EPM Data Warehouse Data Volume-Large EPM Reporting Tables HCM • Detailed Payroll- 512,985,571 rows/54 columns • Time & Labor Payable Time- 302,531,645 rows/70 columns • YTD Payroll Data- 193,291,051 rows/25 columns Financials • HR Accounting Line- 942,246,770 rows/46 columns • Voucher Accounting Line-317,430,908 rows/69 columns • Journal Transactions-103,393,387 rows/65 columns • As of 8/19/2015 6 EPM Data Warehouse Legacy data in the EPM warehouse EPM is the repository for a number of legacy databases. This model of data retention will be increasingly important as more systems become obsolete and technical support staff retire. MSA and TAS, legacy payroll and time and labor data is stored as static tables in the EPM warehouse Data is retained for pension calculations , workers compensation determination and settlement of legal actions 7 EPM Data Warehouse Custom views used to fulfill auditing requirements Custom views can be developed to accomodate unique reporting needs. One example: Financials upgrade from Peoplesoft version 9.0 to 9.1 implemented a workflow framework change in purchasing and accounts payable rendering the workflow data prior to the upgrade irretrievable in production. Static views were built in EPM to store data prior to the framework change facilitating seamless auditing of the voucher and purchasing approval process. 8 EPM Data Warehouse Custom solutions Requirement: Provide audit tool for line agencies to facilitate compliance with Connecticut General Statute 5-208a, Identification and evaluation of dually employed state employees Scenario 1: EPM Query Manager tool automatically inserts a security table which restricts query results for the identification of employees with more than one active state job. Solution: Deploy the query from the transactional database HRMS psQuery tool which does not insert the security table. Scenario 2: Identification of employees with active state employment and an active personal service contract with the state requires marrying data from the financials and HRMS systems. Solution: Create the query in the EPM data warehouse. 9 PS Query Overview Query Functionality PeopleSoft Query is an end user reporting tool; a visual tool that has a graphical user interface (GUI) giving the user the ability to construct queries without requiring knowledge of SQL, as it generates the SQL for you. PeopleSoft Query is available in the source environments as well as the data warehouse. Access to the query manager tool in the source environments is restricted to limit impact on transactional performance. Use of the process scheduler permits user initiated distribution of query results, either by user ID or user Role. Care must be taken when using the distribution functionality as the product is simply a spreadsheet. Row security is not enforced on query results distributed via this method. 10 PS Query Overview Query Functionality • User initiated queries are defined as reporting/user queries. • Includes non-user queries; XML Publisher, Crystal Reports • Role • Query results add self-service time & labor and paycheck viewer roles upon creation of initial hire row • PSnVision source • Comprehensive Financial Status Reports (CFSR) distributed to all state agencies 11 PS Query Overview • Query security • “PeopleSoft Query uses query access group trees to control security of the tables in your PeopleSoft database. Users can retrieve information only from those tables whose record definitions they have access to.”2 • “PeopleSoft applications implement row-level security by using a query security record (typically a view) that is specified on the record definition that joins the data table with an authorization table. When a user searches for data in the data table, the system performs a related record join between the security record view and the base table (rather than searching the table directly).” 2 2 Oracle Peoplebook:Peoplesoft Query 12 Use of PS Query for Field Auditors AUDITORS' REPORT, CORE-CT System, Information Technology Security November 2014 Relevant audit recommendations: • Segregation of duties • Access to confidential data 13 Use of PS Query for Field Auditors EPM/Security Unit Response Developed series of new EPM Reporting tables and public queries to provide agency staff with the tools to perform own audits Security tools tables brought into EPM, each version sourced from correct environment Example: three versions of the tools table PSOPRDEFN (Operator Definition) FN , HRMS and EPM One table is sourced from the portal User Profile View-verifies user set up automated password reset functionality 14 APA PS Queries • 124 Active CORE ID’s assigned to Auditors of Public Accounts, the single largest group of users in EPM • 76 PS Queries in EPM, 45 in Financials, 13 in Human Capital Management • The IT Audit Group maintains public PS queries that are available to be run by all field audit staff. 15 PS Query Overview Quick start tool for new auditors User Friendly at introductory level Field audit staff can create and maintain their own queries that are associated with their own user account (private queries), and can share the queries they create with other specified users Creation and modification of simple queries can be learned in one class session. On-line resources include • High level table descriptions • Job aids defining proper join conditions between tables. 16 Public Access Transparency Sites EPM is used to populate two of the Transparency Sites hosted by CT State government. The Office of the State Comptroller hosts the site Open Connecticut. The Office of Legislative Management, Office of Fiscal Analysis hosts Transparency Connecticut 17 Public Access Transparency Sites The Office of the State Comptroller hosts the site Open Connecticut. Open Checkbook-granular accounts payable data, refreshed nightly Received an A from the United States Public Interest Research Group (US PIRG) for their annual grading of state financial transparency sites, based on user friendly features and ability to download the entire data set.2014 US PIRG is a federation of independent, state-based, citizen-funded organizations that advocate for the public interest. Open Budget-Ledger data provides overview of high level budgeting; spend, revenue and trending. This data is refreshed monthly after General Ledger closes for the accounting period. URL: http://www.osc.ct.gov/openCThome.html 18 Public Access Transparency Sites EPM also populates the searchable website Transparency Connecticut sponsored by Office of Legislative Management, Office of Fiscal Analysis. Data refreshed annually. This site offers: • State Expenditures • Grants • Employee Compensation • Vendor Payments • Contract Spending • Pensions URL:http://transparency.ct.gov/html/main.asp 19 Public Access Transparency Sites Benefits derived from using EPM as the data source • Data thoroughly reviewed by state subject matter experts • HIPAA (Health Insurance Portability and Accountability Act) compliant • Statutory redactions dynamically coded out o Federal guidelines imposed by FERPA (Family Educational Rights and Privacy Act) legislation o State statutes governing release of names/addresses of protected groups • Risk of data breaches is reduced when content control retained at source 20 PS Query Overview • That concludes the first portion of our presentation. I’d like to introduce Brian DeMilia to discuss details of continuous auditing. 21 APA Analysis and Reporting Tools • Core-CT • PeopleSoft Query (PS Query) is used by all staff • Oracle SQL is used by the IT Audit Group, and select members of the APA Audit team assigned to the Office of the State Comptroller audit. • Select employees have 10gb of storage space in their own schema in each Core-CT database to which they have access. • Software (Data access and visualization tools): • PL/SQL Developer (Allround Automations) • Core-CT has an unlimited site license for this product, allowing all of our staff who use the Core-CT databases, to use the product. • Oracle SQL Developer • Data Miner extension (of Oracle SQL Developer) • Navicat • When accessing other state systems… • We attempt to obtain direct access to the database(s) of the system being audited • When needed, we perform extracts from those systems, for loading into an Oracle database, for analysis outside. • Needed when a query would take too long to run against source system • Or when intermediary tables must be created • Or tables need to be indexed or partitioned differently than in the source system 22 Windows Task Scheduler • The task scheduler enables .bat files to be run at defined times/intervals • .bat files can be used to invoke SQL Plus and run a .sql file • Allows for the scheduling of ‘tasks’ • One such task can be the execution of a query • Many programs can be invoked from the command line, offering the same and often times additional functionality than as compared to its GUI • Oracle allows you to run queries over the command line using sqlplus • Oracle allows you to load data over the command line using sqlldr • By scheduling queries to run over sqlplus, you can automate the extraction, transformation, and potentially the loading of data in other systems, or the transfer of data to FTP locations accessible by others 23 Getting to the Task Scheduler • Go to…. • Control Panel • Administrative Tools • Task Scheduler 24 Creating a Task Creates a new task Creates a new folder in which to organize a related group of tasks 25 Choosing a .bat file to run • .bat files can be run as scripts via the task scheduler • The commands in the .bat file can be used to invoke various command line utilities (sqlplus, sqlldr, 7z (7zip), etc.) (1) Hit Actions (2) Choose the .bat file 26 .bat and .sql file – Basic illustration (1) Example .sql for .csv output (2) Example .sql for .txt output, pipe delimited Dynamically include date, fiscal year, etc. in filename Comma delimited; values enclosed in quotation marks Concatenate the pipe character • Concatenating the delimiter between columns avoids the blank spaces that would otherwise result from specifying the delimiter using the colsep approach (3) Example .bat file (runs a specified .sql file) sqlplus /nolog @C:\some_folder\some_query.sql • As a result, loading into other applications can work simply by specifying the delimiter ( | ) • CSV output can be tricky due to commas within values (requiring them to be enclosed in quotation marks), and Excel’s behavior of auto chopping leading zeros in CSVs 27 Choosing the interval at which it runs (1) Hit Triggers (2) Choose a schedule… Monthly… Weekly on certain days… Etc. (3) Repeat throughout the day? (4) Terminate if still not finished after X time? 28 Need to load data? • SQLLDR is a built-in command line utility that allows you to load data into Oracle databases at high speed • It also allows you to process data before the load, such as trimming blank spaces, keeping only part of a string, converting text to a date, etc. 2 components… the control file, and the command to run it options( (1) Example control file skip = 0, direct = true, parallel = true, multithreading = true ) unrecoverable load data infile 'B:\0-JUDICIAL_TRANSFER\KeyPointDates.txt' append into table keypointdates fields terminated by "|" trailing nullcols ( CASEREFNUM, WRITENTRYDATE "to_date(:WRITENTRYDATE,'Mon DD YYYY HH12:MI:AM')", TRIALLISTCLAIMDATE "to_date(:TRIALLISTCLAIMDATE,'Mon DD YYYY HH12:MI:AM')", VERDICTTRIALCOMPLETEDATE "to_date(:VERDICTTRIALCOMPLETEDATE,'Mon DD YYYY HH12:MI:AM')", INITDISPOSITIONDATE "to_date(:INITDISPOSITIONDATE,'Mon DD YYYY HH12:MI:AM')", ………VARIOUS OTHER COLUMNS……… ) • Example log file showing success or failure… (2) Example command sqlldr user/pw@db_name control=example_ctl_file 29 List of Examples • Some of our scheduled* tasks include – * Some of these are run manually at a defined interval and published on shared drives, others are scheduled via the task scheduler. • Judicial Department’s divorce records with employee spouses enrolled in state insurance • Vendor Matching • • • • • • • • • • • • • • Employee’s phone to vendor’s phone Employee’s dependents’ phone numbers to vendor’s phone Employee’s emergency contact’s phone to vendor’s phone Employee’s address to vendor’s address Employee’s emergency contact’s address to vendor’s address Employee’s dependents’ addresses with vendor’s address Terminated Employee Core-CT Account Deactivation Delays Core-CT Revenue Population and Sampling Core-CT User Conflicts Holiday Time Charged on non-Holidays Consecutive Sick Day Data (where Medical Certificate is required) Timesheet Approver Counts P-Card transaction Data by Agency Asset Module Balances by Agency 30 Divorce Records <-> Core Benefits Module Match • Runs the first Monday of every month • Divorce cases disposed of (disposition date) in the preceding month are matched against dependent spouses enrolled in one or more insurance plans by a state employee • The first Monday of every April a full match is performed (all divorce cases existing in Judicial’s records (going back 20+ years) • The Jaro Winkler algorithm is used to identify similar but not exact matches (Judicial’s records do not contain SSNs or other personally identifiable information, so pairs of names (couples) are matched) • Double click on the document below for a summary of the process. 31 Vendor Matching -- Task Scheduler Screenshot • The task: .bat files to run According to this schedule Employee’s phone to vendor’s phone Employee’s dependents’ phone numbers to vendor’s phone Employee’s emergency contact’s phone to vendor’s phone Employee’s address to vendor’s address Employee’s emergency contact’s address to vendor’s address Employee’s dependents’ addresses with vendor’s address 32 Terminated User Account Lockout Delays • Create temp table, finds all terminations, and the first date after in which the account became locked • Then, using that temp table, further grab the first date, if any, in which the user logged in subsequent to their termination date 33 Revenue Reconciliation and Sampling • Each fiscal year, revenue producing modules are reconciled with the balances on the general ledger module • In other words, does the sum of transactions (deposits, invoices, etc.) total up to the balances on the GL? (see revenue reconciliation.sql) • In addition, an Access database holding transaction data (summarized at one row per transaction in one report, with detail (accounting string) data in another) is prepared for the field (see revenue sampling.sql) • Double click on each icon to view the queries used. Some of the source tables reconciled to: PS_BI_ACCT_ENTRY (Billing) PS_PAY_MISC_DST (ARJ source; AR direct journals) PS_ITEM_DST (ARI source; AR open items) PS_VCHR_ACCTG_LINE (Accounts Payable) PS_CA_ACCTG_LN_PC (Contacts; Projects Costing) 34 Core-CT User Role Conflicts, Query (Snippet) • Using audit logs, we retroactively identify role conflicts that existed at various points in the year just ended, at each fiscal year end. • Double click on the document below for an example query that is part of the process on the Financials side. 35 Core-CT User Role Conflicts, End Result Date the data for which the snapshot was taken. Environmen t to which the conflict applies (HR, FIN, etc.) Brief English language description of the conflict (ex. create and pay a person). Various attributes related the employee/user: Color coded illustration of roles that are OK as is (the X roles are okay wither the A roles, or the B roles, but not a mix of both A and B) Long concatenated list of all roles held by employee that relate to the conflict 36 Holiday Time Charged on Non-Holidays For facilitation of the field auditors’ review, if the date is defined as a holiday by ANY schedule, that holiday name is shown, as well as the closest holiday occurring before and closest holiday occurring after • In PeopleSoft, employees are assigned to various time and labor workgroups, time and labor workgroups are assigned to various holiday schedules, and various dates are defined as holidays within each holiday schedule • This query shows all instances where an employee charged time to HOL on days not defined as a holiday according to their workgroup’s holiday schedule 37 Consecutive Sick Days • Query to identify sick days taken over a span of time that may require a medical certificate • Uses analytic functions to look above and below the current row • Finds employees who took 5+ consecutive sick days, or 6+, depending on applicable regulations, and shows one row for each day in each such stretch of time • Double click on the icon below to view the query. 38 Timesheet Approvers • Runs every Wednesday at 6am • One file per fiscal year • Overwrites current file if same fiscal year as a file that already exists • If a new fiscal year, creates a new file with that fiscal year as the name • List of employees by department w/ the count of timesheet approvals by approver (one row per employee and approver) • Double click on the document below for 2 rows of example output. 39 P-Card Transaction Data • Prepared each fiscal year • Using the bank’s website, we run a report for all transactions in the fiscal year just ended. • We import that result set into an Oracle database using sqlldr • We then use a sqlplus script to generate one file per agency with just that agency’s transactions during the fiscal year • The field can then use each file to analyze the P-Card transaction population for a given fiscal year review specific suspicious transactions or select samples 40 Asset Module Balances • This query produces asset module balances by business unit and asset category, for matching with each agency’s asset reporting form that they submit to the Comptroller’s office for financial statement preparation 41 Conclusion • Questions? 42