Fraud Detection in Banking using Big Data By Madhu Malapaka madhu@wilshiresoft.com For ISACA, Hyderabad Chapter Date: 14th Dec 2014 Revised: 14th Dec 2014 Wilshire Software Technologies 1 Agenda • • • • • • • • • Revised: Common Banking Frauds Fraud Fighting Activities Enterprise Fraud Systems Diagnostic Anatomy Big Data Hadoop Ecosystem Banks Data Source Social Network Data Providers Big Data Integration – Technology Stack Reporting Tools 14th Dec 2014 Wilshire Software Technologies 2 Fraud • Revised: A deception deliberately practiced in order to secure unfair or unlawful gain or causing loss to another party. 14th Dec 2014 Wilshire Software Technologies 3 Common Banking Frauds • Revised: A bank is typically exposed to different types of frauds. 14th Dec 2014 Wilshire Software Technologies 4 Fraud Fighting Activities • Fraud fighting activities can be grouped into three primary categories: Fraud Prevention - Proactive Fraud Detection - Reactive Fraud Investigation - Action Revised: 14th Dec 2014 Wilshire Software Technologies 5 Enterprise Fraud Systems Diagnostic Anatomy Source: www.executiveboard.com Revised: 14th Dec 2014 Wilshire Software Technologies 6 ATMS Policy ONLINE Users CREDIT Data Collection Data Analysis Compliance Fraud Detection External Data Feeds Data Logs Legal Action Business Process Change Adopt New Technologies Report Management 7 ATMS Policy ONLINE Users CREDIT Data Collection Data Analysis FraudMA P™ Reputation Manager 360 Compliance Fraud Detection External Data Feeds Data Logs Legal Action Business Process Change Adopt New Technologies Report Management 8 Monitoring Account Holder Behavior • It is organized around different phases or aspects of the online banking process. 9 Revised: 14th Dec 2014 Wilshire Software Technologies 10 ATMS Policy ONLINE Users CREDIT Data Collection Data Analysis Compliance Fraud Detection External Data Feeds Data Logs Legal Action Business Process Change Adopt New Technologies Report Management 11 How Banks can leverage Data Mining capabilities of Big Data for Fraud Detection Revised: 14th Dec 2014 Wilshire Software Technologies 12 BIG DATA • Velocity Moves at very high rates (think sensor-driven systems). Valuable in its temporal, high velocity state. • Volume Fast-moving data creates massive historical archives. Valuable for mining patterns, trends and relationships. • Variety Structured (logs, business transactions). Semi-structured and unstructured. Revised: 14th Dec 2014 Wilshire Software Technologies 13 BIG DATA BY HADOOP Hadoop is a combination of : • HDFS • MapReduce Storage Computation Hadoop Distributed File System (HDFS) • Distributed file system for redundant storage. • Designed to reliably store data on commodity hardware. MapReduce • A programming model for distributed data processing. • A data processing primitives are functions: Mappers and Reducers. Revised: 14th Dec 2014 Wilshire Software Technologies 14 Hadoop Ecosystem Pig • High-level data flow language. • Made of two components: Data processing language Pig Latin (Pig Scripts). Compiler to translate Pig Latin to MapReduce. Hive • • Data Warehousing Layer on top of Hadoop. Allows analysis and queries using SQL–like language. Mahout • Scalable machine learning algorithms on top of Hadoop. Revised: 21/10/2013 Wilshire Software Technologies 15 Hadoop Ecosystem Sqoop • A tool to automate data transfer between structured datastores and Hadoop. Flume • Distributed data/log collection service. • Collects data/log from their sources and puts in a centralized location for storage and processing. Revised: 14th Dec 2014 Wilshire Software Technologies 16 Hadoop Ecosystem Revised: 14th Dec 2014 Wilshire Software Technologies 17 Banks Data Source Identify Data Sources • Consider what data sources you’ll need to take advantage of. Existing data sources • This includes a wide variety of data, such as transactional data, survey data, web logs, etc. Purchased data sources • Does your organization use supplemental data, such as demographics? • If not, consider social media and news stream would complement your current data to create additional project value. Revised: 14th Dec 2014 Wilshire Software Technologies 18 Social Network Data Providers • Revised: This data works as input data to build big-data and can integrate with Bank’s Customer data. 14th Dec 2014 Wilshire Software Technologies 19 Banks Internal and Purchased Data CRM/customer support POS/purchases email/documents/collab. BI & data warehouse system & network logs web logs/clickstream google analytics/omniture facebook/twitter/yelp/ foursquare/google experian/epsilon/acxiom mobile devices sensors product reviews google search results + more Revised: 14th Dec 2014 many terabytes of data, sometimes many PETABYTES Wilshire Software Technologies BIG DATA 20 Big Data Integration – Technology Stack Revised: 14th Dec 2014 Wilshire Software Technologies 21 Analytics Data Logs RDBMS Wilshire Software Technologies 22 Reporting Tools Revised: 21/10/2013 Wilshire Software Technologies 23 81% of global banks say Big Data is a top priority in 2015 Are You Ready? Revised: 14th Dec 2014 Wilshire Software Technologies 24 Thank You! • Questions? Wilshire Software Technologies, based in Hyderabad, India is engaged in Consulting & Training for Big Data Analytics. Contact Information: Madhu Malapaka Managing Director Wilshire Software Technologies Hyderabad, India Cell +91 800 820 4581 madhu@wilshiresoft.com www.wilshiresoft.com Revised: 14th Dec 2014 Wilshire Software Technologies 25