VISVESVARAYA TECHNOLOGICAL UNIVERSITY "Jnana Sangama", Belgavi-590 018, Karnataka, India An Internship Report On CREDIT CARD FRAUD DETECTION Submitted in Partial Fulfillment of the requirement for the award of the degree of BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND ENGINEERING Submitted By CHANDANA A S 1SJ18CS016 Carried out at Technologics Global Pvt Ltd Raghavendra Complex, Jayanagar 4th Block, Bangalore Under the guidance of Internal Guide External Guide Prof. Ashok K N Assistant Professor Dept. Of CSE, SJCIT Mr.Sagar Chakraborty Technical Alliance Manager Technologics Global Pvt Ltd S J C INSTITUTE OF TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CHIKKABALLAPUR-562101 2021-2022 DECLARATION I, CHANDANA A S, student of VIII semester B.E in Computer science & Engineering at S J C Institute of Technology, Chickballapur, hereby declare that the Internship work entitled “CREDIT CARD FRAUD DETECTION” has been independently carried out by me under the supervision of Prof. Ashok K N, Assistant Professor, and the coordinator Prof. Swetha T, Assistant Professor, submitted in partial fulfillment of the course requirement for the award of degree in Bachelor of Engineering in Computer Science & Engineering of Visveswaraya Technological University, Belagavi during the year 20212022. I further declare that the report has not been submitted to any other University for the award of any other degree. PLACE: Date: CHANDANA A S USN:1SJ18CS016 i ABSTRACT It is vital that credit card companies are able to identify fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Such problems can be tackled with Data Science and its importance, along with Machine Learning, cannot be overstated. This project intends to illustrate the modelling of a data set using machine learning with Credit Card Fraud Detection. The Credit Card Fraud Detection Problem includes modelling past credit card transactions with the data of the ones that turned out to be fraud. This model is then used to recognize whether a new transaction is fraudulent or not. Our objective here is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications. Credit Card Fraud Detection is a typical sample of classification. In this process, we have focused on analyzing and pre-processing data sets as well as the deployment of multiple anomaly detection algorithms such as Local Outlier Factor, Sector Vector Machine and Isolation Forest algorithm on the PCA transformed Credit Card Transaction data. ii ACKNOWLEDGEMENT With reverential pranam, we express my sincere gratitude and salutations to the feet of his holiness Byravaikya Padmabhushana Sri Sri Sri Dr. Balagangadharanatha Maha Swamiji, & his holiness Jagadguru Sri Sri Sri Dr. Nirmalanandanatha Maha Swamiji of Sri Adichunchanagiri Mutt for their unlimited blessings. First and foremost, we wish to express my deep sincere feelings of gratitude to our institution, Sri Jagadguru Chandrashekaranatha Swamiji Institute of Technology. For providing me an opportunity for completing my internship work successfully. I extend deep sense of sincere gratitude to Dr. G T Raju, Principal, S J C Institute of Technology, Chickballapur, for providing an opportunity to complete the Internship Work. I extend special in-depth, heartfelt, and sincere gratitude to our HOD Dr.Manjunatha Kumar B H, Professor and Head of the Department, Computer Science and Engineering , S J C Institute of Technology, Chickballapur, for his constant support and valuable guidance of the Internship Work. I convey our sincere thanks to Internship Internal Guide Prof. Ashok K N, Assistant Professor, Department of Computer Science and Engineering, S J C Institute of Technology, for his constant support, valuable guidance and suggestions of the Internship Work. I am thankful to Internship External Guide M r . Sagar Chakraborty, Technical Alliance Manager, Technologics Global Pvt Ltd, Bangalore for providing valuable guidance and encouragement of the Internship Work. I also feel immense pleasure to express deep and profound gratitude to our Internship Coordinator Prof. Swetha T, Assistant Professor, Department of Computer Science and Engineering, S J C Institute of Technology, for her guidance and suggestions of the Internship Work. Finally, I would like to thank all faculty members of Department of Computer Science and Engineering, S J C Institute of Technology, Chickballapur for their support. I also thank all those who extended their support and co-operation while bringing out this Internship Report. CHANADANA A S (1SJ18CS016) iii CONTENTS Declaration i Abstract ii Acknowledgement iii Contents iv List of Figures vi Chapter No Chapter Title 1 COMPANY PROFILE 1.1 History of the Organization Page No 1-3 1 1.1.1 Objectives 1 1.1.2 Operations of the Organization 1 1.2 Major Milestones 2 1.3 Structure of the Organization 2 1.4 Services Offered 3 2 ABOUT THE DEPARTMENT 4-7 2.1 Specific Functionalities of the Department 4 2.2 Process Adopted 4 2.3 Testing 5 2.4 Structure of the Department 6 2.5 Role and Responsibilities of Individuals 7 3 TASK PERFORMED 8-9 4 REFLECTION NOTES 10-18 10 4.1 Experience 4.2 Technical Outcomes 10 4.2.1 System Requirement Specification 11 4.3 System Analysis and Design 11 4.3.1 Existing System 11 iv 4.3.2 Disadvantages of the Existing System 11 4.3.3 Proposed System 11 4.3.4 Advantages of the Proposed System 12 4.4 System Architecture 12 4.4.1 Data Flow Diagram 12 4.4.2 UML Diagram 13 4.4.3 USE CASE Diagram 13 4.5 Implementation 14 4.5.1 Modules 14 4.6 Screenshots 5 16 CONCLUSION 19 BIBLIOGRAPHY 20 v LIST OF FIGURES Figure No. Name of the Figure Page No. Figure 1.1.2 Organization operations 1 Figure 1.3 Organization structure 2 Figure 2.2 Process adopted 5 Figure 2.3 Department Structure 7 Figure 4.4.1 Data flow diagram 12 Figure 4.4.2 UML diagram 13 Figure 4.4.3 Use Case diagram 13 Figure 4.5.1 Libraries 14 Figure 4.5.2 Datasets 14 Figure 4.5.3 Transaction Class Distribution 15 Figure 4.5.4 Graph of Transaction 15 Figure 4.5.5 Amount of Transaction 15 Figure 4.5.6 Graph of Amount Transaction 16 Figure 4.5.7 Correlation 16 Figure 4.5.8 Graph of Correlation 16 Figure 4.5.7 Time of Transaction vs Amount by Class 17 Figure 4.5.9 Outlier Detection Method 17 Figure 4.5.10 Outlier Detection Method Cont. 17 Figure 4.6.1 Local Outlier Factor 18 Figure 4.6.2 Isolation Forest 18 Figure 4.6.3 Support Vector Machine 18 vi CHAPTER - 1 COMPANY PROFILE 1.1 History of the Organization Technologics head - quartered in Bangalore, India. is established by technology pioneers having decades of experience across India & Middle East in controls and automation industry. Technologics commenced business trading as a designer and installer of multi brand controls, automation distribution systems, providing a complete range of Custom Installation services and has an extensive experience in the field of complex turn-key solutions development and integration. We propose, design, develop, train, install, integrate, operate and maintain the stateof-the-art IBMS/SCADA and the PLC automation systems and solutions. The services we offer span the full development / integration lifecycle from definition of requirements to field testing of implemented solutions. 1.1.1 Objectives Aim of Technologics is to predominately provide a comprehensive training service in helping candidates to improve their skills and qualification in order to enhance their employability options. This will incorporate a high-quality intensive training to sustain in industry and to fulfill a dream of working in core industry. 1.1.2 Operation of the Organization The organization is operated by Mohammed Haneef who is Founder, Director and CEO of the company. The company initiated operations in India from 2010. But it exists in the market for almost 2 decades. They executed lots of projects in Middle East. The company offers people the opportunities to accelerate more rapidly than is possible elsewhere. They will continue to drive the greater dimension length to identify and recruit the very best person for every position. They are big enough to solve the client’s problems, yet substantial enough to sustain the loyalty, intimacy and culture that they all treasure and contributes greatly to the success. Figure 1.1.2 Organization Operations 1 Credit Card Fraud Detection Company Profile 1.2 Major Milestones The company is continuously involved in research about futuristic technologies and finding ways to simplify them for their clients. One of the projects was the world finalist in the international innovation challenge called MASTERPIECE in Dubai. It has been exhibited in NASSCOM Product Conclave and has received great appreciation from IT giants. The product has been patented bearing a patent number-201741034208. They have developed a women’s safety device which sends the location of the woman in distress to the nearby police station. This product won the best ICT category project award in a state level exhibit and was exhibited at NASSCOM PRODUCT CONCLVE 2017. Their other research work includes development of a device for blind which can recognize objects and convert it into speech. This innovation has a lot of potential in helping the blind people. Their other products include: Automation of production line and remote quality control monitoring system. Development of mobile app and website for sales of artistic and antique products. Development of an energy conservation system for paper machineries. 1.3 Structure of the Organization Technologics Global Pvt Ltd is a research and development center and educational institute based in Bangalore started by Mohammed Haneef. They are focused on providing quality education on latest technologies and develop products which are of great need to the society. They also involve distribution and sales of latest electronic innovation products developed all over the globe to our customers. They run a project consultancy where they undertake various projects from wide range of companies and assist them technically and build products and provide services to them. Specialties involve internet of things, research and development, skill development, hardware design, and innovation. Technologics Global Pvt Ltd assigns a group of 2-3 trainers for every class to overlook the training period and to assist the students regarding doubts. Dept. of CSE, SJCIT 2 2021-2022 Credit Card Fraud Detection Company Profile Figure 1.3 Organization Structure 1.4 Services Offered We offer a wide range of services, related to PLC, SCADA, Industrial Automation, Integrated Building Management System (IBMS), LabVIEW, ETAP, Mechanical Design (CAD / CAM / CAE), Oracle, JAVA JSP & Embedded systems for commercial, residential and Industrial sectors. Dept. of CSE, SJCIT 3 2021-2022 CHAPTER – 2 ABOUT THE DEPARTMENT 2.1 Specific Functionalities of the Department The department has around 18 members that specialize in a variety of fields including IOT, skill development, ML, AI, project consultancy and hardware design. I worked under the Machine Learning domain, which is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. 2.2 Process Adopted SDLC is a process followed for a software project, within a software organization. It consists of a detailed plan describing how to develop, maintain, replace and alter or enhance specific software. The life cycle defines a methodology for improving the quality of software and the overall development process. A SDLC process as following mentioned steps: Planning Defining Designing Testing Deployment Figure 2.2 Process adopted: SDLC 4 Credit Card Fraud Detection About the Department 2.3 Testing The various testing techniques used by the department can be summarized as follows: Functionality Testing of a Website: it is a process that includes several testing parameters like user interface, APIs, database testing, security testing, client and server testing and basic website functionalities. Functional testing is very convenient and it allows users to perform both manual and automated testing. It is performed to test the functionalities of each feature on the website. Usability Testing: This type of testing includes testing the site navigations and contents of the website. Interface Testing: Three areas to be tested here are Application, Web and Database Server. Database Testing: Database is one critical component of your web application and stress must be laid to test it thoroughly Testing activities will include Test if any errors are shown while executing queries, Data Integrity is maintained while creating, updating or deleting data in database, Check response time of queries and fine tune them, if necessary, Test data retrieved from your database is shown accurately in your web application. Compatibility testing: Compatibility tests ensures that your web application displays correctly across different devices. This would include-Browser Compatibility Test: Same website in different browsers will display differently. You need to test if your web application is being displayed correctly across browsers, JavaScript, AJAX and authentication is working fine. Pipeline testing: After compatibility testing it is the time to test all the micro- services in pipeline together to check their compatibility and message passing. Thus, all the services/functionalities are kept in pipeline and tested together. Afterwards whole pipeline is pushed in the deployment server. Dept. of CSE, SJCIT 5 2021-2022 Credit Card Fraud Detection About the Department 2.3 Structure of the Department The structure of the organization is descripted in the following figure: Figure 2.3 Department Structure Dept. of CSE, SJCIT 6 2021-2022 Credit Card Fraud Detection About the Department 2.4 Roles and Responsibilities of Individuals The different roles and responsibilities of individuals are: Project Manager: Project Managers play the lead role in planning, executing, monitoring, controlling, and closing projects. They're expected to deliver a project on time, within the budget, and brief while keeping everyone in the know and happy. Tech Leads: Technical Lead as the name states is solely responsible for leading a development team. This not easy. They have to lead a team. Technical Lead is the one who actually creates a technical vision in order to turn it into reality with the help of the team. HR Manager: The Human Resource Manager will lead and direct the routine functions of the Human Resources (HR) department including hiring and interviewing staff, administering pay, benefits, and leave, and enforcing company policies and practices. Senior Developer: Develops software solutions by studying information needs, conferring with users, studying systems flow, data usage, and work processes; investigating problem areas; and following the software development lifecycle. A senior developer may manage a team of developers and will be expected to encourage creativity and efficiency throughout complex digital projects. Due to the pressurised nature of the role, a robust and organized approach to the work is needed to produce the best solutions. Junior Developer: Junior Software Developers are entry-level software developers that assist the development team with all aspects of software design and coding. Their primary role is to learn the code base, attend design meetings, write basic code, fix bugs, and assist the Development Manager in all design-related tasks. Dept. of CSE, SJCIT 7 2021-2022 CHAPTER – 3 TASK PERFORMED 3.1 Introduction Every machine learning project begins by understanding what the data and drawing the objectives. While applying machine learning algorithms to the data set, we are understanding, building and analyzing the data as to get the end result. Following are the steps involved in creating a well-defined ML project: Understand and define the problem Prepare the data Explore and analyze the data Apply the algorithms Reduce the errors Predict the result To understand various machine learning algorithms let us use the credit card fraud detection dataset, one of the most famous datasets available. I intend to investigate, research and produce my findings via this submission. My goal for this project is to provide education within the field of machine learning so as to provide knowledge to those interested in understanding it in more detail. This project has been approached with the intention of researching all information of the credit card fraud detection dataset, write and run scripts to test the set within Python and then provide written support of my view and knowledge of the data set. 3.2 Problem Statement 'Fraud' in credit card transactions is unauthorized and unwanted usage of an account by someone other than the owner of that account. Necessary prevention measures can be taken to stop this abuse and the behaviour of such fraudulent practices can be studied to minimize it and protect against similar occurrences in the future. In other words, Credit Card Fraud can be defined as a case where a person uses someone else’s credit card for personal reasons while the owner and the card issuing authorities are unaware of the fact that the card is being used. Fraud detection involves monitoring the activities of populations of users in order to estimate, perceive or avoid objectionable behaviour, which consist of fraud, intrusion, and defaulting. 8 Credit Card Fraud Detection Task Performed Fraud detection methods are continuously developed to defend criminals in adapting to their fraudulent strategies. These frauds are classified as: Credit Card Frauds: Online and Offline Card Theft Account Bankruptcy Device Intrusion Application Fraud Counterfeit Card Telecommunication Fraud 3.2 Technology used Anaconda Python programming language Different Python libraries Dept. of CSE, SJCIT 9 2021-2022 CHAPTER – 4 REFLECTION NOTES 4.1 Experience The internship has been a really useful experience for me that I can learn a lot of new knowledge that will definitely be useful for my future study. I’m grateful that my assignments have a lot of variety instead of just focusing on a specific area. This allows me to be able to learn more and also challenge myself to overcome many different kinds of difficulties encountered during my internship. Having many assignments also required me to manage my work time efficiently prioritizing the urgent task. Some tasks require me to do research with less available online documentation other task requires me to make attempts on works that I have never experienced before just by learning from documentations. Although the task may be difficult and overwhelming sometimes, I’m really excited to push my skills to the limit and carry out those tasks assigned to me. Beside technical skills, I also observed and learned a lot of soft skills from my supervisors and my co-workers such as professional communication and team work. I have also learned a lot from my supervisor who’s always willing to help me when I face difficulties and also willing to share a lot of his knowledge and wisdom to me from his post experience. My internship experience has definitely improved my hard skills in IT and sharpen my soft skills a lot more than I expected I have shaped a better mind set in me and motivated me to keep on exploring and challenging myself in the world of information technology. 4.2 Technical Outcomes Learning the basics of AI and ML Understand a wide variety of learning algorithms Understand how to evaluate models generated from data. Apply, the algorithms to real problems Optimize the models learned and report on the expectancy accuracy that can be achieved by applying the models 10 Credit Card Fraud Detection Reflection Notes 4.2.1 System Requirement Specification Hardware Requirements PROCESSOR: Intel i5 RAM : 4GB HARD DISK : 16GB Software Requirements OPERATING SYSTEM : Linux/Windows BACK-END : Python 3 OTHER BACKEND LIBRARIES : matplotlib,pandas, numpy, sklearn 4.3 System Analysis and Design 4.3.1 Existing System • In existing systems, credit card fraud detection has been done using different ML algorithms. • The ML algorithm which has been used in the existing systems are local outlier factor, isolation forest algorithm and Support Vector Machine. 4.3.2 Disadvantages of the Existing System • Users data cannot be shared due to legal reasons and protection. • This idea is difficult to implement in real life because it requires the cooperation from banks. 4.3.3 Proposed System • A system is developed to detect the credit card fraud using ML algorithms. • The detection is done using different supervised machine learning algorithms namely local outlier factor and isolation forest algorithm. Dept. of CSE, SJCIT 11 2021-2022 Credit Card Fraud Detection Reflection Notes • The ensemble module in the sklearn package includes ensemble-based methods • It has functions for classification, regression and outlier detection. 4.3.4 Advantages of the Proposed System • More number of attributes are considered for detection. • Different algorithms are used to detect the fraud of credit cards independently. • Accuracy is also increased because of using different types of algorithms. 4.4 System Architecture 4.4.1 Data Flow Diagram Figure 4.4.1 Data Flow Diagram Dept. of CSE, SJCIT 12 2021-2022 Credit Card Fraud Detection Reflection Notes UML Diagram Figure 4.4.2 UML Diagram Use Case Diagram Figure 4.4.3 Use Case Diagram Dept. of CSE, SJCIT 13 2021-2022 Credit Card Fraud Detection Reflection Notes 4.5 Implementation We obtained our dataset from Kaggle, a data analysis website which provides datasets. It contains only numerical input variables which are the result of a PCA transformation. The features are Time, Class and Amount. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for exampledependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Libraries used : Figure 4.5.1 Libraries Datasets: Figure 4.5.2 datasets Dept. of CSE, SJCIT 14 2021-2022 Credit Card Fraud Detection Reflection Notes Transaction Class Distribution: Figure 4.5.3 Transaction Class Distribution Figure 4.5.4 Graph of Transaction Amount per Transactions by Class: Figure 4.5.5 Amount of Transaction Dept. of CSE, SJCIT 15 2021-2022 Credit Card Fraud Detection Reflection Notes Figure 4.5.6 Graph of Amount Transactions Correlation: Figure 4.5.7 Correlation Figure 4.5.8 Graph of Correlation Dept. of CSE, SJCIT 16 2021-2022 Credit Card Fraud Detection Reflection Notes Time of Transaction vs Amount by Class: Figure 4.5.9 Time of Transaction vs Amount by Class LOCAL OUTLIER FACTOR Figure 4.5.10 Outlier Detection Method Figure 4.5.11 Outlier Detection Method cont. Dept. of CSE, SJCIT 17 2021-2022 Credit Card Fraud Detection Reflection Notes 4.6 Screen Shots: Figure 4.6.1 Local Outlier Factor Figure 4.6.2 Isolation Forest Figure 4.6.3 Support Vector Machine Dept. of CSE, SJCIT 18 2021-2022 CHAPTER – 5 CONCLUSION This article has listed out the most common methods of fraud along with their detection methods and reviewed recent findings in this field. This paper has also explained in detail, how machine learning can be applied to get better results in fraud detection along with the algorithm, pseudocode, explanation its implementation and experimentation results. While the algorithm does reach over 99.6% accuracy, its precision remains only at 28% when a tenth of the data set is taken into consideration. However, when the entire dataset is fed into the algorithm, the precision rises to 33%. This high percentage of accuracy is to be expected due to the huge imbalance between the number of valid and number of genuine transactions. 19 BIBLIOGRAPHY [1] “Credit Card Fraud Detection Based on Transaction Behaviour -by John Richard D. Kho, Larry A. Vea” published by Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia, November 5-8, 2017. [2] CLIFTON PHUA1, VINCENT LEE1, KATE SMITH1 & ROSS GAYLER2 “ A Comprehensive Survey of Data Mining-based Fraud Detection Research” published by School of Business Systems, Faculty of Information Technology, Monash University, Wellington Road, Clayton, Victoria 3800, Australia. [3] “Survey Paper on Credit Card Fraud Detection by Suman” , Research Scholar, GJUS&T Hisar HCE, Sonepat published by International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3 Issue 3, March 2014. [4] “Research on Credit Card Fraud Detection Model Based on Distance Sum – by WenFang YU and Na Wang” published by 2009 International Joint Conference on Artificial Intelligence. [5] “Credit Card Fraud Detection through Parenclitic Network AnalysisBy Massimiliano Zanin, Miguel Romance, Regino Criado, and SantiagoMoral” published by Hindawi Complexity Volume 2018, Article ID 5764370, 9 pages. [6] “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy” published by IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 8, AUGUST 2018. [7] “Credit Card Fraud Detection-by Ishu Trivedi, Monika, Mrigya, Mridushi” published by International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 1, January 2016. [8] David J.Wetson,David J.Hand,M Adams,Whitrow and Piotr Jusczak “Plastic Card Fraud Detection using Peer Group Analysis” Springer, Issue 2008. 20