Uploaded by Sandy Cheeks

Market Prediction analysis

advertisement
Table of Content
S no.
Topic
1.
Certificate
2.
Acknowledgements
3.
Abstract
4.
List Of Figures
5.
List Of Tables
6.
Synopsis
7.
CHAPTER 1 - INTRODUCTION
1.1 Description of the topic
1.2 Problem Statement
1.3 Objective
1.4 Scope of Project
1.5 Data Collection
8.
CHAPTER 2 – LITERATURE REVIEW
9.
CHAPTER 3 – SYSTEM DESIGN AND METHODOLOGY
3.1 System Design
3.2 Algorithm Used
10.
CHAPTER 4 - IMPLEMENTATION & RESULT
4.1 Hardware and Software Requirement
4.2 Implementation Details
4.3 Result
11.
CHAPTER 5 – CONCLUSION AND FUTURE WORK
5.1 Conclusion
5.2 Future Scope
12.
References
Page No.
Certificate
I Krishna Gupta (08013702021) certify that the Summer Training Report (BCA) entitled “ Predictive Analysis of Fixed Deposit User Engagement using Machine
Learning and Data Science Tools ”is done by me and it is an authentic work
carried out by me at “Internshala Training”. The matter embodied in this project
work has not been submitted earlier for the award of any degree or diploma to the
best of my knowledge and belief.
Signature of the Student
Date:
Certified that the Project Report (BCA-356) entitled "Insurance Claim
Prediction" done by the above student is completed under my guidance.
Signature of the Guide:
Date:
Signature of the Guide:
Date:
Name of the Guide: Ms. Anjum Rathi
Name of the Guide: Ms. Suman Singh
Designation: Assistant Professor
Designation: Assistant Professor
Counter sign HOD-Computer Science
ACKNOWLEDGEMENT
I would like to acknowledge the contributions of the people without whose help and guidance
this report would not have been completed.
I acknowledge the counsel and support of our training guide Ms. Suman Singh, Assistant
Professor, CS Department, with respect and gratitude, whose expertise guidance, support,
encouragement, and enthusiasm has made this report possible.
Their feedback vastly improved the quality of this report and provided an enthralling
experience. I am indeed fortunate to be supported by her.
I am also thankful to Prof. (Dr.) Ganesh Wadhwani, H.O.D of Computer Science Department,
Institute of Technology & Management, and New Delhi for his constant encouragement,
valuable suggestions and moral support and blessings. I shall ever remain indebted to the
faculty members of Institute of Technology & Management, New Delhi for their persistent
support and cooperation extended during this work.
This acknowledgement will remain incomplete if I fail to express our deep sense of obligation
to my parents and God for their consistent blessings and encouragement.
Krishna Gupta
0807137020201
Abstract
In today's dynamic financial landscape, where banks and financial institutions strive to
maintain and expand their customer base, understanding and enhancing user engagement has
become imperative. This project endeavors to harness the power of machine learning and data
science tools to predict and optimize user engagement in fixed deposit services, a critical
offering in the realm of financial products.
Fixed deposits represent a long-term financial commitment for customers, and predicting their
engagement patterns is a multifaceted challenge. Leveraging advanced analytics techniques,
this project aims to decipher the factors influencing user engagement, such as customer
demographics, transaction history, and previous interactions with fixed deposit accounts.
The project's objectives encompass the entire data science lifecycle. It commences with data
collection and preprocessing, followed by feature engineering to extract meaningful insights.
Machine learning models, including regression, classification, and clustering algorithms, are
deployed to forecast user engagement. Moreover, customer segmentation techniques are
applied to tailor marketing strategies, and a recommendation system is devised to offer
personalized fixed deposit options.
Through rigorous performance evaluation and visualization, the project illuminates the inner
workings of predictive models, making them interpretable for stakeholders. The results offer
critical insights into user behavior and preferences, enabling banks to adapt their strategies,
enhance customer satisfaction, and optimize their product offerings.
This project underscores the transformative potential of data-driven decision-making in the
banking sector. By predicting customer engagement in fixed deposit services, financial
institutions can not only retain existing customers but also attract new ones, fostering
sustainable growth and competitiveness in a volatile financial landscape.
LIST OF FIGURES
Chapter-1:
1.1 Pert Chart: Describing the work flow of the project.
Chapter-3:
Fig 3.1 Figure explaining the process of Logistic Regression
Chapter-4:
Figure 4: 1 we are importing libraries which will be used in our project.
Figure 4: 2 Loading the data and converting it into data frame so as to perform operations.
Figure 4: 3 Features in Data set
Figure 4: 4 Overview of dataset
Figure 4: 5 Print data types for each variable
Figure 4: 6 Describing Data set
Figure 4:7 Missing values in the dataset.
Figure 4: 8 Split Data into training and validation set
Figure 4:9 Making prediction on the validation set
Figure 4: 10 checking the accuracy score
Figure 4.11 Heat map to show the correlation between different features of dataset
Figure 4.12 Image showing of Bar Plot for the No of users took FD
Figure 4.13 Image showing client occupation that are taking FD
Figure 4.14 Image showing most of the clients fall in which age group
Figure 4.15 Pie chart showing no. user said yes to subscription
Figure 4.16 Pair Plot
LIST OF TABLES
Chapter-1:
1.1 Responsibility wise work distribution
Chapter-2:
2.1 Showing the related work of this field.
Chapter-4:
4.1 Showing the Hardware and Software requirement of the system developed
4.2 Correlation table
4.3 Comparison between the different machine learning models used in the project
LIST OF ABBREVATIONS: None.
Synopsis
1. Title of the project:
“Predictive Analysis of Fixed Deposit User Engagement using Machine Learning and
Data Science Tools”.
2. Statement about the problem:
Fixed deposits (FDs) are a fundamental financial product offered by banks, often
characterized by long-term commitments. To maintain and grow their customer base,
banks need to continuously assess and improve the user experience for FD customers.
Predictive analysis can play a pivotal role in achieving this by identifying patterns and
trends in customer behavior..
3. Significance of the project:
• The project promotes data-driven decision-making, enabling the bank to tailor
its strategies based on user behavior and preferences.
• The project's significance lies in its potential to drive revenue, improve
customer satisfaction, and enable cost-effective marketing through data-driven
insights and predictions, ultimately benefiting both the bank and its customers.
4. Objective:
The objective of the project is to create a data driven model that can accurately predict
the Fixed Deposits.
5. Scope: The project will Predictive Analysis of Fixed Deposit User Engagement using
Machine Learning & Data Science, signifying the factors such as:
1. Collecting and preparing data on fixed deposit users.
2. Selecting influential features and engineering them.
3. Training machine learning models for user engagement prediction.
4. Validating and evaluating model performance.
5. Using the model to predict potential fixed deposit users and gain insights into
user behavior.
6. H/W and S/W specifications:
• Hardware
• Windows 7 or higher
• 256 mb required RAM
• intel Pentium or above.
• Software
• Python
• Jupyter Notebook
• Libraries: Matplotlib, Seaborn, Pandas, Scikit-Learn.
7. Data collection and Methodology:
•
•
Data collection was simply done from Kaggle dataset website.
Methodologies include:
o Data preprocessing
o Feature engineering
o Data modelling.
8. Algorithm:
This project uses a Logistic Regression due to:
• Its high precision results.
• Low effect of outliers on this model.
9. Limitations and Constraints: Limitations of this projects include.
• Not removing outliers from dataset since we used Logistic Regression that has
low effect on the modal training.
• No consideration of traffic patterns on the way to destination and Weather
situation since the data was not present and also requires complex methods to
operate on.
10. Conclusion and Future Scope:
The "Predictive Analysis of Fixed Deposit User Engagement using Machine
Learning and Data Science Tools" project endeavors to harness the power of
data-driven insights to improve user engagement in fixed deposit services.
Through data analysis, predictive modeling, and personalized
recommendations, the project aims to enhance customer satisfaction and drive
business growth in the financial sector.
11. References and Bibliography:
https://www.kaggle.com/datasets/bankindiauser/8756123
https://www.sciencedirect.com/science/article/abs/pii/S0927538X17303037
https://www.sciencedirect.com/science/article/abs/pii/S0378426619301025
https://www.iasj.net/iasj/article/263875
https://journal.formosapublisher.org/index.php/eajmr/article/view/2524
https://ieeexplore.ieee.org/abstract/document/10080695/authors#authors
CHAPTER-1 INTRODUCTION
1.1 Description of the topic:
A sector that plays a very significant part in the Commercial and Economic backdrop of any
country is the banking sector. Data Mining technique can play a key role in providing different
methods to analyse data and to find useful patterns and to extract knowledge in this sector
(Vajiramedhin and Suebsing; 2014). Data mining helps in the extraction of useful information
from the data (Turban et al.; 2011). According to (Venkatesh and Jacob; 2016), machine
learning has more capability to gather information from the data, which results in more frequent
use of data mining methods in the banking sector. Due to a large amount of data gathered in
banks, data warehouses are required to store these data. Analysing and identifying patterns
from such data can be useful for Banks 1 to identify trends and acquire knowledge from these
data. By the acquired knowledge from these data, organizations can more clearly understand
their customers and improve the services they provide. The topic, "Predictive Analysis of Fixed
Deposit User Engagement using Machine Learning and Data Science Tools," focuses on
leveraging data science and machine learning techniques to forecast user engagement in fixed
deposit products. This research aims to develop predictive models that can identify potential
fixed deposit customers and enhance financial institutions' marketing and product offerings.
By analyzing user behavior and employing advanced algorithms, this project seeks to provide
insights into customer preferences, ultimately improving engagement and investment in fixed
deposits.
1.2 Problem Statement:
The client, a retail bank heavily reliant on term deposits, seeks to optimize their marketing
efforts. Term deposits involve cash investments for a fixed period at an agreed-upon interest
rate. The bank employs various outreach methods, including email, ads, telephonic, and digital
marketing, with telephonic campaigns being particularly effective but expensive.
To make telephonic marketing cost-effective, the goal is to identify potential customers likely
to subscribe to term deposits in advance. Client data such as age, job type, marital status, and
call details (e.g., call duration, day, and month) are available. The task is to predict whether a
client will subscribe to a term deposit based on this data. This predictive model will enable
targeted call outreach, enhancing the efficiency and success of marketing campaigns.
1.3 Objectives:
The objective of the topic, "Predictive Analysis of Fixed Deposit User Engagement using
Machine Learning and Data Science Tools," is to develop predictive models that can accurately
forecast user engagement with fixed deposit products. This involves leveraging machine
learning algorithms and data science tools to analyze user behavior, identify potential
customers, and optimize marketing and product strategies for financial institutions. Ultimately,
the goal is to enhance customer engagement and promote investment in fixed deposits through
data-driven insights and predictions.
1.3 Scope of the Project:
1. Data Collection and Preparation:
• Gather relevant data on fixed deposit user behavior, demographics, and
engagement metrics.
• Clean and preprocess the data, addressing missing values and outliers.
2. Feature Engineering:
• Identify influential features affecting user engagement.
• Create new features or transform existing ones to improve predictive accuracy.
3. Model Selection and Training:
• Choose appropriate machine learning algorithms, including logistic regression.
• Train and fine-tune models on historical data to predict user engagement.
4. Model Evaluation and Validation:
• Assess model performance using metrics like accuracy, precision, and recall.
• Implement cross-validation techniques to ensure robustness.
5. Predictive Analysis:
• Apply the trained model to new data to predict potential fixed deposit users.
• Generate insights into user preferences and behavior.
6. Interpretation of Results:
• Understand the significance of individual features in predicting user
engagement.
• Conduct feature importance analysis to identify key drivers.
7. Model Deployment:
• Create a user-friendly interface for real-time or batch predictions.
• Integrate the predictive model into the bank's marketing strategies.
8. Monitoring and Maintenance:
• Establish protocols for continuous model monitoring.
• Update the model as needed to adapt to evolving user behavior and market
trends.
9. Ethical and Regulatory Compliance:
• Ensure data usage and model deployment align with privacy and regulatory
standards.
• Address potential bias or discrimination in predictions.
10. Documentation and Recommendations:
• Compile comprehensive documentation covering data sources, preprocessing,
modeling, and results.
• Provide actionable recommendations for optimizing the bank's marketing and
engagement strategies based on predictive insights.
11. This project aims to leverage data science and machine learning tools to enhance user
engagement with fixed deposits, ultimately benefiting the retail banking institution.
1.5 Project planning Activities:
1.5.1 Team-Member wise work distribution table
Team consists of two members Piyush Goel and Krishna Gupta and the work is
distributed in the following manner.
Table 1.1 Responsibility wise work distribution
Team Member
Piyush
Goel
Krishna Gupta
Krishna Gupta
Piyush Goel
Role/Responsibility
& Project Data Collection
ML model selection
Graphic Designing
Task/Contributions
Data
Collection
and
Preprocessing
Model
Development
and
Training
Preparation of Presentation on
Canva.
The data collection was done with the agreement of both the team members and found
by the both from Kaggle website.
The preprocessing part was done with the help of both team members.
Krishna Gupta:
๏‚ง Focused on developing the predictive model for delivery time estimation.
Piyush Goel:
๏‚ง
๏‚ง
Created presentation on the project. Using Canva graphic designing tool
creation of presentation.
Utilized Python, scikit-learn for model development and training.
1.5.2 PERT Chart
Fig. 1.1 Pert Chart: Describing the work flow of the whole project.
CHAPTER 2 – LITERATURE REVIEW
2.1 SUMMARY OF PAPER STUDIES
The objective of the paper (Nazar, 2023-02-28) is to investigate the influence of investments
in fixed deposits and real estate on the credit rating of Iraq's National Insurance Company. It
addresses the timely and significant issue of credit ratings within Iraq's insurance market,
emphasizing its relevance. This research adopts a comprehensive approach, combining
theoretical insights with practical data derived from extensive records of the National Insurance
Company spanning from 2009 to 2020, complemented by personal interviews through field
visits. Statistical methodologies are applied for rigorous data analysis, ensuring a robust
examination. An essential discovery is the notable absence of credit ratings for both the
National Insurance Company and other insurance firms in Iraq, whether at the local or
international level. To address this gap, the research proposes the establishment of a specialized
credit rating institution within Iraq, fostering collaboration with key entities like the Central
Bank of Iraq, the Ministry of Finance, and the Insurance Bureau. This approach is envisioned
to be cost-effective when compared to relying solely on international credit rating agencies,
offering a tailored solution for Iraq's insurance industry.
The objective of the paper research (Abhishek Rawat (KIT), 2023-7-12)presents a noteworthy
implementation—a secure fixed deposit system powered by smart contracts, meticulously
developed within the Remix IDE. Notably, this system capitalizes on the Ethereum blockchain,
ensuring an exceptionally high level of transparency, data security, and immutability—critical
attributes for financial applications. The core of this innovation is a smart contract coded in
Solidity, a robust programming language, and developed in the Remix Integrated Development
Environment. This contract empowers the system with a broad spectrum of functionalities,
allowing the creation of new fixed deposit accounts, facilitating seamless fund deposits and
withdrawals, automating interest calculations, and ensuring timely user notifications.
Consequently, the research envisions this framework as a catalyst for financial institutions,
greatly enhancing their operational efficiency and user experience through automation and
heightened security.
The objective of the paper research (Aditya Bodhankar, 2023) underscores the paramount
importance of marketing in business growth and improvement. It highlights the significance of
direct marketing campaigns in achieving specific business goals and the use of various
communication channels, including telephones, social media, and digital marketing, to reach
both local and distant clients. Recognizing the universal need for marketing, the abstract zooms
in on the banking sector, stressing the critical role of marketing analysis, particularly in loan
approval, insurance policies, and fixed deposits. Within this sector, banks employ targeted
strategies based on customer data, including transaction history. The study's core focus is the
analytical approach it adopts, deploying Bayesian Logistic Regression to delve into the
sanctioning of Bank Fixed Term Deposits. It's worth noting that customer eligibility for loans,
fixed deposits, or insurance hinges on a comprehensive analysis, which considers factors such
as transaction history and loan repayment punctuality, underlining the intricate nature of
customer decisions in the financial landscape.
The study by Frankfurt School of Finance and Management Deutsche Bundesbank,
Division Securities and Money Market Statistics (Willer, 2019-12-25) delves into the
implications of recent regulatory proposals, notably the European Deposit Insurance Scheme,
which seek to reshape deposit insurance systems. To evaluate the potential impacts of these
changes, it becomes crucial to discern the factors guiding depositors' decisions to withdraw or
shift their funds. Remarkably, the research introduces a novel insight: Google searches related
to 'deposit insurance' and similar terms can serve as indicative markers of depositors'
apprehensions and anxieties. These online search patterns effectively capture the sentiments
and concerns of depositors.
2.2 Integrated Summary of Literature Studied
The related field of study and the literature written are explained in the following table that
shows the related research paper and the different model used in them with the accuracy or
the error decrement using different models.
Table 2.1 Related work on Predictive Analysis of Fixed Deposit
Ref.
(Nazar, 2023-0228)
Model
•
•
•
Dataset
Logistic
Regression
(81%)
Random Forest
(88%)
SVM (91%)
Cryptocu
rrency
Dataset
Finding and Limitation
•
•
•
(Willer, 2019-1225)
•
•
•
Logistic
Regression
(84%)
SVM (86%)
Deep Learning
(88.8%)
Share
market
Dataset
•
•
•
Secure Fixed Deposit System: The
research demonstrates the successful
implementation of a secure fixed
deposit system using smart contracts in
Remix IDE.
Blockchain-Based Solution: This
system leverages Ethereum blockchain
technology, ensuring high levels of
transparency, data security, and
immutability.
Comprehensive Functionality: The
framework supports critical fixed
deposit operations, including account
creation, fund deposits, withdrawals,
interest calculations, and user
notifications
The findings highlighted in the abstract
emphasize the prevailing investor
sentiments and preferences in India
regarding investment choices. Indian
investors often perceive all investment
approaches as carrying inherent risks.
The findings from this research can
provide valuable insights into investor
preferences and risk appetites in India,
which can inform investment strategies
and financial planning in the region.
Respondent may not be 100% truthful
with their answer.
•
(Abhishek Rawat
(KIT), 2023-7-12)
•
•
•
(Aditya
Bodhankar, 2023)
•
•
•
German
SVM (86%)
Bank
Deep Learning
Dataset
(88.8%)
XGboost (87.9%)
•
South
India
Bank
Dataset
•
Logistic
Regression
(80%)
SVM (82%)
Decision Tree
(88.8%)
There is no way of checking
misinterpretation and unintelligible
replies by the respondent.
A heterogeneous insurance of deposits
can lead to a sudden, fear-induced
reallocation of deposits endangering the
stability of the banking sector even in
absence of redenomination risks.
it is found that the middle-income group
tend to invest in the traditional form
along with the emerging avenues to
balance the risk.
CHAPTER-3 SYSTEM DESIGN AND METHODOLOGY
3.1 System Design:
3.1.1 Introduction to System Design
For an industry providing logistics services it is crucial to keep track of an orders’ dispatch and
delivery time. This creates a belief of satisfaction in customers’ mind. The system design is
crucial for accurately estimating delivery times in the context of the Porter service.
3.1.2 System Architecture:
The high-level architecture of your delivery time estimation system. Include the following
elements:
•
•
•
Components: CSV file containing order delivery data as data sources, data
preprocessing, data visualization and data modeling.
Data Flow: The data is taken from the source csv file that is imported using python
pandas library and then used for the modelling.
Technology Stack: This project uses Python libraries such as Pandas, Seaborn,
Matplotlib, Sci-Kit Learn, Machine learning.
3.1.3 Data Sources:
The data source consists of a downloaded CSV file that is available on Kaggle website given
in reference (Link: https://www.kaggle.com/datasets/ranitsarkar01/porter-delivery-timeestimation-dataset).
3.1.4 Data Preprocessing:
Data preprocessing steps that were followed to prepare the data for modeling are as follows.
•
•
•
Data cleaning
Feature Engineering
• Feature Transformation
Handling Missing, Null, Duplicate values.
3.1.5 Modelling:
The machine learning models we employed for estimating delivery time was Logistic
Regression. This algorithm was used because of numerical target value as well as low effect of
outliers on its result, though the data source we used contained almost not outliers considering
larger no number of data we had.
Since we were using python for machine learning algorithm we used sci-kit learn library for
model selection.
3.1.6 Challenges and Trade-offs:
The only challenge we encountered was while removing the outliers as the data was already
clean enough and thus removing the outliers leaning the model toward biasness thus we had to
make a trade off with the outliers and didn’t remove them.
3.1.7 Conclusion:
For estimation/prediction of accurate time values that are generally numerical it requires
Regression Algorithm and to use the best suited algorithm we applied Logistic Regression. It
is one of the best algorithms for regression problem. It gave high accuracy than other model
that we used (Linear regression). It also reduced the error figures by significant margin.
Providing with best results.
3.2 Algorithm Used
There are a lot of algorithms available in machine learning models but choosing one of them
to address a particular problem is totally dependent on the type of problem or the target values
of your data. Choosing an accurate model is crucial part of a data science project. So, we did
in our project.
After carefully analyzing the all the aspects of our dataset we chose Logistic Regression
algorithm to be used in our project.
First, we used Linear Regression model and the results we get were considerably good. With
different combinations of Test size and Random state we could only achieve a highest score of
81.39 %.
3.2.1 Logistic Regression
This type of statistical model (also known as logit model) is often used for classification and
predictive analytics. Logistic regression estimates the probability of an event occurring, such
as voted or didn’t vote, based on a given dataset of independent variables. Since the outcome
is a probability, the dependent variable is bounded between 0 and 1. In logistic regression, a
logit transformation is applied on the odds—that is, the probability of success divided by the
probability of failure. This is also commonly known as the log odds, or the natural logarithm
of odds, and this logistic function is represented by the following formulas:
1
๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ(๐‘๐‘๐‘๐‘) = 1+exp (−๐‘๐‘๐‘๐‘)
(1)
๐‘๐‘๐‘๐‘
ln ๏ฟฝ(1−๐‘๐‘๐‘๐‘)๏ฟฝ = Beta_0 + Beta_1 ∗ X_1 + … + B_k ∗ K_k (2)
Fig 3.5 Figure explaining the process of Logistic Regression
CHAPTER-4 IMPLEMENTATION AND RESUTLS
4.1Hardware and Software Requirements
Table 4.1 Showing the Hardware and Software requirement of the system developed
Hardware
Software
• i511th Gen, 8Gb RAM
• Any window-based operating
system (Windows 11).
• Screen resolution of at least 800
x 600 required for proper and
• Word pad or Microsoft Word
complete viewing of screens.
• Python
Language,
jupyter
Higher resolution would not be a
notebook
problem.
• Scklearn learn
4.2 Implementation
4.2.1 Data Collection
Data collection in data science refers to the process of gathering, acquiring, and recording data
from various sources for the purpose of analysis, interpretation, and decision-making. This
dataset is extracted from (Kaggle)and our dataset is “Fixed Deposit Data” in which we past
data is available of users doing FD. So, I have considered a labelled dataset for applying
supervised machine learning technique.
Figure 4: 1 we are importing libraries which will be used in our project.
Figure 4: 2 Loading the data and converting it into data frame so as to perform operations.
Figure 4: 3 Features in Data set
Figure 4: 4 Overview of dataset
Figure 4: 5 Print data types for each variable
We can see there are two format of data types:
1. Object: Object format means variables are categorical. Categorical variables in our dataset are:
job, marital, education, default, housing, loan, contact, month, poutcome, subscribed
2. int64: It represents the integer variables. Integer variables in our dataset are: ID, age, balance, day,
duration, campaign, pdays, previous
4.2.2 Data Preparation
Data preparation, also known as data preprocessing or data cleaning, is a crucial step in the
data science workflow. It involves transforming raw data from various sources into a format
that is suitable for analysis, modeling, and machine learning. We prepare raw data in this phase
so that meaningful insights can be extracted from it.
Figure 4: 6 Describing Data set
Figure 4:7 Missing values in the dataset.
There are no missing values in the train dataset.
Next, we will start to build our predictive model to predict whether a client will subscribe to a
term deposit or not.
As the sklearn models takes only numerical input, we will convert the categorical variables
into numerical values using dummies. We will remove the ID variables as they are unique
values and then apply dummies. We will also remove the target variable and keep it in a
separate variable.
Figure 4: 8 Split Data into training and validation set
4.2.3 Model Building
Model building in data science is the process of creating predictive or descriptive models from
data to make informed decisions, solve problems, or gain insights. In this phase, we choose the
appropriate model as per our data, then we divide them into two parts: values (independent)
and target (dependent). Then we further divide them into training and testing so as to calculate
accuracy or score of our data.
In this project, we used this algorithm because we don't have categorical values and we have
structural data so we pursued this algorithm. Also, it reduces overfitting by averaging multiple
decision trees and is less sensitive to noise and outliers in the data.
Figure 4:9 Making prediction on the validation set
Figure 4: 10 checking the accuracy score
We got an accuracy score of around 90% on the validation dataset. Logistic regression has a
linear decision boundary. What if our data have non linearity? We need a model that can
capture this non linearity.
4.3 Results
We conclude that the time estimated by our model is slightly more accurate as compared to the
average time of the data. Also, there is no perfect project i.e there is always a scope of
improvement. This is one of the suitable algorithm as per us which we thought is best suited
for this dataset.
DATA VISUALIZATION USING MATPLOTLIB & SEABORN LIBRARY
Heatmap to see the correlation between the attributes:
We can infer that duration of the call is highly correlated with the target variable. This can be
verified as well. As the duration of the call is more, there are higher chances that the client is
showing interest in the term deposit and hence there are higher chances that the client will
subscribe to term deposit.
Fig 4.11 Heatmap to show the correla๏ฟฝon between di๏ฌ€erent features of dataset
Bar Plot
So, 3715 users out of total 31647 have subscribed which is around 12%. now exploring the variables to
have a better understanding of the dataset.
Fig 4.12 Image showing of Bar Plot for the No of users took FD
We see that most of the clients belongs to blue-collar job and the students are least in number
as students generally do not take a term deposit
Fig 4.13 Image showing client occupa๏ฟฝon that are taking FD
Displot
We can infer that most of the clients fall in the age group between 20-60.
Fig 4.14 Image showing most of the clients fall in which age group
Pie charts
Fig 4.15 showing no. user said yes to subscription
Fig 4.16 Pair Plot
Figure 4.16 Correlation in Train dataset
Comparison tables of the models that we used in our project:
Table 4.3 Comparison between the model used in the project
Decision Tree
Logistic Regression
Test Size
11162
200
Test Size
31674
200
Accuracy
87.02%
92.01%
Accuracy
88.44%
90.34%
CHAPTER-5 CONCLUSION AND FUTURE WORK
The project's conclusion highlights the key outcomes and insights derived from using logistic
regression to predict the users who would take fixed deposits (FDs) with an impressive
accuracy score of approximately 90%. Here's a concise conclusion for your project: This
project successfully employed logistic regression as a predictive modeling technique to identify
potential users who are likely to opt for fixed deposits (FDs). The model achieved a remarkable
accuracy score of approximately 90%, demonstrating its effectiveness in making accurate
predictions. Although, it achieved its objective to accurately predict the users who would take
fixed deposits (FDs), not considering the real time changes in policy and many other factor
makes it limited to the features/factors present in the dataset. In summary, the project
underscores the power of data science and machine learning in improving decision-making
processes within the financial sector. The logistic regression model's high accuracy signifies its
utility in identifying potential FD users, contributing to more effective financial planning and
customer engagement strategies.
In the near future, there are several promising avenues for further enhancing the project's
predictive capabilities. These include refining the dataset with additional relevant features,
considering alternative machine learning algorithms to optimize accuracy, and updating the
data to maintain relevance in a dynamic market. Additionally, exploring time series analysis to
capture temporal patterns, implementing customer segmentation for personalized strategies,
and assessing the risk profile of fixed deposit users are important areas for development. The
project can also benefit from real-time prediction capabilities, incorporating user feedback to
refine predictions, and addressing ethical considerations in data usage. Creating a user-friendly
deployment interface and establishing ongoing monitoring and maintenance protocols are
crucial for long-term success. Finally, considering market expansion possibilities and
collaborating with industry experts can help further refine and extend the project's impact in
the financial sector.
References
1. Abhishek Rawat (KIT), H. U. (2023-7-12). mplementation of Algorithms for Fixed Deposit
Using Smart Contract .
2. Aditya Bodhankar, D. P. (2023). Bank Fixed Term Deposit analysis using Bayesian Logistic
Regression.
3. Nazar, A. (2023-02-28). Analyzing the risks of fixed deposit investments and real estate and
their impact on enhancing the credit rating of insurance companies.
4. Willer, F. (2019-12-25). Fear, deposit insurance schemes, and deposit reallocation in the
German banking system.
5. https://www.kaggle.com/datasets/akaretirastogi897/785268
6. https://www.sciencedirect.com/science/article/abs/pii/S0927538X17303037
7. https://www.sciencedirect.com/science/article/abs/pii/S0378426619301025
8. https://www.iasj.net/iasj/article/263875
9. https://journal.formosapublisher.org/index.php/eajmr/article/view/2524
10. https://ieeexplore.ieee.org/abstract/document/10080695/authors#authors
Download