Uploaded by ARYAN TIWARI

Anaytical Case Competition Deck

advertisement
APPROACH FOR ANALYSIS
OVERALL APPROACH OF THE ANALYSIS
DATA-ANALYSIS
ABOUT LOS POLLOS
HERMANOS
Source: dataset analysis
1
EXPLORATORY DATA ANALYSIS (EDA)
INSIGHTS
RECOMMENDATIONS
13K
$68.7mn
10K
$7K
$519K
Over 13K
consumers in the
company
Total revenue
generated from
the order
Super Orders
made by
consumers
Amount refunded
on the order
placed by users
Revenue
generated from
amount of tips
Overview : OVERALL APPROACH
DATA-Visualization: The dashboard
created using Power BI was segregated
in 2 sections – Super & Non-super
Based on this visualization were made
Modeling
Click icon to access file
Exploratory Data Analysis was done considering
the most important variables from dataset
• Predicting Probabilities – Built a classificatory model
to predict the probabilities of customers ordering
using Superorder feature
• Conducted Interviews – We interviewed restaurant
managers to understand parameters that help them
segment & target users to support our insights
INSIGHTS
Machine Learning Models
•
Logistic regression,
KNN, SVM, Random
Forest, XGBoost
Deep Learning Model
•
Neural Networks
Model Loss & Accuracy
•
Model Accuracy
•
AUC-ROC
•
Classification Report
Consumer Churn
• Consumer Loyalty
& Retention
RFM Analysis
•
Customer
Lifetime Values
RECOMMENDATIONS
•
Forecast growth
EDA
•
•
Python & Excel
Power Bi & Tableau
•
Click here to access all the files
DATA-PREPROCESSING
CONDUCTED DATA PRE-PROCESSING AND SPLIT TESTING TO GENERATE BETTER INSIGHTS
1
MAJOR STEPS INVOLVED
Dropped statistically insignificant entries to conduct better analysis
2
BALANCING THE DATASET
Balanced the dataset (SuperOrders) to perform the analysis on all variables
Used Imblearn’s - Oversampling (RandomSampler) & Undersampling (N-v2)*
Overall performance of model better for undersampled dataset as displayed
Scaling numerical column decreases the spread & increases model efficiency
UNDERSAMPLING & OVERSAMPLING
TO BALANCE THE DATASET
Data
0’s
1’s
Total
Imputed Data
(Unbalanced)
3,605
695
4,300
Perform Label Encoding to convert ordinal data into Interval
Oversampled
Data
3,605
3,605
7,210
OLS logit regression to find statistically significant variables
Undersampled
Data
695
695
1,390
Variables with p-value < 5% are dropped from dataset
Data complexity is reduced by decreasing no. of variables
*condensed nearest neighbor, nearmiss v1, v2,
v3 were tried & nearmiss v2 performed best
EXPLORATORY DATA ANALYSIS
VISUALIZING THE DATA AND HIGHLIGHTING THE STRIKING INSIGHTS
1
SUPERORDER ANALYSIS
Chart displaying SuperOrders by Date Ordered
2
ORDERS FROM STATES
Chord Chart revealing Delivery Region to Order Date
3
ORDER TOTAL ANALYSIS
Tree Map revealing Order Total of Consumers
MOST SUPERODERS WERE MADE IN
THE MIDDEL OF THE MONTH
MOST FREQUENT ORDERS MADE BY
CONSUMERS IN BENGALURU
HIGHEST ORDER MADE DURING MIDDLE
OF THE MONTH BY CONSUMERS
• An exploratory data analysis revealed that
highest percentage of orders were SuperOrders
during end of the month
• To increase orders and generate more revenue
company must grow in Mumbai & Delhi
• The tree map above shows Order Total in
accordance with the Date Order was made
• High number of alternatives and good
competition a probable reason for low reach in
Delhi & Mumbai
• Each branch reveals more revenue generated
from SuperOrders & majority being SuperOrders
in comparison to Non-SuperOrders
• Company’s has high percentage of SuperOrders,
far higher than the industry standards
Download