Uploaded by lijinze0430

Final Project Example

advertisement
Will a Product
Sell Well with
Wish?
What is Wish?
100M
Monthly
Active
Users
Global E-Commerce Platform & Most Downloaded
Mobile Shopping Application Worldwide
Direct-to-Consumer Service connecting buyers to
(predominantly) low-cost, Asian manufacturers –
removing overhead and markups of traditional retailers
Business Model: Wish generates revenue by
retaining a percentage of merchant sales
Tradeoff: Lower Prices but Longer Delivery Lead
Times than other E-Commerce Retailers
2M+
Products
Sold Daily
500K+
Merchants
150M+
Items for
Sale
Merchant Options on Wish
How can sellers boost sales volumes on the e-commerce platform?
Keep Prices Low
Add Sizes & Colors
Add Extra Images
Upload High-Quality Images
Ship Worldwide
Merchant Options on Wish
How can sellers boost sales volumes on the e-commerce platform?
Wish Express
• Program offering express shipping to customers
• Available for products that can meet orders delivering deadlines per designated region
• Merchants receive added benefits such as more product impressions
ProductBoost
• Using Wish’s proprietary algorithms, ProductBoost increases the exposure of products in front of the
customers who are most likely to buy them
• Boosted products move higher up within product rankings if they are relevant to Wish customers
• Optimizes cost of impressions and promotes merchants' best products
Wish Promotions
• Verified by Wish – Products that receive exceptional customer feedback
• CollectionBoost – Allows merchants to promote products in customized collections
• Promoted Products – Wish products selected due to relevant audience, conversion, and popularity
Background Information
In order to evaluate what factors affect product sales, data was extracted from the Wish platform on its 2020 summer
items and stored in a ‘summer product sales.csv 1’ file to be analyzed.
Data Info
2020 summer sales on
Whish Platform
1,573 data points
Key Attributes
Product Info
Sales Records
Data Types
Object
Float 64
Ratings & Badges
Over 40 different attributes
Shipping Details
Merchant Info
1 Summer
Product Sales.csv data file provided by Professor Jie Li in K513 Class
Int64
Data Cleansing
Upon analysis of the raw data, the team identified several data cleansing activities that needed to take place prior to
processing the data through our predictive models.
Data Issues Identified
Unnecessary Descriptive data
(e.g. Title, Currency, Product URLs…etc.)
Duplicative data rows
(e.g. duplicative product ids)
Missing data elements
(e.g. product attributes)
Inconsistent values
(e.g. Differing product sizes)
Data Corrections Applied
Dropped descriptive data
columns that did not help in
the model
Filled in missing data
elements in certain columns
Removed rows with missing
critical data elements (e.g.
Total Rating Count)
Mapped inconsistent data
elements to a generalized
buckets or dummy variables
Finalized Dataset
29 Numerical Variables
16 Categorical Variables
Target Variable:
o Unit Sales
o Range of Unit Sales
4 Binned Unit Sales Ranges:
o
o
o
o
< 100
100 – 1000
1000 – 5000
5000+
Numerical Variable Highlights
Price Distribution
Price
most product prices range
from 0-10 with a mean of
8.46
Rating Distribution
Rating
Majority of the products were
rated between 0-4 with a mean
of 3.81
Shipping Countries Distribution
Countries shipped to
Most products were sold to 3243 different countries with a
mean of 40.62
Numerical Variable Analysis
A correlation matrix analysis between the numerical variables
in the dataset highlights potential factors into total units sold
• Rating Counts– Total rating counts for a product
had a 0.90 correlation to units sold. (High
Correlation)
• 5-Star Rating Counts – Number of 5-star ratings for
a product had a 0.88 correlation to units sold (High
Correlation)
• 1-Star Rating Counts – Number of 1-star ratings for
a product had a 0.83 correlation to units sold (High
Correlation)
Correlation Matrix
Categorical Variable Highlights
Yes
No
Upon investigation of categorical variables, the team summarized several key data below
% uses ad boosts
% from local
% has picture
% receives consistent
good evaluations
% has multi color
% has fashion tag
Categorical Variable Analysis
Graphical compares between the values in the categorical
variables helped identify potential drivers for units sold
• Merchant_has_profile_picture
• Color_multicolor
• Badge_Product_Quality
• Color_Blue/Black/Purple
• Tag_Fashion
Other variables did not display as much variance between
‘Yes’ and ‘No’ data. (e.g ad_boosts, shipping_is_express,
color_green, badge_fast_shipping)
Overview of Models
We built the following models to find the best fit to the data set. Selections are based on data conditions, and model
implementation and interpretation needs.
Classification Model
Multiple Regression Model
Unit Sales Ranges
Unit Sold
Model
Benefits
Linear Regression
KNN
Logistic Regression
• Predict the target
as 𝑦𝑦=𝑤𝑤^𝑇𝑇 𝑥𝑥=𝑤𝑤_0+𝑤𝑤_1
𝑥𝑥_1+⋯+𝑤𝑤_𝑛𝑛 𝑥𝑥_𝑛𝑛
• Learn by minimizing sum of
squared errors
• Predict the new instance
belongs to the majority class
of its K nearest neighbors.
• Instance-based model
• Calculate the probability of a new
instance belonging to a certain
class. If the probability > cut-off,
the model belongs to that class
• Learn by minimizing logistic losses
+ penalty on large weights
 Easier to interpret the output
coefficients
 Simple to implement
 No assumption on the
shape of data
 Easy to implement for multiclass problem (Unit Sales Bins)
 Works well with sparse data
 Reasons for prediction are
relatively easy to interpret
Model Building – Multiple Regression
The team built all three model types and evaluated them independently to determine which would produce the most
accurate model with minimal bias (e.g. underfitting, overfitting…etc.)
Multiple Regression Model
85.3%
Training Accuracy
• Linear Regression model was
created with a target variable of
“Units Sold”
71.6%
Test Accuracy
• Ridge and Lasso Regressions
were used to reduce complexity
of the model and limit under
and overfitting of the model
• Hyperparameters alpha and
max iteration were tuned to
obtain the optimum outputs
Data exploration used to adjust variables for classification model
Model Building – KNN
The team built all three model types and evaluated them independently to determine which would produce the most
accurate model with minimal bias (e.g. underfitting, overfitting…etc.)
KNN Model
• KNN model was created with a
target variable of “Unit Sales
Range” instead of “Units Sold”
64.9%
Training Accuracy
48.3%
Test Accuracy
Hyperparameter tuning to determine best output
100%
• Hyperparameter, nearest
neighbor, was tuned to obtain
the optimum outputs
Accuracy
80%
60%
40%
20%
0%
1
2
3
4
5
6
Nearest Neighbor Hyperparameter
7
8
9
10
Model Building – Logistic Regression
The team built all three model types and evaluated them independently to determine which would produce the most
accurate model with minimal bias (e.g. underfitting, overfitting…etc.)
Logistic Regression Model
80.89%
Training Accuracy
• Logistic Regression model was
created with a target variable of
“Unit Sales Range” instead of
“Units Sold”
73.39%
Test Accuracy
1
0.8
Accuracy
• Hyperparameter, C, was tuned
to obtain the best model
performance without
overfitting
Hyperparameter tuning to determine best output
0.6
0.4
0.2
0
0.001
0.01
0.1
1
10
100
Hyperparameter C
1000
10000
100000 1000000
Model Evaluation – Logistic Regression
Once the best model was identified – Logistic Regression – the model was evaluated against a series of scores to determine
the accuracy of the model
Units Sold Analysis
• Overall accuracy of the model
proves to be 73% accurate
across the four category classes
• For models in the 5,000
category, there is a significantly
low recall percentage meaning
correct predictions of positive
cases out of actual positive
cases is low
• Confusion Matrix evaluation
shown in Appendix IV
Recommendations When Using Wish
Our predictive analytical model has determined what factors impact total product sales on the Wish platform. How do these
compare to Wish’s own guidance? In general, our team has the following recommendations for merchants:
Sell Women’s
Fashion Items
Maintain
Good Product
and Merchant
Ratings
Earn Local
and Product
Quality
Badges
Avoid Yellow
& Brown
Products
Have a Profile
Picture
Use Ad
Boosts to
Market
Products
Offer Express
Shipping
For a more customized approach, utilize our team’s model to determine if Wish is the right fit for your
products and business strategy by accurately predicting future sales volumes on the platform.
Future Directions
If we have all the resources needed, we believe the model can be improved by the follows:
• More data – Use prior year data in addition to the
one of 2020 to further generalize factors impacting
sales
• New features– Select new attributes which may
better explain the relationship of independent
variables with target variable
• Ensemble methods (Bagging / Boosting) –
Combine results of multiple weak models to
produce better results
Appendix
Appendix I - Data Points Used
In order to evaluate what factors affect product sales, data was extracted from the Wish platform on its 2020 summer
items to be analyzed. 1,573 data points and over 40 different attributes were collected and stored in a ‘summer product
sales.csv 1’ file. Data types include object, float 64 and int64.
•
Title
•
Rating_three_Count
•
Product_Variation_Inventory
•
Merchant_Name
•
Price
•
Rating_Two_Count
•
Shipping_Option_Name
•
Merchant_Info_Subtitle
•
Retail_Price
•
Rating_One_Count
•
Shipping_Option_Price
•
Merchant_Rating_Count
•
Currency
•
Badges_Count
•
Shipping_Is_Express
•
Merchant_Rating
•
Units_Sold
•
Badge_Local_Product
•
Countries_Shipped_To
•
Merchant_ID
•
Uses_Ad_Boosts
•
Badge_Product_Quality
•
Inventory_Total
•
Merchant_Has_Profile_Picture
•
Rating
•
Badge_Fast_Shipping
•
Has_Urgency_Banner
•
Product_URL
•
Rating_Count
•
Tags
•
Urgency_Text
•
Product_Picture
•
Rating_Five_Count
•
Product_Color
•
Origin_Country
•
Product_ID (used as Index)
•
Rating_Four_Count
•
Product_Variation_Size_ID
•
Merchant_Title
•
Theme
1 Summer
Product Sales.csv data file provided by Professor Jie Li in K513 Class
Appendix II – Data Cleansing Example
Data Cleaned: Color; Method: Creating generalized buckets
Org. Color
New Color Group
Org. Color
New Color Group
Org. Color
New Color Group
Org. Color
New Color Group
Org. Color
New Color Group
blue'
'Blue/Black/Purple'
'red'
'Orange/Red/Pink'
'white & black'
'Multicolor'
'Rose red'
'Orange/Red/Pink'
'winered & yellow'
'Multicolor'
'orange'
'Orange/Red/Pink'
'coolblack'
'Blue/Black/Purple'
'lightblue'
'Blue/Black/Purple'
'black & green'
'Multicolor'
'claret'
'Orange/Red/Pink'
'white'
'White/Grey'
'navyblue'
'Blue/Black/Purple'
'pink & grey'
'Multicolor'
'white & green'
'Multicolor'
'lakeblue'
'Blue/Black/Purple'
'black'
'Blue/Black/Purple'
'rose'
'Orange/Red/Pink'
'lightyellow'
'Yellow/Brown'
'dustypink'
'Orange/Red/Pink'
'pink & black'
'Multicolor'
'purple'
'Blue/Black/Purple'
'winered'
'Orange/Red/Pink'
'coffee'
'Yellow/Brown'
'jasper'
'Orange/Red/Pink'
'orange & camouflage'
'Multicolor'
'light green'
'Green'
'black & yellow'
'Multicolor'
'navyblue & white'
'Multicolor'
'offwhite'
'White/Grey'
'pink & white'
'Multicolor'
'multicolor'
'Multicolor'
'armygreen'
'Green'
'blackwhite'
'Multicolor'
'black & blue'
'Multicolor'
'offblack'
'Blue/Black/Purple'
'gray'
'White/Grey'
'rosered'
'Orange/Red/Pink'
'leopard'
'Multicolor'
'lightgrey'
'White/Grey'
'leopardprint'
'Multicolor'
'floral'
'Multicolor'
'camel'
'Yellow/Brown'
'watermelonred'
'Orange/Red/Pink'
'lightpurple'
'Blue/Black/Purple'
'applegreen'
'Green'
'khaki'
'Yellow/Brown'
'rosegold'
'Orange/Red/Pink'
'black & stripe'
'Blue/Black/Purple'
'darkblue'
'Blue/Black/Purple'
'pink & blue'
'Multicolor'
'ivory'
'White/Grey'
'army green'
'Green'
'whitefloral'
'Multicolor'
'orange-red'
'Orange/Red/Pink'
'fluorescentgreen'
'Green'
'black & white'
'Multicolor'
'beige'
'Yellow/Brown'
'lightkhaki'
'Yellow/Brown'
'gold'
'Yellow/Brown'
'brown & yellow'
'Multicolor'
'green'
'Green'
'star'
'White/Grey'
'coralred'
'Orange/Red/Pink'
'greysnakeskinprint'
'Multicolor'
'darkgreen'
'Green'
'grey'
'White/Grey'
'red & blue'
'Multicolor'
'violet'
'Blue/Black/Purple'
'apricot'
'Yellow/Brown'
'rainbow'
'Multicolor'
'pink'
'Orange/Red/Pink'
'navy'
'Blue/Black/Purple'
'mintgreen'
'Green'
'lightred'
'Orange/Red/Pink'
'tan'
'Yellow/Brown'
'navy blue'
'Blue/Black/Purple'
'whitestripe'
'White/Grey'
'prussianblue'
'Blue/Black/Purple'
'wine'
'Orange/Red/Pink'
'Black'
'Blue/Black/Purple'
'camouflage'
'Multicolor'
'burgundy'
'Orange/Red/Pink'
'nude'
'Yellow/Brown'
'lightgray'
'White/Grey'
'White'
'White/Grey'
'yellow'
'Yellow/Brown'
'skyblue'
'Blue/Black/Purple'
'blue & pink'
'Multicolor'
'silver'
'White/Grey'
'Pink'
'Orange/Red/Pink'
'wine red'
'Orange/Red/Pink'
'army'
'Green'
'lightpink'
'Orange/Red/Pink'
'gray & white'
'Multicolor'
'Blue'
'Blue/Black/Purple'
'brown'
'Yellow/Brown'
'denimblue'
'Blue/Black/Purple'
'lightgreen'
'Green'
'white & red'
'Multicolor'
'RED'
'Orange/Red/Pink'
'Army green'
'Green'
Appendix III Numerical Variable Summary
Below is a key information summary of the numerical variables
price
retail_pr
ice
rating
rating_coun rating_five_c rating_one_c badges_c shipping_o countries_shi inventory_t merchant_r
t
ount
ount
ount ption_price
otal
ating
pped_to
count
1,306
1,306
1,306
1,306
1,306
1,306
1,306
1,306
1,306
1,306
1,306
mean
8.46
0.44
3.81
1,008.17
487.26
104.77
0.12
0.01
40.62
49.78
4.04
std
3.99
0.50
0.45
2,115.28
1,032.82
225.53
0.36
0.12
19.96
2.81
0.19
min
1.00
-
1.00
1.00
-
-
-
-
6.00
1.00
2.94
25%
5.86
-
3.57
36.00
17.00
4.00
-
-
32.00
50.00
3.93
50%
8.00
-
3.85
220.50
108.00
25.00
-
-
40.00
50.00
4.05
75%
11.00
1.00
4.10
997.50
468.50
109.00
-
-
43.00
50.00
4.17
max
49.00
1.00
5.00
20,744.00
11,548.00
2,789.00
3.00
1.00
140.00
50.00
4.58
Appendix IV - Categorical Variable Summary
Below is a key information summary of the categorical variables
uses_ad_boos badge_local_pro badge_product_ badge_fast_ship shipping_is_ merchant_has_ color_Blue/Bla
color_Green
ts
duct
quality
ping
express profile_picture
ck/Purple
0
1
0
1
736
570
1302
4
1105
201
820
486
1196
110
color_Multicol color_Orange/R color_White/Gre color_Yellow/Bro
tag_summer
y
wn
or
ed/Pink
tag_women
tag_fashion
tag_dress
123
1183
66
1240
829
477
1238
68
1278
28
1066
240
1202
104
1024
282
1287
19
1186
120
160
1146
Appendix V – Model Coefficients
Units sold
Unit Sold Ranges
100 and
less 100-1K 1K-5K
Target Variable
Coefficients Linear regression
Lasso
Units sold
5000+
Logistic Regression
Target Variable
Unit Sold Ranges
100 and
less 100-1K 1K-5K 5000+
Coefficients
Linear
regression
Lasso
Logistic Regression
price
51.24
47.22
0.40
0.22
(0.11)
(0.32)
inventory_total
(6.76)
(0.20)
(6.31)
0.25
1.27 0.13
retail_price
(14.30)
(14.24)
0.37
(0.13)
(0.22)
0.01
merchant_rating_count
0.00
0.00
(0.47)
(0.30)
0.02 0.01
uses_ad_boosts
596.50
591.47
(0.62)
0.08
(0.07)
0.21
merchant_rating
88.79
84.23
0.27
0.17
0.18 (0.47)
rating
293.65
286.17
(0.35)
0.07
0.08
0.42
merchant_has_profile_picture
117.02
114.21
(0.29)
0.15
(0.04) 0.13
rating_count
3.15
3.25
(46.40)
(5.76)
(3.78)
3.83
color_Blue/Black/Purple
360.90
250.99
(0.04)
(0.04)
(0.02) (0.02)
rating_five_count
0.45
0.31
(40.00)
1.99
2.42
2.98
color_Green
201.37
89.72
(0.11)
(0.01)
(0.16) (0.04)
rating_one_count
9.64
9.35
(38.29)
1.30
1.35
2.55
color_Multicolor
(543.31) (641.80)
0.29
(0.01)
(0.18) 0.23
badges_count
(633.22)
-
0.09
0.04
(0.18)
(0.19)
color_Orange/Red/Pink
-
(0.13)
0.03
0.09 (0.01)
badge_local_product
1,208.87
562.58
(0.02)
0.09
0.09
(0.10)
color_White/Grey
4.93 (100.54)
0.20
0.02
0.13 0.07
badge_product_quality
359.71 (257.73)
0.01
0.07
0.11
(0.04)
color_Yellow/Brown
(131.69) (233.78)
(0.15)
0.02
0.01 (0.16)
badge_fast_shipping
(2,201.79) (2,806.88)
0.26
(0.16)
(0.93)
(0.37)
tag_summer
(690.25) (687.63)
0.18
(0.05)
0.03 (0.36)
shipping_option_price
(336.93) (320.83)
0.42
0.03
0.33
(0.31)
tag_women
507.20
(0.05)
0.14
0.13 0.18
shipping_is_express
1,859.93 1,792.23
(0.44)
(0.32)
(0.11)
0.38
tag_fashion
1,075.49 1,067.39
(0.13)
(0.03)
0.12 0.10
0.15
0.01
(0.01)
(0.18)
tag_dress
(430.93) (426.47)
0.51
0.00
(0.18) (0.22)
countries_shipped_to
(9.20)
(9.08)
107.80
510.85
Appendix VI – Confusion Matrix
Legend
•
•
•
•
0 – “100 and less”
1 – 1,000
2 – 5,000
3 – 5,000+
References
• https://www.businessinsider.com/amazon-wish-how-shopping-experience-compares-2019-3
• https://www.vox.com/the-goods/2019/6/17/18679107/wish-shopping-app
• https://www.wish.com/company_info?hide_login_modal=true
• https://www.nba.com/lakers/wish
• https://merchantfaq.wish.com/hc/en-us
• https://deliverr.com/blog/what-is-wish/
• https://merchantfaq.wish.com/hc/en-us/articles/204531178-How-do-I-increase-sales-for-my-products• https://medium.com/rants-on-machine-learning/7-ways-to-improve-your-predictive-models-753705eba3d6
Download