Will a Product Sell Well with Wish? What is Wish? 100M Monthly Active Users Global E-Commerce Platform & Most Downloaded Mobile Shopping Application Worldwide Direct-to-Consumer Service connecting buyers to (predominantly) low-cost, Asian manufacturers – removing overhead and markups of traditional retailers Business Model: Wish generates revenue by retaining a percentage of merchant sales Tradeoff: Lower Prices but Longer Delivery Lead Times than other E-Commerce Retailers 2M+ Products Sold Daily 500K+ Merchants 150M+ Items for Sale Merchant Options on Wish How can sellers boost sales volumes on the e-commerce platform? Keep Prices Low Add Sizes & Colors Add Extra Images Upload High-Quality Images Ship Worldwide Merchant Options on Wish How can sellers boost sales volumes on the e-commerce platform? Wish Express • Program offering express shipping to customers • Available for products that can meet orders delivering deadlines per designated region • Merchants receive added benefits such as more product impressions ProductBoost • Using Wish’s proprietary algorithms, ProductBoost increases the exposure of products in front of the customers who are most likely to buy them • Boosted products move higher up within product rankings if they are relevant to Wish customers • Optimizes cost of impressions and promotes merchants' best products Wish Promotions • Verified by Wish – Products that receive exceptional customer feedback • CollectionBoost – Allows merchants to promote products in customized collections • Promoted Products – Wish products selected due to relevant audience, conversion, and popularity Background Information In order to evaluate what factors affect product sales, data was extracted from the Wish platform on its 2020 summer items and stored in a ‘summer product sales.csv 1’ file to be analyzed. Data Info 2020 summer sales on Whish Platform 1,573 data points Key Attributes Product Info Sales Records Data Types Object Float 64 Ratings & Badges Over 40 different attributes Shipping Details Merchant Info 1 Summer Product Sales.csv data file provided by Professor Jie Li in K513 Class Int64 Data Cleansing Upon analysis of the raw data, the team identified several data cleansing activities that needed to take place prior to processing the data through our predictive models. Data Issues Identified Unnecessary Descriptive data (e.g. Title, Currency, Product URLs…etc.) Duplicative data rows (e.g. duplicative product ids) Missing data elements (e.g. product attributes) Inconsistent values (e.g. Differing product sizes) Data Corrections Applied Dropped descriptive data columns that did not help in the model Filled in missing data elements in certain columns Removed rows with missing critical data elements (e.g. Total Rating Count) Mapped inconsistent data elements to a generalized buckets or dummy variables Finalized Dataset 29 Numerical Variables 16 Categorical Variables Target Variable: o Unit Sales o Range of Unit Sales 4 Binned Unit Sales Ranges: o o o o < 100 100 – 1000 1000 – 5000 5000+ Numerical Variable Highlights Price Distribution Price most product prices range from 0-10 with a mean of 8.46 Rating Distribution Rating Majority of the products were rated between 0-4 with a mean of 3.81 Shipping Countries Distribution Countries shipped to Most products were sold to 3243 different countries with a mean of 40.62 Numerical Variable Analysis A correlation matrix analysis between the numerical variables in the dataset highlights potential factors into total units sold • Rating Counts– Total rating counts for a product had a 0.90 correlation to units sold. (High Correlation) • 5-Star Rating Counts – Number of 5-star ratings for a product had a 0.88 correlation to units sold (High Correlation) • 1-Star Rating Counts – Number of 1-star ratings for a product had a 0.83 correlation to units sold (High Correlation) Correlation Matrix Categorical Variable Highlights Yes No Upon investigation of categorical variables, the team summarized several key data below % uses ad boosts % from local % has picture % receives consistent good evaluations % has multi color % has fashion tag Categorical Variable Analysis Graphical compares between the values in the categorical variables helped identify potential drivers for units sold • Merchant_has_profile_picture • Color_multicolor • Badge_Product_Quality • Color_Blue/Black/Purple • Tag_Fashion Other variables did not display as much variance between ‘Yes’ and ‘No’ data. (e.g ad_boosts, shipping_is_express, color_green, badge_fast_shipping) Overview of Models We built the following models to find the best fit to the data set. Selections are based on data conditions, and model implementation and interpretation needs. Classification Model Multiple Regression Model Unit Sales Ranges Unit Sold Model Benefits Linear Regression KNN Logistic Regression • Predict the target as 𝑦𝑦=𝑤𝑤^𝑇𝑇 𝑥𝑥=𝑤𝑤_0+𝑤𝑤_1 𝑥𝑥_1+⋯+𝑤𝑤_𝑛𝑛 𝑥𝑥_𝑛𝑛 • Learn by minimizing sum of squared errors • Predict the new instance belongs to the majority class of its K nearest neighbors. • Instance-based model • Calculate the probability of a new instance belonging to a certain class. If the probability > cut-off, the model belongs to that class • Learn by minimizing logistic losses + penalty on large weights Easier to interpret the output coefficients Simple to implement No assumption on the shape of data Easy to implement for multiclass problem (Unit Sales Bins) Works well with sparse data Reasons for prediction are relatively easy to interpret Model Building – Multiple Regression The team built all three model types and evaluated them independently to determine which would produce the most accurate model with minimal bias (e.g. underfitting, overfitting…etc.) Multiple Regression Model 85.3% Training Accuracy • Linear Regression model was created with a target variable of “Units Sold” 71.6% Test Accuracy • Ridge and Lasso Regressions were used to reduce complexity of the model and limit under and overfitting of the model • Hyperparameters alpha and max iteration were tuned to obtain the optimum outputs Data exploration used to adjust variables for classification model Model Building – KNN The team built all three model types and evaluated them independently to determine which would produce the most accurate model with minimal bias (e.g. underfitting, overfitting…etc.) KNN Model • KNN model was created with a target variable of “Unit Sales Range” instead of “Units Sold” 64.9% Training Accuracy 48.3% Test Accuracy Hyperparameter tuning to determine best output 100% • Hyperparameter, nearest neighbor, was tuned to obtain the optimum outputs Accuracy 80% 60% 40% 20% 0% 1 2 3 4 5 6 Nearest Neighbor Hyperparameter 7 8 9 10 Model Building – Logistic Regression The team built all three model types and evaluated them independently to determine which would produce the most accurate model with minimal bias (e.g. underfitting, overfitting…etc.) Logistic Regression Model 80.89% Training Accuracy • Logistic Regression model was created with a target variable of “Unit Sales Range” instead of “Units Sold” 73.39% Test Accuracy 1 0.8 Accuracy • Hyperparameter, C, was tuned to obtain the best model performance without overfitting Hyperparameter tuning to determine best output 0.6 0.4 0.2 0 0.001 0.01 0.1 1 10 100 Hyperparameter C 1000 10000 100000 1000000 Model Evaluation – Logistic Regression Once the best model was identified – Logistic Regression – the model was evaluated against a series of scores to determine the accuracy of the model Units Sold Analysis • Overall accuracy of the model proves to be 73% accurate across the four category classes • For models in the 5,000 category, there is a significantly low recall percentage meaning correct predictions of positive cases out of actual positive cases is low • Confusion Matrix evaluation shown in Appendix IV Recommendations When Using Wish Our predictive analytical model has determined what factors impact total product sales on the Wish platform. How do these compare to Wish’s own guidance? In general, our team has the following recommendations for merchants: Sell Women’s Fashion Items Maintain Good Product and Merchant Ratings Earn Local and Product Quality Badges Avoid Yellow & Brown Products Have a Profile Picture Use Ad Boosts to Market Products Offer Express Shipping For a more customized approach, utilize our team’s model to determine if Wish is the right fit for your products and business strategy by accurately predicting future sales volumes on the platform. Future Directions If we have all the resources needed, we believe the model can be improved by the follows: • More data – Use prior year data in addition to the one of 2020 to further generalize factors impacting sales • New features– Select new attributes which may better explain the relationship of independent variables with target variable • Ensemble methods (Bagging / Boosting) – Combine results of multiple weak models to produce better results Appendix Appendix I - Data Points Used In order to evaluate what factors affect product sales, data was extracted from the Wish platform on its 2020 summer items to be analyzed. 1,573 data points and over 40 different attributes were collected and stored in a ‘summer product sales.csv 1’ file. Data types include object, float 64 and int64. • Title • Rating_three_Count • Product_Variation_Inventory • Merchant_Name • Price • Rating_Two_Count • Shipping_Option_Name • Merchant_Info_Subtitle • Retail_Price • Rating_One_Count • Shipping_Option_Price • Merchant_Rating_Count • Currency • Badges_Count • Shipping_Is_Express • Merchant_Rating • Units_Sold • Badge_Local_Product • Countries_Shipped_To • Merchant_ID • Uses_Ad_Boosts • Badge_Product_Quality • Inventory_Total • Merchant_Has_Profile_Picture • Rating • Badge_Fast_Shipping • Has_Urgency_Banner • Product_URL • Rating_Count • Tags • Urgency_Text • Product_Picture • Rating_Five_Count • Product_Color • Origin_Country • Product_ID (used as Index) • Rating_Four_Count • Product_Variation_Size_ID • Merchant_Title • Theme 1 Summer Product Sales.csv data file provided by Professor Jie Li in K513 Class Appendix II – Data Cleansing Example Data Cleaned: Color; Method: Creating generalized buckets Org. Color New Color Group Org. Color New Color Group Org. Color New Color Group Org. Color New Color Group Org. Color New Color Group blue' 'Blue/Black/Purple' 'red' 'Orange/Red/Pink' 'white & black' 'Multicolor' 'Rose red' 'Orange/Red/Pink' 'winered & yellow' 'Multicolor' 'orange' 'Orange/Red/Pink' 'coolblack' 'Blue/Black/Purple' 'lightblue' 'Blue/Black/Purple' 'black & green' 'Multicolor' 'claret' 'Orange/Red/Pink' 'white' 'White/Grey' 'navyblue' 'Blue/Black/Purple' 'pink & grey' 'Multicolor' 'white & green' 'Multicolor' 'lakeblue' 'Blue/Black/Purple' 'black' 'Blue/Black/Purple' 'rose' 'Orange/Red/Pink' 'lightyellow' 'Yellow/Brown' 'dustypink' 'Orange/Red/Pink' 'pink & black' 'Multicolor' 'purple' 'Blue/Black/Purple' 'winered' 'Orange/Red/Pink' 'coffee' 'Yellow/Brown' 'jasper' 'Orange/Red/Pink' 'orange & camouflage' 'Multicolor' 'light green' 'Green' 'black & yellow' 'Multicolor' 'navyblue & white' 'Multicolor' 'offwhite' 'White/Grey' 'pink & white' 'Multicolor' 'multicolor' 'Multicolor' 'armygreen' 'Green' 'blackwhite' 'Multicolor' 'black & blue' 'Multicolor' 'offblack' 'Blue/Black/Purple' 'gray' 'White/Grey' 'rosered' 'Orange/Red/Pink' 'leopard' 'Multicolor' 'lightgrey' 'White/Grey' 'leopardprint' 'Multicolor' 'floral' 'Multicolor' 'camel' 'Yellow/Brown' 'watermelonred' 'Orange/Red/Pink' 'lightpurple' 'Blue/Black/Purple' 'applegreen' 'Green' 'khaki' 'Yellow/Brown' 'rosegold' 'Orange/Red/Pink' 'black & stripe' 'Blue/Black/Purple' 'darkblue' 'Blue/Black/Purple' 'pink & blue' 'Multicolor' 'ivory' 'White/Grey' 'army green' 'Green' 'whitefloral' 'Multicolor' 'orange-red' 'Orange/Red/Pink' 'fluorescentgreen' 'Green' 'black & white' 'Multicolor' 'beige' 'Yellow/Brown' 'lightkhaki' 'Yellow/Brown' 'gold' 'Yellow/Brown' 'brown & yellow' 'Multicolor' 'green' 'Green' 'star' 'White/Grey' 'coralred' 'Orange/Red/Pink' 'greysnakeskinprint' 'Multicolor' 'darkgreen' 'Green' 'grey' 'White/Grey' 'red & blue' 'Multicolor' 'violet' 'Blue/Black/Purple' 'apricot' 'Yellow/Brown' 'rainbow' 'Multicolor' 'pink' 'Orange/Red/Pink' 'navy' 'Blue/Black/Purple' 'mintgreen' 'Green' 'lightred' 'Orange/Red/Pink' 'tan' 'Yellow/Brown' 'navy blue' 'Blue/Black/Purple' 'whitestripe' 'White/Grey' 'prussianblue' 'Blue/Black/Purple' 'wine' 'Orange/Red/Pink' 'Black' 'Blue/Black/Purple' 'camouflage' 'Multicolor' 'burgundy' 'Orange/Red/Pink' 'nude' 'Yellow/Brown' 'lightgray' 'White/Grey' 'White' 'White/Grey' 'yellow' 'Yellow/Brown' 'skyblue' 'Blue/Black/Purple' 'blue & pink' 'Multicolor' 'silver' 'White/Grey' 'Pink' 'Orange/Red/Pink' 'wine red' 'Orange/Red/Pink' 'army' 'Green' 'lightpink' 'Orange/Red/Pink' 'gray & white' 'Multicolor' 'Blue' 'Blue/Black/Purple' 'brown' 'Yellow/Brown' 'denimblue' 'Blue/Black/Purple' 'lightgreen' 'Green' 'white & red' 'Multicolor' 'RED' 'Orange/Red/Pink' 'Army green' 'Green' Appendix III Numerical Variable Summary Below is a key information summary of the numerical variables price retail_pr ice rating rating_coun rating_five_c rating_one_c badges_c shipping_o countries_shi inventory_t merchant_r t ount ount ount ption_price otal ating pped_to count 1,306 1,306 1,306 1,306 1,306 1,306 1,306 1,306 1,306 1,306 1,306 mean 8.46 0.44 3.81 1,008.17 487.26 104.77 0.12 0.01 40.62 49.78 4.04 std 3.99 0.50 0.45 2,115.28 1,032.82 225.53 0.36 0.12 19.96 2.81 0.19 min 1.00 - 1.00 1.00 - - - - 6.00 1.00 2.94 25% 5.86 - 3.57 36.00 17.00 4.00 - - 32.00 50.00 3.93 50% 8.00 - 3.85 220.50 108.00 25.00 - - 40.00 50.00 4.05 75% 11.00 1.00 4.10 997.50 468.50 109.00 - - 43.00 50.00 4.17 max 49.00 1.00 5.00 20,744.00 11,548.00 2,789.00 3.00 1.00 140.00 50.00 4.58 Appendix IV - Categorical Variable Summary Below is a key information summary of the categorical variables uses_ad_boos badge_local_pro badge_product_ badge_fast_ship shipping_is_ merchant_has_ color_Blue/Bla color_Green ts duct quality ping express profile_picture ck/Purple 0 1 0 1 736 570 1302 4 1105 201 820 486 1196 110 color_Multicol color_Orange/R color_White/Gre color_Yellow/Bro tag_summer y wn or ed/Pink tag_women tag_fashion tag_dress 123 1183 66 1240 829 477 1238 68 1278 28 1066 240 1202 104 1024 282 1287 19 1186 120 160 1146 Appendix V – Model Coefficients Units sold Unit Sold Ranges 100 and less 100-1K 1K-5K Target Variable Coefficients Linear regression Lasso Units sold 5000+ Logistic Regression Target Variable Unit Sold Ranges 100 and less 100-1K 1K-5K 5000+ Coefficients Linear regression Lasso Logistic Regression price 51.24 47.22 0.40 0.22 (0.11) (0.32) inventory_total (6.76) (0.20) (6.31) 0.25 1.27 0.13 retail_price (14.30) (14.24) 0.37 (0.13) (0.22) 0.01 merchant_rating_count 0.00 0.00 (0.47) (0.30) 0.02 0.01 uses_ad_boosts 596.50 591.47 (0.62) 0.08 (0.07) 0.21 merchant_rating 88.79 84.23 0.27 0.17 0.18 (0.47) rating 293.65 286.17 (0.35) 0.07 0.08 0.42 merchant_has_profile_picture 117.02 114.21 (0.29) 0.15 (0.04) 0.13 rating_count 3.15 3.25 (46.40) (5.76) (3.78) 3.83 color_Blue/Black/Purple 360.90 250.99 (0.04) (0.04) (0.02) (0.02) rating_five_count 0.45 0.31 (40.00) 1.99 2.42 2.98 color_Green 201.37 89.72 (0.11) (0.01) (0.16) (0.04) rating_one_count 9.64 9.35 (38.29) 1.30 1.35 2.55 color_Multicolor (543.31) (641.80) 0.29 (0.01) (0.18) 0.23 badges_count (633.22) - 0.09 0.04 (0.18) (0.19) color_Orange/Red/Pink - (0.13) 0.03 0.09 (0.01) badge_local_product 1,208.87 562.58 (0.02) 0.09 0.09 (0.10) color_White/Grey 4.93 (100.54) 0.20 0.02 0.13 0.07 badge_product_quality 359.71 (257.73) 0.01 0.07 0.11 (0.04) color_Yellow/Brown (131.69) (233.78) (0.15) 0.02 0.01 (0.16) badge_fast_shipping (2,201.79) (2,806.88) 0.26 (0.16) (0.93) (0.37) tag_summer (690.25) (687.63) 0.18 (0.05) 0.03 (0.36) shipping_option_price (336.93) (320.83) 0.42 0.03 0.33 (0.31) tag_women 507.20 (0.05) 0.14 0.13 0.18 shipping_is_express 1,859.93 1,792.23 (0.44) (0.32) (0.11) 0.38 tag_fashion 1,075.49 1,067.39 (0.13) (0.03) 0.12 0.10 0.15 0.01 (0.01) (0.18) tag_dress (430.93) (426.47) 0.51 0.00 (0.18) (0.22) countries_shipped_to (9.20) (9.08) 107.80 510.85 Appendix VI – Confusion Matrix Legend • • • • 0 – “100 and less” 1 – 1,000 2 – 5,000 3 – 5,000+ References • https://www.businessinsider.com/amazon-wish-how-shopping-experience-compares-2019-3 • https://www.vox.com/the-goods/2019/6/17/18679107/wish-shopping-app • https://www.wish.com/company_info?hide_login_modal=true • https://www.nba.com/lakers/wish • https://merchantfaq.wish.com/hc/en-us • https://deliverr.com/blog/what-is-wish/ • https://merchantfaq.wish.com/hc/en-us/articles/204531178-How-do-I-increase-sales-for-my-products• https://medium.com/rants-on-machine-learning/7-ways-to-improve-your-predictive-models-753705eba3d6