SANDHAI – An E-shopping Service Aggregation Framework Harikrishna Narayanan Pranesh Parimala Ranganathan Vijay Ramakrishnan Siva Subbiah 902533226 902505951 902446624 902538209 {harikrishna, pranesh, v.ramakrishnan, siva.subbiah}@gatech.edu SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 1 Contents Abstract: ...................................................................................................................................................... 3 Some Sandhai Features: ........................................................................................................................... 4 Architecture and Desgin: ......................................................................................................................... 4 Currently supported APIs: ...................................................................................................................... 4 Integration of Amazon APIs: .................................................................................................................. 4 Integration of Ebay APIs: ....................................................................................................................... 5 Recommendation System: ...................................................................................................................... 5 Apriori algorithm: ............................................................................................................................... 6 Database Design & UI Functionality ...................................................................................................... 8 Architecture Diagram: .......................................................................................................................... 11 Testing and Evaluation: ..................................................................................................................... 12 Challenges: .............................................................................................................................................. 13 Screenshots : ............................................................................................................................................ 13 Future Work: .......................................................................................................................................... 14 Conclusion: .............................................................................................................................................. 15 Project Planning: .................................................................................................................................... 15 Acknowledgments: .................................................................................................................................. 15 References: .............................................................................................................................................. 15 SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 2 Abstract: Sandhai = a common market place in Tamil [South Indian Language] where one can buy or sell anything. Sandhai is an e-shopping search engine system. With the current boom in Ecommerce business integration of ecommerce search engines into a single point service offers several advantages like fast and cost effective search and quality of return etc. In Sandhai we had designed such an Aggregation framework targeting major Ecommerce service providers. The Background and below use cases will give a better picture of what Sandhai is all about. . Background: The background of service integration and Ecommerce applications that lead to this project is explained in the form of two simple use cases as below. Sample Use case #1: Consider a user U wish to buy a product P online. He has to search across several online e-commerce services to find a best deal of his interest both in terms of cost and quality. The various factors that might influence his buying are Product Cost, Free Offers, Shipment charges, Taxes. He has to spend a considerable amount of time in finding a good deal for the product of his interest. Also the user has to be aware of various services and also other information like Products of Category “C” are better offered by Website W1 and Products of Category “D” are better offered by Website W2. Nearly half of people, who fix deals of products through a web service online, find out a better offer of the same product by a different service later. Instead if there is a consolidated service or a system that can talk to several online services and find the best offer amongst all, the user would be happy to use the system and can be very much satisfied with the deal he found for himself. Sample Use case #2: The idea of buying new products, goods, gadgets spreads amongst friends circle when friends usually meet or get together. Say A, B, C, D and E are friends and they get together once in a while for a dinner. Let user A buy a product P in a nice offer. When the friends meet and casually talk about the product that A bought some of his friends might like the product and would wish to buy the same in a similar offer, but unfortunately the offer might have expired or might have turned unfavorable in the time. Instead If there existed a system where users can keep track of their wish list and once they buy one or get one they check it with the details of the deal they used to buy it, his/her friends circle might be notified by the same by a Pub Sub framework. So in our system users create and maintain their wish lists. The friends circle can then subscribe themselves for a wish list item of their friend. Say now user A buys a product in his wish list he fills out the wish list completion that will publish the details of his 3 buying to all the subscribed friends. The existing social network infrastructure can also be used to accomplish this. SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 3 Other Popular Ecommerce Search Engines www.shopzilla.com www.kelkoo.com www.MyShoppingpal.com www.thefind.com Some Sandhai Features: Easy integration of new web services Support for both SOAP and REST based product search APIs. User customized search tuning Get information about users preferences and interests [say his favorite online shops, his favorite brands, his favorite color etc] Architecture and Desgin: 1) What to look for in any E-Commerce Service ? Product data: Product data includes information about product availability and pricing for items in the catalog. Content from customers: Content from customers include reviews and product lists Seller information: Seller information includes general information and customer feedback about the wide range of vendors 2) The system will allow users to do a single master search that will spawn itself across various e-commerce players using their E-Shopping API interfaces and help users in getting the right product. 3) Currently supported APIs: 4) Integration of Amazon APIs: a. The API provides well defined mechanism for querying the database from Amazon. It provides for a variety of search queries and uses REST or SOAP protocols. The API exposes Amazon's product data and e-commerce functionality. This allows developers, web site publishers and others to leverage the data that Amazon uses to power its own business, and potentially make money as an Amazon affiliate. b. Representational state transfer (REST) is a style of software architecture for distributed hypermedia systems. REST refers to a collection of network SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 4 architecture principles which outline how resources are defined and addressed. The term is often used more loosely to describe any simple interface which transmits domain-specific data over HTTP without an additional messaging layer such as SOAP or session tracking via HTTP cookies. c. There are certain groups of APIs which can be used. Some of the popular groups include BrowseNodeLookup, Customer Content, Items, List, Cart, Third Party Listings, TransactionLookup . The only restriction is that one one call can be made per IP per second and it also presents a limitation on how long the data can be cached locally. d. Amazon provides WSDL to integrate with platforms such as .net in order to code. A WSDL (Web Service Description Language) is an XML document that defines the operations, parameters, requests, and responses used in web service interactions. It acts like the contract that defines the language and grammar used by web service clients and servers. When you look at the Amazon Associates Web Service WSDL, for example, you find in it all of the Amazon Associates Web Service operation names, parameters, request and response structures. e. .Amazon Associates Web Service REST requests are URLs, as shown in the following example. http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService&Operation=Ite mSearch&AWSAccessKeyId=[Access Key D]&AssociateTag=[ID]&SearchIndex=Apparel&Keywords=Shirt 5) Integration of Ebay APIs: a. Ebay like amazon also provides APIs for performing a search. It uses SOAP protocols. A sample ebay query is given below for reference. http://open.api.ebay.com/shopping?appid=MyAppID&version=517&siteid= 0&callname=FindItems&QueryKeywords=ipod&responseencoding=JSON&callb ack=true. 6) Recommendation System: Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into three major categories: content-based, collaborative and hybrid recommendation approaches. But we focused on only one approach and it is the collaborative filtering approach. For the recommendation system we tried three different algorithms.The three algorithms are SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 5 1) Apriori Algorithm. 2) Naïve Non Parametric Baye’s Classifier. 3) Aditive content based classification. -> For the Naive version of NBC I had a training set with labeled data about the products(into four different categories). (Technology, Wearable, Books and Entertainment). So now if some user had bought a product from a category he would be shown a different product from the same category. This had some issues as the number of products in each category is huge. So even though the classification accuracy was good the recommended stuff was not very appropriate as there were many products in the same category which were not very closely related to the current product. This was written in C#. -> The additive algorithm just checks for the products the user has already bought in different news articles(which are stored in text file according to date) and based on the reviews and recommendations from other customers we suggest other products. This was written in Java. -> The apriori algorithm is the third algorithm which we tried implementing. It uses association rule mining and it gave better results of the three. But since this doesn't indicate the other popular products in market other than the users' preference information (which was limited), we just grabbed the ranking of the popular products from Amazon and E-Bay API and displayed it as well. The variety was good and also the content was appropriate. This was written in C#. Since finally for the integration we needed a common platform, we dropped the first two algorithms which had different requirements. Apriori algorithm: Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Apriori algorithm is basically used to find association rules. Association rule mining works the following way, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k − 1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 6 (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all 2 | S | − 1 of its proper subsets. Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database. The problem is usually decomposed into two sub problems. One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. In our case item sets are those products which a user has purchased. The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets are generated in the following way: the first rule is {I1, I2, … , Ik-1}⇒ {Ik}, by checking the confidence this rule can be determined as interesting or not. Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent, further the confidences of the new rules are checked to determine the interestingness of them. Those processes iterated until the antecedent becomes empty. Since the second sub problem is quite straight forward, most of the researches focus on the first sub problem. The Apriori algorithm finds the frequent sets L In Database D. Find frequent set Lk − 1. Join Step. o Ck is generated by joining Lk − 1with itself Prune Step. o Any (k − 1) -itemset that is not frequent cannot be a subset of a frequent k -itemset, hence should be removed. Where (Ck: Candidate itemset of size k) (Lk: frequent itemset of size k) The pseudo code for the algorithm is given below. Apriori large 1-itemsets that appear in more than transactions } while Generate(Lk − 1) for transactions SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 7 Subset(Ck,t) for candidates return In this algorithm based on the other users’ purchases we recommend stuff for this user. This is called association rule mining and the method of recommendation is called collaborative filtering. Since we didn’t do any actual purchases we extrapolated the values and filled in the data base. From these values we implemented the algorithm and filled in another database for easy retrieval of data. This was the first type of recommendation. The database for this design stored the user details and the product details. There were two other databases one for matching the users and their purchases and the final one for storing the recommendations. There is also one another simple implementation where we get the ranking of the products from the web services namely Amazon and E-Bay. This gives the popular searches for the day. By this way we bring in variety to the products recommended. So now the user has many options some of which are personalized and based on his purchases and purchases of other like minded users and the others are given by the popular purchases of the day. Thus the user will have many options to choose from. The next level we would like to take the project to is to recommend based on hybrid approaches based on item and based on social networks. Database Design & UI Functionality The Database design was made keeping in mind the simplicity of the relationships between the different tables and the efficiency of querying that has to be performed on the underlying data. The main tables used in the Database design are 1. User 2. Product 3. Category 4. Friend_List. The Database tables are primarily used in the following operations of a user: Registration: Once when a user registers in the Sandhai website, the data which is collected through the forms is populated in the underlying ‘User’ Database. The data collected about the user during the registration phase is used for recommendations and populating the search results in a particular order. Login: SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 8 Every user when logging into the site is verified upon using the username and password details those are present in the database. Once a user successfully logs in, the session variables are set denoting the current user and the Last_visit field is updated with the current time stamp of the user. Recommendation: Once a user opts to check out the suggested recommendations for him, the data from the user table and the product table are fetched and fed into the recommendation system using which user recommendations are provided based on the apriori algorithm. Send Mail Alerts: The user has an option to send email or sms alerts whenever the product requested by him on a particular cost differential is available. The sendMail program fetches the contact details of the currently logged in user and sends an alert through email with the help of the underlying sendMail API. Twitter: Twitter is a feature add messages to a friend’s blog when a user searches for a particular product of interest. We use a separate table for maintaining the friends list of every user and once when the tweet option is selected by the user, the currently searched items’ link is sent as a tweet to the friends of the particular user. Database Connectivity The primary database which was used to build the application is SQL and for testing purposes we had implemented access databases which can be connected with the C# application with the help of the JET OLEDB 4.0 driver. The Database Tables are as follows: User Attribute Username(primary key) Password Phone Email Zip Amazon_pref Ebay_pref Besstbuy_pref Last_visit Datatype Varchar Varchar Varchar Varchar Integer Integer Integer Integer datetime Product Attribute Pid(primary key) Datatype Varchar SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 9 Username(foreign key from Varchar user) Product name Varchar Vendorid Varchar Categoryid Varchar Unitprice Varchar P_descp Varchar P_cost Float Category Attribute Categoryid(primary key) Categoryname C_descp Datatype Varchar Varchar varchar Friend_List username F_username F_email varchar varchar varchar SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 10 Architecture Diagram: SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 11 Testing and Evaluation: How to test the effectiveness of a service aggregation system as a whole? ◦ Highly dependent on individual sub systems Bringing up a Sandhai Pilot system and asking many users to perform search in it will help in evaluating the system. Performance of the system at various stages of integration The QOS parameters are Speed Number of Results Preference Simulating Web services to profile the integration framework code ◦ Created simple web service mockups ◦ Integrated these mock up services into Sandhai ‘s framework ◦ Triggered custom searches and calculated the request response times for various ranges of queries. Defining Quality of Service Parameters Q : Search Query WS1,WS2,…..WSn : Web Services T1 ,T2,……Tn : Time Taken for Search AT : Aggregator Framework time We aim to achieve a performance in which the sandhai‘s search time is always better than the slowest product search engine among the integrated engines. SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 12 Challenges: Trying to aggregate different web services with a generic framework. Performance Evaluation was a challenge Screenshots : Home page screenshot SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 13 Search results screenshot Future Work: Integrating and supporting more e-commerce sites to provide users with a wider search range Supporting a full fledged recommendation system for the user profiles in the system Independent wish-list publisher subscriber system We would like to implement this idea mainly for e-commerce [buying and selling of online goods] services and extend them to other service consolidations like web search services consolidation, Social Network services consolidation and thus giving the user flexibility across all services at a single place. SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 14 Data mining and trend analysis based on product searches made by different user profiles Conclusion: The idea of Web Service aggregation seamlessly is powerful when implemented. In our project we were successfully able to wrap APIs of 3 different Ecommerce service providers and were able to utilize several of their features from a single point. We would be more interested in seeing how the framework of integration suites and scales well with respect to service aggregation from different domains. Project Planning: Resource Plan and Schedule: Week 1: Reading related literature and Web Service APIs Week 2: Design of Database and basic Classes for implementation Week 3: Implementation of Shopping features to support various e-commerce services Week 4: Implementation of wish list Pub Sub system Week 5: Implementation of auxiliary services supported by the system Week 6: Integration and System Testing Week 7: Testing, Bug fixes and Regression Week 8: Presentation, Release and Usage in Production Acknowledgments: Prof. Ling Liu for her valuable comments and suggestions during the design and implementation phase of our project. References: 1. Amazon API “http://docs.amazonwebservices.com/AWSECommerceService/2008-03-03/GSG/” 2. Google Maps API http://en.wikipedia.org/wiki/Google_Maps 3. Masand, Spiliopoulou, Srivastava, Ziane: “Web Mining for Usage Patterns & Profiles”, WEBKDD 2002. 4. Rayid Ghani, Carlos Soares: “Data Mining for Business Applications”, KDD – 2006. 5.http://developer.ebay.com/DevZone/shopping/docs/HowTo/JS_Shopping/JS_SearchGS_NV_JSON/JS_Sea rchGS_NV_JSON.html SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 15 SANDHAI – E-shopping aggregation framework | CS8803 AIAD | Spring’09 Page 16