Szabolcs Palinko (1 person group) Advanced Internet Application Development Ling Liu, CS8803, 2007 Spring Project Proposal Automatic Blog Comment Ranking Summary User added content to websites have become extremely popular with the possibility to add comments to blogs or reviews to online shops. Popular blog entries often have several hundreds of comments, and users can skim through them sequentially to find the ones that they are interested in. Although there are some techniques to manually rank these comments—by a moderator or by user voting,— there is no current algorithm in use for automatically ranking the comments. I am proposing an automatic comment ranking technique, based on the distinctive attributes of blog comments, that ranks comments by their estimated popularity: the comments that are read more and therefore viewed longer (stay on the screen longer) are ranked higher. This automatic ranking technique that leverages the specialty of comments and how they are displayed on websites helps ranking the comments without further user or moderator interaction. User Behavior Analysis for Websites There are currently three basic approaches for analyzing user behavior on websites: - Analysis of inter-page interaction: analysis is based on pages viewed, links followed, and time between switching to different pages Eye tracking: mainly used for usability evaluation purposes in lab settings because it requires camera and special software while user interacts with the system (EyeTracking.com [1]) Tracking and saving the whole user interaction with the website for later replay and analysis: JavaScript based, captures and saves every mouse movement, clicks, keyboard strokes, and navigation. (see ClickTale.com [2]) The first two approaches do not take in-page navigation into account, only inter-page navigation. At this time, the only widely used analysis method is the inter-page interaction analysis, which is mostly performed using data mining algorithms. Distinctive Attributes of Blog Comments The blog comments have the following distinctive attributes that make an automatic ranking feasible: - Continuous user added content Comments are most often displayed sequentially below each other, in one column; approximately 2-6 comments are visible on the screen at a time - Popular blog entries might have several hundreds of comments All comments are shown on the same page or sometimes broken into several pages The user has to scroll up and down the screen to go through the comments The comments are in the order they are added to the blog Current Solutions for Comment Ranking There are three current approaches used for comment ranking: - No ranking: all comments are shown in the timely sequential order they are added to the blog entry. This is currently the most widely used approach. Human moderated/ranked comments: in this case, an authorized person with special access privileges moderate and rank the comments based on their intuition (see Slashdot.org as an example). User interaction/voting based ranking: this technique is implemented by providing generally two options for every comment: a positive vote and a negative vote option (some systems have an additional report inappropriate option as well). Users, who typically have to be logged in, can vote on others’ comments if they would like to. This method is used at Digg.com [3] and Amazon.com [4] for instance. Proposed Solution My proposed solution is an automatic ranking mechanism of comments based on an estimation of how much time the readers spend on viewing a particular comment: - The rank estimates how frequently the comment was read The frequency of reading a particular comment is estimated from the time the user spends on reading the comment and the length of the comment The time spent on reading a comment is estimated from the time the comment is shown on the browser screen We take advantage of the fact that only a few comments are shown on the screen at a time Possible Uses of Automatic Ranking An automatic comment ranking system could be used in several ways: - Show rank of comments with each comment so that readers can selectively read comments by considering their popularity ranking Display the comments in the order of their rank Keep the timely sequential order but hide or shrink the comments with low ranks Evaluation and Testing I will use two evaluation methods: - - Personal evaluation: this involves extracting real blog comments from other websites and applying the automatic ranking on those comments. I will read through the comments, as I would do with a real blog, several times, and see results of the ranking. This method is proposed because I do not have access to a popular blog with a massive reader basis for which I could deploy my solution. Simulation: develop a probabilistic user behavior model on comment reading, define the ranking model, generate fake comments with predefined user interest ranking. Run simulations using the models and compare the resulting ranks and how they are aligned with the predefined user interest rankings. Technology and Basic Architecture I will be using the following technology to implement the system: - Own Windows based development environment Development focus and testing on Firefox browser Apache web server MySQL relational database for storing experimental blog comments are rankings JavaScript running in the browser for determining how long the comments are visible on the screen. The script continuously tracks what comments are visible on the screen and sends periodic updates to the server. Asynchronous JavaScript calls to update the rank database PHP for generating HTML content, handling database access, and process data provided by the asynchronous calls Python scripting for processing other blog pages and extracting their comments for evaluation Python script for simulation Deliverables In the course of this project, I will provide the following deliverables: - A model of the user and the ranking mechanism Automatic blog comment ranking system implementation Working demo Simulation framework and results Final project deliverable Timeline Feb 12: Set up development environment (software and hardware) Feb 19: Design software architecture and database scheme Feb 26: Create database, generate blog entry and comments for use in development, start JavaScript code implementation and PHP code Mar 5: Basic JavaScript and PHP code finished Mar 12: Personal evaluation and refinement of implementation Mar 19: Develop user model, semi-formal definition of ranking method Mar 26: Simulation framework implementation starts Apr 2: Simulation framework ready Apr 9: Run simulations and compare results Apr 16: Refine implementation and execute required iterations Apr 23: Workshop, demo, ... Apr 30: ... and final deliverable Referenced Websites - [1] EyeTracking.com [2] ClickTale.com [3] Digg.com [4] Amazon.com