Automatic Blog Comment Ranking Summary

advertisement
Szabolcs Palinko (1 person group)
Advanced Internet Application Development
Ling Liu, CS8803, 2007 Spring
Project Proposal
Automatic Blog Comment Ranking
Summary
User added content to websites have become extremely popular with the possibility to
add comments to blogs or reviews to online shops. Popular blog entries often have
several hundreds of comments, and users can skim through them sequentially to find the
ones that they are interested in. Although there are some techniques to manually rank
these comments—by a moderator or by user voting,— there is no current algorithm in
use for automatically ranking the comments. I am proposing an automatic comment
ranking technique, based on the distinctive attributes of blog comments, that ranks
comments by their estimated popularity: the comments that are read more and therefore
viewed longer (stay on the screen longer) are ranked higher. This automatic ranking
technique that leverages the specialty of comments and how they are displayed on
websites helps ranking the comments without further user or moderator interaction.
User Behavior Analysis for Websites
There are currently three basic approaches for analyzing user behavior on websites:
-
Analysis of inter-page interaction: analysis is based on pages viewed, links
followed, and time between switching to different pages
Eye tracking: mainly used for usability evaluation purposes in lab settings
because it requires camera and special software while user interacts with the
system (EyeTracking.com [1])
Tracking and saving the whole user interaction with the website for later
replay and analysis: JavaScript based, captures and saves every mouse movement,
clicks, keyboard strokes, and navigation. (see ClickTale.com [2])
The first two approaches do not take in-page navigation into account, only inter-page
navigation. At this time, the only widely used analysis method is the inter-page
interaction analysis, which is mostly performed using data mining algorithms.
Distinctive Attributes of Blog Comments
The blog comments have the following distinctive attributes that make an automatic
ranking feasible:
-
Continuous user added content
Comments are most often displayed sequentially below each other, in one
column; approximately 2-6 comments are visible on the screen at a time
-
Popular blog entries might have several hundreds of comments
All comments are shown on the same page or sometimes broken into several
pages
The user has to scroll up and down the screen to go through the comments
The comments are in the order they are added to the blog
Current Solutions for Comment Ranking
There are three current approaches used for comment ranking:
-
No ranking: all comments are shown in the timely sequential order they are
added to the blog entry. This is currently the most widely used approach.
Human moderated/ranked comments: in this case, an authorized person with
special access privileges moderate and rank the comments based on their intuition
(see Slashdot.org as an example).
User interaction/voting based ranking: this technique is implemented by
providing generally two options for every comment: a positive vote and a
negative vote option (some systems have an additional report inappropriate option
as well). Users, who typically have to be logged in, can vote on others’ comments
if they would like to. This method is used at Digg.com [3] and Amazon.com [4]
for instance.
Proposed Solution
My proposed solution is an automatic ranking mechanism of comments based on an
estimation of how much time the readers spend on viewing a particular comment:
-
The rank estimates how frequently the comment was read
The frequency of reading a particular comment is estimated from the time the user
spends on reading the comment and the length of the comment
The time spent on reading a comment is estimated from the time the comment is
shown on the browser screen
We take advantage of the fact that only a few comments are shown on the screen
at a time
Possible Uses of Automatic Ranking
An automatic comment ranking system could be used in several ways:
-
Show rank of comments with each comment so that readers can selectively read
comments by considering their popularity ranking
Display the comments in the order of their rank
Keep the timely sequential order but hide or shrink the comments with low ranks
Evaluation and Testing
I will use two evaluation methods:
-
-
Personal evaluation: this involves extracting real blog comments from other
websites and applying the automatic ranking on those comments. I will read
through the comments, as I would do with a real blog, several times, and see
results of the ranking. This method is proposed because I do not have access to a
popular blog with a massive reader basis for which I could deploy my solution.
Simulation: develop a probabilistic user behavior model on comment reading,
define the ranking model, generate fake comments with predefined user interest
ranking. Run simulations using the models and compare the resulting ranks and
how they are aligned with the predefined user interest rankings.
Technology and Basic Architecture
I will be using the following technology to implement the system:
-
Own Windows based development environment
Development focus and testing on Firefox browser
Apache web server
MySQL relational database for storing experimental blog comments are rankings
JavaScript running in the browser for determining how long the comments are
visible on the screen. The script continuously tracks what comments are visible on
the screen and sends periodic updates to the server.
Asynchronous JavaScript calls to update the rank database
PHP for generating HTML content, handling database access, and process data
provided by the asynchronous calls
Python scripting for processing other blog pages and extracting their comments
for evaluation
Python script for simulation
Deliverables
In the course of this project, I will provide the following deliverables:
-
A model of the user and the ranking mechanism
Automatic blog comment ranking system implementation
Working demo
Simulation framework and results
Final project deliverable
Timeline
Feb 12: Set up development environment (software and hardware)
Feb 19: Design software architecture and database scheme
Feb 26: Create database, generate blog entry and comments for use in development, start
JavaScript code implementation and PHP code
Mar 5: Basic JavaScript and PHP code finished
Mar 12: Personal evaluation and refinement of implementation
Mar 19: Develop user model, semi-formal definition of ranking method
Mar 26: Simulation framework implementation starts
Apr 2: Simulation framework ready
Apr 9: Run simulations and compare results
Apr 16: Refine implementation and execute required iterations
Apr 23: Workshop, demo, ...
Apr 30: ... and final deliverable
Referenced Websites
-
[1] EyeTracking.com
[2] ClickTale.com
[3] Digg.com
[4] Amazon.com
Download