Preventing Content Hotlinking on the web CS8803AIA Rohit Sud Motivation Consider the page http://f1mail.rediff.com/quill/QuillPadWeb.html. It is a page that allows anyone to transliterate text from Hindi (a language) to English. What is so special about it? Well, it is part of a multilingual email package provided by rediff.com. But if you notice closely, anyone on the Internet can access this resource freely and use an iframe to embed it on his site. Worse, if I were to make a cool plug-in for Wordpress that would provide free transliteration facilities, I could simply hotlink the script from the rediff.com servers and use it as my own. If my plugin becomes really popular, it could end up using a substantial chunk of rediff.com's bandwidth. So this is not just a theft of intellectual property but also of web server resources! The aim of this proposal to come up with an effective way that is compliant with current technologies used on the web that would prevent hotlinking of data from web servers. Illegal hotlinking on the Internet has been a fundamental problem since the inception of the Internet. Not only does hotlinking waste the original publisher's server resources but also amounts to theft of not only content but also resources like bandwidth and computational power. But the very nature of the Internet (providing constant URLs to access information from anywhere) is the root of the problem. Current solutions to take care of hotlinking are naive and can be overcome easily. Any attempt to make the current system more robust to hotlinking results in many legitimate viewers of the content being locked from the content. In the proposal, we discuss a novel way to implement hotlinking protection on dynamically generated pages and evaluate the robustness of proposed technique. The novelty of the application is that it goes beyond the current access log based approach to a more general and maintainable approach. The solution will be provided initially as a Wordpress plug-in which can be used in blogs to protect content like images and flash videos from being hotlinked. The plug-in is specific to the particular software but the concept behind it is general enough to be used in other Internet applications. Related work It is surprising to see so little work being done on hotlink prevention. A search on Google Scholar http://scholar.google.com/scholar?q=hotlinking for hotlinking turns up just around 2 results that deal with content hotlinking and none which deal with how to prevent it. There is a lot of interest from corporations like Adobe who 1 are investing to make their servers aware of hotlinked content . The paper discusses various measures taken by the Flash Media Server to prevent hotlinking of videos apart from providing general protection by preventing videos from being copied or recorded on the client side. It does mention that by clever scripting techniques we can check requests and see if they are coming from the correct source but leaves the choice and the implementation to the user. Seeing the value that a hotlinking prevention technique can provide to the web community, it is worthwhile spending time to research ways to do it. Proposed work The proposed work aims at devising and developing a deployable prototype to prevent hotlinking. In the proposal we first discuss various strategies to counter hotlinking and their classification based on the deployment strategies. We discuss various strategies that are currently used and why they are not effective. 1. Server side methods - Hotlinkers thrive because they know that every object on the public web can be accessed through a unique URL which can be accessed from anywhere and still return the same object. To prevent this, people have used referrer matching before they serve data. Unfortunately this methods has a number of drawbacks: It is expensive - Checking the referrer for each and every HTTP request loads the server tremendously. Plus, there is no way to tell the server to do this once in a while as the server has to check every request to ascertain that it is coming from an allowed source. Referrer lists are tedious to maintain - The author of the proposal is also a webmaster of a popular 2 Formula One blog . Due to the exclusiveness of the content (photographs) posted on the site, the site is prone to be exploited by hotlinkers. The author has been using a .htaccess based approach to redirect users who access the hotlinked page to another page but the script has become 9.4KB large. Maintaining such a huge list takes a lot of effort especially with thousands of hits a day. To take care of huge lists people maintain a list of allowed referrers. This poses a problem as many a times browsers do not expose referrer information in the header which makes this method useless. 2. Client + Server side method - This involves the use of Javascript to perform validation before the user is allowed to view content. Usually this is coupled with a light server side component to transfer the information about the source of the request back to the server. This approach is used for another purpose by sites like rapidshare.de which couple their Javascript with their server to allow access to only those clients that pass authentication at both levels. In the proposed work, we propose a strategy to counter hotlinking based on rewriting urls periodically. The site administrator can suggest an interval after which the URL to the content changes and is automatically reflected back in the new content. The suggested approach can be strengthened using session cookie for browsers that provide this feature. Our investigation will involve figuring out client side says to make the process more robust and reliable. Plan of action The work make use of a Wordpress blog hosted on a PHP server running Apache. The deliverables for the project will be a deployable prototype of a php plug-in that would prevent content hotlinking from a Wordpress blog. The plug-in will be a php script that is a Wordpress plug-in. If time permits, we can investigate and develop a client side approach that works in tandem with the server script for authenticating the source of the request before serving the page. The proposed duration of the project is 4 weeks including testing and deployment. Evaluation and testing The script will be tested by real world deployment on f1chronicles.com. Referrer logs will be collected from the server to see the list of sites from where users came and the percentage of users who were able to view the content even though they were not authorized to. In addition we will also be reporting statistics regarding the number of users who were successfully redirected to the error page (and were thus caught by our system). We will be clearing the current .htaccess file to start from scratch and not ban anyone statically. Apart from the above, load analysis of the server will be done while using the proposed system. Specifically, we would be comparing the transactions per second and the latency for the system in 3 scenarios: When referrer blocking is used. When URL rewriting is used. The URL rewriting method will be further evaluated on the basis of the time for which the content URL is not changed and the effect of this parameter on the 2 factors mentioned above. As an extra step, the system will also be evaluated using a session cookie based authorization approach to prevent hotlinking. Technical issues The proposed system has advantages over currently deployed solutions in terms of server load and the ease with which it can be deployed. However, it is best used in conjunction with maintaining blocked referrer lists and allowed referrer lists. The reason is that most hotlinkers (as yet) are not savvy enough to write scripts to deploy on the server which overcome tighter hotlinking strategies. Many people still use the old-fashioned way of using absolute URLs for hotlinking. Future work will focus on a server side authentication mechanism that will release content only when the browser has identified itself as being referred from the correct site by using some challenge. This challenge might be as simple as providing the correct referrer or might be complicated enough to demand authentication before releasing potentially hotlinked content. Bibliography The .htaccess notes. Hotlink protection, Time Based Redirection, Cookie Password Protection, http://www.unixcities.com/apache/index1.html Video content protection measures enabled by Adobe® Flash® Media Server, http://www.adobe.com/devnet/flashmediaserver/articles/protecting_video_fms.pdf Securing Web Interfaces, http://www.smart-techie.com/blog/2007/06/creating-secure-web-interfaces/ 1 2 http://www.adobe.com/devnet/flashmediaserver/articles/protecting_video_fms.pdf http://f1chronicles.com