Preventing Content Hotlinking on the web CS8803AIA Rohit Sud

advertisement
Preventing Content Hotlinking on the web
CS8803AIA
Rohit Sud
Motivation
Consider the page http://f1mail.rediff.com/quill/QuillPadWeb.html. It is a page that allows anyone to transliterate
text from Hindi (a language) to English. What is so special about it? Well, it is part of a multilingual email
package provided by rediff.com. But if you notice closely, anyone on the Internet can access this resource freely
and use an iframe to embed it on his site. Worse, if I were to make a cool plug-in for Wordpress that would
provide free transliteration facilities, I could simply hotlink the script from the rediff.com servers and use it as my
own. If my plugin becomes really popular, it could end up using a substantial chunk of rediff.com's bandwidth. So
this is not just a theft of intellectual property but also of web server resources! The aim of this proposal to come
up with an effective way that is compliant with current technologies used on the web that would prevent
hotlinking of data from web servers.
Illegal hotlinking on the Internet has been a fundamental problem since the inception of the Internet. Not only
does hotlinking waste the original publisher's server resources but also amounts to theft of not only content but
also resources like bandwidth and computational power. But the very nature of the Internet (providing constant
URLs to access information from anywhere) is the root of the problem. Current solutions to take care of
hotlinking are naive and can be overcome easily. Any attempt to make the current system more robust to
hotlinking results in many legitimate viewers of the content being locked from the content. In the proposal, we
discuss a novel way to implement hotlinking protection on dynamically generated pages and evaluate the
robustness of proposed technique. The novelty of the application is that it goes beyond the current access log
based approach to a more general and maintainable approach.
The solution will be provided initially as a Wordpress plug-in which can be used in blogs to protect content like
images and flash videos from being hotlinked. The plug-in is specific to the particular software but the concept
behind it is general enough to be used in other Internet applications.
Related work
It is surprising to see so little work being done on hotlink prevention. A search on Google Scholar
http://scholar.google.com/scholar?q=hotlinking for hotlinking turns up just around 2 results that deal with content
hotlinking and none which deal with how to prevent it. There is a lot of interest from corporations like Adobe who
1
are investing to make their servers aware of hotlinked content . The paper discusses various measures taken by
the Flash Media Server to prevent hotlinking of videos apart from providing general protection by preventing
videos from being copied or recorded on the client side. It does mention that by clever scripting techniques we can
check requests and see if they are coming from the correct source but leaves the choice and the implementation to
the user. Seeing the value that a hotlinking prevention technique can provide to the web community, it is
worthwhile spending time to research ways to do it.
Proposed work
The proposed work aims at devising and developing a deployable prototype to prevent hotlinking.
In the proposal we first discuss various strategies to counter hotlinking and their classification based on the
deployment strategies. We discuss various strategies that are currently used and why they are not effective.
1. Server side methods - Hotlinkers thrive because they know that every object on the public web can be
accessed through a unique URL which can be accessed from anywhere and still return the same object. To
prevent this, people have used referrer matching before they serve data. Unfortunately this methods has a
number of drawbacks:
It is expensive - Checking the referrer for each and every HTTP request loads the server
tremendously. Plus, there is no way to tell the server to do this once in a while as the server has to
check every request to ascertain that it is coming from an allowed source.
Referrer lists are tedious to maintain - The author of the proposal is also a webmaster of a popular
2
Formula One blog . Due to the exclusiveness of the content (photographs) posted on the site, the site
is prone to be exploited by hotlinkers. The author has been using a .htaccess based approach to
redirect users who access the hotlinked page to another page but the script has become 9.4KB large.
Maintaining such a huge list takes a lot of effort especially with thousands of hits a day.
To take care of huge lists people maintain a list of allowed referrers. This poses a problem as many a
times browsers do not expose referrer information in the header which makes this method useless.
2. Client + Server side method - This involves the use of Javascript to perform validation before the user is
allowed to view content. Usually this is coupled with a light server side component to transfer the
information about the source of the request back to the server. This approach is used for another purpose by
sites like rapidshare.de which couple their Javascript with their server to allow access to only those clients
that pass authentication at both levels.
In the proposed work, we propose a strategy to counter hotlinking based on rewriting urls periodically. The site
administrator can suggest an interval after which the URL to the content changes and is automatically reflected
back in the new content. The suggested approach can be strengthened using session cookie for browsers that
provide this feature. Our investigation will involve figuring out client side says to make the process more robust
and reliable.
Plan of action
The work make use of a Wordpress blog hosted on a PHP server running Apache.
The deliverables for the project will be a deployable prototype of a php plug-in that would prevent content
hotlinking from a Wordpress blog. The plug-in will be a php script that is a Wordpress plug-in.
If time permits, we can investigate and develop a client side approach that works in tandem with the server script
for authenticating the source of the request before serving the page.
The proposed duration of the project is 4 weeks including testing and deployment.
Evaluation and testing
The script will be tested by real world deployment on f1chronicles.com. Referrer logs will be collected from the
server to see the list of sites from where users came and the percentage of users who were able to view the content
even though they were not authorized to. In addition we will also be reporting statistics regarding the number of
users who were successfully redirected to the error page (and were thus caught by our system). We will be
clearing the current .htaccess file to start from scratch and not ban anyone statically.
Apart from the above, load analysis of the server will be done while using the proposed system. Specifically, we
would be comparing the transactions per second and the latency for the system in 3 scenarios:
When referrer blocking is used.
When URL rewriting is used.
The URL rewriting method will be further evaluated on the basis of the time for which the content URL is not
changed and the effect of this parameter on the 2 factors mentioned above.
As an extra step, the system will also be evaluated using a session cookie based authorization approach to prevent
hotlinking.
Technical issues
The proposed system has advantages over currently deployed solutions in terms of server load and the ease with
which it can be deployed. However, it is best used in conjunction with maintaining blocked referrer lists and
allowed referrer lists. The reason is that most hotlinkers (as yet) are not savvy enough to write scripts to deploy on
the server which overcome tighter hotlinking strategies. Many people still use the old-fashioned way of using
absolute URLs for hotlinking. Future work will focus on a server side authentication mechanism that will release
content only when the browser has identified itself as being referred from the correct site by using some
challenge. This challenge might be as simple as providing the correct referrer or might be complicated enough to
demand authentication before releasing potentially hotlinked content.
Bibliography
The .htaccess notes. Hotlink protection, Time Based Redirection, Cookie Password Protection,
http://www.unixcities.com/apache/index1.html
Video content protection measures enabled by Adobe® Flash® Media Server,
http://www.adobe.com/devnet/flashmediaserver/articles/protecting_video_fms.pdf
Securing Web Interfaces, http://www.smart-techie.com/blog/2007/06/creating-secure-web-interfaces/
1
2
http://www.adobe.com/devnet/flashmediaserver/articles/protecting_video_fms.pdf
http://f1chronicles.com
Download