Master Project Proposal PhishLurk: A Mechanism for Classifying and Blocking Phishing Websites by: Mohammed Alqahtani 1. Committee Members and Signatures: Approved by Date __________________________________ Advisor: Dr. Edward Chow _____________ __________________________________ Committee member: Dr. Albert Glock _____________ __________________________________ Committee member: Dr. Chuan Yue _____________ Introduction Phishing is a cybercrime done by person or company to steal highly sensitive information such as usernames, passwords and credit card details. Mostly, phishing attacks come into two types emails and webpages that spoof or lure the user to enter sensitive information. On other words, phishing is directing users to fraudulent web sites in order to get the sensitive information. Users are increasingly using the internet to do their daily task such bills payment, banking, socializing. As result of there are more and more personal information will be used for different purposes which mean expand the surface of target for phishing. sample of a phishing website (source: www.phishtank.com) Phishing has been a major concern in the IT security. In the U.S., companies lose more than $2 billion every year as result of phishing attacks [6]. Phishing works because of many reason, one of the most common reason is the users’ carelessness and the users ignorance about how to differentiate whether the website is phishing or not [1] . Moreover, there are long lists of website that are hard to detect. There are many research have been proposed focusing on anti-phishing, using different methods of filtering and detecting such as black list, plugs-in, extensions and toolbars for browsers [2]. Desktop browsers’ Developers try hard to provide a solid protection such as warning the user by displaying a box massage if the website potentially is a phishing websites or invalid or expired SSL certificates. Mostly a third party and black-list are involved to display and identify phishing websites [3]. Recently, Users started to have more varieties of access to surf the internet for example notebooks, PC game, handhelds and smartphones , However; using more varieties of devises made in different abilities and features leads to complicate and sophisticate providing a full protection, especially from phishing attacks methods. Yet there is no such a complete protection. One of the most used devices is smartphones. According to a survey of ComScore, Inc. the number of smartphones subscribers increased 60 percent in 2010 compared to 2009 [4]. Another report by Nielsen Company indicates that by 2011 half of cell-phones users would be using smartphones [5]. Figure explains the global rapid growth of smartphones market 2009 - 2010 Users started to use these types of access to do their activities and tasks due to the advantages they provide i.e. smartphone preferred to use because of the easiness, flexibility and mobility that smartphone have. Some activates such as online banking, paying bills, online shopping and emailing [5] demand users need to enter sensitive information to complete the authentication and authorization process, sensitive information could be credit-cards numbers, password and usernames. In fact, having many types of devices to access the internet would expand the surface for phishing attackers and complicate the protection. Rellated Work PhishTank is a unprofitable project aimed to build dependable database of phishing websites [7], the project is to collect, verify, track and share phishing data. In order to report a phishing links, the user has to be register as a member. So the admin can learn and judge each member's contribution. The phishing websites can be reported and submitted via emails or PhishTank’s websites. The data are verified by committee after they are submitted by the members. Phishtank’s database can be shared via the API. The links in the original database are only classified as “phishing” and “unknown”. We will classify the phishing sites in PhishTank database into more precise categories and used them in the proposed project. PhishTank Has been working effectively to fight against phishing attacks, thousands of phishing links are monthly detected and verified as valid phishing sites [9], using the public’s effort and contribution to build a trustworthy and dependable database that is open for everyone to use and share. As result of that several well know organizations and browsers started using PhishTank database such as Yahoo mail, Opera, MaCafee, Mozilla Firefox [10]. In my prototype, I use PhishTank as a phishing URLs’ provider. In the paper titled “Large-Scale Automatic Classification of Phishing Pages [2]”, Colin Whittaker, Brian Ryner, and Marria Nazif proposed an automatic classifier to detect phishing websites. The classifier maintains Google’s phishing blacklist automatically and analyzes millions of pages a day including examining the URL and the contents to verify whether the page is phishing or not. The paper proposed a classifier works automatically with large-scale system which will maintain a false positive rate below 0.1% and reduce the life time of phishing page. They used machine learning technique to analyze the web page content. In my project, the determination is based on Phishtank’s blacklist, However; I aim to propose a methodology for classification the phishing website. My ultimate goal is not to determine whether the page phishing or not, PhishLurk determines depending on Phishtank’s blacklist, but to provide a new method to classify phishing links and considering two factors: consuming as less memory and screen space as possible which eventually improve the overall classification efficiency. In the paper titled “PhishGuard: A Browser Plug-in for Protection from Phishing [8], Joshi, Y. Saklikar, S. Das, D. Saha, proposed a mechanism to detect a forged website via submitting fake credentials before the actual credentials during the login process of a website, then the server-side analyzes the responses of the submissions of all those credentials to determine whether the website is phishing or not. The mechanism was implemented on browsers side “user-side” as plug-in of Mozilla FireFox, However; the mechanism only detects during the log-in process for a user. If another user log-in to the same phishing website, he will goes through the same detection process. In my project, if the website reported as phishing site, no other user can get access, the reported link will be blocked, to the reported website. In the paper titled “BogusBiter: A Transparent Protection Against Phishing Attacks [17]” Chuan Yue and Haining Wang proposed a client-side tool called BogusBiter that send a large number of bogus credentials to suspected phishing sites and hides the real credentials from phishers . BogusBiter is unique it also helps legitimate web sites to detect stolen credentials a timely manner by having the phisher to verify the credentials he has collected at that legitimate web site. BogusBiter was implanted as Firefox 2 extension , however; My project is different that uses server side to provide the protection . Most popular browsers provides a phishing filter that warns users from malicious websites including phishing websites. Filters mainly depend on certain lists to detect the malicious websites. IE7 used “Phishing Filter” that has been improved to be SmartScreen Filter in later version of IE due to the weak protection phishing filter provides[15]. In IE 8 and IE 9 "SmartScreen Filter" verifies the visited websites based on the updated list of malicious websites that Microsoft created and updated continuously [11] [12]. Similar to IE, Safari browser has filters checks the websites while the user browsing against a list of phishing sites. After the warning of Paypal to its members that Safari is not safe for their service [13], Safari started to use an extended validation certificates to support analyzing websites [14]. In order to have a safe browsing, Safari’s users need to use both. Firefox earlier versions of Firefox take advantage of ant-phishing companies such as GeoTrust or the Phish-Tank, using their list to support identifying malicious websites. The current version of Firefox has adopted Google's antiphishing program to support its phishing protection. Many research projects have proposed mechanisms that implemented as browser plugs-in and tool-bar against phishing attack. The main problem with plugs-in and tool bar is the need for users’ cooperation. Users may not cooperate and install the tool. Some users occasionally prefer to turn their filter off to brows faster[16]. Plugs-in and tool bar in some devices may not be as effective as it in desktop browser due to the limitation in the performance and the screen space as the case in smartphones. PhishIurk’s mechanism is aimed to use as less space and memory as possible in the Client-side, using the server side to provide the classification and protection of phishing links. So even the phishing protection was disabled on client-side PhishIurk still provide protected and classified links to the user. Proposed Project I propose a mechanism to protect the user from phishing attacks, the mechanism assesses and classifies the sites, based on Phishtank’s blacklist, from the server side and using color scheme. The system also utilizes less screen space and memory to be work even with small sizes devices. The mechanism classifies the links into four types by using coloring scheme that use less space and requires less memory. I expanded the classification that used in Phishtank to be as following : Phishing link (Red): is an absolute phishing link. The link will be disabled, so even if the user is ignorant or surfing carelessly as we saw in the survey [1], there is no way to access the link. Unknown link (Orange): suspicious link, it might potentially be phishing link, it could be link indicate the same name or part of a real company's name asking the user to provide sensitive information. The link is submitted as phishing link but it hasn’t be verified yet. The user can click and get access to this type in their responsibility. The user gets warned before accessing the link. Unlikely link (Gray): The same as unknown link, the difference is when the black list get a report about link that unlikely to be a phishing link for example websites that have TopLevel Domain “TLD” ends with (.edu or .gov), they are unlikely to be used by hackers website because their specialized for official use of organizations. The link will maintain to be unlikely until gets verified by Phishtank. Note that it might be someone reported the unlikely site trying to denigrate the organizations, it is fair to maintain the unlikely status until it gets verified and changed to a Safe link, or the site might actually be attacked by Cross-site scripting attacks or SQL injection attack. Global Phishing Survey: Trends and Domain Name Use - April 2011 As we see in the above chart, 60% phishing attacks was lunched by TLDs: .COM, .NET, .TK, and .CC. Safe Link (Green): These are safe links, totally not phishing. The user can access the link without triggering warning messages. Providing the protection from the server side and using the coloring scheme for classification would safe much memory and more space on the client-side. The mechanism determines whether the website is phishing or not based on provided black-list of phishing website that is periodically updated to achieve the possible maximum accuracy. The plan In this project, I will develop an anti-phishing search web site called “PhishLurk” using PHP and CSS that responds to the user search inquires with classified protective links. In case the website was a phishing link, the engine would classify it as risky, disable it, and warn the user by producing a red link. If the link was classified as “unknown” or “suspicious”, it would give users the choice whether to access the link or not, and warn them about the impact or consequences. If the link was classified “unlikely”, it would give the user the choice whether to access the link or not and warning to take the responsibility and warn that the link unlikely to be phishing, the link might be hacked or there is someone try to denigrate the organizations of the website. The last case when the link has no risk or suspicious note, the engine would classify it as a safe link. I use CSS to help classifying the links because it doesn’t consume a lot of screen resources or demand extensive computation. Beside processing the classification and providing the safe results to the user, PhishLurk system reads and updates the blacklist periodically from PhishTank.com to have the most up-todate results. . PhishLurk’s Design Metric for Evaluating the PhishLurk System The proposed PhishLurk system can be evaluated by examining the effectiveness of its usage by the users and the processing overhead. We will conduct a survey on the usage of PhishLurk and summarize the feedbacks. Stress tests will be performed on the system and collecting the statistics about the average processing time overheads for classifying the url, and modifying the links. Deliverables The working software prototype, PhishLurk, with user guide and installation manual. A master report documenting the design and implementation of PhishLurk, implementation choices and their performance evaluation, and the lessons learned. References: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Rachna Dhamija, J. D. Tygar, and Marti Hearst. 2006. Why phishing works. In Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI '06), Rebecca Grinter, Thomas Rodden, Paul Aoki, Ed Cutrell, Robin Jeffries, and Gary Olson (Eds.). ACM, New York, NY, USA, 581-590. DOI=10.1145/1124772.1124861 http://doi.acm.org/10.1145/1124772.1124861. Aaron Blum, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing URL detection using online learning. In <em>Proceedings of the 3rd ACM workshop on Artificial intelligence and security</em> (AISec '10). ACM, New York, NY, USA, 54-60. DOI=10.1145/1866423.1866434 http://doi.acm.org/10.1145/1866423.1866434 Gross, Ben. "Smartphone Anti-Phishing Protection Leaves Much to Be Desired | Messaging News." Messaging News | The Technology of Email and Instant Messaging. 26 Feb. 2010. Web. <http://www.messagingnews.com/story/smartphone-antiphishing-protection-leaves-much-be-desired>. ComScore, Inc. "Smartphone Subscribers Now Comprise Majority of Mobile Browser and Application Users in U.S." ComScore, Inc. - Measuring the Digital World. ComScore, Inc, 1 Oct. 2010. <http://www.comscore.com/Press_Events/Press_Releases/2010/10/Smartphone_Subscribers_Now_Comprise_Majority_of_Mo bile_Browser_and_Application_Users_in_U.S>. Entner, Roger. "Smartphones to Overtake Feature Phones in U.S. by 2011." Http://www.nielsen.com. Nielsen Wire, 26 Mar. 2010. Web. <http://blog.nielsen.com/nielsenwire/consumer/smartphones-to-overtake-feature-phones-in-u-s-by-2011/>. Kerstein, Paul L. "How Can We Stop Phishing and Pharming Scams?" CSO Online - Security and Risk. CSO Magazine Security and Risk, 19 July 2005. Web. <http://www.csoonline.com/article/220491/how-can-we-stop-phishing-and-pharmingscams->. OpenDNS, LLC. PhishTank: an Anti-phishing Site. [Online]. http://www.phishtank.com. Joshi, Y.; Saklikar, S.; Das, D.; Saha, S.; , "PhishGuard: A browser plug-in for protection from phishing," Internet Multimedia Services Architecture and Applications, 2008. IMSAA 2008. 2nd International Conference on , vol., no., pp.1-6, 10-12 Dec. 2008 doi: 10.1109/IMSAA.2008.4753929, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4753929&isnumber=4753904 PhishTank - Statistics about phishing activity and PhishTank usage , http://www.phishtank.com/stats.php PhishTank, Friends of PhishTank, http://www.phishtank.com/friends.php SmartScreen Filter: Frequently Asked Questions." Windows Home - Microsoft Windows. <http://windows.microsoft.com/enUS/windows7/SmartScreen-Filter-frequently-asked-questions-IE9>. "SmartScreen Filter - Microsoft Windows." Windows Home - Microsoft Windows. Web. <http://windows.microsoft.com/enUS/internet-explorer/products/ie-9/features/smartscreen-filter>. Apple - Safari - Learn about the Features Available in Safari." Apple. <http://www.apple.com/ca/safari/features.html>. TECH.BLORGE- Top Technology news, Paypal warns buyers to avoid Safari browser from Apple - < http://tech.blorge.com/Structure:%20/2008/02/28/paypal-warns-buyers-to-avoid-safari-browser-from-apple/ > "Firefox 2 Phishing Protection Effectiveness Testing." Home of the Mozilla Project. <http://www.mozilla.org/security/phishing-test.html>. "AVIRA News - Anti-Virus Users Are Restless, Avira Survey Finds." Antivirus Software Solutions for Home and for Business. <http://www.avira.com/en/press-details/nid/482/>. Chuan Yue and Haining Wang. 2010. BogusBiter: A transparent protection against phishing attacks. ACM Trans. Internet Technol. 10, 2, Article 6 (June 2010), 31 pages. DOI=10.1145/1754393.1754395 http://doi.acm.org/10.1145/1754393.1754395