Google Hacking University of Sunderland CSEM02 Harry R Erwin, PhD Peter Dunne, PhD Basics • • • • • Web Search Newsgroups Images Preferences Language Tools Google Queries • • • • • • Non-case sensitive * in a query stands for a word ‘.’ in a query is a single character wildcard Automatic stemming Ten-word limit AND (+) is assumed, OR (|) and NOT (-) must be entered • “” for a phrase More Queries • You can control the language of the pages and the language of the reports • You can restrict the search to specific countries Controlling Searches • • • • • • • • • • Intitle, allintitle Inurl, allinurl Filetype Allintext Site Link Inanchor Daterange Cache Info • • • • • • • • • • Related Phonebook Rphonebook Bphonebook Author Group Msgid Insubject Stocks Define Controlling Searches (II) • These operators can be used to restrict searches. • To restrict the search to the university: site:sunderland.ac.uk • Or to search for seventh moon merlot in the uk: “seventh moon” merlot site:uk Typical Filetypes • • • • • • • Pdf Ps Xls Ppt Doc Rtf Txt Why Google • You access Google, not the original website. • Most crackers access any site, even Google via a proxy server. • Why? If you access the cached web page and it contains images, you will get the images from the original site. Directory Listings • • • • • • • Search for intitle:index.of Or intitle:index.of “parent directory” Or intitle:index.of name size Or intitle:index.of inurl:admin Or intitle:index.of filename This can then lead to a directory traversal Look for filetype:bak, too, particularly if you want to expose sql data generated on the fly Commonly Available Sensitive Information • • • • • • • HR files Helpdesk files Job listings Company information Employee names Personal websites and blogs E-mail and e-mail addresses Network Mapping • Site:domain name • Site crawling, particularly by indicating negative searches for known domains • Lynx is convenient if you want lots of hits: – lynx -dump “http://www.google.com/search?\ – q=site:name+-knownsite&num=100” >\ – test.html • Or use a Perl script with the Google API Link Mapping • Explore the target site to see what it links to. The owners of the linked sites may be trusted and yet have weak security. • The link operator supports this kind of search. • Also check the newsgroups for questions from people at the organization. Web-Enabled Network Devices • The Google webspider often encounters web-enabled devices. These allow an administrator to query their status or manage their configuration using a web browser. • You may also be able to access network statistics this way. Searches to Worry About • • • • • Site: Intitle:index.of Error|warning Login|logon Username|userid|empl oyee.ID| “your username is” • Password|passcode| “your password is” • Admin|administrator • -ext:html -ext:htm -ext:shtml -ext:asp -ext:php • Inurl:temp|inurl:tmp| inurl:backup|inurl:bak • Intranet|help.desk Protecting Yourselves • • • • • Solid security policy Public web servers are Public! Disable directory listings Block crawlers with robots.txt <META NAME=“ROBOTS” CONTENT=“NOARCHIVE”> • NOSNIPPET is similar. More Protection • Passwords • Delete anything you don’t need from the standard webserver configuration • Keep your system patched. • Hack yourself • If sensitive data gets into Google, use the URL removal tools to delete it.