Google Hacking: An Intro for beginners

Google hacking is the process of employing complex search engine queries to locate sensitive information.

Because of various web server mis-configurations, sensitive information gets indexed by the search engines when spiders crawl them.The sensitive information may include:password files, confidential directories, logon portals,log files etc.

BASIC SEARCH

The most basic search involves searching for the required terms via the Google’s web interface. E.g

    • hacker• ethical hacker

The important thing to note here is that Google searches are case insensitive. Whether you search for (ethical hacker) or (ETHICAL HACKER) or (EtHiCaL HaCkEr), it provides you the same number of results.google case insensitivity\r\n\r\ngoogle case insensitivity1

google case insensitivity2

PHRASE SEARCH

Phrase search involves enclosing the required terms in double-quotes. Google searches for all the words in the phrase in the “exact” order you provide them. This is very useful when you are searching for a specific thing and want to omit extraneous results. Case insensitivity is maintained in phrase search too. E.g.

    • “hacker”• “ethical hacker”

There are certain common words like “a”, “and”, “the”, “for” etc. which Google ignores in basic search. These words are called stop words. Stop words, when used in the phrase search are not excluded by Google.

ADVANCED SEARCH

Google wildcard

Google treats asterisk (*) as a placeholder for any unknown term(s) and tries to find the best match(es) for it. It can be used both in basic as well as phrase search, but you have to separate it by a space from the preceding and succeeding words; e.g:

    • you can * multiple *Here, Google will try to find best match(es) for the wildcard.

Excluding words from search: the minus sign (-)

Sometimes you want to exclude pages containing certain words from the search result. You can do this by prefacing the minus (-) sign to the unwanted word. The minus sign should be preceded with a space and should be placed immediately before the unwanted term.A search such as (ethical hacker -cracker) will return pages containing ethical hacker but excluding the term cracker.If you want to exclude multiple terms, you can do so by placing the minus sign before each term. E.g

    • ethical hacker -cracker -blackhat• computer virus -antivirus -antispyware

Searching as is: the plus sign (+)

Sometimes you want to include stop words in your search. One way to do this is by using them in a phrase search i.e enclosing them in double quotes. Another way is to place a plus (+) sign before the stop word. The plus sign tells Google to include the word succeeding it. Similar to the minus sign, the plus sign should be placed immediately before the word you want included and should be preceded by a space.E.g

    • +this +is +a test
    From this example, you can see that there”s no limit to the number of plus sign you can use in a query.

Google”s Boolean operators

Google allows you to use three Boolean operators: AND, OR and NOT

    • If you want to search for pages containing the terms;”google”, ”hacking” and ”tutorial”, you can construct your query as:-
      • google AND hacking AND tutorial
      • The query (google | microsoft) returns all pages containing either Google or Microsoft or both.
  • The AND operator

    AND operator is used to search multiple terms.E.g

    Watch the above query carefully and you will see that it is just the basic Google query. Google includes the AND operator by default; you do not have to use it.

    The NOT operator

    The NOT operator is used to exclude words from a search.The NOT operator is not supported by Google. Instead, Google uses the minus sign to exclude terms.

    The OR operator

    The default Google search employs AND operation. You can override this functionality using the OR operator. The OR operator (OR is used in all caps) tells Google to locate either one of several words.                    You can use the pipe symbol (|) instead of OR to perform OR operation.E.g

Query string length limit

Neglecting the stop words, you can search only up to 32 words in a single Google query. Google ignores any words after the first 32 words (excluding stop words) and returns a message.Word limit message

153,000,000 Results!..really?

Though Google claims to find thousands – or millions – of results for any query, it lets you view only the first 1,000 results. If you try to go beyond the first thousand results, Google displays an error message.Result limit message

ADVANCED OPERATORS

Google provides a myriad of additional operators to enhance your search (or hacking!) experience.We will cover some of the most useful operators here.

All the advanced Google operators have the syntax- operator:search_term(s)

• There should not be any space between the operator, the colon and the search term.

• The search term can be a single term or a phrase.

Searching within a domain: site operator

Syntax: site:Domain

This is perhaps the most useful Google operator for reconnaissance. The site operator is used to limit search to a particular domain. E.g

    • site:ethicalhacker.net
    This will return pages only from ethicalhacker.net
    • site:sans.org training
    This will search only ”sans.org” for the term ”training”.

You can also exclude results from specific subdomains with the help of minus operator. E.g

    • site:sans.org training -site:www.sans.org
    This will search ”sans.org” for the term ”training” but omits results from the subdomain ”www”

Searching the title: intitle and allintitle operators

Syntax: intitle:search_term

Syntax: allintitle:search_term(s)

The intitle operator is used to search the title of the pages. E.g\r\n

    • intitle:”google hacking”
    This will list all the pages with ”google hacking” somewhere in their title.
    It can also be combined with the site operator to limit search to a specific domain; e.g
    • site:ethicalhacker.net intitle:”google hacking”

Locating directory listings

One of the most sinister uses of the intitle operator is in locating directory listings. Directory listings include the phrase “index of” in their title. So, we can search for (intitle:”index of”) or (intitle:index.of) to locate all the directory listings indexed by Google.

The period (.) in ”index.of” is the wildcard for single character.

You can also use this operator to search for password files; as,

    • intitle:Index.of etc shadow
    This will search for UNIX /etc/shadow password files

The allintitle is also used to match the title of a webpage, but it searches for all the words that follow it. E.g

    • allintitle:penetration testing
    This query will search for webpages with the words ”penetration” and ”testing”. Notice that unlike the intitle operator it does not require multiple words to be enclosed in quotes.

Similar to the intitle operator, the allintitle operator can also be used to discover index directories. This operator does not gel well with other advanced operators; consequently, you should use the intitle operator instead.

Searching the URL: inurl and allinurl operator Syntax: inurl:search_term                                                                                                   Syntax: allinurl:search_term(s)

The inurl operator is used to locate URLs containing the search term.E.g

    • inurl:hacker This will list all the URLs containing the term ”hacker”.

The inurl: operator can also be combined with the site operator to search URLs associated with a specific domain. It can also be utilized to discover vulnerable scripts if the script names are included in the URL.

Akin to the inurl operator, allinurl is also used to match the URL of a webpage, but it searches for all the words following it. The allinurl operator also does not combine well with other advanced operators and its use should be refrained.

Searching for a specific file type: filetype and ext operators Syntax: filetype:type_of_file Syntax: ext:type_of_file

Google supports two operators to search for a specific type of file based on the file extension: filetype and ext. You can use either of the two operators.E.g

    • filetype:pdf “google hacking”                                                                                  • ext:pdf “google hacking” This will list all the .pdf files comprising the term ‘google hacking’

You can use both these operators together with the site operator to search for specific type of files in a particular domain.E.g

    • site:sans.org ext:doc training This will search ”sans.org” for all the Microsoft word document files comprising the term ”training”.

Links to a URL: link operator Syntax: link:URL

The link operator is used to find all the webpages that have links to the specified URL. E.g

    •                                                                                       • link:www.ethicalhacker.net

Viewing modified pages: cache operator Syntax: cache:URL

Whenever Googlebot crawls a webpage, it caches a snapshot of that page. This cached version could be very useful if that page is recently deleted or inaccessible owing to other internet problems. If the page is deleted, you can view its cached version which is stored in Google”s server.

Every result that Google hands you over for your query, it also provides a link to the cached version of that page. You can browse the cached version via the cached link below the snippet for that result.

Cache link

Another way to view the cached page is via the Google cache operator. E.g

    • cache:ethicalhacker.net This will show you the cached version of the page when Googlebot last crawled it. The current version of the page could be different than the cached version.

Google cache could be very useful for an attacker if the website has modified their original content. It helps to view the old content of the page.

Google cache can also be employed to use Google as a proxy!. This, and more advanced tricks will be covered in separate article.

GOOGLE HACKING DEFENSES

Disable directory listings Directory listings give away too much information than a visitor needs. Disabling it is always a good option.

Noindex your confidential files Web crawlers can be forbidden from crawling a webpage using the ”noindex” meta tag or by putting that URL in the /robots.txt file.                                              Note that /robots.txt is publicly available; it shouldn”t be used for hiding information as it can be viewed by anyone.

Noarchive to forbid caching You can prevent search engines from caching a webpage by employing the ”noarchive” meta tag.

Employ Google-fu against your own site You could perform these advanced searches against your website to discover any vulnerabilities.

Additionally, you can make use of automated tools like Wikto and Sitedigger which will thoroughly scan your site.

Removing an indexed page\r\nIf your confidential page has already been indexed by Google, you van remove it via Google”s URL removal tool.

ADDITIONAL RESOURCES

• Johnny “ihackstuff” Long maintains a Google Hacking Database– a list of numerous advanced Google queries which can be used to discover vulnerable targets.

• Johnny Long is also the author of Google Hacking for Penetration Testers which is a must for any serious Google hacker.

• SANS institute maintains a very handy Google cheat sheet

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: