How to Prevent Google from Indexing Certain Web Pages
In search engine optimization, the typical goal is to get as many pages in your website indexed and crawled by search engines like Google.
The common misconception is that doing so could result in better SEO rankings. However, that may not always be the case. Oftentimes, it is necessary to deliberately prevent search engines from indexing certain pages from your website to boost SEO. One study found that organic search traffic increased by 22% after removing duplicate web pages, while Moz reported a 13.7% increase in organic search traffic after removing low-value pages.
Web Pages That Don’t Need to Be Indexed
As mentioned, not all pages in your website need to be indexed by search engines. Typically, these include, but are not limited to, the following:
- Landing pages for ads
- Thank-you pages
- Privacy and policy pages
- Admin pages
- Duplicate pages (e.g., similar content posted across multiple websites owned by one company)
- Low-value pages (e.g., outdated content from years back, but something valuable enough not to be deleted from your website)
Prior to de-indexing, it’s important to conduct a thorough content audit of your website so you have a systematic approach in determining which pages to include and exclude.
How to Prevent Google from Indexing Certain Web Pages
There are four ways to de-index web pages from search engines: a “noindex” metatag, an X-Robots-Tag, a robots.txt file, and through Google Webmaster Tools.
1. Using a “noindex” metatag
The most effective and easiest tool for preventing Google from indexing certain web pages is the “noindex” metatag. Basically, it’s a directive that tells search engine crawlers to not index a web page, and therefore subsequently be not shown in search engine results.
How to add a “noindex” metatag:
All you need to do is to insert the following tag in the <head> section of a page’s HTML markup:
<meta name=”robots” content=”noindex”>
Depending on your content management system (CMS), inserting this metatag should be fairly easy. For CMSs such as WordPress that don’t allow users to access the source code, use a plugin like Yoast SEO. A thing to note here is that you need to do this to every page that you wish to de-index.
Additionally, if you want search engines to both de-index your web page and not follow the links on that page (such in the case of thank-you pages where you do not want search engines to index the link to your offer), use the “noindex” with the “nofollow” metatag:
<meta name=”robots” content=”noindex,nofollow”>
2. Using an X-Robots-Tag HTTP header
Alternatively, you can use an X-Robots-Tag, which you add to the HTTP response header of a given URL. It has basically the same effect as a “noindex” tag, but with the additional options to specify conditions for different search engines. For more information, please see Google’s guide here.
How to add an X-Robots-Tag:
Depending on the web browser you use, it can be pretty tricky to find and edit the HTTP response header. For Google Chrome, you can use developer tools like ModHeader or Modify Header Value. Here are examples of X-Robots-Tag for specific functions:
- To de-index a web page:
- To set different de-indexing rules for different search engines:
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
3. Using a robots.txt file
A robots.txt file is mainly used to manage search engine crawler traffic from overloading your website with requests. It must be noted, however, that this type of file is not meant to hide web pages from Google; rather, it is used to prevent images, videos, and other media files from appearing in search results.
How to use robots.txt file to hide media files from Google:
Using robots.txt is fairly technical. Basically, you need to use a text editor to create a standard ASCII or UTF-8 text file, and then add that file to the root folder of your website. To learn more on how to create a robots.txt file, check out Google’s guide here. Google has also created separate guides for hiding certain media files from appearing in search results:
4. Using Google Webmaster Tools
You can also choose to temporarily block pages from Google search results using Google Webmaster’s Remove URLs Tool. Please note that this is only applicable to Google; other search engines have specific tools of their own. It’s also important to consider that this removal is temporary. For permanent removal of web pages from search engine results, view Google’s instructions here.
How to use Google Remove URL tools to temporarily exclude pages:
The procedure is quite easy. Open the Remove URLs Tool and select a property in Search Console that you own. Select Temporarily Hide and enter the page URL. Afterwards, choose Clear URL from cache and temporarily remove from Search. This hides the page from Google search results for 90 days, and also clears the cached copy of the page and snippets from the Google index. For more information, check out Google’s guide here.
Wrapping It Up
It may take time for Google to receive your request for de-indexing. It often takes a few weeks for the change to kick in. If you notice that your page is still appearing in Google’s search results, it’s most probably because Google hasn’t crawled your site since your request. You can request for Google to recrawl your page using the Fetch as Google tool.
If you want to know more, or if you need help with any of your SEO needs, Ilfusion has the expertise and experience to lend you a hand. Give us a call at 888-420-5115, or send us an email to [email protected].Tags: crawl, google search, google+, index, metatags, search engines, SEO