SEO’s Guide To Improving Your Website Using Index Coverage Report

Table of Content

Seos Guide To Improving Your Website Using Index Coverage Report

SEO

If SEO is your exam, Google Search Console is its report card.

A report card that shows you where you underperformed and why your website isn’t ranking like is supposed to on search engines.

It includes various technical SEO parameters that help you make valuable decisions. and amend key mistakes that are holding your site back from ranking on search engines.

One of the most important applications on the Search Engine Console is its reports that give you insights into your web page’s organic search performance and how you can optimise the same.

One of those reports is the Index Coverage Report.

What is an Index Coverage Report?

Index Coverage Report is a feature in the Google Search Console that provides information to webmasters about the indexing process by Google. It lists the status of each page that has been visited or accessed by Google bots.

This information is vital because indexing is a key factor to consider when we talk about technical SEO. It is how Google recognises web pages and considers crawling them to rank on search engines.

The index report shows each page based on its status and if they have been validated and added to the index. It also shows the underlying issues and errors plaguing your web pages and hampering your website growth. You check this report as a part of your regular SEO audit and ensure crawl optimisation.

What Does an Index Coverage Report Include?

Google Search Console’s Index Coverage Report is divided into four sections where pages are segregated into the status of their indexing and also the date on which the report was last updated.

Errors

This section includes the web pages with errors preventing them from being indexed. They won’t show up on Search Engine Results Pages and as a result, have no organic traffic. There are various forms of errors such as errors from redirect, 404 errors, etc. We’ll discuss them in detail and also how to fix them.

Valid with Warnings

This section contains pages that may or may not show up in the search results. They may be indexable but are blocked by the robot.txt directive. So they include web pages that can indexed but are difficult to analyse by the Google bot and if they need to be indexed or not and may have problems appearing in the search results.

Valid Pages

The pages that are indexed property and correctly displayed on the search engine are valid pages. On the index coverage report, you aim that the pages you aim to generate traffic on your website fall into the category of valid pages.

Excluded Pages

These are the pages that do not appear at all on Google Search. The reasons can either be they’re duplicate pages, valid canonical pages already exist for them, or the page has a no-index directive, were not found, etc.

Just because a page is categorised as excluded, doesn’t mean it is always bad. Maybe you intended for it to be excluded because you didn’t want duplication of pages. But if you don’t intend them to be excluded, then you can consider fixing prevailing issues to get Google bot to index these pages.

The Index Coverage Report gives you a detailed insight into what pages are facing issues and which are the specific issues plaguing them, but if you simply want to check the status of a particular page, you can also leverage a URL inspection tool in the search console and see why certain URls are causing trouble with your SEO performance. If you find relevant data on this tool, you can use the coverage report to verify it and work on solving such issues.

Issues in Index Coverage Reports that Impact SEO and Ways to Fix Them

Blocked by robot.txt

On your Index Coverage Report, you can find pages that are blocked by robot.txt. Robots.txt is a file containing instructions directing search engine bots on how they should crawl their website.

If there are any blocked URLs in your robots.txt file, Google won’t be able to index and crawl your site even when you submit a URL to Google. Robot.txt essentially tells Googlebot to not index certain pages but Google may still index pages if it has other strong ranking signals such as backlinks.

If you are sure you don’t want a page to be indexed and want to block the page from appearing in SERPs, consider not using robot.txt. Simply use a noindex tag. If you do want a URL to be indexed, update your robot.txt file to give the permission and if not, you can remove them from your XML sitemap.

Unauthorised Request

The 401 unauthorised status code in your Index Coverage Report means the request to index and crawl the pages couldn’t be completed because the Googlebot received a 4011 HTTP status code. It means pages are still in a staging environment and need authorisation so that Google cannot access them.

These pages cannot be indexed as they are hidden behind logins and are password-protected. To ensure crawl optimisation, you must remove the authorisation requirement especially if your URL is public. If your URL should be public, ensure authorisation is not required to access it.

But if your pages are in the staging environment, it means the pages are still a work-in-progress and you need to ensure the authorisation so that Google is not able to access that particular page.

Currently not Indexed Pages

Googlebot can come across a few pages that aren’t yet indexed. These pages can fall into two categories: crawled but currently not indexed and discovered but not indexed.

If your page is crawled but not indexed, chances are that the URL is recently crawled but is still due to be indexed or Google knows about the URL but hasn’t indexed it due to quality issues like duplicate content, thin content, or lack of internal links.

This information allows you to cross-check the quality parameters of your site and optimise it better for crawling and indexing. If your important URLs are crawled but not indexed yet, and the crawling has happened recently, you can simply wait it out.

As for discovered but not indexed URLs, chances are that Google knows about them but they’re queued for crawling. Maybe Google has requested these URLs but there are some crawl budget issues or they simply didn’t get to crawling yet. To fix this, check if the number of URLs increases. If it does, your site is demanding a higher crawl budget than you currently have.

Duplicate Pages and Issues with Accepting Canonical Pages

Duplicate content can gravely impact your SEO. For some industries like eCommerce, and fashion stores, duplicate content is inevitable considering the hundreds of product pages you have.

You have to add canonical tags which inform Google that a master copy of that page exists. Search Engine Consoles’ Index Coverage Report tells you if there are any issues with accepting canonical pages and if your WordPress website ranking is affected by duplicate content.

With duplicate pages, there can be three primary issues which the Index Coverage Reports shows:

Duplicate Without User-Selected Canonical: This page is considered a duplicate by Google but it isn’t marked with clear canonical and hence, is excluded from the index. To fix this, mark the correct canonical, using rel=canonical links for every crawlable URL on your website.

Duplicate, Submitted URL Not Selected as Canonical: It is the same occurring as the previous one but here you’re notifying Google to index your URL by submitting it in your XML sitemap.

The Index Coverage Report shows you these URLs and you can explicitly mark the correct canonical using rel=canonical links for every crawlable URL and include only the canonical pages in your XML sitemap.

Duplicate, Google Chose Different Canonical Than User: Say you even mark the URL with rel=canonical link but even then, Google can disagree and choose a different URL to index as canonical leading to duplicate content.

To understand why Google does this, inspect the link that Google has chosen as canonical. If you agree with Google, you can change the rel=canonical link or simply reduce the website architecture to reduce duplicate content on your site.

Submitted URL not Found

Google Search Console’s Index Coverage Report shows you the URL you submitted via your XML sitemap but it doesn’t exist.

These types of URLs are similar to the “Not Found (404)” type of error where even though you don’t submit a URL, Google can find it but cannot index it because of an HTTP status code 404.

What you can do in this case is find the important URL listed and restore their content or remove these URLs from the XML sitemap.

Server Error

Under the category of the error, the Index Coverage Report shows you all the pages Google couldn’t index because servers that return with 500 HTTP response code. This type of Internal Server Error is caused by a wider server issue or a brief server disconnection issue as a result of which the pages are unable to load a page.

This can happen rarely and if you encounter such issues, chances are it will automatically be fixed in a while. But if that can’t wait, you can submit a request to Google to index within the URL inspection. If this happens often, you can ask your web developers to keep SEO in mind and work on server infrastructure.

Blocked Due to Prohibited Access (403)

You can leverage your Index Coverage Report to check which of your sites are blocked due to 403 errors. These URLs are those that are submitted through an XML sitemap but Google isn’t allowed access due to a 403 HTTP response.

The server understands the request but refuses to authorise it. If you want the page to be authorised and provide unrestricted access to the public, remove the URL from sitemaps and if you don’t want Google to access these URLs, use a noindex tag.

Pages with Redirect

A redirect error is one where the search engine bot is directed from an old URL to a new one but in doing so, it encounters a few issues and hence cannot crawl and index the new URL.

Some other forms of redirecting errors are:

Redirect loops: Redirect loops in servers and platforms where infinite cycles are caused by links redirecting to one another.
Redirect Chains: Such errors occur when there is more than one redirect error between the initial URL and the final destination URL.
The loading speed of redirects is too long.
Wrong or empty URLs in the redirect chain.

To solve these problems

Try optimising the speed of your redirect URLs so that loading is not an issue
Determine the correct destination page to avoid redirect loops
Ensure that you redirect your initial URL to the final destination to resolve the redirect chain.

Wrapping Up

Now you know what errors and information Google Search Console’s Index Coverage Report, ensure that you follow the required actions to improve your technical SEO relating to indexing.

If you encounter any such issue on your coverage report, it means it is contributing to negative SEO performance, slow page speed, technical SEO glitches, and poor website traffic. You can outline which URLs have been crawled and indexed and why the search engine has made that choice about a particular URL.

Using and leveraging the Google Search Console is something you can outsource to an SEO agency that checks your site health and technical SEO parameters for you. Contact us at Supple to learn more about how we can help you accelerate your website growth and traffic by assessing and improving your SEO framework and ensuring smooth indexing and crawling.

Authors

Hardy Desai

Hardy is the visionary founder of Supple Digital, a boutique SEO agency based in Melbourne, Australia. With a profound understanding of the digital landscape and a deep passion for innovation, Hardy has steered Supple Digital to become a leading name in the SEO domain.