What is a Crawl Budget?
Did you know that Google crawls pages on websites in direct proportion to their approximate page rank? It’s a concept called, crawl budget or crawl rank and it could be having a very big impact on your sites – especially if they’ve got a lot of pages.
According to a recent posting on CalvinAyre.com by SEO expert Nick Garner, crawl budgets mean that pages with low page rank may not be getting crawled often, if at all. In effect, says Garner, “This means if competitors are being crawled more frequently for a similar page than you – they will out rank you.”
So what factors wind up influencing the crawl budget?
In a recent posting, Google Web Spam boss/SEO Oracle Matt Cutts explained:
So if you have a lot of incoming links on your root page, we’ll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we’ll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline.
Cutts goes on to say that every page your site is in a never ending battle for crawl budget with other, similarly ranked, pages. Since most pages on the web have little or nothing in the way of page rank, that can be a pretty fierce fight.
It also means that it may be time to take a closer look at how you handle sitemaps and robots.txt files. Garner points out a publisher who was asking Google to index around 1.5 million pages on her site. While that idea looked good on paper, Garner says, “The more they indexed, the less they crawled.”
This makes sense because, in all likelihood, most of those pages had very low page ranking – if they ranked at all – and weren’t getting crawled.
The good news here is that there are some workarounds that can help you point Google bots to the pages that really matter on your sites. The bad news is that getting there is going to involve a major site clean up and plenty of quality time in the guts of your site.
Garner’s suggestions for creating a crawl budget-friendly site include:
- Checking your sitemap.xml and making 100% certain that it’s categorized properly.
- Keep plenty of internal links pointed at the pages you think have the most potential.
- Make certain you’re using Robots.txt effectively to push the important pages while pushing the less important ones to the side a bit.
- Learn about bots and what parts of your sites they’re crawling today so that you can get a better idea of where you want them crawling in the future.
- Spend lots of time on Webmaster Tools getting a better feel for what’s really going on with your sites.
- Avoid deep site architecture that sends bots through the less traveled sections of your site.
(For a complete list of Garner’s tips, you’ll definitely want to take a look his full posting and video on CalvinAyre.com.)
Adjusting your SEO strategy to account for crawl budgets may seem counter intuitive because it draws attention away from large portions of your existing content.
By the same token, those pages might not be doing you much good anyways and could actually be harming your more-trafficked pages.