To fully understand how Google analyzes your website, it is important to understand how it proceeds to study it. Obviously, it is not the employees of Google who consult the sites one by one, it is a robot which takes care of it: the Googlebot. How does it work ? What is its purpose ? How to improve its pages to facilitate its passage? Can we make it crawl our sites more often? Find the answers to your questions in this article.
GoogleBot: definition and operation
The Google Bot is simply a robot that explores (or “crawls”) your website. We can also call this kind of system a “spider”.
This robot that crawls websites is both interesting for site publishers and Google.
For you, who manages a website, this is the gateway to indexing the pages of your site in search results. He will look for the information he needs and decide whether or not to index your page in the search results. In other words, if you want to optimize the natural referencing of your website, you must already go through this step to hope to be visible.
Google, for its part, finds an interest in it by sorting out the pages that are or are not worth showing in their search results pages.
It is only once they have been crawled and indexed that Google will decide where and how your pages can be viewed (their natural referencing).
These are the 3 stages of page ranking: Crawling > Indexing > Ranking.
If googlebot encounters a problem while crawling your site (page blocked in robots.txt, page in canonical, page responding in code 500, 404, 301, 302, noindex tag, etc.), the indexing of your pages will then be impossible and the natural referencing of the pages concerned necessarily penalized.
For consult the main errors associated with the Google crawlyou can either use a professional crawl analysis tool like ScreamingFrog, SEOlyzer or OnCrawl, or consult the “Coverage” report dedicated to crawl errors on your Google Search Console account.
When does the Google robot pass? How often does it crawl pages?
The Google Bot visit frequency varies from site to site.
It varies between a few minutes and up to a few days.
Google crawl frequency varies depending on the size of the site, the frequency of publication new pages and page update frequency existing.
If you only publish or update pages every 3 months, the GoogleBot may have a fairly low frequency of visits.
What is the crawl budget and how to optimize it?
Another thing to consider about GoogleBot’s crawl is Crawl Budget.
As its name suggests, it is a specific time allocated to the crawl of a site by Google.
For each site, Google will allocate resources and a defined exploration time.
The objective of any SEO is then to direct the Google bot to the most relevant pages to browse. At the same time, on the contrary, it will also have to be prevented from spending too many resources on pages that have no SEO positioning objective.
For this, different strategies are possible and can be jointly implemented:
- Crash in robots.txt categories, URL parameters and other pages that you do not want to submit to GoogleBot’s crawl.
- Blur internal links leading to useless pages (eg my account, legal notices, etc.).
- Delete all unnecessary pages who have no interest in UX and SEO.
- Respond in HTTP 304 code (not modified) pages that have not been modified for several weeks.
- Avoid as much as possible making internal links to pages responding in HTTP code 404, 500, 301 and 302. As far as possible, all internal links must redirect to pages that respond in code 200.
Googlebot is not the only User-Agent
Googlebot is what is called a User Agent. A crawler like this is a user-agent. Google doesn’t just use Googlebot to crawl what’s happening on your site. You can find AdsBot, which checks the quality of your ads, AdSense or the Google API. There are also different versions of the Googlebot for Google Images, Googles News or Google Videos.
To interact with them, there are different instructions to follow, in particular for the Robots.txt file or the Meta tags, which you will find here: https://developers.google.com/search/docs/advanced/crawling/overview- google-crawlers?hl=fr&ref_topic=4610900&visit_id=637843187863289655-38369794&rd=1.
How to improve the exploration of its pages by the Googlebot? 6 tips to apply
Manage your robots.txt file well
This is one of the most important points to check. On this file, you will concretely give directions to Google robots to tell them what to crawl or not. If we want to draw a parallel with the Crawl Budget, we can say that you are going to explain to him where he should spend this budget.
If you don’t give any indication to the robot, it will explore everything. It is therefore better to provide explanations on what the robot should crawl, depending on what you want to see indexed on Google or not.
Give instructions via Meta tags or X-Robots-Tag
In addition to the robots.txt, you can fill in instructions directly from the code of your pages.
You can give information in meta tags. Just add a Meta tag in the Head, with a “name” attribute that targets the robot to whom you want to give an instruction, and a “content” attribute that gives the directive. If you put “name=”robots””, you will target all crawlers (User Agent). For example, if you want to tell Google Bot only that it should neither index nor crawl a web page, you can insert this tag in the head of the pages concerned:
It is also possible to pass instructions in the header tag using the “X-Robots-Tag” HTTP header. You can add more information such as ” noindex“, to block indexing but not access to content, ” nofollow to prevent the robot from accessing it or noarchive to counter archiving. You can combine the 3 if you need to.
Create your sitemap.xml file
The Sitemap file is an important element that can help guide GoogleBot’s crawl. This is the file that will tell the Googlebot the structure of your site and thus facilitate the crawl and therefore the indexing of your pages. It is an important file to promote the indexing of all the desired pages. Without this, Google may miss pages, due to a faulty internal linking or other technical problem.
It is a file in .xml format that you can declare in Search Console or in the robots.txt file that we saw just above.
Provide fresh content all the time
As mentioned, the frequency of the crawl depends in part on the rhythm of publishing new pages and updating existing pages. The more content you update, and the more new you publish, the more the robot will crawl your website.
Improve the internal linking of the site
the internal linking is very important in SEO, for everything related to the “power transfer” between pages. Here, for the Googlebot, these internal links between your pages are also very important. It is these links that go guide the robot through your website. A page that receives no link will be more difficult to crawl and will therefore risk not being indexed.
Ensure correct performance (loading time and technique)
A poorly performing site that takes time to respond will be crawled less often. Google is giving more and more importance to user experience criteria and the website performance is part of. The crawl budget will decrease since Google considers that if it uses too much bandwidth, site users will be penalized.
How to increase Google’s crawl frequency?
There are no miracle solutions to increase the Googlebot’s crawl frequency. Ensuring regular updates and page posts is a good solution.
If you follow the different indications that we saw in the previous part, you should be able to optimize the exploration and therefore optimize your crawl budget. This does not strictly speaking increase the frequency, but allows for better quality exploration.
Nevertheless, you can reduce the Googlebot’s crawl speed if you notice that it penalizes performance of your website. Google advises against limiting this aspect, but nothing prevents you from doing so. You will still have to make a “special request” if your crawling speed is already considered optimal.
You can also do it by blocking the crawl on your robots.txt or by returning the HTTP code 5XX/429. You now know how Googlebot works and how to optimize the crawl of your site. If you have any questions about this, feel free to post them in the comments section below.