Log analysis: Segment well to analyze properly

segmentation logs

Log analysis is today an essential tool for any SEO agency to optimize a website, especially sites offering Google several thousand pages to crawl.

This is an effective solution and should be carried out at least once a year in order to validate the good SEO health of a website. It will improve the crawl of your site, the position of your strategic expressions and thus increase your traffic and your conversions. It is also often the only solution to unblock awkward SEO situations and facilitate decision-making.

Here we will focus on the art of segmentation in a log analysis. This step is essential and will have an impact on your entire study.

The backbone of your log analysis

That’s it, you are embarking on log analysis: Bravo! The Logs files were successfully recovered, are usable and you have selected your preferred analysis software.
The first step, and the most important because it is what will ultimately define all your analysis, is the segmentation of your logs by major category.

It will allow you to compare different types of pages according to SEO criteria and to analyze abnormal crawl behavior.

An analysis of well-segmented logs will allow you to easily answer these types of questions:

  • Are my Product pages crawled more often than my Categories pages?
  • What is my crawl loss percentage?
  • What is my crawl window? (how long does it take for Google to visit all the strategic pages of my site at least once)
  • Which category of my catalog is visited the most by Google?
  • Are my http pages still crawled?
  • How can I optimize the performance of my .htAccess on old redirects?

Segmentation allows you to categorize certain pages of your tree structure in relation to others and to compare them. The way to optimize SEO pages is not the same depending on the nature, role and type of expression targeted by the page (long tail / generic).

Thus, your segmentation must reflect your SEO strategy and allow you to obtain reliable data to guide your future optimizations.

Tools like OnCrawl allow you to easily configure these segmentations thanks to sorting parameters on the URLs, but also Onsite parameters, such as a minimum of content, internal links or even the presence or absence of semantic markup.

All this information can then be used to compare the impact of SEO optimizations on a site’s crawl.

Top 5 of our favorite segmentations

Here are our top 5 segmentations that should be included in any log analysis:

Strategic pages for SEO vs non-strategic

One of the first segmentations to be implemented. It will allow you to quickly improve the impact of your crawl budget, or even increase it on your site: Detect non-strategic pages for your SEO which are always crawled.

By carrying out this segmentation, we can realize the number of hits “wasted” on the site and which could benefit other more strategic pages.

Examples of pages detected:

Most often, it is resource pages that are detected with this technique or pages with duplicate content (Filters, tracking, etc.). A keyword analysis and a very good knowledge of the site are necessary to do this sorting.


  • Validate that the crawled page is really not a non-strategic page for your SEO.
  • Installation of canonical tag on filtered pages. Be careful, you must be aware that Googlebot is very fond of canonical tags and little devotes a significant part of your crawl budget to them. These should only be used if you have no choice.
  • Setting up a Meta Robots NoIndex, NoFollow tag
  • Remove this page from the sitemap
  • Reduce the number of internal links pointing to this page
  • As soon as this page is no longer indexed, block it with robots.txt
  • Monitor hits on these newly blocked pages.

HTTP vs HTTPs version

This second segmentation is of course linked to switching the site to https. It is sometimes surprising to see how Google can continue to crawl old versions of http despite a migration that took place years ago. Even if the 301 redirects are in place, it is always better to have a site crawled overwhelmingly on its final version.

It is also, of course, a method to know if Google has taken into account redirects and the new version of a site after migration.


If your crawl budget is important on your old http version, we advise you to:

  • Check for 301 redirects (in place? Accessible for Googlebots?)
  • Check your sitemap
  • Check your internal mesh
  • Request or update your external netlinking (at least the most powerful sites)

Editorial pages VS e-commerce pages

The purpose of this segmentation is to detect whether your editorial pages or your e-commerce pages are regularly hitted by search engines. An editorial page has for your vocation to offer content more important than the average, with a number of varied expressions and we know that Google is very fond of that. An e-commerce page is very often the page that you want to see indexed because it is the one that allows the Internet user to convert without multiplying the clicks. Most of the time, these are the pages that generate the most business.

This analysis will therefore allow you to check the impact of your content on the crawl, in terms of volume, and the impact of your internal network.


If your e-commerce pages, the most strategic, are much less crawled than your editorial pages, we advise you to:

  • Check the number of words that generates an increase in Google crawl
  • Create content on your product pages
  • Check that there are no orphan pages (invisible in your internal mesh)
  • Intelligently mesh certain product pages from your editorial pages
  • Dedicated sitemap
  • Netlinking to your e-commerce pages

These recommendations apply of course to editorial pages, if you want to position yourself on non-transactional expressions. In this case, we recommend an internal link between your editorial pages and to be particularly vigilant about the content.

Top sales VS New products / categories with high potential

To achieve this segmentation, it will be essential to know the flagship products of the site, those which convert the best.

The objective here is to compare these driving products for your site to new products or new categories that you want to see converted. We will therefore look at:

  • Top seller crawl level VS new products
  • The level of internal links to Top Selling products
  • The content offered by top sellers
  • The crawl frequency> How often Google goes on your new products VS top sellers. In principle, this frequency should be as short as possible so that Google quickly takes into account the optimizations that you can make on your new pages.


  • Be inspired by the nature of Top Sales pages: Title, meta description, meta data, photos, CTA, content
  • Develop internal links to your new products (highlighting on the homepage / universe / category + cross selling?)
  • Check that the new products are in the sitemap? If necessary, devote a dedicated sitemap to them …

Segmentation by language

This segmentation is of course aimed at international sites. If you want to improve your positions on several languages ​​and check that a language is not behind for Googlebot, this is the segmentation for you!

Here, we will therefore separate the different languages ​​of the site, but also the different types of pages offered. Thus, we will be able to compare the crawl rate of the Italian Categories pages VS the French Categories pages.


  • Validate the international strategy: What are the most strategic languages? It must be felt in the Google crawl
  • Check that international SEO is well established on all pages of the site (lang tag, hreflang, adapted content, etc.)
  • Check the sitemaps (at least 1 sitemap per language!)
  • Develop netlinking accordingly

We hope to have made you want to carry out a log analysis on your site, and to have clarified your vision on the possible segmentations. As always with log analyzes, be careful to take into account the life of a website before selecting your analysis periods!

Do not hesitate to contact us if you want a complete log audit for the SEO of your site!

See also  definition, interest and creation (+ free tools)

Leave a Comment

Your email address will not be published.