Mastering SEO Essentials: A Guide to robots.txt, Sitemaps in Next.js

January 27, 2024by Liben Hailu

4 mins read

Learn how to optimize your site using robots.txt, leverage the robots meta-tag, and harness the power of sitemaps. Follow easy steps to implement these techniques, ensuring proper indexing and visibility.

robots.txt

A robots.txt is a simple text file used to determine whether the web crawler of a search engine is allowed to index the web page or not. It is placed in the root directory of the web page, and your site can only have one robots.txt, which will be applied to every page. For example, this https://www.example.com/robots.txt should return a status of 200, and the content should be structured correctly for the web page to be indexed properly.

Here is an example of a robots.txt:"

  User-agent: *
  Allow: /

  Sitemap: https://www.example.com/sitemap.xml

This robots.txt has one group (rule). According to the directives above, all user-agents (robots) are allowed to crawl the website. All directories are allowed to be crawled, and there is no directory disallowed. It also specifies the location of the sitemap.

Generating robots.txt in Next.js

Create robots.ts in the root app directory and return Robots object

robots.ts

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: "*",
      allow: "/",
    },
    sitemap: "https://www.example.com/sitemap.xml",
  };
}

Robots meta-tag

Let's say you want to hide your privacy policy page from crawlers. If we use Disallow: "/privacy" in our robots.txt, the crawler won't crawl it. Consequently, we will not have a no-index meta-tag in the page header, which means it might show up in search results if that page is linked on another allowed page to be crawled. In such scenarios, we can use the robots meta-tag to apply specific tags to the page.

in Next.js this can be achieved by

First, remove privacy from our Disallow list in the robots.txt file. Then, on our privacy page, add this metadata.

page.tsx

export const metadata: Metadata = {
  title: "My page",
  robots: {
    index: false,
    follow: true,
  },
};

Sitemaps

A sitemap is a file where you provide information about your site. It helps search engine crawlers index your site efficiently. You can use a sitemap to provide information about your content on the page, such as video, image, news.

When do you need sitemap

Large sites On large sites, it is difficult to ensure that every page is linked by at least one page, so some pages might not be discovered by Googlebot.
New site Googlebot and other crawlers crawl the web, following links from one page to another. If no other pages link to your site, Googlebot might not discover your page.
Site with a lot of rich media content Googlebot can retrieve additional information from your sitemaps for search.

Generating sitemap.xml in Next.js

Create sitemap.ts in the root app directory and return Sitemap object.

sitemap.ts

export default function sitemap(): MetadataRoute.Sitemap {
  return [
    {
      url: "https://www.example.com",
      lastModified: new Date(),
      changeFrequency: "yearly",
      priority: 1,
    },
  ];
}

Submitting your sitemap

Google provides different ways to submit sitemaps and verification methods. Let's verify ownership of our site with the HTML tag verification method. Go to Google Search Console. Let's go with the URL prefix option. Now, add the URL in the input and click continue. Expand the HTML tag section to find the meta tag.

Adding Google's ownership verification code in Next.js

In you root layout metadata add the verification object

layout.tsx

export const metadata: Metadata {
    title: "Home page",
    verification: {
        google: "VERIFICATION-CODE"
    }
}

After adding verification redeploy your site. if you see something like this in the <head/> element

<meta name="google-site-verification" content="VERIFICATION-CODE"/>

You're all set! Head over to Google Search Console, click the verify button, and in a few days, your site will be crawled and indexed.