
The most common misconception is that robots.txt prevents indexing. It only controls crawling; use a noindex meta tag to keep pages out of search results.
The correct placement for robots.txt depends on your setup: place it in the app/ directory for the App Router or the public/ directory for the Pages Router.
A standard robots.txt for Next.js should disallow internal paths like /api/ and /_next/ to optimize your site's crawl budget.
For complex technical SEO challenges, Synscribe's Technical SEO Audit & Implementation service provides hands-on engineering expertise to ensure your site is perfectly optimized.
If you've ever struggled with configuring robots.txt for your Next.js application, you're not alone. Many developers find themselves confused about where to place the file, what to include, and whether to use a static or dynamic approach. This confusion can lead to search engine optimization (SEO) issues that affect your site's visibility and performance.
In this comprehensive guide, we'll demystify robots.txt implementation for Next.js applications and provide you with everything you need to properly configure and test your setup.
A robots.txt file provides instructions to web crawlers (like Googlebot) about which pages or files on your site they can or cannot access. Based on the Robots Exclusion Protocol (REP), which became an official internet standard in 2022 but has been in use since 1994, this simple text file sits at the root of your website.
For example, a basic robots.txt file might look like this:
User-agent: *
Allow: /
Disallow: /private/ The main purpose of robots.txt is to manage crawler traffic and prevent your server from being overwhelmed with requests. However, it's crucial to understand that robots.txt is not a security mechanism. It cannot be used to hide private information, as it's merely a set of suggestions that well-behaved crawlers follow.
One of the most common misconceptions is that disallowing a page in robots.txt prevents it from appearing in search results. This is incorrect. A page disallowed in robots.txt can still be indexed if linked from other sites.
To prevent a page from being indexed (appearing in search results), you must use:
A noindex meta tag in the page's HTML
HTTP response headers with X-Robots-Tag: noindex
Password protection
Remember: robots.txt controls crawling, not indexing.
You might wonder if robots.txt is still necessary when tools like Google Search Console exist for submitting sitemaps. The answer is yes, for several important reasons:
Search engines allocate a limited "crawl budget" to your site. With robots.txt, you can guide crawlers to prioritize your most important pages (like service pages for conversions) and avoid wasting this budget on non-essential pages.
You can use robots.txt to block crawling of URLs with parameters that might generate duplicate content, such as:
User-agent: *
Disallow: /*?* robots.txt helps prevent indexing of certain resources that you don't want appearing in search results, such as backend API endpoints or admin panels.
One of the most confusing aspects of robots.txt in Next.js is determining where to place the file. The correct method depends on your Next.js version (App Router vs. Pages Router) and whether you need static or dynamic content.
For most websites, a static robots.txt file is sufficient and easier to maintain.
Create a file named robots.txt directly in the app/ directory. Next.js will automatically serve it from the root.
Example app/robots.txt:
User-agent: *
Allow: /
Disallow: /private/
Sitemap: https://acme.com/sitemap.xml Create a file named robots.txt in the public/ directory. Any file in this directory is served statically from the root.
Example public/robots.txt:
User-agent: *
Disallow: /admin/
Allow: /
Sitemap: https://www.yourdomain.com/sitemap.xml Dynamic generation is useful when you need different rules for different environments (development, staging, production).
Create a file named app/robots.ts (or .js). This file exports a default function that returns a MetadataRoute.Robots object.
Basic Dynamic Example (app/robots.ts):
import type { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
disallow: '/private/',
},
sitemap: 'https://acme.com/sitemap.xml',
};
}Advanced Dynamic Example (Multiple User Agents):
import type { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
const isProduction = process.env.NODE_ENV === 'production';
return {
rules: [
{
userAgent: 'Googlebot',
allow: isProduction ? ['/'] : [],
disallow: isProduction ? ['/private/'] : ['/'],
},
{
userAgent: ['Applebot', 'Bingbot'],
disallow: ['/'],
},
],
sitemap: 'https://acme.com/sitemap.xml',
host: 'https://acme.com', // Optional
};
}For dynamic generation with the Pages Router, you'll need to create an API route and use rewrites in next.config.js.
Step 1: Create the API Route (pages/api/robots.js):
// pages/api/robots.js
export default function handler(req, res) {
// Set the appropriate content type
res.setHeader('Content-Type', 'text/plain');
// Define your robots.txt content
const robots = `User-agent: *
Disallow: /admin/
Allow: /`;
// Send the response
res.send(robots);
}Step 2: Add a Rewrite in next.config.js:
// next.config.js
const nextConfig = {
async rewrites() {
return [{ source: '/robots.txt', destination: '/api/robots' }];
},
};
module.exports = nextConfig;next-sitemap)For projects that also need a sitemap, next-sitemap can automate both tasks.
Step 1: Installation:
npm install next-sitemap Step 2: Configuration (next-sitemap.config.js):
const config = {
siteUrl: 'https://yourwebsite.com',
generateRobotsTxt: true, // (optional)
robotsTxtOptions: {
policies: [
{ userAgent: '*', disallow: '/private/' },
{ userAgent: '*', allow: '/' },
],
additionalSitemaps: [
'https://yourwebsite.com/sitemap.xml',
],
},
};
module.exports = config;Step 3: Add Post-build Script (package.json):
"scripts": {
"build": "next build",
"postbuild": "next-sitemap"
}Many developers wonder what a typical Next.js project should exclude in robots.txt. Here's a recommended baseline configuration:
User-agent: *
Allow: /
# Disallow API routes as they don't serve crawlable content
Disallow: /api/
# Disallow internal Next.js folders if they are somehow exposed
Disallow: /_next/
# Disallow private or administrative pages
Disallow: /admin/
Disallow: /profile/
# Add the location of your sitemap
Sitemap: https://yourdomain.com/sitemap.xmlAllow: /: Explicitly states that everything is allowed by default.
Disallow: /api/: API routes contain backend logic, not content, so crawlers shouldn't access them.
Disallow: /_next/: This folder contains build assets and is generally not meant to be crawled directly.
Regarding specific file types:
Source files (.ts, .js) are not served publicly, so they don't need to be disallowed.
Images should generally be allowed so they can appear in image search, unless they contain sensitive information.
Avoid these common pitfalls when configuring your robots.txt:
Incorrect capitalization (user-agent vs. User-agent) or typos can invalidate a rule.
Accidentally disallowing CSS/JS files can prevent Google from rendering pages correctly, harming SEO. A rule like Disallow: /assets/ could be problematic.
Remember that robots.txt is public and malicious bots will ignore it. Use authentication or noindex tags for sensitive content.
Not having a robots.txt file means crawlers assume they can access everything, which may not be ideal.
If a disallowed URL is linked from another website, Google may still index it without visiting it. The search result will show the URL with a note like "No information is available for this page."
Before deployment, you'll want to ensure your robots.txt is properly configured to avoid potential crawl errors.
This tool, part of Google Search Console, allows you to paste your robots.txt content and test if specific URLs are blocked for different Google user-agents. Visit the Robots Testing Tool to validate your configuration.
After deployment, use the URL Inspection Tool in GSC. Enter a URL, and under the "Coverage" section, it will report if the page is "Blocked by robots.txt."
After deploying your site, simply navigate to https://yourdomain.com/robots.txt in your browser to ensure the file is being served correctly and contains the expected content.
A well-configured robots.txt file is a cornerstone of technical SEO for any Next.js application. It helps manage crawl budget, prevent indexing issues, and guide search engines effectively.
Start with a simple configuration and only add more complex rules as needed. Regularly test your setup using the tools mentioned above, and keep your robots.txt updated as your site evolves.
By implementing the strategies outlined in this guide, you'll ensure search engines can efficiently crawl your Next.js application, helping to improve your site's visibility and performance in search results.
The primary purpose of a robots.txt file is to manage web crawler traffic and optimize your site's crawl budget. It instructs search engine bots on which pages or sections of your site to crawl and which to avoid, ensuring they spend their limited resources on your most important content rather than on non-essential pages like admin panels or API routes.
The location depends on your Next.js version and architecture. For modern Next.js applications using the App Router (v13.4+), you should place a static robots.txt file directly in the app/ directory. For older applications using the Pages Router, the file should be placed in the public/ directory.
To completely prevent a page from being indexed and appearing in search results, you must use a noindex directive. Blocking a page in robots.txt only prevents it from being crawled; it can still be indexed if linked from other sites. The most common method is to add a noindex meta tag (<meta name="robots" content="noindex">) to the HTML <head> of the specific page.
A standard configuration for a Next.js app should disallow crawling of non-content and internal directories. Best practices suggest including rules like Disallow: /api/ to block backend API routes, Disallow: /_next/ to block internal build assets, and any private areas such as Disallow: /admin/ or Disallow: /profile/.
A page blocked by robots.txt can still be indexed if Google discovers it through a link from another website. In this case, Google indexes the URL without crawling its content, often resulting in a search result that says, "No information is available for this page." To remove it, you must use a noindex tag and temporarily allow crawling so Google can see the tag.
You can create environment-specific rules using a dynamic robots.ts (or .js) file in the App Router. This file can export a function that checks an environment variable like process.env.NODE_ENV and returns different rules. For example, you could disallow all crawling on development and staging environments while allowing it in production.
robots.txt and a dynamic robots.ts file?A static robots.txt is a simple, fixed text file that is easy to create and suitable for most websites with unchanging crawl rules. A dynamic robots.ts file is a function that generates the robots.txt content programmatically, which is useful for complex scenarios where you need conditional logic, such as different rules for different subdomains or environments.
Synscribe helps B2B companies with SEO & GEO using programmatic SEO approach. Book a call to find out how we help you win.