Skip to content
← Back to blog
SEOApril 19, 2026·8 min read

What Is robots.txt and How to Create One (Step-by-Step Guide)

TL;DR

A robots.txt file tells search engine crawlers which pages on your site they're allowed to visit and which ones they should skip. This guide covers what robots.txt does, how to create one, and what happens when you get it wrong.

What Is robots.txt and How to Create One (Step-by-Step Guide)

Every website talks to search engines. Most of that conversation happens behind the scenes, through small text files and bits of code you never see. One of the most important files in that conversation is robots.txt.

If you run a website, you need to understand what robots.txt does, how to create one, and what happens when you get it wrong. This guide covers all of it in plain English.

What Is robots.txt?

A robots.txt file is a plain text file that tells search engine crawlers which pages on your site they're allowed to visit and which ones they should skip. Think of it as a set of house rules for bots. When Google, Bing, or any other crawler arrives at your site, the first thing it checks is your robots.txt file.

It doesn't block access like a password would. Crawlers can still visit pages you've disallowed. But well-behaved bots (like Googlebot) will respect the instructions. Malicious bots won't, but that's a separate problem.

The file exists so you can guide crawlers toward the content that matters and away from pages that don't belong in search results.

Try robots.txt Generator Free

Generate a valid robots.txt file for your site in seconds.

Open Tool

No signup required. Runs in your browser.

Where Does robots.txt Live on Your Site?

Your robots.txt file must sit at the root of your domain. That means it needs to be accessible at:

https://yourdomain.com/robots.txt

Not in a subfolder. Not renamed. Not on a subdomain (unless you want separate rules for that subdomain). If a crawler can't find the file at that exact URL, it assumes everything on your site is fair game.

You can check yours right now. Open a browser, type your domain followed by /robots.txt, and see what comes up. If you get a 404, you don't have one yet.

Robots.txt Syntax Explained

The syntax is simple. There are only a handful of directives you need to know.

User-agent

This line specifies which crawler the rules apply to. Use * to target all crawlers, or name a specific one.

User-agent: *
User-agent: Googlebot

Disallow

This tells a crawler not to access a specific path.

Disallow: /admin/

The trailing slash matters. /admin/ blocks everything inside that folder. /admin would also block a page literally called /admin.

Allow

This overrides a Disallow rule for a specific path. Google supports it. Not every crawler does, but the major ones do.

Allow: /admin/public-page.html

Sitemap

This points crawlers to your XML sitemap so they can find all your pages efficiently.

Sitemap: https://yourdomain.com/sitemap.xml

You can include multiple Sitemap lines if you have more than one. It's a good practice to always include this directive. If you're not sure whether your sitemap is set up correctly, run it through an XML Sitemap Validator to catch errors before search engines do.

Common robots.txt Examples

Here are the setups you'll see most often. Pick the one closest to your situation and adjust from there.

Allow everything (default behavior)

User-agent: *
Disallow:

Leaving the Disallow value empty means "don't block anything." This is the most open configuration.

Block a specific folder

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /staging/

This keeps internal pages, admin panels, and staging areas out of search results. You probably don't want Google indexing your WordPress login page.

Block a specific bot

User-agent: AhrefsBot
Disallow: /

User-agent: *
Disallow:

This blocks the Ahrefs crawler from your entire site while letting everyone else through. Some site owners do this to prevent competitor analysis tools from crawling their content.

Block everything except one folder

User-agent: *
Disallow: /
Allow: /public/

Useful during development or for sites with mostly private content and just a few public-facing pages.

Full production example

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?
Disallow: /api/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

This robots.txt example blocks admin, cart, checkout, internal search results, and API endpoints, while allowing everything else. It also points to the sitemap.

Robots.txt vs. Noindex Meta Tag

These two do different jobs, and mixing them up is one of the most common SEO mistakes.

robots.txt tells crawlers whether they should visit a page. It controls crawling.

A noindex meta tag tells crawlers not to include a page in search results even after they've visited it. It controls indexing.

Here's the important part: if you block a page with robots.txt, crawlers won't visit it. That means they'll never see a noindex tag on that page. So if other sites link to a blocked page, search engines might still index the URL based on those external links, even though they can't read the page itself.

If you want a page completely out of search results, use a noindex meta tag and make sure robots.txt allows the crawler to see it. Sounds backwards, but that's how it works.

You can add noindex tags quickly using a Meta Tag Generator if you need to set up several pages at once.

Common Mistakes to Avoid

Blocking your CSS and JavaScript files. Years ago, people blocked these to save crawl budget. Today, Google needs to render your pages to understand them. Blocking CSS/JS can hurt your rankings.

Forgetting the trailing slash. Disallow: /admin and Disallow: /admin/ behave differently. Be precise with your paths.

Using robots.txt as security. It's not a security measure. Anyone can read your robots.txt file by visiting it directly. Don't put sensitive paths there, and never rely on it to hide private content. Ironically, listing secret paths in robots.txt tells the world exactly where to look.

Blocking your sitemap. If your Disallow rules accidentally block the path to your sitemap, crawlers can't read it. Always double-check that your sitemap URL isn't caught by a Disallow rule.

Having no robots.txt at all. While technically fine (crawlers will just index everything), having one gives you control. It also helps crawlers find your sitemap.

How to Test Your robots.txt File

Before you push your file live, test it. Google provides a robots.txt report inside Google Search Console. You can enter any URL from your site and see whether your current rules block it or allow it.

Here's a quick testing checklist:

  • Visit yourdomain.com/robots.txt and confirm the file loads
  • Check that no critical pages (homepage, product pages, blog posts) are accidentally blocked
  • Verify your sitemap URL is correct and accessible
  • Test the file in Google Search Console's robots.txt tester
  • Make sure CSS and JS files aren't blocked

If you're also implementing structured data on your pages, a Schema Markup Generator can help search engines understand your content better alongside a clean robots.txt setup.

How to Create a robots.txt File

You have two options: write it by hand or generate one automatically.

Writing it by hand

  1. Open any plain text editor (Notepad, VS Code, TextEdit)
  2. Write your directives using the syntax covered above
  3. Save the file as robots.txt (no other extension, no capital letters)
  4. Upload it to the root directory of your website
  5. Test it by visiting yourdomain.com/robots.txt

This works fine for simple setups. But if you're managing multiple crawler rules or want to make sure you haven't made a syntax error, a generator saves time and catches mistakes.

Using a robots.txt generator

Morphkit's free Robots.txt Generator lets you build a complete file without memorizing any syntax. Pick which crawlers to target, set your allow/disallow rules, add your sitemap URL, and download the finished file. It takes about 30 seconds.

It's especially useful if you're setting up robots.txt for the first time or managing multiple sites. No signup required.

Quick Reference

Directive What it does
User-agent: * Applies rules to all crawlers
User-agent: Googlebot Applies rules to Google only
Disallow: /path/ Blocks crawlers from that path
Allow: /path/ Overrides a Disallow for that path
Sitemap: URL Points crawlers to your sitemap

Wrapping Up

A robots.txt file is small but important. It tells search engines where to go and where to stay away. Getting it right means better crawl efficiency, cleaner indexing, and fewer accidental pages showing up in search results.

If you don't have one yet, take five minutes and create one. You can generate a robots.txt file for free with Morphkit and have it ready to upload today.

Related Articles