How to create llms.txt and why your site may need it

What is llms.txt? It is a proposed Markdown file at a site's root that gives LLMs and AI agents a curated map of important content. Publishing it does not guarantee more citations in ChatGPT or Google AI Overviews. The correct file name is llms.txt, not llm.txt.

For documentation, SaaS, and large information sites, the file can provide a useful navigation layer. This guide covers the llms.txt format, an example file, the difference between llms.txt and robots.txt, llms-full.txt, and validation. For broader context, read what AI visibility is and why SEO is no longer enough.

What is llms.txt

The llmstxt.org llms.txt is described as a proposal to standardize a /llms.txt file that helps the LLM use the site during inference, i.e. at the moment when the user is already querying the model. The specification was published by Jeremy Howard on September 3, 2024.

Consider a typical product site. A person can navigate the menu, benefit sections, forms, footer, and internal links. An agent trying to find documentation or pricing has to filter navigation, utility elements, JavaScript, and repeated interface text. The proposal at llmstxt.org addresses this with a concise source that provides context and links to important pages.

Therefore, llms.txt closer to a curated index than to another sitemap. This is a short sitemap for AI tools: Cursor, an agent in a browser, a chatbot with web search, or an internal assistant that needs to quickly understand the structure of the resource.

Kozak turns a noisy website into a short llms.txt map of key pages

Why llms.txt is being discussed now

The topic did not grow out of theory.

The file has already made its way into the practice of real documentation sites. For example, X Developer Docs publishes both llms.txt and llms-full.txt, as well as Markdown versions of individual pages through the .md suffix.

Chrome Lighthouse already has a separate audit for llms.txt. But this is not a sign of a "new must-have". On the contrary: the Chrome documentation explicitly says that if llms.txt returns a 404, the audit is marked as Not Applicable, because at the moment the file is optional.

And another reason is already closer to business: the market is looking for an easy way to make the site more understandable for AI. This is where the exaggerations begin. llms.txt can indeed reduce chaos for an agent, but it is not a substitute for quality content, technical SEO, or external brand mentions.

Who benefits most from llms.txt

The llms.txt works best where the site has many pages and where it is easy for an AI tool to go astray at the start.

Type of site	Where is the practical benefit of
Developer docs, APIs, and SDKs	Gives the agent a concise documentation index with links to the main sections
SaaS or service site	Shows canonical pages about the product, pricing, security, FAQs, and integrations
E-commerce	Helps you consolidate policies, categories, shipping and return certificates into one file
Educational or content project	Provides a shortcut to key materials without having to scan the entire archive
Corporate website	Helps the agent quickly find pages about the company, services, cases, and contacts

If you have a docs site or a complex SaaS, the benefits are almost direct: the agent gets to the right entry points faster. If it is a small business card site of 3-5 pages, the effect will be more modest. There, even without a separate index, everything is often clear.

What llms.txt doesn't do

This is where expectations usually go bad.

llms.txt does not replace robots.txt and does not control bot access.
llms.txt does not replace sitemap.xml and is not a complete list of indexed URLs.
llms.txt does not guarantee that ChatGPT, Gemini, Claude, or Google AI Overviews will start quoting the site.
llms.txt does not fix weak content, poor information architecture, or crawler blocking.
llms.txt is not a ranking signal that has been publicly confirmed as mandatory by Google or OpenAI.

It is better to align expectations with the team before publishing the file. Otherwise, people may expect llms.txt to deliver outcomes it cannot technically provide.

Google's position here is quite direct. Google Search Central says that there are no additional technical requirements for AI Overviews and AI Mode to appear, and there is no need to create new machine-readable AI text files or special schema.org markup. Hence the practical conclusion: llms.txt is a useful addition for discoverability and agent convenience, but not a prerequisite for visibility in Google AI.

How llms.txt differs from robots.txt and sitemap.xml

They are often mentioned side by side, but their tasks are different.

File	Main role	What not to expect from it
`robots.txt`	Shows what access for automated systems is considered allowed or undesirable	Does not explain the content of the site
`sitemap.xml`	Gives search engines a list of URLs to crawl and index	Is not a curated guide for LLM
`llms.txt`	Gives the LLM a brief context and a list of the most important pages	Not a crawl control tool or a complete map of all URLs

On llmstxt.org it is worded directly: llms.txt is designed to coexist with existing standards, not replace them.

Kozak separates robots.txt, sitemap.xml, and llms.txt into different roles

Recommended structure

Everything is simpler than it seems. The specification offers a short Markdown file, and only one part is required - H1 with the name of the site or project.

What should be in the file:

# Site name is the only mandatory section.
Short blockquote with description - recommended.
If necessary, several paragraphs without headings - to explain how to read the file.
One or more H2 sections.
Inside each H2 there is a list of links in Markdown format with a short description.

There is also an important nuance in the specification: the ## Optional section has a special meaning. References in it can be omitted if the agent needs a shorter context (llmstxt.org).

Kozak assembles the recommended llms.txt Markdown structure with summary, core pages, and optional links

A minimal template might look like this:

# Example Site

> A short description of the site and its purpose.

Use this space to explain which pages are canonical and where to find the most important answers.

## Core pages

- [About](https://example.com/about): Who we are and what we do.
- [Product](https://example.com/product): Product overview and primary use cases.
- [Pricing](https://example.com/pricing): Plans, packages, and terms.

## Documentation

- [Quick start](https://example.com/docs/quick-start.md): The shortest path to getting started.
- [API reference](https://example.com/docs/api.md): Technical API documentation.

## Optional

- [Blog](https://example.com/blog): Additional resources and analysis.

What pages should you add

Most often, they make mistakes in one place: they try to turn llms.txt into a dump of all site URLs. At this point, the file loses its meaning.

It is better to add only those pages that really answer key questions about the business or product:

product or service pages;
documentation, quick start, API reference;
pricing, payment terms, and SLA;
FAQ, support, reference materials;
pages about the company, contacts, security, policies;
some really strong Evergreen materials, if the blog is part of an expert reputation.

A blog also needs a selection. Do not drag the entire archive into llms.txt. It is better to leave 3-10 materials, after which the agent really understands the topic better. For example, for the topic of AI visibility, articles how to analyze sources relied on by AI, how structured data affects AI visibility, and which pages of the site are most often included in AI responses would be appropriate.

Markdown versions of pages and llms-full.txt

This is one of the most practical parts of the specification and is often forgotten. At llmstxt.org it is proposed that pages useful for LLMs have a clean Markdown version on the same URL with the suffix .md. If the URL does not end with the name of the file, but with a path, the specification suggests an option index.html.md.

In practice, this is convenient: an agent can read clean Markdown more easily than full HTML with menus, banners, and interface elements.

Separately, it is worth distinguishing between llms.txt and llms-full.txt.

llms.txt is a short index and navigation.
llms-full.txt is a complete summary context if the site really wants to give the agent a lot of documentation in one file.

In X Developer Docs, this is how it is implemented: a short root llms.txt for navigation, section-specific llms.txt files for documentation subtrees, and llms-full.txt for deeper context. This can work well for SaaS documentation but is often unnecessary for a small corporate site.

How to create a llms.txt without chaos

The first impulse is often to list 20-30 links manually. Start with selection instead: decide which pages deserve inclusion before writing the file.

Conduct a brief content audit. Identify 10-30 pages that really explain the site.
Decide which ones are canonical. If there are two similar pages, keep one stronger one.
Start the file with a short, factual summary without marketing filler.
Break the links into understandable sections: Core pages, Product, Docs, Policies, Optional.
Add a short description to each URL. A good description tells you why you should open this particular page.
If the site has valuable documentation, prepare .md versions of the pages.
Publish the file to the root of the site at https://domain.com/llms.txt and check that it gives 200 OK.

A useful rule: if the description of the page does not help a person understand why this URL is needed, it is unlikely to help the agent either.

Common mistakes

The file itself is simple. The problems are almost always not technical, but editorial.

Publish llm.txt, ai.txt or LLMS.txt instead of llms.txt.
Add all site URLs without selection.
Write an advertising text instead of a specific summary.
Insert links to pages that are outdated, duplicated, or closed to the user.
Fail to update the file after changes in pricing, documentation, or site structure.
They expect instant traffic growth just because of the fact of having a file.
Mix the purpose of llms.txt with access rules, which should actually live in the robots.txt.

A separate mistake is not to reconcile llms.txt with the real architecture of the site. If the file says that the canonical page is one, and the agent gets to another through redirects, noindex, or blocking in the CDN, there will be little use.

How to validate your llms.txt

Start with the basic technical check: does https://domain.com/llms.txt open without authentication and return 200 OK? Then review the content.

What to check	What is considered the norm
file URL	`https://domain.com/llms.txt` opens without authorization
HTTP response	`200 OK`, without a chain of strange redirects
Format	Valid Markdown, clear section order, no broken links
Table of Contents	The file contains only key pages, not the entire site
Expectations	The team understands that this is an auxiliary discovery file and not a guaranteed SEO factor

The Lighthouse audit for llms.txt is a useful additional check, but it should not replace editorial review. Chrome treats the file as optional. A site without llms.txt is not broken, and adding the file does not automatically make the site visible in AI answers.

Another practical check is to open the llms.txt in a real AI tool and see if it was able to correctly navigate the structure of the site. If the answer still relies on random pages, the problem is most likely not a missing file, but a weak content architecture or external signals.

Kozak validates llms.txt with 200 OK, Markdown, links, and an AI test

Conclusion

Create llms.txt because it is an inexpensive way to give agents a short, controlled map of important pages, not because it is supposedly mandatory for AI SEO. It is most useful for documentation, SaaS products, knowledge bases, and sites where pages are difficult to interpret without context.

The practical framework is simple: llms.txt does not replace SEO or robots.txt, remove the need for good content, or guarantee appearance in AI Overviews or ChatGPT Search. It can make a site easier to navigate for tools that choose to read the file. Site architecture, useful text, crawl access, and clear external brand signals still matter.

How to create llms.txt and why your site may need it

What is llms.txt

Why llms.txt is being discussed now

Who benefits most from llms.txt

What llms.txt doesn't do

How llms.txt differs from robots.txt and sitemap.xml

Recommended structure

What pages should you add

Markdown versions of pages and llms-full.txt

How to create a llms.txt without chaos

Common mistakes

How to validate your llms.txt

Conclusion

What else to read

See how AI sees your brand in VYDAI

What is llms.txt

Why llms.txt is being discussed now

Who benefits most from llms.txt

What llms.txt doesn't do

How llms.txt differs from robots.txt and sitemap.xml

Recommended structure

What pages should you add

Markdown versions of pages and llms-full.txt

How to create a llms.txt without chaos

Common mistakes

How to validate your llms.txt

Conclusion

What else to read

What to read next

AI crawlers in robots.txt: GPTBot, ClaudeBot, and more

AI trust sources: website, PR, directories, or reviews?

ChatGPT Shopping: how AI chooses ecommerce products

See how AI sees your brand in VYDAI