What is LLMs-Full.txt?

If you want LLMs to answer accurately from your docs without scraping noisy HTML, LLMs-Full.txt is the simplest way to ship a clean, Markdown “source of truth” that models can fetch and use.

Quick answer: What is LLMs-Full.txt?

LLMs-Full.txt is a single, Markdown-formatted file (typically hosted at /llms-full.txt) that contains the full, machine-readable version of your core documentation for large language models to consume. Its goal is to bypass context-window and HTML-noise issues by giving models a compact, high-signal context file they can retrieve, parse, and cite directly.

Think of it as a canonical, LLM-ready bundle of your essentials—definitions, procedures, examples, and references—in one place.

In practice, teams host it at a stable URL, keep it under sensible token limits, and reference it in prompts or retrieval pipelines to ground answers reliably.

LLMs-Full.txt vs LLMs.txt vs llms-ctx-full.txt

Clear naming avoids confusion and mis-implementation. Think robots.txt vs sitemap.xml: LLMs.txt is the index, while LLMs-Full.txt is the content package. Use consistent paths and casing so tools and prompts can find and trust the right file every time.

LLMs.txt (curated, high-signal entry point)

LLMs.txt is a small index that points LLMs to your best sources: key docs, API references, changelogs, and optionally your /llms-full.txt. It mirrors sitemap.xml in spirit but is human- and model-readable Markdown with brief annotations.

Include scope notes, priorities, and short canonical descriptions so models know what to fetch first and why. For example, you might list:

API Reference (primary)
Getting Started (secondary)
Changelog (version context)
LLMs-Full.txt (complete bundle)

Treat it as the front door that orients models before they read deeper.

LLMs-Full.txt (/llms-full.txt: complete docs in Markdown)

LLMs-Full.txt is the entire “just the facts” version of your docs in one Markdown file. It should include essential concepts, procedures, examples, and references so a model can answer common questions without crawling your whole site.

This file is ideal when you want a dependable, fetch-once package that supports both LLM browsing and RAG ingestion. Keep sections predictable, keep examples concise, and favor clarity over marketing language. The result is faster, more accurate answers and fewer hallucinations from noisy or fragmented sources.

llms-ctx-full.txt (tool-generated full context)

llms-ctx-full.txt is a variant some tools generate automatically—often chunked or pre-processed for agents. Treat it as an implementation detail: it’s not a canonical standard, but it can be useful for CI pipelines or offline agents that need split files, embeddings, or indexes while preserving the same content scope as /llms-full.txt.

If you use it, document how it’s produced, how it maps back to /llms-full.txt, and how downstream tools should interpret chunk boundaries. The takeaway: publish /llms-full.txt as the public source of truth, and use llms-ctx-full.txt internally when it improves performance or workflow.

When (and when not) to use LLMs-Full.txt

Use LLMs-Full.txt when you want reliable, fast answers from LLMs without custom crawlers or brittle scraping. It shines when your core knowledge can be expressed succinctly in Markdown and updated via your existing docs workflow.

The pattern works especially well for developer products, customer support, and standardized procedures. As your footprint grows, you can scale to multiple files or pair it with RAG while keeping /llms-full.txt as the canonical bundle.

Good fits: API docs, onboarding guides, support knowledge bases

LLMs-Full.txt works well for API references, SDK guides, product overviews, auth workflows, pricing explanations, and troubleshooting FAQs. It’s also ideal for “getting started” and integration guides that benefit from stepwise procedures and concise code snippets.

Teams frequently start by exporting their top 10–20 pages and tightening them into a single, de-duplicated file. Include representative examples and the most common edge cases to maximize coverage. This gives models a focused, high-signal source that answers the majority of user questions.

Avoid or adapt if: extreme size, PII/licensing, fast-changing content

If your docs are huge (hundreds of pages, multi-GB assets), you’ll need chunking and linking rather than one monolithic file. Don’t include PII, credentials, or licensed content you can’t redistribute; instead, link to gated sources and document terms.

For fast-moving UIs or APIs, ensure you version, timestamp, and automate updates to prevent stale answers. In these cases, /llms-full.txt can remain the high-level bundle while detailed or volatile content lives in referenced documents. This keeps the core file clean, safe, and current.

Recommended structure for /llms-full.txt

Structure matters because LLMs scan headings to build mental maps of content. A predictable outline reduces hallucinations and parsing errors. Start with required sections, add light metadata, and keep size within token budgets.

When content exceeds targets, split files and link them from a small index.

Required sections (Title, Overview, Key Concepts, Procedures, References)

Title: One H1 that names the product or docs package.
Overview: Scope, intended audience, and supported versions.
Key Concepts: Clear definitions (auth model, rate limits, data model).
Procedures: Task-oriented “How to” workflows with steps and examples.
References: API endpoints, CLI flags, configuration keys, and links.

Optional metadata (updated_at, language, source URLs, license)

Add lightweight provenance near the top:

updated_at (ISO date)
language
canonical source URLs
license
contact

This helps tools handle freshness, multilingual selection, and attribution. If you localize, include language tags in headings or provide one file per locale.

You can also include product version, ETag/commit SHA, or environment notes to help agents pin and cache the exact revision they used.

Size and token budgets (chunking and linking strategy)

Aim for 0.3–1.0 MB uncompressed (roughly 75k–250k tokens depending on text density) to avoid fetch timeouts and context truncation.
For larger sites, create a top-level /llms-full.txt that links to /llms-full-1.txt, /llms-full-2.txt, etc., each kept under ~100k tokens.
Prefer concise examples over long transcripts; link out for bulk reference tables.
Tokenization varies by model, so monitor real usage and adjust chunk sizes accordingly. The goal is quick fetches, clean parsing, and minimal truncation in downstream prompts.

Linking policy (when to embed vs reference)

Embed: Core definitions, critical steps, small code samples.
Reference: Large tables, release archives, image-heavy tutorials, legal docs.
Use absolute URLs and stable anchors; provide 1–2 lines of context before each link so models know why it matters.

Template and minimal example

Use this minimal pattern to bootstrap your first version, then expand with your own concepts, procedures, and references. Keep the metadata near the top, and ensure headings are consistent so tools can chunk by section.

# ExampleCo Developer Docs (LLMs-Full)

updated_at: 2025-11-17
language: en
license: CC-BY-4.0
sources:
- https://exampleco.dev/docs
- https://api.exampleco.com

## Overview
ExampleCo API lets you create, read, update, and delete Projects and Tasks. This LLM-friendly file summarizes the canonical docs so models can answer questions without scraping HTML. Supported versions: v1 (current), v0 (deprecated).

## Key Concepts
- Authentication: Bearer token via Authorization header. Tokens are project-scoped.
- Rate Limits: 100 req/min default; burst to 200 req/min for 30s.
- Errors: JSON body with code, message, and request_id.

## Procedures
### Create a Task
1) Obtain a token from the dashboard.
2) POST to /v1/tasks with name and project_id.
3) On 201, store id for later updates.

Request:

curl -s -X POST https://api.exampleco.com/v1/tasks \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"name":"demo","project_id":"prj_123"}'

### Webhooks Setup
1) Create an endpoint in the dashboard.
2) Verify signatures using the signing_secret.
3) Retry policy: exponential backoff, max 24h.

## References
- REST API: https://exampleco.dev/reference
- SDKs: https://exampleco.dev/sdk
- Webhooks: https://exampleco.dev/webhooks
- Changelog: https://exampleco.dev/changelog

Generation workflows

There’s no single “right” workflow—pick the path that fits your stack and team. Start small, automate steadily, and ship updates via your existing docs pipeline.

The most effective setups combine a one-time bootstrap with regular CI-driven refreshes. Keep humans in the loop for pruning and clarity.

Manual and CMS-based assembly

Export your top pages to Markdown from your CMS (Docusaurus, MkDocs, Mintlify, Hugo) and paste into a single file, de-duplicating as you go. Keep headings consistent, add metadata, and normalize code blocks.

For small sites, this takes a few hours and gives you control over what models see first. As you refine, collapse repetitive examples, promote canonical workflows, and add links to heavy references. This approach creates an opinionated, high-signal baseline before you automate.

CLI and plugins (e.g., Firecrawl, dotenvx, static-site plugins)

Use a crawler that outputs Markdown to bootstrap a draft, then prune it by hand. This accelerates initial assembly while preserving your ability to curate what stays in the final file.

Example: npx firecrawl crawl https://yourdomain.com/docs --format markdown --out llms-full.txt
Static-site generators often have export or print-to-Markdown plugins; run them in build to emit /llms-full.txt.
If you manage environment variables or secrets in builds (dotenvx or similar), ensure they never leak into the exported file.

CI/CD automation and incremental updates

Add a job that regenerates /llms-full.txt on content changes, runs a linter, and publishes atomically. Emit an ETag and updated_at, and fail the build on broken anchors or missing required sections.

For very large docs, maintain chunk manifests (e.g., /llms-full-index.txt) and only rebuild the changed chunks. Consider posting diffs to PRs so reviewers can spot unintended removals or size regressions. Over time, this keeps your file trustworthy, fresh, and predictable for agents.

Validation and troubleshooting

Quality control prevents silent failures where models read the file but miss critical sections. Validate format, links, and size before publishing.

Then run prompt-based tests to confirm models cite and use the file correctly. Monitor usage over time so you can fix drift and duplication early.

Linting checklist and common formatting mistakes

Required sections present: Title, Overview, Key Concepts, Procedures, References.
Headings use #, ##, ### consistently; no empty headings.
Code blocks fenced with language hints; no mixed tabs/spaces in lists.
Absolute URLs resolve (200/301) and include https.
File size within budget; no binary blobs; UTF-8 only.

Quick sanity regex ideas:

Headings: ^#{1,3}\s.+$
Links present: [[^]]+](https?://[^)]+)

Sanity tests with sample prompts

Ask: “Fetch and use https://yourdomain.com/llms-full.txt to answer: How do I authenticate?” Check for correct headers and steps.
Ask for a citation: “Cite the section and URL you used.” Verify it references your file or linked refs.
Negative test: “What’s the GraphQL endpoint?” If you don’t have one, the correct answer should say it’s not supported.

Monitoring for stale/duplicated content

Log requests to /llms-full.txt, track ETag hits, and alert when traffic rises but updated_at lags releases. Periodically diff /llms-full.txt against your source docs to catch drift.

Remove repeated examples that bloat the file and increase token cost. These lightweight signals help you prioritize updates where they matter most.

How AI tools consume it today

Provider behavior is evolving, but most tools benefit when you offer a clean Markdown source at a predictable path and reference it in prompts or indices. Make the file easy to fetch, small enough to load quickly, and explicit about scope and version.

When possible, include it in your LLMs.txt index to guide discovery. This ensures consistent grounding across chat, IDEs, and agents.

ChatGPT, Claude, Cursor: manual and plugin-driven approaches

Manual: Paste or link /llms-full.txt and instruct the model to use it as the primary source of truth.
Browsing/tools: If browsing or tool-use is enabled, the model can fetch the URL and ground answers.
Dev environments: Editors like Cursor can load URLs or local files to inform completions; keeping a single, well-structured file improves retrieval quality.

Self-hosted agents and RAG pipelines

RAG pipelines can ingest /llms-full.txt directly, chunk by headings, and embed with your vector DB. Pin the file version via ETag or commit SHA, and store chunk-level provenance for citations.

For offline or on-prem agents, mirror the file internally and cache by version to avoid external fetches. As your corpus grows, layer in chunk manifests and per-section freshness metadata for targeted re-embeds.

Security, licensing, and compliance

Treat /llms-full.txt like a public API for your knowledge: share what you intend to be reused, and nothing more. Set policy upfront so your automation can enforce it.

If in doubt, link to controlled sources instead of embedding sensitive material. Clear licensing reduces downstream ambiguity and improves adoption by tools.

PII redaction and scope control

Exclude secrets, internal emails, tokens, and customer data. Replace sensitive values with placeholders and add warnings where applicable.

If certain areas require auth, do not inline them—link to the gated docs and clearly mark access requirements. Consider an automated redaction pass in CI to catch accidental leaks. Scope discipline keeps your file safe to fetch and widely reusable.

Licensing and attribution practices

When quoting third-party material, link to the original and clarify terms. This reduces legal ambiguity and encourages proper citation in generated answers. Keep license changes versioned and visible via updated_at or a changelog.

Multilingual and multi-brand strategies

Prefer one file per language (e.g., /llms-full.en.txt, /llms-full.es.txt) and advertise them in /llms.txt with language tags.
For multi-brand orgs, publish one file per brand and keep product names unambiguous to reduce cross-contamination in LLM answers.
Use consistent section ordering and metadata across locales to simplify agent selection.
If you localize asynchronously, note the translation date and base version to avoid mixing mismatched content.

Measuring impact

Define success before you ship: improved answer accuracy, fewer escalations, faster time-to-resolution, or higher self-serve rates. Then instrument your pipeline so you can attribute changes to updates in /llms-full.txt.

Combine prompt-based QA with telemetry to validate real-world impact. Iterate on high-traffic sections first to maximize return.

Prompt-based QA tests and accuracy benchmarks

Create a fixed test set (e.g., 30–50 tasks) representing FAQs, edge cases, and error-handling.
Score answers for correctness, completeness, and citation quality before and after publishing /llms-full.txt.
Track deltas and flag regressions when content changes.
Rotate in new prompts as products evolve, but keep a stable core for trend analysis.

This gives you a repeatable signal on whether changes help or hurt.

Telemetry: request logs, file fetches, and outcome tracking

Measure requests to /llms-full.txt, ETag hit ratios, and average fetch size.
Tie support tickets and search queries to topics in the file to see which sections drive outcomes.
For RAG, log chunk IDs and citations to identify high-value or stale sections.
When you spot spikes in a topic, review that section for clarity, duplication, and token cost.

Use these insights to prioritize edits that reduce confusion and improve resolution rates.

FAQs

What is the difference between LLMs.txt and LLMs-Full.txt? LLMs.txt is a curated index of high-signal links; LLMs-Full.txt is the full Markdown content for direct consumption.
How do I create a /llms-full.txt file? Export key docs to Markdown, consolidate into a single file with required sections, add metadata, lint, and host it at https://yourdomain.com/llms-full.txt.
Best practices for LLMs-Full.txt size and tokens? Keep it under ~1 MB uncompressed; chunk into multiple files if you exceed ~100k tokens per chunk.
How do ChatGPT and Claude use /llms-full.txt? Provide the URL in your prompt or via browsing/tools and instruct the model to rely on it as the primary source.
Should I use llms-ctx-full.txt or /llms-full.txt? Use /llms-full.txt as the canonical public file; llms-ctx-full.txt is a tool-specific variant for pipelines.
How often should I update /llms-full.txt? Update on every docs change that affects answers; automate in CI and bump updated_at.
Does /llms-full.txt create SEO duplicate-content risk? Minimize by using noindex headers if needed, avoiding full duplication of HTML pages, and treating it as a machine-readable artifact.
What headers, caching, and rate limits are recommended? Serve Content-Type: text/markdown; set ETag and Cache-Control: max-age=3600; enable gzip/Brotli; allow CORS (Access-Control-Allow-Origin: *) and apply modest rate limits (e.g., 60 req/min/IP).
How do I validate and troubleshoot? Run a linter for headings/links, test sample prompts, check file size, and verify all URLs resolve with curl.
Is LLMs-Full.txt supported by major AI providers? There’s no official standard, but many tools and agents benefit when you provide a clean, fetchable Markdown source and reference it in prompts or indices.

References and further reading

LLMs.txt community spec and examples: https://llmstxt.org
Anthropic tool use (for URL fetching/browsing patterns): https://docs.anthropic.com
OpenAI retrieval/browsing guidance: https://platform.openai.com/docs
Firecrawl (Markdown crawling/export): https://github.com/mendableai/firecrawl
Docusaurus, MkDocs, Hugo (static-site docs frameworks): https://docusaurus.io | https://www.mkdocs.org | https://gohugo.io

Note: Provider support and best practices evolve quickly; check the spec and tool docs regularly and include updated_at in your /llms-full.txt to signal freshness.