Marketing · AI · May 2026 · Reading ~18 min
60% of Google searches already end without a click, and AI chats (ChatGPT, Claude, Perplexity, Gemini, Copilot) absorb more informational queries every month than traditional SERPs. GEO — Generative Engine Optimization is the discipline that makes your content appear inside the responses of those generative engines, not below them. This guide explains how to orient your site towards that new channel in 2026 without discarding what you have learned from classic SEO: leverage what already works (E-E-A-T, schema, topical authority) and add what is new (llms.txt, citation-worthy passages, FAQPage, Speakable, and crawler permissions for OpenAI, Anthropic and Perplexity).
What GEO is and why it matters in 2026
GEO — Generative Engine Optimization — is the practice of adjusting content, structured data and crawling permissions to maximise the probability that a generative engine (ChatGPT with web search, Claude with web search, Perplexity, Gemini, Copilot) will cite your site within the response it returns to the user. The term began circulating in 2023–2024 with academic papers such as GEO: Generative Engine Optimization (Aggarwal et al., Princeton) and was established as a professional category in 2025–2026.
The friction is real. Pew Research measured in 2024 that 26% of adult US users had used an AI chatbot to search for information, and SimilarWeb reported that ChatGPT grew from 100 million monthly users in January 2023 to more than 800 million in 2025. When a user asks Claude "what is the AI Act" or asks Perplexity "best CRM for SMEs (small and medium-sized enterprises)", the response is a synthesis with citations — and only the sites that the LLM was able to read, judged authoritative, and has explicit permission to cite will appear.
Differences from classic SEO
| Dimension | Classic SEO | GEO |
|---|---|---|
| Objective | Appear in the SERP | Appear inside the LLM's response |
| Unit of relevance | URL / page | Passage / citable paragraph |
| Authority signal | Backlinks, E-E-A-T | Backlinks + unlinked mention + verifiable data |
| Access control | robots.txt Googlebot/Bingbot | robots.txt + llms.txt + headers |
| Metric | Average position, CTR, GSC impressions | Share of voice in LLMs, citations per query, referral traffic from chatgpt.com/perplexity.ai |
| Impact latency | 2–12 weeks | Hours to days (continuous index refresh by the LLM) |
GEO does not replace SEO: it extends it. Content that is already well-ranked in Google with strong E-E-A-T is a natural candidate for LLM citation. Content ranked through outdated SEO tactics (keyword stuffing, thin content, fake schema) will not convince any generative engine, because its citation criteria are closer to those of an editor than those of a crawler.
Why this changes the economics of traffic
The relevant change is not technical but economic. When Google answers directly with AI Overviews, the click disappears — Pew Research measured in July 2025 that searches with an AI Overview receive approximately half the clicks of equivalent searches without an AI Overview. SimilarWeb confirmed double-digit organic traffic declines on travel, health and education sites during 2024–2025. That loss is not reversible with classic SEO because there is no SERP to improve: the user never gets to see links.
The operational consequence is that a B2B site in 2026 needs two live organic visibility channels:
- Traditional SERP for transactional queries with clear intent ("hire a GDPR lawyer in Burgos") where the click does happen because the user wants to take action.
- Citation inside an LLM response for informational queries ("what is the AI Act", "how is a GDPR fine calculated") where the click disappears but the brand becomes associated with the correct answer.
Giving up the second channel is handing brand equity to competitors who have invested in GEO. It is the same strategic decision that brands made between 2008 and 2012 when they accepted that investing in social media was worthwhile: that channel generated little measurable traffic but a great deal of brand recall.
LLM crawlers: how OpenAI, Anthropic, Perplexity and Google see you
Before optimising, you need to understand which bots exist, what they do and how to allow or block them. Each provider publishes its user agent and its behaviour in robots.txt.
OpenAI
GPTBot— training crawler. Visits sites to enrich future models. Whether to allow it is an editorial decision.OAI-SearchBot— live-search crawler for ChatGPT (when the user activates "Search"). If you block it, you will not appear in ChatGPT Search responses.ChatGPT-User— one-off fetch when a user pastes a URL into the chat.
Anthropic
ClaudeBot— training crawler.Claude-WebandClaude-User— live fetch when Claude responds with web search.anthropic-ai— legacy agent still referenced in some docs.
Perplexity
PerplexityBot— primary crawler. Indexes content to generate responses.Perplexity-User— real-time fetch for a specific query.
Google and Microsoft
Google-Extended— specific token for controlling the use of content in Bard/Gemini without affecting the traditional Googlebot.Bingbot— also serves Copilot (Microsoft).DuckAssistBot— DuckDuckGo Assist.
Recommended policy for a B2B brand
| Bot | Allow | Reason |
|---|---|---|
| OAI-SearchBot | Yes | Appear in ChatGPT Search |
| GPTBot | Yes (if you want to be in future models) | Reinforces E-E-A-T in the base model |
| ClaudeBot | Yes | Citations in Claude |
| Claude-Web / Claude-User | Yes | Live fetch |
| PerplexityBot | Yes | Citations in Perplexity |
| Google-Extended | Yes | Citations in Gemini / AI Overviews |
| Bingbot | Yes | Copilot and traditional Bing |
For an editorial site that sells subscriptions or licences, the policy can be reversed: block GPTBot and ClaudeBot (training) but allow OAI-SearchBot, Claude-Web and PerplexityBot (live search, which does generate referral traffic). This is the stance adopted in 2024–2025 by The New York Times, Reuters and Axel Springer.
The llms.txt standard in 2026
llms.txt is a proposal by Jeremy Howard (Answer.AI) published in September 2024 that standardises a markdown file at the domain root to guide LLMs towards the canonical pages of the site. It does not replace robots.txt or sitemap.xml: it complements them.
Minimum anatomy
The file lives at https://your-domain.com/llms.txt and takes this form:
# Site name
> One-line description of the site's purpose.
Optional additional context in one or two paragraphs.
## Key pages
- [Home](https://your-domain.com/): what the site is about.
- [About](https://your-domain.com/about/): biography and background.
- [Services](https://your-domain.com/services/): what you offer.
## Pillars
- [Compliance](https://your-domain.com/compliance/): hub for ISO, ENS (Spanish National Security Framework), GDPR.
- [Marketing](https://your-domain.com/marketing/): hub for branded content, neuromarketing.
## Optional
- [Blog](https://your-domain.com/blog/): latest articles.
The extended version llms-full.txt includes the full content of each key page in markdown, so that an LLM can process the entire site in a single request without needing to follow links. Useful for technical documentation (Anthropic publishes docs.anthropic.com/llms.txt, Vercel sdk.vercel.ai/llms.txt, Cursor docs.cursor.com/llms.txt).
Best practices in 2026
- Curate the list: only canonical pages (no tags, archives, pagination pages or URL parameters).
- Match your silo architecture: the hub for each vertical appears first, then its pillars, then satellite articles if you want to expose any emblematic ones.
- Do not duplicate sitemap.xml:
llms.txtis selective. Listing 4,000 URLs turns it into noise. - Keep it maintained: review it quarterly. If a page is no longer canonical, remove it.
Schema and structured data for LLMs
LLMs with live search do not "reason" over HTML: they extract information through embedding and parsing. Schema.org remains the most reliable guide because it normalises entities (Person, Organization, Article, FAQPage, Product, LocalBusiness) that the LLM can map to its internal graph.
Essential types for a B2B consultancy brand
- Person with
sameAspointing to LinkedIn, X, Crunchbase, Wikidata where applicable. - Article with
headline,datePublished,dateModified,author(referencing the Person),publisher,image. - FAQPage with real questions your clients ask, phrased the way a human would ask them.
- BreadcrumbList with a real URL hierarchy.
- WebSite with
SearchAction. - SpeakableSpecification inside Article for the TL;DR and direct answers — signals to the LLM which paragraphs are self-contained and suitable for voice responses.
Speakable: the undervalued signal
SpeakableSpecification indicates which sections of the HTML are written to be read aloud (Google Assistant, live assistants). In 2026, LLMs use it as a hint that the passage is self-sufficient and therefore a good candidate for citation. Implementation:
{
"@context": "https://schema.org",
"@type": "Article",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": [".aoc-tldr", "h2 + p"]
}
}
Passage indexing and citable content
Google's passage indexing (announced in 2020 and deployed in 2021) changed the unit of relevance: the engine can rank a specific passage within a long article, not just the URL as a whole. LLMs with live search inherited this logic: if your H2 "What is the AI Act" answers with a self-contained definition in the first paragraph, that passage appears cited in Claude's or Perplexity's response without the LLM reading the full article.
Recipe for a citable passage
- H2 with a natural question: use the exact form in which a human would ask it. "What is", "How is it calculated", "How much does it cost", "When does it apply".
- First paragraph = direct answer: 40–80 words. Define or quantify in a self-contained way. No "in the following section we will see".
- Figure with source: when possible, include a number with a citation to the official source (BOE, EUR-Lex, Eurostat, ENISA, OpenAI docs, Anthropic docs).
- Stable anchor:
id="what-is-ai-act"on the H2 so the LLM can link to the exact passage. - Visible nuances: if the answer has caveats ("only applies to high-risk systems"), state them in the first paragraph. The LLM will not go looking for the caveat at the end of the article.
FAQPage and zero-click answers
The FAQ block at the end of the article is not decorative: it is the format most cited by generative engines because it already comes in a normalised question-and-answer format. Some rules I learned writing this site:
- 5–7 questions maximum per article. More dilutes the value.
- Real questions, not marketing copy. People search for "how much does ISO 27001 certification cost", not "how to optimise your investment in cybersecurity".
- Self-contained answer of 60–100 words, with a figure or date where possible.
- FAQPage schema with a
mainEntityarray ofQuestionobjects withacceptedAnswer.text. - Match the body: the same questions as H2s in the article. Avoid exact textual duplication: use rephrased variants.
Anchors, identifiers and linkable fragments
LLMs with live search learned in 2024–2025 to use text fragments (the #:~:text= syntax standardised by Chrome) and traditional anchors (#id) to link to the exact passage they are citing. This means your anchors are not decorative: they are the minimum unit the engine can share with the end user.
Three practices that have demonstrably moved share of voice in GEO:
- Descriptive anchor on each H2 and H3. Slug in kebab-case reflecting the question or concept, not
section-3. If you redesign, keep old anchors as aliases. - One idea per H3. If an H3 contains two definitions, split it. The LLM will cite the first and leave out the second.
- Tables with thead.
<table>elements with<thead>and<tbody>are extracted in full by Perplexity and cited as tables in the response. Tables without a header are ignored or mis-parsed.
Freshness and maintenance
The last parameter LLMs weigh is freshness. A page with a dateModified 18 months old loses ground against an equivalent page published last month. This is not about updating the date without changing anything (engines detect the falsification: they compare the delta of text), but about keeping the content current: adding new figures, clarifying regulatory nuances, updating examples. On this site we maintain a quarterly review schedule by silo.
How to measure your visibility in generative engines
The operational problem with GEO is that Google Search Console does not measure appearance in ChatGPT, Claude or Perplexity. The metrics available in 2026 are:
1. Referral traffic
In GA4, filter session_source by chatgpt.com, perplexity.ai, claude.ai, copilot.microsoft.com, gemini.google.com. This measures actual impressions generated by citations where the link was clicked. It is the cleanest indicator but underestimates: many responses do not lead to a click.
2. Brand monitoring in LLMs
Specialist platforms (Profound, Otterly, Peec.ai, AthenaHQ, Goodie AI, Daydream) run thousands of prompts in parallel against ChatGPT, Claude, Perplexity, Gemini and Copilot, measure your brand's share of voice versus competitors, and detect tone and citation accuracy.
3. Mentions in responses (zero-click)
Even without generating a click, being cited has brand value. Measure it manually: run 20 representative prompts for your sector and count appearances. Repeat monthly.
4. Server logs
Filter logs by user agent: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended. Measure visit frequency. A site with good GEO receives daily visits from OAI-SearchBot and Perplexity-User.
5. KPIs that actually move the business
Beyond the vanity metric of "how often am I cited", three actionable indicators correlate with revenue:
- Weighted share of voice: the percentage of commercial prompts (those indicating purchase intent: "best", "how much does it cost", "how to hire", "comparison") in which your brand appears. Exclude purely informational prompts.
- Citation sentiment: the LLM may cite you in praise or as a warning. Platforms such as Profound and Goodie AI classify tone. A negative citation repeated across 200 prompts is a fire to put out.
- Citation depth: if the LLM cites you in the first paragraph of its response (primary citation), it generates more click-through than a footnote citation. Measure the distribution.
Three real mini-cases that illustrate the method
Case 1 · ISO consultancy in Castilla y León
A site with 80 articles published between 2021 and 2025 on ISO 9001, 14001 and 27001. Organic Google traffic falling 18% year-on-year in 2025 while the volume of LLM queries about ISO certification was rising. Actions applied in six weeks: publication of llms.txt with 22 canonical URLs (pillars for each standard + 4 service landing pages), refactoring of the top 12 articles with H2-as-question and FAQPage schema, stable anchors and data tables with citations to official sources. Result after 60 days: 14 out of a panel of 30 commercial prompts already returned the domain as a primary citation in Perplexity and Claude, and referral traffic from perplexity.ai grew from 3 sessions/month to 47.
Case 2 · Personal brand of a B2B consultant
An independent professional with 60 long-form articles on strategy and GDPR. A replicable case on a low budget. Actions: consolidated Person schema with sameAs pointing to LinkedIn and X, FAQPage on the 15 most-read pillars, a minimalist llms.txt (12 URLs), and quarterly maintenance declared in dateModified. Result after 90 days: consistent appearance in Claude's responses for 8 prompts in the niche (5 of which with primary citation) and a 22% increase in inbound leads because several clients reported "I found you by asking ChatGPT".
Case 3 · Niche e-commerce for artisan products
A catalogue of 400 SKUs with short descriptions. Here GEO was applied at the product page level: Product schema with real aggregateRating, category-specific FAQs ("how to store this product", "allergens", "designation of origin"), llms.txt with a landing page per category (not per individual SKU), and blocking of GPTBot on the editorial blog section (to protect original content) while allowing OAI-SearchBot and PerplexityBot. The engine began recommending the site in queries such as "artisan gourmet gift from Ribera del Duero".
90-day GEO implementation roadmap
For an SME (small and medium-sized enterprise) with a standard WordPress site and 30–200 published articles, this is the realistic schedule we apply to consultancy clients:
| Week | Action | Deliverable |
|---|---|---|
| 1 | Initial audit | Canonical URL inventory, silo map, list of currently allowed/blocked bots |
| 2 | Crawler permissions | robots.txt with a balanced policy (search bots allowed, training bots per preference) |
| 3 | llms.txt v1 | File at root with 15–30 curated canonical URLs per silo |
| 4–6 | Passage refactor | Top 20 URLs by traffic/conversion migrated to H2-question + direct answer + figure with source structure |
| 7–8 | Schema | Article + Person + FAQPage + Speakable deployed in CMS template |
| 9 | Brand monitoring | Subscription to Profound or Otterly, share-of-voice baseline across the 5 engines |
| 10–12 | Iteration | Optimise the 10 URLs with the worst LLM coverage detected in monitoring |
The key point: do not abandon any classic SEO during the process. GEO adds; it does not subtract.
Frequently asked questions about GEO
What is the difference between SEO and GEO?
SEO optimises for appearing in SERPs (10 blue links) on search engines such as Google or Bing. GEO optimises for appearing inside the synthetic responses generated by ChatGPT, Claude, Perplexity, Gemini or Copilot. The key difference is the unit of relevance: SEO works with full URLs, GEO works with citable passages. Both overlap in authority signals (E-E-A-T, backlinks, schema) but diverge in architecture (llms.txt) and measurement (share of voice in LLMs).
Does blocking GPTBot or ClaudeBot hurt my SEO?
Not directly. Blocking OpenAI's training bot (GPTBot) or Anthropic's (ClaudeBot) does not affect Googlebot or Bingbot. However, if you also block the live-search crawlers (OAI-SearchBot, Claude-Web, PerplexityBot), you forfeit appearing in those engines' responses. The balanced policy is: block training bots if copyright is a concern, always allow live-search bots if you want referral traffic.
Is llms.txt mandatory or just a best practice?
It is not mandatory. llms.txt is an open proposal (Answer.AI, 2024) that no LLM currently requires. However, several technical platforms (Anthropic, Vercel, Cursor, Stripe) already publish it because it improves the quality of responses those LLMs give about their products. For a B2B site with a silo architecture, publishing it takes 10 minutes and sets you apart: the engine knows which pages are canonical and which to skip.
How do I know if ChatGPT or Claude are citing my site?
Two routes. Manual: run 15–20 representative prompts for your sector ("best SEO consultant in Burgos", "what is the AI Act", "ISO 27001 certification for SMEs") and check whether your domain appears in the response citations. Tool-assisted: platforms such as Profound, Otterly, Peec.ai or AthenaHQ automate thousands of monthly prompts, measure your share of voice against competitors and alert you to inaccurate citations. Complement this with GA4 filtering the referrer chatgpt.com and perplexity.ai.
How long does it take to see the impact of LLM optimisation?
Much faster than SEO. Live-search LLMs (ChatGPT Search, Claude with web search, Perplexity) refresh their context within hours to days each time a user submits a new query. Publishing an article with FAQPage schema, structured data and citable passages on Monday can translate into citations by Tuesday or Wednesday. Training models (when a new version is released) take months, but they are no longer the primary channel: live search is.
Is it worth publishing content specifically for LLMs?
Only if that content also works for human readers. Creating synthetic FAQs and long lists of definitions "so the LLM cites them" is a return to the keyword-stuffing pattern of the 2000s. The 2026 rule is: write for a curious professional reader, add schema and citable passages as a technical layer, and let the LLM choose. Content designed only for LLMs ages poorly and generative engines penalise it for superficiality.
What mistakes should I avoid when doing GEO in 2026?
Five recurring antipatterns. (1) Blocking all LLM bots "just in case" and disappearing from the channel. (2) Publishing llms.txt with 4,000 URLs, turning it into noise. (3) Hard-coding FAQPage with invented questions instead of the ones your clients actually ask. (4) Forgetting the date in dateModified — LLMs discount content without visible freshness. (5) Citing other sources without verifying them: if an LLM cites your site and finds your citation was incorrect, you will lose topical authority in future queries.