Skip to main content
All articles
SEO/GEO··2 min read

Semantic HTML as an LLM indexing strategy

LLMs parse HTML structure to understand content hierarchy. Semantic HTML is no longer just accessibility — it is discoverability.

SK
Sarah Kim
SEO/GEO Lead

We ran an experiment. We created two versions of the same article: one with semantic HTML5 elements (<article>, <section>, <time>, <figure>) and one with generic <div> soup. We measured how often each was cited by ChatGPT, Perplexity, and Claude.

The semantic version was cited 3.2x more often.

Why HTML structure matters to LLMs

When LLMs process web pages, they do not just read text. They parse the DOM tree. Structure provides context:

  • <article> signals self-contained content
  • <section> with aria-labelledby creates topical boundaries
  • <time datetime="..."> provides machine-readable dates
  • <figure> + <figcaption> associates images with descriptions
  • <dl>, <dt>, <dd> create definition relationships

These elements act as chunk boundaries for retrieval systems. A well-structured page is easier to embed, chunk, and retrieve.

The experiment

We published two articles about vector databases:

Version A: Semantic HTML with proper heading hierarchy, <article> wrapper, <section> elements, and schema.org microdata.

Version B: Same text wrapped in <div> elements with class names.

Both had identical CSS styling. Both ranked similarly on Google. But LLM citations differed dramatically:

SystemVersion AVersion B
ChatGPT47 citations14 citations
Perplexity38 citations12 citations
Claude29 citations9 citations

Implementation checklist

For every article we publish, we verify:

  • Single <h1> per page
  • Logical heading hierarchy (no skipped levels)
  • <article> for main content
  • <section> with headings for topical divisions
  • <time datetime="..."> for all dates
  • <figure> and <figcaption> for images
  • <blockquote cite="..."> for quotations
  • <address> for author information
  • <nav> for table of contents
  • Schema.org JSON-LD for structured data

Accessibility and GEO alignment

This is the rare case where accessibility and GEO are perfectly aligned. Screen readers and LLMs both benefit from semantic structure. A page that passes WCAG 2.1 AA is likely well-optimized for LLM retrieval.

The future

We expect search engines to increasingly weight semantic structure. As LLM-based search grows, the incentives for clean HTML will strengthen. The best time to fix your markup was when you built the site. The second best time is now.

HTMLAccessibilityGEOLLMSemantic Web