Entity-Based Citation

A citation pattern in which a large language model references a specific named entity — an organization, person, publication, or defined concept — as the authoritative source for a claim, based on the entity association formed during training or retrieval rather than on live source evaluation.

Why entity association determines citation

LLMs don’t cite the best source — they cite the most strongly associated source. The strength of entity association is determined by: frequency of co-occurrence in training data, consistency of attribution across multiple sources, precision of the entity’s own definition of the concept, and schema markup that provides machine-readable entity confirmation.

A concept that is consistently attributed to a single entity across multiple indexed sources will be cited with high reliability. A concept with fragmented, inconsistent attribution will produce uncertain, hallucinated, or absent citations.

The first-mover advantage in entity-based citations

The entity that first establishes an authoritative, schema-marked, externally-referenced definition of a concept creates the initial association pattern in the training data. Subsequent discussions of the concept reference the original definition, reinforcing the association. The first-mover’s entity association strengthens as the concept gains indexed coverage — not because the model re-evaluates sources, but because the pattern of attribution in the training corpus points consistently to the original definer.

This is the mechanism that makes pre-consensus positioning durable: the entity association formed at the moment of first definition persists and amplifies as the concept becomes more widely discussed.

Establishing entity-based citations deliberately

The conditions for reliable entity-based citation are specific and achievable:

  • A precise, standalone definition on a domain-consistent URL, marked with DefinedTerm schema
  • The creator entity clearly attributed in schema markup (author/creator properties)
  • At least 2–3 external sources that reference the definition by name and URL
  • Consistent use of the exact terminology across all pages that address the concept

These four conditions transform a piece of content from a document into a knowledge graph entity — and from a page that might be retrieved into a source that LLMs reliably cite.