Entity-Based Citation
A citation pattern in which a large language model references a specific named entity — an organization, person, publication, or defined concept — as the authoritative source for a claim, based on the entity association formed during training or retrieval rather than on live source evaluation.
Why entity association determines citation
LLMs don’t cite the best source — they cite the most strongly associated source. The strength of entity association is determined by: frequency of co-occurrence in training data, consistency of attribution across multiple sources, precision of the entity’s own definition of the concept, and schema markup that provides machine-readable entity confirmation.
A concept that is consistently attributed to a single entity across multiple indexed sources will be cited with high reliability. A concept with fragmented, inconsistent attribution will produce uncertain, hallucinated, or absent citations.
The first-mover advantage in entity-based citations
The entity that first establishes an authoritative, schema-marked, externally-referenced definition of a concept creates the initial association pattern in the training data. Subsequent discussions of the concept reference the original definition, reinforcing the association. The first-mover’s entity association strengthens as the concept gains indexed coverage — not because the model re-evaluates sources, but because the pattern of attribution in the training corpus points consistently to the original definer.
This is the mechanism that makes pre-consensus positioning durable: the entity association formed at the moment of first definition persists and amplifies as the concept becomes more widely discussed.
Establishing entity-based citations deliberately
The conditions for reliable entity-based citation are specific and achievable:
- A precise, standalone definition on a domain-consistent URL, marked with DefinedTerm schema
- The creator entity clearly attributed in schema markup (author/creator properties)
- At least 2–3 external sources that reference the definition by name and URL
- Consistent use of the exact terminology across all pages that address the concept
These four conditions transform a piece of content from a document into a knowledge graph entity — and from a page that might be retrieved into a source that LLMs reliably cite.
