Global Knowledge for AI

A Database-First Approach

September 13, 2025

View Slides: Global Knowledge for AI (Slide Deck)

Introduction

The Semantic Web's vision of intelligent machines understanding globally interconnected data has seen success in domains requiring web-scale federation. For enterprise AI applications that demand high-performance structured data operations and rigorous logical constraints, a **complementary** database-first approach may prove more effective. This article explores when and why foundational database principles, combined with modern APIs like GraphQL, can better serve specific knowledge-intensive AI use cases.

Domain-Specific Challenges

In the world of business and enterprise AI, the need for robust and scalable knowledge management systems is critical. Tim Berners-Lee's vision for the Semantic Web grew from the success of the World Wide Web. The web’s document model excels at linking distributed information across organizational boundaries. However, applying this approach to enterprise-scale structured data introduces architectural trade-offs, particularly in scalability and controlled exposure for business-to-business (B2B) partners.

Architectural Trade-offs in Different Contexts

The web's document model treats everything as markup, a design that is effective for human-readable content and flexible linking. However, for systems that depend on structured data, this can introduce architectural considerations. The database-first approach, by contrast, maintains a clear separation between physical storage, logical schemas, and presentation layers. This discipline offers a robust foundation for building high-performance, maintainable systems.

A key strength of this model is its ability to enforce a strong separation between data structure (the schema) and data content (the instances). This clarity simplifies management, allows for easier optimization, and facilitates long-term system evolution. This focus on **data independence** and predictable query performance aligns with the philosophy of database experts like Michael Stonebraker, who advocates for specialized systems optimized for specific tasks.

The Solution: A Database-First Approach

For enterprise contexts where database principles are well-established and performance is critical, we can build on proven database management foundations while adding intelligent public interfaces that can interoperate with broader external data systems. This approach is designed to leverage the best of both worlds: internal rigor and external flexibility.

Object-Role Modeling: A Practical Semantic Layer

For practical use in today's neuro-symbolic AI, an ontology is more than a theoretical concept. It is a structured, interpretable specification of a domain, expressed through logic-governed constraints and formal semantics. An Object-Role Model, when developed rigorously, can serve as a powerful, machine-interpretable ontology that fully meets this definition, and can effectively form the semantic layer of the knowledge architecture. This is not to say that ORM is a replacement for formats like RDF; rather, a rigorously developed ORM model can serve as the conceptual foundation from which precise RDF or OWL triples can be generated, ensuring logical consistency from the start.

Object-Role Modeling is a conceptual methodology that uses a role-based approach to prioritize constraints and conceptual abstraction. It focuses on defining the world independent of any specific implementation and provides a precise semantic blueprint that various systems can implement. Its utility for explicit business rule modeling and robust enterprise information architecture is particularly noteworthy. This approach is an effective tool for building a practical semantic framework, which is explored in more detail in my other articles on this subject.

Evidence for ORM's Preference in AI

Because ORM verbalizations are clear natural language and LLMs are trained on natural language, a practical hypothesis follows: LLMs may handle ORM verbalizations more effectively than other popular encoding formats. We do not yet have direct, peer-reviewed comparisons, but we can reason from first principles based on how these systems work. Specifically:

Examples of ORM Verbalizations

ORM verbalizations are precise, structured sentences that describe facts about the world. They are fundamentally human-centric and can be easily understood by both people and AI models trained on natural language.

A Person has a Name.
An Employee works for a Company.
An Employee uses a Machine on a Project. (Ternary example)

Training Data:

LLMs are trained on vast corpora of natural language text from the internet, books, and other sources. Their primary function is to understand and generate human-like language patterns.

Natural Language as a Common Ground:

ORM verbalizations, like "a `Customer` places an `Order`", are precise, structured sentences. They serve as a clear, human-readable bridge between the conceptual model and the underlying data.

Knowledge Grounding:

The emerging field of neuro-symbolic AI explores how to combine the strengths of neural networks (LLMs) with the precision of symbolic knowledge bases. The challenge is creating a high-fidelity interface between these two systems. A knowledge representation that is already in a natural-language format could make this grounding process more direct and efficient, potentially reducing the cognitive load on the LLM.

Human-Centric Design:

Formats like RDF and JSON-LD are optimized for machine-to-machine communication, often using complex URI structures and a "triples everywhere" approach that, while logically sound, is not human-readable at a glance. By contrast, an ORM-based semantic layer prioritizes human understanding first, which aligns with the very nature of LLMs as natural language processors.

This does not invalidate other formats but suggests that an ORM layer may provide a unique advantage by acting as a highly intuitive knowledge interface for both humans and AI models.

GraphQL as a Public-Facing Knowledge Interface

Once a solid, logical foundation, precisely defined by an ORM model and forming the semantic layer, is established, that knowledge can be exposed through a flexible, modern API. GraphQL can be a highly suitable public interface for this purpose, acting as the gateway to the underlying semantics. Importantly, AI services do not need to be pre-trained on this data; they can query GraphQL endpoints on demand to fetch fresh, authoritative knowledge at inference time.

How AI Agents Use GraphQL at Inference Time

Discover & understand: Locate the endpoint and schema via introspection or a registry; parse types, fields, and constraints.
Interpret semantics: Use the domain model (e.g., an ORM specification, optionally represented as JSON-LD or ORM Verbalizations) to map task intent to fact types, roles, identifiers, and constraints.
Plan the query: Construct a minimal query for the task and permissions; prefer persisted queries or allowlists when available.
Execute securely: Use returned semantics and facts to produce the response, include citations/provenance, and cache short-lived results when appropriate.
Ground the answer: Use returned semantics and facts to produce the response, include citations/provenance, and cache short-lived results when appropriate.

Precision, Efficiency, and Type Safety: Clients request only the fields they need, and schemas enforce a strong contract. This cuts over/under-fetching and reduces errors in complex, interconnected domains.

Interoperability and Identity at Scale: An ORM-based conceptual model exposed via a strongly typed GraphQL schema provides shared semantics across systems. Federation preserves optimal internal IDs (e.g., integers or UUIDs) while mapping to stable external IRIs for consistency.

Federation and Distributed Queries: Federation plans and executes queries across services, supports cross-service joins, batches requests, and streams results that are vital for distributed knowledge architectures.

Security, Trust, and Provenance: Depth limiting, rate limiting, field-level permissions, and service-to-service auth protect endpoints. Provenance and quality controls (audit trails, reputation, verification) sustain trustworthy knowledge exchange.

Making Knowledge Discoverable

For a knowledge architecture to be valuable at scale, its semantics must be easy for autonomous agents to find and consume. Beyond basic introspection, use lightweight, internet-native discovery:

Schema Registries: Publish minimal schema metadata (fingerprints, domain tags, endpoints, access patterns) so agents can locate, compare, and cluster related models.
DNS-Based Discovery: Advertise schema metadata via DNS TXT records or a standard subdomain (for example, graphql-schema.domain.com) on existing infrastructure.
Crawling and Beacons: Enable discovery by exposing predictable endpoint patterns or HTML beacons (for example, ``), then rely on introspection to fetch the schema.
Conceptual alignment: Organize related domains by "family resemblance" to promote interoperability without imposing a single universal schema.

The New Context: The Role of AI

The rise of Large Language Models (LLMs) has changed the knowledge representation landscape. Instead of machines reading semantic markup, systems can now understand and reason over natural language at an unprecedented scale. The future involves combining LLM's natural language capabilities with the formal logical reasoning of a clean knowledge representation. This hybrid intelligence leverages the strengths of both neural and symbolic systems. A well-structured database with an ORM-based semantic layer can serve as an ideal foundation for knowledge-grounded AI by:

Reducing Hallucinations: LLMs can query the structured knowledge base for factual verification.
Enabling Citations: The structured data allows LLMs to cite specific sources and track information provenance, addressing concerns about the reliability of AI-generated content.
Providing Domain Expertise: High-quality, domain-specific knowledge representations provide LLMs with expert-level knowledge where training data may be limited.

Decentralized vs. Centralized Architectures

The Semantic Web is a decentralized, distributed, and collaborative effort that relies on the voluntary linking of data across many independent actors. This is its greatest strength, as it enables the emergence of a truly global and loosely coupled web of information. Our approach, by contrast, is more specific and targeted. It focuses on taking information that is typically "stuck" in a single, authoritative source, such as a business's internal databases, and making it available to LLMs and other intelligent services. This creates a different architectural model: one where authority, integrity, and performance are prioritized through a single, well-defined interface, rather than being federated across a vast, uncontrollable network. Both approaches are valid and serve different, though complementary, purposes.

Voices for Model-First and Database-Centric Design

While the Semantic Web community has focused heavily on an RDF-centric approach, a number of influential thinkers in the database and knowledge representation fields have long advocated for a similar, more rigorous, and principled approach to data. Their work offers a strong, evidence-based foundation for our proposed model.

Dr. John F. Sowa, a key figure in knowledge representation and a creator of Conceptual Graphs, has long advocated for the development of more expressive and logically sound systems. His work emphasizes a disciplined approach to knowledge modeling that aligns directly with the rigor of Object-Role Modeling.

Dr. Michael Stonebraker, a Turing Award winner for his contributions to database systems, has been a vocal critic of the "one size fits all" philosophy in data management. His work highlights the performance limitations of non-relational systems for certain use cases and champions specialized database architectures, a philosophy that directly supports our argument that a robust, database-first approach can provide the best foundation for a performant knowledge system.

Dr. E. S. H. Kuhn, a prominent voice in data modeling, argues that RDF itself is a data encoding rather than a formal data model. He asserts that a conceptual model must be developed first, and that RDF is merely one of many possible encodings of that model. This perspective, that a robust, implementation-agnostic model is the primary artifact, is a core tenet of our approach and aligns directly with the principles of Object-Role Modeling.

This group of academics and practitioners demonstrates that the principles of robust data modeling, logical integrity, and high-performance querying are not new, but are foundational ideas that can and should be applied to modern knowledge-intensive AI applications.

Advocates for GraphQL in Practice

The practical application of GraphQL as a knowledge interface is not merely a theoretical proposal. Its adoption and promotion by influential individuals and organizations provide strong support for this approach. Their contributions validate its use in building robust, performant data layers for a range of applications, including those that power AI systems.

Apollo GraphQL, the company behind the open-source Apollo platform, is perhaps the most prominent advocate for GraphQL's use in enterprise settings. Their work on Apollo Federation provides a clear architectural blueprint for building a unified "supergraph" from many disparate microservices. This concept directly aligns with our proposal of using GraphQL to federate diverse, authoritative data sources into a cohesive knowledge layer.

The co-creators of GraphQL at Facebook, including Lee Byron, Nick Schrock, and Dan Schafer, introduced the language as a solution to the real-world problems of mobile data fetching and API evolution. Their original white papers and presentations established the core principles of strong typing and precise queries, which are the same qualities that make GraphQL well-suited for machine-to-machine communication with AI agents.

Engineers at leading technology companies like Netflix and Airbnb have published extensively on their adoption of GraphQL to manage their complex, distributed data landscapes. Their blog posts and conference talks often detail how a GraphQL layer simplifies client-side application development and, more importantly, provides a single, consistent interface to a fragmented backend, which is a foundational requirement for building a reliable knowledge base for AI.

Conclusion

The vision of a global, interconnected knowledge base is more relevant than ever. This article proposes that for enterprise and AI applications that demand database-level performance and logical integrity, a database-first approach offers a robust and effective path forward. By combining the rigor of established database principles and ORM with the flexibility of a modern API like GraphQL, we can build authoritative knowledge systems that directly address the needs of AI. This is not about replacing the valuable work of decentralized systems like the Semantic Web, but rather about providing a powerful, specialized tool for a critical need: making an organization's authoritative data directly usable by intelligent agents. The future of knowledge-aware AI is a hybrid one, where high-performance, logically-sound databases contribute their rigor to the broader digital ecosystem, ensuring that AI systems have access to both web-scale information and precise, verifiable enterprise knowledge.

Other Sources for Further Research:

Stonebraker, M., & Pavlo, A. (2024). What Goes Around Comes Around... And Around.... SIGMOD Record, 53(1).
Byron, L., Schrock, N., & Schafer, D. (2015). GraphQL: A data query language. (Often referenced from early presentations/blog posts when GraphQL was open-sourced by Facebook).
Apollo GraphQL Documentation & Blog. (Various articles on GraphQL Federation, schema design, and enterprise adoption. A good starting point for exploring practical GraphQL implementations at scale).
Kuhn, E. S. H. (2007). RDF is an Encoding, Not a Model. An argument for the necessity of a rigorous conceptual model before any data encoding.
Sowa, J. F. (n.d.). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Various writings on his website: https://www.jfsowa.com