The Relational Model Still Matters

By G. Sawatzky, embedded-commerce.com

August 15, 2025


This article explores the future of the relational model and its relevance in light of advances in artificial intelligence. A key motivation for this discussion is the growing trend of using natural language queries with LLMs, raising the fundamental question of whether SQL, or the relational model itself, will remain relevant. While the relational model is almost universally associated with SQL and SQL DBMSs, this discussion takes a "back to fundamentals" approach. This article will focus on the original underpinnings of the relational model, the decisions Codd made and why, and the rigorous work by Chris Date and Hugh Darwen. This article will also look at alternatives for RDBMSs based on the works of Date and Darwen, along with potential novel approaches that might be considered given the capabilities of AI language models.

Codd's design wasn't just about data storage; it aimed to build a mathematically sound information system. His choices, first-order logic (FOL) and set theory, were forward-thinking. FOL gave queries a strong mathematical base, ensuring results could be algorithmically determined and query optimization automated. This decidability was key for reliable systems. Set theory, conversely, offered a clean, abstract way to represent data. Thinking of relations as sets of tuples removed the messy, navigational problems of older database models. This also ensured mathematical closure: operations on relations always yield relations, crucial for building complex, compositional queries.

Logical-physical independence was perhaps Codd's most revolutionary idea. It separated what you query (the logical view) from how the system stores and retrieves it. This decoupling meant database implementers could innovate with storage and indexing without changing application code. It also enabled automated query reasoning, turning the optimizer into a theorem prover that finds the most efficient execution plan. The model’s inherent simplicity and uniform abstraction, 'everything is a relation', scaled easily. Programmers could work declaratively with data, avoiding complex internal structures. These principles created systems where human intent and mechanical execution had formal bridges, a separation critical as AI systems grow in complexity.


Codd's Pragmatic Choice: First-Order Logic over Second-Order Logic

When designing the relational model, Codd carefully considered various logical systems, including second-order logic. He ultimately chose First-Order Logic (FOL) for practical and computational reasons. Second-order logic is fundamentally undecidable; there is no general algorithm to guarantee that a second-order query will terminate or that its validity can be determined. For building automated database systems, this was a critical barrier. Query processors need to reliably terminate and allow for algorithmic reasoning about query equivalence and optimization.

Beyond decidability, second-order logic presents significant computational complexity. Even its decidable fragments often demand exponential time or space, making them impractical for the technology of the 1970s, or even today's large-scale systems. Codd needed a model that could be efficiently implemented. FOL mapped naturally to computable operations: relations align with finite sets, quantifiers with loops, and predicates with computable functions. This practical choice, prioritizing computational tractability over maximum expressive power, enabled the creation of working, high-performance database systems. This pragmatic approach remains crucial for AI, where generated queries must execute reliably and efficiently within real-world constraints.

Note: Full first-order logic (FOL) is only semi-decidable—you can confirm a theorem if it’s valid, but not always prove invalidity in finite time. The relational model instead uses relational algebra and relational calculus, which are decidable.


Why the Relational Model Thrives in the AI Era

The relational model's foundational strengths offer distinct advantages, especially as AI systems become more sophisticated and data-intensive.

One key benefit is declarative reasoning. The ability to express what relationships should exist without specifying how to find them provides a clear separation between intent and computation, invaluable for AI. Unlike procedural or navigational models, the relational model allows AI systems to reason about data relationships mathematically, proving query equivalences, inferring constraints, and optimizing access patterns automatically. This enables advanced automated analysis.

The compositional closure property, where every operation produces a result of the same type (a relation), is another critical advantage. Automated reasoning systems can build complex queries from simple, well-defined parts and transform them algebraically. Many NoSQL systems, in contrast, lack this mathematical closure; their operations often yield varying data types or need external coordination, complicating automated processing.

Furthermore, the model’s robust formal optimization theory fits AI perfectly. Query optimization in relational databases isn't just heuristic; it's rooted in mathematics. Cost-based optimizers can formally analyze equivalent expressions and select the most efficient execution strategy. This automation is far harder to achieve in models without a strong algebraic foundation. For AI, which increasingly generates queries programmatically for complex analytical workflows, this inherent optimization capability is a huge advantage.

Crucially, the relational model serves as a mathematically rigorous intermediate representation for AI systems. When a Large Language Model (LLM) translates a natural language query into a database query, the target language's semantic clarity is paramount. Translating natural language into a a First-Order Logic (FOL)/set-based language (like relational algebra) is fundamentally more reliable than translating to a less-structured navigational graph query language. The relational model’s well-defined compositional semantics, its ability to support equivalence testing, and its logical completeness make it an ideal "semantic compilation target" that both humans understand and machines can optimize. This greatly improves the reliability of AI-generated queries, as correctness can be formally verified.


Refining the Model: The Third Manifesto

While Codd set the stage, Chris Date and Hugh Darwen's "Third Manifesto" offers crucial refinements. It directly addresses perceived deviations in SQL from Codd's original, purist vision, significantly strengthening the relational model’s relevance for automated systems.

A core contribution is their insistence on a proper type system. They argue types aren't mere implementation details; they're logical constructs essential for formal reasoning. Their distinction between scalar and nonscalar types, with proper type inheritance, builds a more robust mathematical foundation. This is critical for avoiding the semantic ambiguities that plague SQL, especially its problematic handling of `NULL` values and three-valued logic. For AI systems relying on precise, consistent data semantics, eliminating these ambiguities is a must.

Date and Darwen also emphasize true relational closure and orthogonality. They advocate for a model where every operator strictly produces a relation, cutting out SQL's numerous special cases and inconsistent return types. This orthogonality is vital for AI systems that need to compose operations predictably. You simply can't build reliable inference engines on an inconsistent foundation, and their approach restores the logical coherence that SQL often compromises for practical reasons.

Beyond that, the Manifesto extends logical-physical independence to data modification, treating assignment as a logical operation. This creates a cleaner base for reasoning about state changes, increasingly important as AI systems need to understand and interact with dynamic data, temporal relationships, and causal effects. Their push for languages like Tutorial D, which truly reflect relational principles, shows how far SQL has drifted and highlights the need for a mathematically sound language interface for automated systems. By stopping implementation details from "leaking" into the logical model, the Third Manifesto provides the rigorous, consistent framework AI systems need for formal optimization and compositional reasoning.

These extensions, including Relation-Valued Attributes (RVAs), are also consistent with FOL. Date and Darwen argue that RVAs operate within an enriched type system where relation types are first-class citizens. This means the logical operations remain first-order; they simply quantify over a more complex domain that includes relations as atomic values. This approach allows for increased expressiveness while preserving the critical decidability and computational tractability of the underlying logical framework, a design choice that has been sufficiently addressed despite some historical challenges to its consistency with FOL.


Overcoming Limitations: The "Bridge" Approach

Despite the relational model's theoretical elegance, SQL, its most popular form, has practical limits, especially regarding composability. Queries can be verbose and hard to nest, impeding automated reasoning. However, as Michael Stonebraker points out, successful database innovations often re-enter the SQL ecosystem. This shows SQL's strong market pull, but it also raises a key question for AI: Will AI be stuck with SQL's limitations, or will it finally force a fundamental shift?

A promising architectural pattern addresses SQL's weaknesses while keeping the relational model's benefits: the "bridge approach." This involves creating higher-level abstractions that compile down to SQL. This strategy leverages decades of query optimization while offering a more principled logical interface.

Logica, a declarative logic programming language from Google, is a prime example. Part of the Datalog family, Logica extends classical logic programming with features like aggregation and compiles its queries to SQL. Logica provides Datalog's compositional advantages, letting developers define complex queries through logical rules that combine naturally, unlike SQL's more rigid structure. By compiling to SQL, Logica uses the mature, performant SQL engines already out there, avoiding the huge effort of building a new database engine. Its synergy with modern systems like DuckDB, which handles SQL issues like embedded deployment and columnar performance, makes this a powerful 'best of both worlds' solution.

This "bridge" pattern suggests a viable evolution for relational systems in the AI era. Instead of abandoning the relational foundation, developers can build mathematically sound languages that provide superior compositional capabilities and cleaner semantic interfaces, all while using the robust, optimized SQL execution infrastructure. The value here lies not in a wholesale adoption of specific languages like Logica, Tutorial D, Malloy, or PRQL as complete solutions, but in borrowing their underlying ideas and principles to enhance relational systems. For example, Logica, by design, isn't built for general external integration and, in its current form, might not be a direct fit for the solution we seek. Furthermore, some of its language elements, in this author's opinion, still fall short; yet, its core concept remains highly relevant. This is especially powerful for AI, which needs sophisticated, programmatic analytical pipelines without sacrificing performance or reliability.


The Relational Model: A Semantic Compass for AI's Future

The relational model will stay highly relevant in an AI-driven future. Its ability to provide a mathematically rigorous intermediate representation underpins this. When AI models, especially Large Language Models (LLMs), handle data, they often translate natural language into queries. The relational model's clarity and formal properties, rooted in First-Order Logic and Set Theory, make it an ideal target for such translations. This stands in sharp contrast to systems like graph databases, where query languages often expose navigational details, making verification and optimization much harder.

Consider the challenge of training AI systems to generate queries for new or specialized functions, even those not yet developed. The solution lies in using the relational model's mathematical basis. Instead of training LLMs on limited examples of new query languages, they can learn to translate natural language into universal mathematical expressions (e.g., set operations, logical quantifiers). These mathematical concepts are abundant in the academic literature and formal specifications LLMs have already seen. This "mathematical common language" then becomes a precise intermediate representation that can be algorithmically translated to relational constructs, keeping logical consistency and supporting optimization.

This approach works for any pure function, a deterministic mapping from input to output without side effects. It integrates seamlessly into the relational framework. For instance, an AI image recognition function, `ImageClassifier(image, criteria) -> label`, can be thought of relationally. This means AI-native operations can combine with traditional relational queries, benefiting from the model's compositional properties and optimization. The mathematical bridge effectively acts as a universal adapter, letting the relational model serve as a unifying semantic layer for hybrid AI-database systems.

Ultimately, Codd's vision, refined by Date and Darwen, offers a semantic compass for navigating data complexity in the AI age. While SQL will undoubtedly remain a dominant force for the foreseeable future, its underlying principles could inspire new, potentially more precise and compositionally powerful, implementations of the relational model given the demands and opportunities presented by LLMs. Its relevance isn't just enduring; it's growing because of AI's demands.


References

Other Sources for further Research