August 14, 2025
In the world of software development, those of us deeply involved in database management systems (DBMS) built on SQL might sometimes feel a disconnect from the discussions within the Knowledge Engineering (KE) community. We often hear claims that SQL falls short in formal knowledge representation due to perceived limitations in its expressiveness and computational reasoning. For those of us who have effectively built complex systems with SQL, these claims can be puzzling; our immediate reaction might be, "that's not true, and here's how it can be done in SQL." This article aims to bridge that gap by exploring this nuanced perspective.
We will provide precise definitions, examine where SQL's practical implementations might indeed present challenges for robust knowledge representation, and, importantly, distinguish these points from the inherent strengths of the formal Relational Model. Finally, we will propose how an ORM (Object Relational Modeling) based toolkit can integrate various technologies to create a comprehensive knowledge engineering solution. These suggestions are general guidelines; their practical implementation should always be based on specific context, use cases, and volumes of data. Furthermore, these considerations do not exhaustively cover the use of LNNs and LTNs to handle uncertainty, which is another key aspect of modern knowledge representation.
As a single author, this article presents my individual interpretation and synthesis of knowledge engineering principles and their relationship to data management technologies. It does not claim to represent the views of the KE community. My perspective may be seen as coming from an individual operating at the intersection of these fields.
To understand the KE perspective, we first need a clear definition of what "knowledge representation" (KR) truly means in this context.
Knowledge Representation (KR) is the formalization and organization of information, along with the explicit rules, relationships, and constraints, in a way that enables computational reasoning, inference, and automated problem-solving. It goes beyond simply storing facts to encoding the logic by which new, implicit facts can be derived or verified by machines.
Reasoning is the process of manipulating a knowledge representation to derive conclusions that are implicitly contained within the represented knowledge. This includes deduction (guaranteed conclusions), induction (generalizing from specific examples), and abduction (inferring causes from effects).
The core distinction from mere "information" is that KR builds structures not just for retrieval, but for active, logical manipulation. For example, if we have information that "Mary is 30 years old" and "All persons over 18 are adults," a knowledge representation would encode the rule that allows us to infer "Mary is an adult" without explicitly stating it. This is a key capability KR aims to achieve.
Knowledge engineers work to build systems that capture and operationalize domain expertise.
Knowledge engineers primarily interact with a declarative model. They define:
Person
, Book
, Department
).Person works for Department
, Author wrote Book
).Person has birthDate
).Book
must have at least one Author
," "a Person
can only work for
one Department
at a time").X
is a kind of
Mammal
and Mammal
has
Hair
, then X
has
Hair
").This interaction is typically through high-level, human-readable (and machine-interpretable) languages that focus on the semantics and logic of the domain, rather than low-level storage or procedural steps.
Beyond defining the domain's facts and rules, Knowledge Engineers also require the ability to query and reason with the model itself. This means asking questions not just about the data instances (e.g., "Who wrote '1984'?"), but about the schema, concepts, and relationships defined in the knowledge representation (e.g., "What types of relationships can exist between a Person
and a Department
?", "What are the superclasses of Mammal
?").
The job of a knowledge engineer is done when the formalized knowledge can be effectively used by computational systems for its intended purpose. The value proposition includes:
Knowledge engineers often hand off their deliverables to AI developers, data scientists, software architects, and domain experts who consume the formalized knowledge in various forms, such as:
Just like any engineering discipline, KR requires rigorous testing. Pass/fail criteria include:
This involves unit tests for individual rules, integration tests for complex inferences, and validation against real-world data and expert judgments.
For knowledge engineers, certain characteristics of SQL's typical implementation can be seen as less ideal for formal KR, leading to the perception of "shortcomings."
NULL
values and its three-valued logic (true, false, unknown) can introduce complexities that deviate from standard two-valued logic, which is the foundation for most formal knowledge representation and reasoning.A significant point of contention can arise from the conflation of different modeling layers in SQL.
Traditional SQL database design often forces a blending of these layers. For instance, choosing a VARCHAR
size (physical) can inadvertently impact how a "name" (conceptual) is perceived or constrained. Knowledge engineers prefer tools where the conceptual model can be defined purely, with clear, loosely coupled mappings to logical and physical implementations, allowing changes at one layer without spiraling through all others.
As discussed, this is a key philosophical and practical distinction:
While SQL can simulate OWA behavior with constructs like LEFT JOIN
or NOT EXISTS
, its native behavior operates under CWA. KR languages often default to OWA or provide explicit mechanisms to define which parts of the knowledge base operate under which assumption, offering greater flexibility.
This is a crucial distinction. Many criticisms aimed at "SQL" are not necessarily criticisms of the Relational Model (RM) itself, as conceived by E.F. Codd[1] and rigorously extended by C.J. Date and Hugh Darwen[2].
The formal Relational Model is a powerful logical model. It defines relations, attributes, and, critically, a rich set of integrity constraints (e.g., entity integrity, referential integrity, and general assertions/business rules) that are declarative and logically pure. In its formal sense, the RM provides a robust framework for representing domain knowledge through these declarative constraints. A "relvar" (relation variable) is not just a table; it's a predicate stating a truth about the world.
So, while SQL (the language and its common implementations) often has perceived weaknesses, the underlying Relational Model is a much more capable and logically consistent system for knowledge representation than its practical SQL manifestations sometimes allow.
Date and Darwen, while champions of the RM, are famously critical of SQL's deviations from the formal model. Many of their criticisms align with the concerns of knowledge engineers regarding logical precision and expressiveness for KR:
NULL
introduces a problematic three-valued logic (true, false, unknown) that complicates logical reasoning and deviates from the two-valued logic foundational to most KR. Date argues NULL
s are a fundamental flaw.CHECK
constraints, its support for complex, cross-table "assertions" (general integrity constraints that evaluate to true or false across the entire database state) has historically been weak or absent. The formal RM allows for arbitrary assertions, which are crucial for declarative business rules.These points highlight that even within the database community's most rigorous thinkers, there's recognition that SQL is an imperfect vehicle for purely logical, declarative knowledge representation.
Given the limitations of SQL's practical implementations, let's explore some alternatives and specific SQL remedies that may be more aligned with KR tasks. Some of these approaches aim to mitigate the perceived shortcomings of SQL DBMSs, often by more closely adhering to the formal Relational Model or by integrating its strengths with other paradigms.
For the ORM (Object-Role Modeling)-based toolkit, the optimal path forward is not to replace, but to orchestrate these different strengths. The toolkit's ability to map ORM models to various targets is a significant advantage.
The perceived "shortcomings" of SQL for knowledge representation are often limitations in its practical implementation and common usage, rather than inherent flaws in the underlying Relational Model itself. The RM, in its formal sense, offers powerful declarative capabilities for knowledge representation.
Our path forward involves a more nuanced approach: recognizing SQL's strengths for data management and integrity, while simultaneously embracing other declarative languages and reasoning paradigms for their superior expressiveness in defining and inferring complex logical knowledge. By orchestrating robust RDBMS (like SQL, especially with tools like DuckDB), symbolic AI (like Datalog), and neuro-symbolic AI (like LNNs/LTNs) through a powerful conceptual modeling approach like ORM, we can build truly comprehensive and intelligent systems that meet the rigorous demands of modern knowledge engineering.
This checklist proposes a set of criteria for evaluating how well a language, tool, or technology aligns with the needs and principles of a Knowledge Engineer, prioritizing features that facilitate robust knowledge representation and reasoning. The aim is to remain agnostic to specific technologies while highlighting desirable characteristics.
Criteria | Description | Score (1-5, 5=High) | Weight | Weighted Score |
---|---|---|---|---|
1. Declarative Expression of Domain Logic | How easily can domain rules, relationships, and facts be expressed as direct logical statements (what is true), rather than procedural instructions (how to compute)? | (e.g., 20%) | ||
2. Native Support for Automated Inference | Does the system natively support automated derivation of new, implicit facts from existing knowledge and defined rules? (e.g., transitive closure, property inheritance). | (e.g., 15%) | ||
3. Semantic Richness & Expressiveness | Can the system capture complex semantic nuances like classification hierarchies, part-whole relationships, or N-ary associations directly and intuitively? | (e.g., 15%) | ||
4. Handling of Incomplete Knowledge (OWA) | Does the system provide robust mechanisms or a default assumption for dealing with missing or unknown information, distinguishing it from explicit falsehood? | (e.g., 10%) | ||
5. Schema Flexibility & Evolution | How easy is it to evolve the conceptual model and its associated schema to accommodate new knowledge or changing domain understanding without significant overhead or disruption? | (e.g., 10%) | ||
6. Clear Conceptual-Logical Mapping | Does the tool/language clearly separate the abstract domain meaning from its logical and physical implementation details, allowing for independent evolution of layers? | (e.g., 10%) | ||
7. Explainability of Reasoning | Can the system provide transparent explanations or traces for how it arrived at a particular conclusion or derived a new fact? | (e.g., 5%) | ||
8. Robust Constraint/Rule Definition | Does it natively support the definition and enforcement of complex, domain-specific integrity constraints that go beyond basic data type checks? | (e.g., 5%) | ||
9. Support for Complex Type Systems | Can it represent rich type hierarchies, abstract data types, and other sophisticated conceptual structures directly within the model? | (e.g., 5%) | ||
10. Composability of Knowledge Modules | How easily can distinct, smaller modules of represented knowledge and rules be combined or reused to build larger, more complex systems? | (e.g., 5%) | ||
11. Facilities for Knowledge Delivery | Does the technology offer built-in or easily integrable facilities for exposing and operationalizing the represented knowledge via common interfaces (e.g., Reasoning services, Knowledge Graph APIs, Rules engines, Intelligent Agent interfaces)? | (e.g., 5%) | ||
12. Querying and Reasoning Over the Model Itself | Does the system provide facilities to directly query and reason about the defined schema, concepts, and relationships of the knowledge model, independent of data instances? | (e.g., 5%) | ||
Total Score | (Sum of Weighted Scores) | 100% |
The perceived lack of composability in SQL, especially when building complex logical inferences from reusable components, is a point of contention that aligns the views of some database theorists with knowledge engineering principles. While SQL offers constructs like views and Common Table Expressions (CTEs) for organizing queries, their limitations often necessitate workarounds that compromise true logical composability.
C.J. Date and Hugh Darwen, in their extensive critiques of SQL (often found throughout their works, particularly in books like SQL and Relational Theory and The Relational Database Dictionary), frequently highlight how SQL deviates from the logical purity and mathematical elegance of the formal Relational Model. This deviation directly impacts composability:
SELECT
statement does not always produce a truly relational result (e.g., allowing duplicate rows, NULL
semantics), which can break this closure and make it harder to reliably compose complex queries.NULL
values and its multi-valued logic, mean that combining queries or views doesn't always yield logically predictable results. This makes it difficult to reason about the composition of queries, unlike the straightforward logical composition in relational calculus or Datalog.Language Integrated Query (LINQ), co-created by Anders Hejlsberg and Erik Meijer, and Reactive Extensions (Rx), also co-created by Erik Meijer, implicitly address SQL's composability challenges by offering alternative approaches to data manipulation. While these technologies originate from programming language integration, the underlying criticisms of SQL's composability apply more broadly to how logical rules are defined and combined:
A few examples of Logica have been provided below so we can see that it is a composable logic programming language which happens to compile to Sql. It is a part of the Datalog family.
For more information on Logica, see https://github.com/EvgSkv/logica
#For these examples, duckdb is used
@Engine("duckdb");
# Assume you have a predicate identifying graduate students,
# perhaps like this (using a name or ID):
GraduateStudent(person_id: 123);
GraduateStudent(person_id: 456);
# Rule: If someone is a GraduateStudent, then they are a Student.
Student(person_id:) :- GraduateStudent(person_id:);
# You might have other rules defining Students too:
Undergraduate(person_id: 789);
Student(person_id:) :- Undergraduate(person_id:);
# Rule: If someone is a GraduateStudent, then they have library access.
HasLibraryAccess(person_id:) :- GraduateStudent(person_id:);
logica students.l run Student
+-----------+
| person_id |
+-----------+
| 123 |
| 456 |
| 789 |
+-----------+
logica students.l run HasLibraryAccess
+-----------+
| person_id |
+-----------+
| 123 |
| 456 |
+-----------+
@Engine("duckdb");
IsSweet("orange");
IsSweet("apple");
IsNotSweet("lemon");
IsNotSweet("lime");
IsFruit("orange");
IsFruit("kiwi");
IsFruit("lemon");
IsFruit("apple");
IsFruit("lime");
UnknownSweetness(fruit:) :- IsFruit(fruit), ~IsSweet(fruit), ~IsNotSweet(fruit);
logica simulate_owa.l run UnknownSweetness
+-------+
| fruit |
+-------+
| kiwi |
+-------+
@Engine("duckdb");
# Return all rules where all conditions are met.
Rule("rule1");
Rule("rule2");
RuleCondition("rule1", "condition1");
RuleCondition("rule1", "condition2");
RuleCondition("rule1", "condition3");
RuleCondition("rule2", "condition2");
RuleCondition("rule2", "condition4");
RuleCondition("rule2", "condition5");
State("condition2");
State("condition4");
State("condition5");
State("condition6");
MatchingCondition(rule, condition) :- RuleCondition(rule, condition), State(condition);
UnmetCondition(rule, condition) :- RuleCondition(rule, condition), ~State(condition);
NoUnmetConditions(rule) :- Rule(rule), ~UnmetCondition(rule, condition);
# Different flavors of the same Rules (aka Queries)
RuleSatisfied(rule) :- Rule(rule), NoUnmetConditions(rule);
RuleSatisfied2(rule) :- Rule(rule), ~(RuleCondition(rule, condition), ~State(condition));
RuleSatisfied3(rule) :- Rule(rule), ~UnmetCondition(rule);
# With a twist
ExtraCondition(condition) :- State(condition), ~RuleCondition(rule, condition);
RuleAndAllConditionsSatisfied(rule) :- Rule(rule), ~ExtraCondition(condition), ~(RuleCondition(rule, condition), ~State(condition));
RuleAndAllConditionsSatisfied2(rule) :- Rule(rule), ~(State(condition), ~RuleCondition(rule, condition)), ~(RuleCondition(rule, condition), ~State(condition));
logica quantification.l run RuleSatisfied
+-------+
| col0 |
+-------+
| rule2 |
+-------+
logica quantification.l run RuleAndAllConditionsSatisfied
+------+
| col0 |
+------+
+------+
@Engine("duckdb");
#This is an example of a Taxonomy with Properties
SubclassOf("foods", "entity");
SubclassOf("animals", "entity");
SubclassOf("vehicles", "entity");
SubclassOf("fruits", "foods");
SubclassOf("vegetables", "foods");
SubclassOf("mammals", "animals");
SubclassOf("fish", "animals");
SubclassOf("land_vehicles", "vehicles");
SubclassOf("air_vehicles", "vehicles");
SubclassOf("citrus", "fruits");
SubclassOf("leafy_greens", "vegetables");
SubclassOf("canines", "mammals");
SubclassOf("saltwater_fish", "fish");
SubclassOf("cars", "land_vehicles");
HasProperty("foods", "has_origin");
HasProperty("foods", "is_perishable");
HasProperty("animals", "has_habitat");
HasProperty("animals", "is_domesticated");
HasProperty("vehicles", "has_manufacturer");
HasProperty("vehicles", "has_power_source");
HasProperty("fruits", "is_sweet");
HasProperty("fruits", "has_seed");
HasProperty("vegetables", "is_savory");
HasProperty("vegetables", "grows_underground");
HasProperty("mammals", "has_fur");
HasProperty("fish", "has_gills");
HasProperty("land_vehicles", "has_wheels");
HasProperty("air_vehicles", "can_fly");
HasProperty("citrus", "is_zesty");
HasProperty("leafy_greens", "is_nutritious");
HasProperty("canines", "has_tail");
HasProperty("cars", "is_fuel_efficient");
HasProperty("cars", "has_seats");
HasProperty("cars", "has_wheels");
HasProperty("vehicles", "has_capacity");
HasProperty("vehicles", "is_electric");
# Direct subclass relationship
TransitiveSubclass(x,y) :- SubclassOf(x, y);
# Indirect subclass relationship
TransitiveSubclass(x, y) :- TransitiveSubclass(x, z), TransitiveSubclass(z, y);
#This returns 1 because citrus is a fruit which is a food.
IsCitrusFood(is_food? += 1) distinct:- TransitiveSubclass("citrus", "foods");
#This returns X = mammals, X = fish, and X = canines.
Animals(animal:) :- TransitiveSubclass(animal, "animals");
#This returns all properties of vehicles
HasAllProperties(class, property) :- HasProperty(class, property), class == "vehicles";
# Find all properties inherited from a superclass
HasAllProperties(class, property) :-
SubclassOf(class, superclass),
HasAllProperties(superclass, property);
logica taxonomy.l run TransitiveSubclass
+----------------+---------------+
| col0 | col1 |
+----------------+---------------+
| foods | entity |
| animals | entity |
| vehicles | entity |
| fruits | foods |
| vegetables | foods |
| mammals | animals |
| fish | animals |
| land_vehicles | vehicles |
| air_vehicles | vehicles |
| citrus | fruits |
| leafy_greens | vegetables |
| canines | mammals |
| saltwater_fish | fish |
| cars | land_vehicles |
| fruits | entity |
| vegetables | entity |
| mammals | entity |
| fish | entity |
| land_vehicles | entity |
| air_vehicles | entity |
| citrus | foods |
| leafy_greens | foods |
| canines | animals |
| saltwater_fish | animals |
| cars | vehicles |
+----------------+---------------+
logica taxonomy.l run IsCitrusFood
+---------+
| is_food |
+---------+
| 1 |
+---------+
logica taxonomy.l run Animals
+----------------+
| animal |
+----------------+
| mammals |
| fish |
| canines |
| saltwater_fish |
+----------------+
logica taxonomy.l run HasAllProperties
+---------------+------------------+
| col0 | col1 |
+---------------+------------------+
| vehicles | has_manufacturer |
| vehicles | has_power_source |
| vehicles | has_capacity |
| vehicles | is_electric |
| air_vehicles | has_manufacturer |
| air_vehicles | has_power_source |
| air_vehicles | has_capacity |
| air_vehicles | is_electric |
| cars | has_manufacturer |
| cars | has_power_source |
| cars | has_capacity |
| cars | is_fuel_efficient|
| land_vehicles | has_manufacturer |
| land_vehicles | has_power_source |
| land_vehicles | has_capacity |
| land_vehicles | is_electric |
| land_vehicles | has_wheels |
+---------------+------------------+