When Prolog Beats the LLM That Created It

August 1, 2025

Abstract

A simple logic puzzle shows how a Logic Programming Machine (LPM) built with Prolog can outperform the Language Model (LLM) that created it. The LPM solved the puzzle faster and more accurately than the LLM itself. This reveals a practical approach: use LLMs to generate specialized reasoning tools rather than having them solve complex logic problems directly. The results point to a hybrid strategy where LLMs create LPMs for structured reasoning tasks.

1. Introduction

Large Language Models (LLMs) like GPT-4 are impressive at understanding and generating natural language. However, for tasks requiring structured reasoning, their performance can be inconsistent. Logic Programming Machines (LPMs), such as Prolog-based systems, provide deterministic and transparent solutions for structured reasoning tasks.

This informal experiment shows a practical approach: an LLM was used to generate a Prolog program to solve a logic puzzle. The Prolog program was faster and more reliable than the LLM that created it. This demonstrates how LLMs can build specialized logic solvers that outperform the LLMs themselves for specific reasoning tasks.

2. The Tea Party Puzzle

The puzzle involves three guests: Ada, Babbage, and Turing. Each guest brought a different kind of tea (Earl Grey, Darjeeling, or Chamomile), wore a different colored hat (red, blue, or green), and sat in a different chair (left, middle, right).

The clues were:

The person in the middle seat wore the blue hat.
Ada sat to the left of the person who brought Darjeeling.
The guest in the red hat brought Chamomile.
Babbage did not sit on the left.
Turing wore the green hat.
The person who brought Earl Grey sat immediately next to the person who wore the red hat.

The goal is to determine the full configuration of who sat where, what tea they brought, and which hat they wore.

3. Prolog Implementation (LPM)

The Prolog code for the puzzle was generated by the same LLM (ChatGPT 4.5) that was also tasked with solving the puzzle directly. After minor syntax adjustments, the generated Prolog code functioned as an effective LPM. When executed, this LPM quickly evaluated the constraints and produced the correct solution deterministically. This process serves as an example of a practical hybrid, neuro-symbolic workflow: using a generative model to create a symbolic one.

Immediate response
Deterministic constraint satisfaction
Transparent and interpretable logic

4. LLM Interpretation and Performance

The LLM required multiple steps and external assistance to interpret the clues and resolve inconsistencies. Although it eventually reached the correct solution, the iterative approach was slower and less reliable compared to the Prolog implementation.

Misinterpretation of natural language clues
Internal inconsistency across reasoning steps
Slower response time due to iterative logic construction

5. Comparative Analysis

Criteria	LPM (Prolog)	LLM (ChatGPT GPT-4.5-turbo)
Speed	Immediate	Slow and iterative
Correctness Guarantee	Deterministic	Probabilistic
Interpretability	High	Moderate
Scalability (symbols)	High for small domains	Moderate
Generalization	Low	High

6. Implications and Recommendations

The key insight: don't use LLMs to solve complex logic problems directly. Instead, use them to generate specialized tools that can solve those problems better. This creates a practical workflow where LLMs act as programmers for logic solvers.

LLM parses and structures natural language clues
LPM evaluates the constraints
LLM interprets and presents results

7. Conclusion

The Tea Party Puzzle shows that LLMs can generate logic-based tools that outperform the LLMs themselves. This hybrid approach works: LLMs create specialized logic solvers for specific tasks. Even with a simple puzzle, the performance difference was significant. This strategy becomes more valuable as problems get more complex. While future LLMs may solve logic problems directly, the hybrid approach offers a practical solution today.

Appendix A: ChatGPT's Actual Attempt at the Logic Puzzle

Here's the actual conversation with ChatGPT 4.5 when asked to solve the puzzle directly:

Human: What do you, as an LLM, think the solution is?

ChatGPT: Splendid question! Let us now step into the LMM's silk slippers and reason through the solution...

ChatGPT then proceeded through multiple attempts, making several errors and contradictions:

First attempt placed Turing in middle with blue hat, contradicting clue 5 (Turing wears green)
Second attempt violated the "Ada left of Darjeeling" constraint
Third attempt mixed up the red hat/chamomile connection
Multiple revisions and self-corrections throughout

After several failed attempts, ChatGPT proposed this final solution:

seat(left, turing, chamomile, green)
seat(middle, ada, earl_grey, blue)
seat(right, babbage, darjeeling, red)

However, when shown the correct Prolog output, ChatGPT acknowledged:

"Ah-ha! Yes! That settles it—like the final drop of milk in a perfectly steeped cup of tea."

The correct solution was:

left: ada, tea: chamomile, hat: red
middle: babbage, tea: earl_grey, hat: blue
right: turing, tea: darjeeling, hat: green

Appendix B: Prolog-Based LPM Implementation

The Prolog implementation was adapted from the puzzle with only minor syntax changes using Windsurf and an LLM. No logical modifications were needed. The solution was found immediately, with full accuracy, highlighting the strength of declarative logic for this kind of task.

% Prolog code for the Tea Party Puzzle
solve(Solution) :-
    % There are three guests, each with attributes: Position, Person, Tea, Hat
    Solution = [
        seat(left, _, _, _),
        seat(middle, _, _, _),
        seat(right, _, _, _)
    ],

    % Each person is one of Ada, Babbage, Turing
    member(seat(_, ada, _, _), Solution),
    member(seat(_, babbage, _, _), Solution),
    member(seat(_, turing, _, _), Solution),

    % Each tea is one of Earl Grey, Darjeeling, Chamomile
    member(seat(_, _, earl_grey, _), Solution),
    member(seat(_, _, darjeeling, _), Solution),
    member(seat(_, _, chamomile, _), Solution),

    % Each hat is one of red, blue, green
    member(seat(_, _, _, red), Solution),
    member(seat(_, _, _, blue), Solution),
    member(seat(_, _, _, green), Solution),

    % Clue 1: The person in the middle seat wore the blue hat
    member(seat(middle, _, _, blue), Solution),

    % Clue 2: Ada sat to the left of the person who brought Darjeeling
    member(seat(P1, ada, _, _), Solution),
    member(seat(P2, _, darjeeling, _), Solution),
    left_of(P1, P2),

    % Clue 3: The guest in the red hat brought Chamomile
    member(seat(_, _, chamomile, red), Solution),

    % Clue 4: Babbage did not sit on the left
    \+ member(seat(left, babbage, _, _), Solution),

    % Clue 5: Turing wore the green hat
    member(seat(_, turing, _, green), Solution),

    % Clue 6: The person who brought Earl Grey sat immediately next to the person who wore the red hat
    member(seat(P3, _, earl_grey, _), Solution),
    member(seat(P4, _, _, red), Solution),
    next_to(P3, P4),

    findall(Person, member(seat(_, Person, _, _), Solution), Persons),
    findall(Tea, member(seat(_, _, Tea, _), Solution), Teas),
    findall(Hat, member(seat(_, _, _, Hat), Solution), Hats).

% Define seating relations
left_of(left, middle).
left_of(middle, right).
left_of(left, right).     % Make left_of transitive: left is also to the left of right.

next_to(P1, P2) :- left_of(P1, P2).
next_to(P1, P2) :- left_of(P2, P1).

% --- Print the solution when run as a script ---
:- initialization(main).

main :-
    (   solve(Solution)
    ->  writeln('Solution to the riddle:'),
        print_solution(Solution)
    ;   writeln('No solution found.'),
        halt(1)
    ),
    halt.

print_solution([]).
print_solution([seat(Position, Person, Tea, Hat)|T]) :-
    format('~w: ~w, Tea: ~w, Hat: ~w~n', [Position, Person, Tea, Hat]),
    print_solution(T).

Output:

Solution to the riddle:
left: ada, Tea: chamomile, Hat: red
middle: babbage, Tea: earl_grey, Hat: blue
right: turing, Tea: darjeeling, Hat: green

Keywords: logic programming, Prolog, language models, LPM, constraint satisfaction, neuro-symbolic AI, hybrid reasoning