Skip to main content

Your LLM Shouldn't Be Handling If/Else Logic

· 4 min read
Tomasz Bilaszewski
Author of airules

Here's code that's running in production right now across thousands of applications:

from pydantic_ai import Agent
from enum import Enum

class Team(Enum):
BILLING = "billing"
AUTH = "auth"
SHIPPING = "shipping"
RETURNS = "returns"
GENERAL = "general"

agent = Agent(
"openai:gpt-4o-mini",
output_type=Team,
system_prompt="Route this support ticket to the correct team.",
)

async def route(subject: str, body: str) -> Team:
result = await agent.run(f"Subject: {subject}\n\n{body}")
return result.output

It's clean. It's flexible. And every time a user types "I want to unsubscribe" or "forgot my password," it burns tokens to answer a question the model already answered yesterday.

gpt-4o-mini costs $0.15 per million input tokens. At 50,000 tickets a day, with an average prompt of 300 tokens, that's 15 million tokens a day - $2.25 daily, $820 a year - on inputs that need zero intelligence. Just pattern matching.

You might think: fine, I'll add a quick if/elif pre-filter. That works until you have 15 conditions, three engineers who've each added their own special cases, a subtly wrong ordering that shadows a rule nobody noticed, and zero tests for the logic structure itself. The spaghetti grows fast.

The smarter architecture

What's missing is a declarative, type-safe pre-filter that handles the predictable 80% of your traffic deterministically - cheap, instantaneous, auditable - and routes only the genuine edge cases to the LLM.

Incoming ticket


┌─────────────────┐ match ┌──────────────┐
│ TicketRouter │ ──────────▶ │ Return Team │ ← 0ms, $0
│ (airules) │ └──────────────┘
│ │ no match
│ │ ──────────▶ ┌──────────────┐
└─────────────────┘ │ LLM │ ← 800ms, $$$
└──────────────┘

The key insight: most decisions are already known. If you can write down the right answer for a given input pattern, that knowledge belongs in a rule. AI earns its cost on the genuinely unknown tail - the inputs nobody anticipated.

Type safety matters

The problem with if/elif chains isn't just readability. It's that ticket["subject"] is untyped - a missing key silently becomes a KeyError at runtime, a typo in a field name goes undetected until production, and nothing stops you from comparing a float to a string.

airules brings static typing to decision logic. Facts are typed schemas, predicates are built from field accessors, and the engine is Generic[FactType, ReturnType] - your editor catches bad field references and type mismatches before you deploy.

The @Default fallback

The LLM call moves into the @Default method - the one that fires only when the engine found no match:

class TicketRouter(KnowledgeEngine[Ticket, Team]):

@Rule(
Ticket.subject.contains("billing", case_insensitive=True)
| Ticket.body.contains("invoice", case_insensitive=True)
)
def billing(self, ticket: Ticket) -> Team:
return Team.BILLING

@Rule(
Ticket.subject.contains("password", case_insensitive=True)
| Ticket.subject.contains("login", case_insensitive=True)
)
def auth(self, ticket: Ticket) -> Team:
return Team.AUTH

@Default
async def llm_fallback(self, ticket: Ticket) -> Team:
rules_schema = json.dumps(type(self).describe(), indent=2)
agent = Agent(
"openai:gpt-4o-mini",
output_type=Team,
system_prompt=(
"You only receive tickets that matched none of the deterministic rules.\n\n"
f"Existing rules:\n{rules_schema}"
),
)
result = await agent.run(f"Subject: {ticket.subject}\n\n{ticket.body}")
return result.output

Look at type(self).describe(). When the LLM receives the system prompt, it also receives the complete serialized schema of every rule that already exists in the engine. The model knows precisely which patterns have already been handled. It won't accidentally contradict an existing rule, because it can see what the rules are.

Without this, your LLM and your rules engine are two separate systems with no shared understanding. With it, they're one coherent pipeline.

The iterative loop

Once this is running, your @Default hits are a roadmap:

  1. Observe - track which inputs are hitting the LLM
  2. Analyze - cluster the default facts by field values and patterns
  3. Add rules - write a new @Rule for each pattern you can enumerate
  4. Repeat - default rate drops, token spend drops, latency drops

The LLM trains the rules engine. Every correct LLM classification is signal for a new rule. Over time, the engine handles more and more of the traffic - the LLM handles less and less, and stays reserved for inputs that genuinely benefit from its reasoning.

Optimizing LLM cost and latency isn't just about choosing smaller models or caching responses. It's about being intentional with what you send to the LLM in the first place. The pattern is simple: rules for what you know, LLM for what you don't.