Your LLM Shouldn't Be Handling If/Else Logic
Here's code that's running in production right now across thousands of applications:
from pydantic_ai import Agent
from enum import Enum
class Team(Enum):
BILLING = "billing"
AUTH = "auth"
SHIPPING = "shipping"
RETURNS = "returns"
GENERAL = "general"
agent = Agent(
"openai:gpt-4o-mini",
output_type=Team,
system_prompt="Route this support ticket to the correct team.",
)
async def route(subject: str, body: str) -> Team:
result = await agent.run(f"Subject: {subject}\n\n{body}")
return result.output
It's clean. It's flexible. And every time a user types "I want to unsubscribe" or "forgot my password," it burns tokens to answer a question the model already answered yesterday.
gpt-4o-mini costs $0.15 per million input tokens. At 50,000 tickets a day,
with an average prompt of 300 tokens, that's 15 million tokens a day - $2.25
daily, $820 a year - on inputs that need zero intelligence. Just pattern
matching.
You might think: fine, I'll add a quick if/elif pre-filter. That works until you have 15 conditions, three engineers who've each added their own special cases, a subtly wrong ordering that shadows a rule nobody noticed, and zero tests for the logic structure itself. The spaghetti grows fast.
The smarter architecture
What's missing is a declarative, type-safe pre-filter that handles the predictable 80% of your traffic deterministically - cheap, instantaneous, auditable - and routes only the genuine edge cases to the LLM.
Incoming ticket
│
▼
┌─────────────────┐ match ┌──────────────┐
│ TicketRouter │ ──────────▶ │ Return Team │ ← 0ms, $0
│ (airules) │ └──────────────┘
│ │ no match
│ │ ──────────▶ ┌──────────────┐
└─────────────────┘ │ LLM │ ← 800ms, $$$
└──────────────┘
The key insight: most decisions are already known. If you can write down the right answer for a given input pattern, that knowledge belongs in a rule. AI earns its cost on the genuinely unknown tail - the inputs nobody anticipated.
Type safety matters
The problem with if/elif chains isn't just readability. It's that
ticket["subject"] is untyped - a missing key silently becomes a KeyError
at runtime, a typo in a field name goes undetected until production, and nothing
stops you from comparing a float to a string.
airules brings static typing to decision logic. Facts are typed schemas,
predicates are built from field accessors, and the engine is
Generic[FactType, ReturnType] - your editor catches bad field references
and type mismatches before you deploy.
The @Default fallback
The LLM call moves into the @Default method - the one that fires only when
the engine found no match:
class TicketRouter(KnowledgeEngine[Ticket, Team]):
@Rule(
Ticket.subject.contains("billing", case_insensitive=True)
| Ticket.body.contains("invoice", case_insensitive=True)
)
def billing(self, ticket: Ticket) -> Team:
return Team.BILLING
@Rule(
Ticket.subject.contains("password", case_insensitive=True)
| Ticket.subject.contains("login", case_insensitive=True)
)
def auth(self, ticket: Ticket) -> Team:
return Team.AUTH
@Default
async def llm_fallback(self, ticket: Ticket) -> Team:
rules_schema = json.dumps(type(self).describe(), indent=2)
agent = Agent(
"openai:gpt-4o-mini",
output_type=Team,
system_prompt=(
"You only receive tickets that matched none of the deterministic rules.\n\n"
f"Existing rules:\n{rules_schema}"
),
)
result = await agent.run(f"Subject: {ticket.subject}\n\n{ticket.body}")
return result.output
Look at type(self).describe(). When the LLM receives the system prompt, it
also receives the complete serialized schema of every rule that already exists in
the engine. The model knows precisely which patterns have already been handled.
It won't accidentally contradict an existing rule, because it can see what the
rules are.
Without this, your LLM and your rules engine are two separate systems with no shared understanding. With it, they're one coherent pipeline.
The iterative loop
Once this is running, your @Default hits are a roadmap:
- Observe - track which inputs are hitting the LLM
- Analyze - cluster the default facts by field values and patterns
- Add rules - write a new
@Rulefor each pattern you can enumerate - Repeat - default rate drops, token spend drops, latency drops
The LLM trains the rules engine. Every correct LLM classification is signal for a new rule. Over time, the engine handles more and more of the traffic - the LLM handles less and less, and stays reserved for inputs that genuinely benefit from its reasoning.
Optimizing LLM cost and latency isn't just about choosing smaller models or caching responses. It's about being intentional with what you send to the LLM in the first place. The pattern is simple: rules for what you know, LLM for what you don't.
