Rules vs ML vs LLMs: Picking the Right Tool (with Code)

One problem, three solutions

In the previous concept you learned the three tool families: rules, classical ML, and LLMs. Now let's make it concrete. Imagine you run a SaaS product and your support inbox gets 500 tickets a day. You want to auto-classify each ticket into one of four categories: billing, bug, feature request, or other. All three tools can do this. The difference is how much data you need, what it costs per ticket, and how reliable it is.

Approach 1: Rules

Look for keywords. If the ticket mentions "invoice," "payment," "charge," or "refund," classify it as billing. If it mentions "error," "crash," "broken," classify as bug. And so on.

Rules are deterministic, fast (microseconds), free, and you can test them with unit tests. The downside: as the vocabulary of your users grows, you end up adding more and more keywords, and the rules start to conflict. "I was charged for a feature that's broken" matches both billing and bug. The maintenance cost of keyword rules grows faster than the vocabulary.

Approach 2: Classical ML

Train a model on 5,000 labeled tickets. The model learns patterns beyond keywords: it picks up on phrasing, sentence structure, and word combinations. Inference is fast (single-digit milliseconds) and cheap (CPU only). Once deployed, it handles the "500 error on billing page" case correctly because the training data included similar ambiguous tickets.

The downside: you need those 5,000 labeled examples. Someone has to label them. The model needs retraining when your product changes (new categories, new features). And if a new kind of ticket shows up that is nothing like the training data, the model guesses poorly.

Classical ML is the right choice when you have clean labeled data, strict latency requirements, and high volume (millions of classifications per day).

Approach 3: LLM

Write a prompt. Send the ticket text and the list of categories to an LLM. It classifies on the first try without any labeled data, handles ambiguity naturally, and you can change the categories by editing a string.

The downside: each call costs money (fractions of a cent per ticket, but it adds up at 500 tickets/day = ~$5-15/day depending on the model), takes 500ms to 2 seconds, and occasionally returns something unexpected. You need to validate the output and handle retries.

Side by side: cost, speed, and data requirements

For the same ticket-classification task:

Rules: $0 per ticket. Microseconds latency. Zero data needed. Breaks on ambiguous input. Best for: crisp, well-defined requirements with few edge cases.

Classical ML: ~$0 per ticket (CPU inference). 1-5ms latency. Needs thousands of labeled examples. Handles ambiguity well. Best for: high-volume classification with stable categories and available training data.

LLM: ~$0.001-0.01 per ticket. 500ms-2s latency. Zero labeled data needed. Handles ambiguity and new categories out of the box. Best for: language-heavy tasks, cold-start situations, and cases where requirements change often.

In practice, many teams start with the LLM approach to get something working in a day, then graduate to classical ML once they have enough labeled data from the LLM's output. The LLM becomes a labeling tool that bootstraps the ML pipeline. This is a very common pattern.

Rules handle the crisp cases, classical ML handles high-volume scoring, and LLMs handle the messy language work. The arrows show how they compose.

The simplest-tool principle

The beginner instinct is to reach for the fanciest tool first. The engineering instinct is the opposite: start with the simplest tool that works. If rules handle it, use rules. If you have labeled data and need speed, use ML. Reach for an LLM when the input is messy language and the other tools have already failed.

Now that you have seen all three tools with real code, the next concept zooms in on LLMs specifically. You will learn the six things LLMs are genuinely good at in production, with a code example for each.

AI Concepts