Stop Burning Tokens: How to Avoid Feeding LLMs Broken Code (CyberDudeBivash Guide)

Executive summary

If you feed buggy code to an LLM you’ll get back buggy suggestions — and you’ll pay for them. The secret: fix as much as possible locally first, then send the smallest, most precise context the LLM needs. This guide gives a practical system you can adopt now:

Local preflight: lint, unit tests, minimal reproducible example (MRE) generator.
Prompt hygiene: diff-only prompts, test-driven prompts, and strict output formats.
CI gating: only call LLM from CI when pre-checks pass or when a focused, failing-test payload is published.
Token-aware engineering: estimate tokens, calculate cost, and budget.
Developer tooling & templates: pre-commit hooks, Python/Node scripts, GitHub Actions examples.

Follow this and you’ll cut wasted tokens, shorten review cycles, and produce higher-quality LLM outputs.

Why engineers waste tokens

Common anti-patterns:

Dumping entire repositories into the prompt.
Asking for “fix my code” without failing tests or clear error output.
Sending code with unresolved syntax errors or missing imports.
No preflight: you ask the LLM first, then debug its output manually.

Consequences: wasted tokens (money), longer iteration times, lower signal-to-noise from LLMs, and risky incorrect code merged into prod.

The CyberDudeBivash 5-Step Workflow

Local preflight — lint + run tests + reproduce error.
Minimize context — produce a minimal reproducible example (MRE).
Prompt for a patch — use a strict template asking for patch only or diff only.
Validate — run returned patch inside sandbox/tests automatically.
CI gate & telemetry — only accept LLM-assisted changes when tests pass and token-cost budget respected.

Practical toolset

Linters: flake8 / pylint (Python), eslint (JS/TS).
Formatters: black, prettier.
Unit tests: pytest, unittest, jest.
Local sandbox: Docker + docker-compose or ephemeral VMs.
Pre-commit: pre-commit hooks.
Token estimation helper: small script (below).
CI: GitHub Actions (examples later).

Affiliate picks (recommended — use our affiliate links on your site):

JetBrains Fleet / IntelliJ (IDE productivity; affiliate link placeholder).
GitHub Copilot (assist, but use after preflight).
Replit / Gitpod (ephemeral dev sandboxes).
(Include affiliate disclosure on publish.)

Preflight scripts & pre-prompt checklist

Pre-prompt checklist

Code compiles / lints locally (flake8 / eslint)
Unit tests reproduce the failing behavior (pytest / jest)
Minimal Reproducible Example (MRE) created — unrelated code removed
Expected vs actual output logged (include traceback)
Token budget estimated for the prompt (see calculator below)
CI/CD gating strategy defined (where LLM patch will be validated)

Minimal reproducible example (MRE) template

Create mre.py that contains only:

the function(s) under test
the failing test case (assert)
any minimal setup data (no large binary blobs)

Example (mre.py):


# mre.py
def add(a, b):
    return a + b  # failing due to edge-case elsewhere

def test_add():
    assert add(1, "2") == 3  # shows type error / failing case

Always include the test runner output (stack trace) with your prompt.

Prompt templates — be strict: ask for diff only

Template: "Patch-only prompt"


CONTEXT:
- Language: Python 3.11
- File: add_utils.py (shown below)
- Test: test_add_fails.py (shown below)
- Failing pytest output: (paste entire traceback)

TASK:
Return a unified diff (git-style) patch that fixes the bug so that `pytest -q` passes for the provided test. Only return the patch, nothing else.

FILES:
<<insert only the minimal files: add_utils.py, test_add_fails.py >>

Important: insist Only return the patch — not explanations. That avoids extra tokens and speeds up programmatic application.

Example — small Python patch flow

Developer reproduces failing test:


$ pytest tests/test_add.py -q
F
================================= FAILURES ===================================
___________________________ test_add_with_string ____________________________
tests/test_add.py:5: AssertionError
> assert add(1, "2") == 3
E TypeError: unsupported operand type(s)

Build the MRE and include only add.py and tests/test_add.py in the prompt.
Send the Patch-only prompt (above). LLM returns unified diff:


*** Begin Patch
*** Update File: add_utils.py
@@
-def add(a, b):
-    return a + b
+def add(a, b):
+    try:
+        return int(a) + int(b)
+    except Exception:
+        raise TypeError("add: both args must be numeric or numeric-strings")
*** End Patch

Apply patch and run tests automatically in CI.

Pre-commit & local automation

Add a pre-commit hook that runs lint and tests before letting you call the LLM:

.pre-commit-config.yaml


repos:
- repo: https://github.com/pre-commit/mirrors-eslint
  rev: v8.40.0
  hooks:
    - id: eslint

- repo: https://github.com/psf/black
  rev: 23.9.1
  hooks:
    - id: black

- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v4.0.1
  hooks:
    - id: trailing-whitespace

call-llm.sh (only after lint/tests pass)


#!/usr/bin/env bash
pytest -q || { echo "Tests fail — fix locally first"; exit 1; }
python estimate_tokens.py --files add_utils.py tests/test_add.py --prompt-template prompt.txt
# if token budget OK, call LLM
# call your LLM client here (curl / openai sdk)

CI pattern: GitHub Actions — only call LLM when tests reproduce AND MRE provided

/.github/workflows/llm-assist.yml


name: LLM Assist Patch Flow
on:
  workflow_dispatch:
    inputs:
      token_budget: { required: true, default: 2000 }

jobs:
  preflight:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        uses: actions/setup-python@v4
        with: python-version: '3.11'
      - name: Install deps
        run: pip install -r requirements.txt
      - name: Run tests
        run: pytest -q
      - name: Check MRE present
        run: test -f mre.py || (echo "MRE missing" && exit 1)
  llm-call:
    needs: preflight
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Estimate tokens
        run: python estimate_tokens.py --files mre.py --prompt-template prompt.txt --budget "${{ github.event.inputs.token_budget }}"
      - name: Call LLM
        if: success()
        run: |
          # call your LLM using secured token
          python call_llm.py --prompt-file prompt.txt
      - name: Apply patch and run tests
        run: |
          git apply patch.diff
          pytest -q

This enforces: tests must reproduce locally (preflight), MRE must exist, token budget checked, LLM only called from CI with secure keys.

Token estimation & cost calculator (simple, exact arithmetic)

Estimating tokens from characters

A practical rule: 1 token ≈ 4 characters in English (approximation, use your tokenizer for exact).
Formula: estimated_tokens = ceil(total_chars / 4)

Example calculation (step-by-step):
Suppose your prompt (files + traces) is 42,372 characters long.

Divide by 4: 42,372 / 4 = 10,593.
Round up (if needed): estimated_tokens = 10,593 tokens.

Cost example

Assume model price = $0.02 per 1,000 tokens (example pricing used solely for illustration).

Tokens = 10,593.
Thousands-of-tokens = 10,593 / 1000 = 10.593.
Cost = 10.593 * $0.02 = $0.21186.
Rounded to cents = $0.21.

(Every arithmetic step above computed explicitly.)

Tip: keep prompts ≤ 2,000–3,000 tokens when possible to reduce cost and improve latency.

Smart prompt compression strategies

Send failure + single-file MRE, not whole repo.
Remove comments, large whitespace, and long sample data.
Send only failing test and relevant functions.
Send diffs instead of full files. If you must send file, gzip and include only essential parts.
Use function signatures + types rather than full code when asking for algorithmic logic.

Prompt engineering patterns that save tokens

Test-first prompt


I have the following failing pytest output (paste). Provide a git-style patch that fixes only the code necessary so tests pass. Only return the patch.

Diff-only prompt
Provide the current file and the desired behavior; ask for a unified diff patch.
Small-step prompt
Ask for a single small change (e.g., function fix) rather than end-to-end rewrite.
Strict format enforcement
“Return JSON only with fields {patch, tests_run, success}” — easier to parse and validate.

Validation harness — run returned patch automatically

validate_patch.py (conceptual)


import subprocess, sys

# apply patch
subprocess.run(["git", "apply", "patch.diff"], check=True)
# run tests
r = subprocess.run(["pytest", "-q"], capture_output=True, text=True)
print(r.stdout)
if r.returncode != 0:
    print("Patch failed tests", r.stderr)
    sys.exit(2)
print("Patch validated")

Use this in CI step immediately after receiving the patch.

Defensive prompts & guardrails (reduce hallucinations)

Ask LLM to not invent imports or API calls. Provide the exact dependency list or require code to only use the existing project imports.
Request executable code only; require pytest to pass in CI.
If the LLM returns explanations, automatically reject and re-run with Only return patch enforcement.

Common real-world patterns & examples

Pattern: Runtime type errors in Python

Preflight: run mypy / pytest.
Prompt: include failing traceback and function signature.
Patch: LLM suggests type coercion or validation.
Validation: run tests — success -> merge.

Pattern: Frontend CSS/JS regressions

Preflight: run npm run test, eslint, and visual regression (percy) or unit tests.
Prompt: include failing test and minimal component snippet.
Patch: LLM returns specific component diff.

FAQ

Q: When should I NOT use an LLM for code?
A: Don’t use it to fix failing tests if you can’t produce an MRE, or when code involves secrets/crypto primitives you cannot validate locally. Use LLMs more for design/boilerplate than for security-critical code unless heavily validated.

Q: How often should I call an LLM?
A: Prefer fewer, highly focused calls. Use local automation to reduce repetitive prompts.

Q: What about using LLMs as pair-programming assistants?
A: Great, but keep the same disciplines: run tests locally first, then ask LLM to suggest concise changes.

Metrics & KPIs to track

Tokens consumed per merged PR (baseline vs. post-adoption).
% of LLM-assisted patches that pass CI on first application.
Mean time to first green build (MTTFGB) for LLM-assisted PRs.
Token cost saved per sprint.

Integration checklist

Pre-commit hooks installed and enforced.
MRE template created in /mre/ and required for LLM requests.
CI workflow includes estimate_tokens.py and validate_patch.py.
Token budget per PR set and monitored.
Post-merge telemetry enabled (tokens/PR, success rate).

#CyberDudeBivash #LLM #PromptEngineering #DevOps #CI #Precommit #TokenEfficiency #AIforDev #SoftwareEngineering #Productivity #MRE #Testing

Final quick checklist —

Run `pre-commit` and `pytest`.

Create `mre.py` capturing the failing test.

Run `estimate_tokens.py` to verify budget.

Trigger `llm-assist` CI workflow to call LLM.

Validate returned patch automatically (`validate_patch.py`).

Merge only if CI green and token budget respected.

Search This Blog

Cyberdudebivash