Stop Burning Tokens: How to Avoid Feeding LLMs Broken Code (CyberDudeBivash Guide)

 


Executive summary 

If you feed buggy code to an LLM you’ll get back buggy suggestions — and you’ll pay for them. The secret: fix as much as possible locally first, then send the smallest, most precise context the LLM needs. This guide gives a practical system you can adopt now:

  • Local preflight: lint, unit tests, minimal reproducible example (MRE) generator.

  • Prompt hygiene: diff-only prompts, test-driven prompts, and strict output formats.

  • CI gating: only call LLM from CI when pre-checks pass or when a focused, failing-test payload is published.

  • Token-aware engineering: estimate tokens, calculate cost, and budget.

  • Developer tooling & templates: pre-commit hooks, Python/Node scripts, GitHub Actions examples.

Follow this and you’ll cut wasted tokens, shorten review cycles, and produce higher-quality LLM outputs.


Why engineers waste tokens 

Common anti-patterns:

  1. Dumping entire repositories into the prompt.

  2. Asking for “fix my code” without failing tests or clear error output.

  3. Sending code with unresolved syntax errors or missing imports.

  4. No preflight: you ask the LLM first, then debug its output manually.

Consequences: wasted tokens (money), longer iteration times, lower signal-to-noise from LLMs, and risky incorrect code merged into prod.


The CyberDudeBivash 5-Step Workflow

  1. Local preflight — lint + run tests + reproduce error.

  2. Minimize context — produce a minimal reproducible example (MRE).

  3. Prompt for a patch — use a strict template asking for patch only or diff only.

  4. Validate — run returned patch inside sandbox/tests automatically.

  5. CI gate & telemetry — only accept LLM-assisted changes when tests pass and token-cost budget respected.


Practical toolset

  • Linters: flake8 / pylint (Python), eslint (JS/TS).

  • Formatters: black, prettier.

  • Unit tests: pytest, unittest, jest.

  • Local sandbox: Docker + docker-compose or ephemeral VMs.

  • Pre-commit: pre-commit hooks.

  • Token estimation helper: small script (below).

  • CI: GitHub Actions (examples later).

Affiliate picks (recommended — use our affiliate links on your site):

  • JetBrains Fleet / IntelliJ (IDE productivity; affiliate link placeholder).

  • GitHub Copilot (assist, but use after preflight).

  • Replit / Gitpod (ephemeral dev sandboxes).
    (Include affiliate disclosure on publish.)


Preflight scripts & pre-prompt checklist

Pre-prompt checklist

  •  Code compiles / lints locally (flake8 / eslint)

  •  Unit tests reproduce the failing behavior (pytest / jest)

  •  Minimal Reproducible Example (MRE) created — unrelated code removed

  •  Expected vs actual output logged (include traceback)

  •  Token budget estimated for the prompt (see calculator below)

  •  CI/CD gating strategy defined (where LLM patch will be validated)


Minimal reproducible example (MRE) template

Create mre.py that contains only:

  • the function(s) under test

  • the failing test case (assert)

  • any minimal setup data (no large binary blobs)

Example (mre.py):

# mre.py def add(a, b): return a + b # failing due to edge-case elsewhere def test_add(): assert add(1, "2") == 3 # shows type error / failing case

Always include the test runner output (stack trace) with your prompt.


Prompt templates — be strict: ask for diff only

Template: "Patch-only prompt"

CONTEXT: - Language: Python 3.11 - File: add_utils.py (shown below) - Test: test_add_fails.py (shown below) - Failing pytest output: (paste entire traceback) TASK: Return a unified diff (git-style) patch that fixes the bug so that `pytest -q` passes for the provided test. Only return the patch, nothing else. FILES: <<insert only the minimal files: add_utils.py, test_add_fails.py >>

Important: insist Only return the patch — not explanations. That avoids extra tokens and speeds up programmatic application.


Example — small Python patch flow

  1. Developer reproduces failing test:

$ pytest tests/test_add.py -q F ================================= FAILURES =================================== ___________________________ test_add_with_string ____________________________ tests/test_add.py:5: AssertionError > assert add(1, "2") == 3 E TypeError: unsupported operand type(s)
  1. Build the MRE and include only add.py and tests/test_add.py in the prompt.

  2. Send the Patch-only prompt (above). LLM returns unified diff:

*** Begin Patch *** Update File: add_utils.py @@ -def add(a, b): - return a + b +def add(a, b): + try: + return int(a) + int(b) + except Exception: + raise TypeError("add: both args must be numeric or numeric-strings") *** End Patch
  1. Apply patch and run tests automatically in CI.


Pre-commit & local automation

Add a pre-commit hook that runs lint and tests before letting you call the LLM:

.pre-commit-config.yaml

repos: - repo: https://github.com/pre-commit/mirrors-eslint rev: v8.40.0 hooks: - id: eslint - repo: https://github.com/psf/black rev: 23.9.1 hooks: - id: black - repo: https://github.com/pre-commit/pre-commit-hooks rev: v4.0.1 hooks: - id: trailing-whitespace

call-llm.sh (only after lint/tests pass)

#!/usr/bin/env bash pytest -q || { echo "Tests fail — fix locally first"; exit 1; } python estimate_tokens.py --files add_utils.py tests/test_add.py --prompt-template prompt.txt # if token budget OK, call LLM # call your LLM client here (curl / openai sdk)

CI pattern: GitHub Actions — only call LLM when tests reproduce AND MRE provided

/.github/workflows/llm-assist.yml

name: LLM Assist Patch Flow on: workflow_dispatch: inputs: token_budget: { required: true, default: 2000 } jobs: preflight: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install deps run: pip install -r requirements.txt - name: Run tests run: pytest -q - name: Check MRE present run: test -f mre.py || (echo "MRE missing" && exit 1) llm-call: needs: preflight runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Estimate tokens run: python estimate_tokens.py --files mre.py --prompt-template prompt.txt --budget "${{ github.event.inputs.token_budget }}" - name: Call LLM if: success() run: | # call your LLM using secured token python call_llm.py --prompt-file prompt.txt - name: Apply patch and run tests run: | git apply patch.diff pytest -q

This enforces: tests must reproduce locally (preflight), MRE must exist, token budget checked, LLM only called from CI with secure keys.


Token estimation & cost calculator (simple, exact arithmetic)

Estimating tokens from characters

A practical rule: 1 token ≈ 4 characters in English (approximation, use your tokenizer for exact).
Formula: estimated_tokens = ceil(total_chars / 4)

Example calculation (step-by-step):
Suppose your prompt (files + traces) is 42,372 characters long.

  1. Divide by 4: 42,372 / 4 = 10,593.

  2. Round up (if needed): estimated_tokens = 10,593 tokens.

Cost example

Assume model price = $0.02 per 1,000 tokens (example pricing used solely for illustration).

  1. Tokens = 10,593.

  2. Thousands-of-tokens = 10,593 / 1000 = 10.593.

  3. Cost = 10.593 * $0.02 = $0.21186.

  4. Rounded to cents = $0.21.

(Every arithmetic step above computed explicitly.)

Tip: keep prompts ≤ 2,000–3,000 tokens when possible to reduce cost and improve latency.


Smart prompt compression strategies

  • Send failure + single-file MRE, not whole repo.

  • Remove comments, large whitespace, and long sample data.

  • Send only failing test and relevant functions.

  • Send diffs instead of full files. If you must send file, gzip and include only essential parts.

  • Use function signatures + types rather than full code when asking for algorithmic logic.


Prompt engineering patterns that save tokens

  1. Test-first prompt

    I have the following failing pytest output (paste). Provide a git-style patch that fixes only the code necessary so tests pass. Only return the patch.
  2. Diff-only prompt
    Provide the current file and the desired behavior; ask for a unified diff patch.

  3. Small-step prompt
    Ask for a single small change (e.g., function fix) rather than end-to-end rewrite.

  4. Strict format enforcement
    “Return JSON only with fields {patch, tests_run, success}” — easier to parse and validate.


Validation harness — run returned patch automatically

validate_patch.py (conceptual)

import subprocess, sys # apply patch subprocess.run(["git", "apply", "patch.diff"], check=True) # run tests r = subprocess.run(["pytest", "-q"], capture_output=True, text=True) print(r.stdout) if r.returncode != 0: print("Patch failed tests", r.stderr) sys.exit(2) print("Patch validated")

Use this in CI step immediately after receiving the patch.


Defensive prompts & guardrails (reduce hallucinations)

  • Ask LLM to not invent imports or API calls. Provide the exact dependency list or require code to only use the existing project imports.

  • Request executable code only; require pytest to pass in CI.

  • If the LLM returns explanations, automatically reject and re-run with Only return patch enforcement.


Common real-world patterns & examples

Pattern: Runtime type errors in Python

  • Preflight: run mypy / pytest.

  • Prompt: include failing traceback and function signature.

  • Patch: LLM suggests type coercion or validation.

  • Validation: run tests — success -> merge.

Pattern: Frontend CSS/JS regressions

  • Preflight: run npm run test, eslint, and visual regression (percy) or unit tests.

  • Prompt: include failing test and minimal component snippet.

  • Patch: LLM returns specific component diff.


FAQ 

Q: When should I NOT use an LLM for code?
A: Don’t use it to fix failing tests if you can’t produce an MRE, or when code involves secrets/crypto primitives you cannot validate locally. Use LLMs more for design/boilerplate than for security-critical code unless heavily validated.

Q: How often should I call an LLM?
A: Prefer fewer, highly focused calls. Use local automation to reduce repetitive prompts.

Q: What about using LLMs as pair-programming assistants?
A: Great, but keep the same disciplines: run tests locally first, then ask LLM to suggest concise changes.


Metrics & KPIs to track 

  • Tokens consumed per merged PR (baseline vs. post-adoption).

  • % of LLM-assisted patches that pass CI on first application.

  • Mean time to first green build (MTTFGB) for LLM-assisted PRs.

  • Token cost saved per sprint.


Integration checklist 

  •  Pre-commit hooks installed and enforced.

  •  MRE template created in /mre/ and required for LLM requests.

  •  CI workflow includes estimate_tokens.py and validate_patch.py.

  • Token budget per PR set and monitored.

  •  Post-merge telemetry enabled (tokens/PR, success rate).


#CyberDudeBivash #LLM #PromptEngineering #DevOps #CI #Precommit #TokenEfficiency #AIforDev #SoftwareEngineering #Productivity #MRE #Testing


Final quick checklist — 

  1. Run pre-commit and pytest.

  2. Create mre.py capturing the failing test.

  3. Run estimate_tokens.py to verify budget.

  4. Trigger llm-assist CI workflow to call LLM.

  5. Validate returned patch automatically (validate_patch.py).

  6. Merge only if CI green and token budget respected.

Comments

Popular posts from this blog

CyberDudeBivash Rapid Advisory — WordPress Plugin: Social-Login Authentication Bypass (Threat Summary & Emergency Playbook)

Hackers Injecting Malicious Code into GitHub Actions to Steal PyPI Tokens CyberDudeBivash — Threat Brief & Defensive Playbook

Exchange Hybrid Warning: CVE-2025-53786 can cascade into domain compromise (on-prem ↔ M365) By CyberDudeBivash — Cybersecurity & AI