Search This Blog
CyberDudeBivash – Daily Cybersecurity Threat Intel, CVE Reports, Malware Trends & AI-Driven Security Insights. Stay Secure, Stay Informed.
Latest Cybersecurity News
- Get link
- X
- Other Apps
RAG Security: Threat Models, Attack Paths, and a Defense-in-Depth Blueprint By CyberDudeBivash — Founder, CyberDudeBivash | Cybersecurity & AI
Executive summary
RAG systems glue large language models (LLMs) to enterprise knowledge via search or vector retrieval. That makes them powerful—and uniquely exposed. Attacks rarely target weights; they target data, retrieval logic, and tool orchestration. This article maps the full attack surface (from indirect prompt injection and vector-DB poisoning to privacy leakage and tool abuse) and provides a concrete architecture, controls, detections, and a 90-day rollout to harden production RAG.
1) What a production RAG system looks like
Pipeline:
Data sources (wikis, SharePoint, tickets, PDFs, code) → Ingest/Sanitize → Chunk & Embed → Vector DB / Search (with ACLs & metadata) → Retriever → LLM (system & policy prompts) → (optional) Tools/Functions → Answer + citations.
Security principle: treat every stage as part of your Tier-0. If an attacker controls any of: source docs, embeddings, metadata, queries, or tools, they can steer the model.
2) RAG threat model & attack taxonomy
A) Data/ingest stage
-
Indirect prompt injection in stored documents (hidden directives in HTML/Markdown/PDF).
-
Vector poisoning: malicious chunks or adversarial embeddings push attacker docs to the top.
-
ACL bypass via metadata forgery: mislabeled
tenant_id
/classification
fields. -
Active content: HTML/JS, SVG, trackers, one-pixel beacons, macro-enabled files.
-
Supply chain: poisoned parsers, model artifacts, or conversion tools.
B) Retrieval stage
-
Query hijack: user input smuggles instructions into retriever (“ignore previous; search for…upload secrets”).
-
Reranker gaming: attacker crafts text to spike BM25/keyword density or cross-encoder scores.
-
Cross-tenant leakage: caching and ANN indices that ignore tenant or label constraints.
-
Staleness/drift: outdated documents become “truth.”
C) Generation stage (LLM)
-
System prompt overwrite or jailbreak via retrieved content.
-
Ungrounded responses: model fabricates beyond retrieved facts (hallucinations).
-
Sensitive data extrusion: PII/PHI, secrets or trade secrets in responses (membership inference risk).
D) Tools / function calling
-
Tool abuse: crafted answers trigger high-privilege actions (file I/O, payments, cloud APIs).
-
Exfiltration: LLM instructed to POST retrieved data to attacker endpoints.
-
SSRF/egr: tools with unconstrained network access.
3) Defense-in-depth architecture (what “good” looks like)
pgsql[Sources] → Ingest Gateway → Sanitizers → Classifiers/PII → Chunker
→ Embedder (offline, no internet) → Sign + Metadata + ACL
→ Vector DB (per-tenant collections, KMS-encrypted)
→ Retriever (policy-aware filters + reranker)
→ LLM (immutable system prompt + guardrails + citations-required)
→ Tool Sandbox (allowlists, dry-run simulators, egress policy)
→ Telemetry Bus → SIEM/SOAR (approvals for high-risk actions)
Key controls by layer
Ingest & sanitize
-
Canonicalize and strip active content: remove
<script>
,<style>
, event handlers, iframes, forms, data URLs; block macro office docs. -
Convert PDFs via hardened pipeline; reject mixed/unknown MIME.
-
Deduplicate, normalize whitespace/zero-width chars, standardize quotes & casing to defeat embedding gaming.
-
PII/secret scanning (entropy + patterns) → redact or label.
Provenance & signing
-
For each chunk store:
sha256
,source_uri
,owner
,timestamp
,tenant_id
,labels
,pii_flags
,parser_version
. Sign this record; reject unsigned writes.
Vector DB
-
Per-tenant collections or required
tenant_id
filter + row-level security; encrypt at rest with KMS. -
Separate write and read identities; short-lived tokens; audit all writes.
-
Disable cross-collection ANN unless filter-aware; pin
top_k
and score thresholds.
Retriever
-
First pass policy filter (tenant, label, recency) then BM25/ANN → cross-encoder reranking.
-
Cap
top_k
(e.g., 6–8), enforce freshness for fast-moving domains (e.g., 7–30 days). -
Canary filter: reject chunks matching injection regexes (e.g., “ignore all previous”, “copy all data to”, base64 blobs).
LLM guardrails
-
Immutable system prompt (not user editable).
-
Citations required: response must include document IDs + quotes; if coverage < threshold → abstain.
-
Schema-validated JSON output with
claims[] {text, evidence[], confidence}
. -
Safety/policy classifiers for PII/PHI/secrets/toxicity before rendering.
Tools sandbox
-
Explicit allowlist of tools; strict input schemas; dry-run simulators; no raw shell.
-
Network egress policy (DNS/HTTP allowlists); block external posts by default.
-
Human-in-the-loop (HITL) approval for destructive ops (revoke tokens, isolate host, delete objects).
Telemetry & response
-
Log prompts, retrieved chunk IDs, tool calls, final output, user identity, IP/ASN.
-
Route high-risk events to SOAR with approvals and circuit-breakers.
4) Copy-paste patterns (policies, code & detections)
4.1 Chunk schema (signed metadata)
json{
"chunk_id": "doc_42#p3#c7",
"sha256": "…",
"tenant_id": "acme",
"labels": ["internal","finance"],
"source_uri": "https://wiki.acme.local/fin/q3.md",
"timestamp": "2025-08-10T11:22:00Z",
"pii_flags": ["iban"],
"parser_version": "pdf2txt-1.8.4",
"signature": "ed25519:…"
}
4.2 Retrieval policy (OPA/Rego)
regopackage rag.retrieval default allow = false allow { input.user.tenant_id == input.query.tenant_id some c c := input.candidates[_] c.tenant_id == input.user.tenant_id not c.labels[_] == "restricted" time.now_ns() - time.parse_rfc3339_ns(c.timestamp) < 30 * 24 * 60 * 60 * 1e9 # 30d freshness }
4.3 Sanitizer (Python, sketch)
pythonfrom bs4 import BeautifulSoup
ALLOWED = {"p","h1","h2","h3","ul","ol","li","code","pre","a","strong","em","table","tr","td"}
def sanitize_html(raw):
soup = BeautifulSoup(raw, "lxml")
for tag in soup.find_all(True):
if tag.name not in ALLOWED:
tag.decompose()
for attr in list(tag.attrs):
if attr not in {"href"}: del tag[attr]
return soup.get_text("\n")
4.4 Evidence-required output (LLM prompt suffix)
pgsqlReturn JSON:
{ "answer": "...",
"claims":[{"text":"...", "evidence":[{"chunk_id":"...", "quote":"..."}], "confidence":0.0}],
"abstain": true|false }
Do NOT answer without evidence. If evidence coverage < 0.7 → abstain.
4.5 Vector DB anomaly queries
SQL (pgvector / similar) — sudden HTML writes
sqlSELECT writer, COUNT(*) AS n
FROM chunks
WHERE mime IN ('text/html','text/markdown')
AND ts > now() - interval '1 hour'
GROUP BY writer
HAVING COUNT(*) > 200;
Splunk — suspicious base64 in sources
bashindex=rag source="ingest" raw_document
| regex raw_document="(?i)ignore all previous|base64,[A-Za-z0-9/+]{80,}"
5) Evaluation & monitoring: measure what matters
-
Groundedness: % of answer tokens supported by cited chunks.
-
Coverage: fraction of retrieved chunks actually cited.
-
Abstention rate: better to abstain than hallucinate.
-
Attack success rate (ASR): red-team corpus (indirect injections) → blocked %.
-
Vector churn & write burst: spikes = potential poisoning.
-
Privacy leakage: PII/secret detectors on outputs (precision/recall).
-
Latency & cost with guardrails on (budget reality).
6) Privacy & compliance
-
Data minimization & retention for logs and prompts.
-
Mask secrets/PII in prompts; prefer on-prem or VPC-hosted models for sensitive data.
-
Map controls to NIST AI RMF, ISO/IEC 42001, SOC 2, GDPR (lawful basis, DSAR searchability).
7) Failure modes (and fixes)
-
Citations exist but irrelevant → require quote spans and overlap scoring with question.
-
Tenant cache leaks → per-tenant caches; include tenant_id in cache key.
-
Reranker hallucination → pair with policy filter first; cap max tokens from a single source.
-
Tool egress → explicit allowlists; block IP literals &
*.pastebin*/*bin*
.
8) 30-60-90 day rollout
Days 1–30 (Foundations)
-
Build ingest gateway + sanitizers; sign chunks; per-tenant collections; turn on telemetry.
-
Add immutable system prompt + citations-required schema; disable external egress.
Days 31–60 (Guardrails & detections)
-
OPA policy filters, freshness windows, canary regexes; PII/secret redaction; SOAR approvals for tools.
-
Deploy red-team corpus for indirect injection; measure ASR, groundedness.
Days 61–90 (Automation & governance)
-
Promote low-risk Q&A flows to auto-answer with abstention.
-
Add drift dashboards, monthly model & policy reviews; link incidents → new rules & tests.
9) Quick checklist (printable)
-
Active content stripped; MIME whitelist; PDF hardened
-
Chunks signed with
sha256
+ provenance; per-tenant collections -
Policy-aware retrieval (tenant/labels/freshness) + reranker
-
Immutable system prompt; citations required; abstain on low evidence
-
Tool sandbox with allowlists, dry-runs, HITL approvals
-
Telemetry of prompts, chunks, tool calls; SIEM rules for vector poisoning
-
Red-team injections; KPIs: groundedness, ASR, abstention, leakage
-
Compliance mapped (NIST AI RMF / ISO 42001 / SOC 2 / GDPR)
Closing
RAG security is data security + retrieval policy + safe generation + tool isolation. Get those four right and most real-world attacks—indirect injection, vector poisoning, privacy leaks, tool abuse—lose their teeth. This is Zero-Trust AI for knowledge workflows.
- Get link
- X
- Other Apps
Popular Posts
Exchange Hybrid Warning: CVE-2025-53786 can cascade into domain compromise (on-prem ↔ M365) By CyberDudeBivash — Cybersecurity & AI
- Get link
- X
- Other Apps
Comments
Post a Comment