AI & NLP for Threat Intelligence (2025): Automate IOC/TTP Extraction, Summaries & ATT&CK Mapping By CyberDudeBivash • September 21, 2025 (IST)
TL;DR
-
What you’ll build: an end-to-end CTI pipeline that ingests reports/feeds → extracts IOCs & TTPs → normalizes/dedupes → maps to MITRE ATT&CK → publishes STIX 2.1 to your TIP (MISP/OpenCTI) and pushes detections to SIEM/SOAR. ATT&CK is your lingua franca for adversary behavior. MITRE ATT&CK+1
-
Why now: mature building blocks exist—spaCy/HuggingFace for NER, STIX/TAXII 2.1 for exchange, MISP/OpenCTI for knowledge graphs, ATT&CK Navigator for coverage views. MITRE ATT&CK+5spacy.io+5Hugging Face+5
-
Business win: shrink report-to-detection from days to minutes; measure precision/recall on extractions and coverage deltas per ATT&CK technique. (Use CISA’s mapping practices to keep analysts honest.) CISA
1) What problems AI actually solves in CTI
-
Speed: OCR/PDF → clean text → IOC/TTP extraction and entity linking at stream speed.
-
Normalization: inconsistent formats → STIX 2.1 objects (Indicator, Malware, Intrusion Set, Relationship). OASIS Open+1
-
Prioritization: summarize long reports; rank IOCs by observed-in and confidence; map to your detection gaps using ATT&CK. MITRE ATT&CK
-
Distribution: auto-publish via TAXII 2.1 to TIPs and subscribers. docs.oasis-open.org+1
2) Reference pipeline
Ingest → Parse → NER/IOC extract → Validate → Normalize & De-dup → TTP extraction → ATT&CK mapping → STIX 2.1 pack → TAXII publish → SIEM/SOAR actions
2.1 Ingest & parsing
-
Accept PDF/HTML/blog/TWITTER/X feeds. Strip boilerplate; preserve line breaks for pattern-based cues (e.g., command blocks).
2.2 IOC extraction (NER + rules)
-
Use spaCy (fast, customizable) + Hugging Face token-classification models for domain/IP/hash/URL/CVE tags; backstop with regex/heuristics for high-precision patterns. spacy.io+1
-
Validate with shape checks (IPv4/6, TLD list), sinkhole typo-squats, and active DNS lookups (quarantined).
2.3 TTP extraction (behavior → techniques)
-
Pattern library for common textual cues → ATT&CK techniques; e.g., “mimikatz/lsass dump” → Credential Dumping (T1003); “regsvr32 /s /u /i:http” → Signed Binary Proxy Execution (T1218). Use ATT&CK technique pages as your source of truth. MITRE ATT&CK
-
Apply weak/medium/strong mapping rules and keep analyst review in the loop (see §5).
2.4 Normalize & de-dup
-
Canonicalize domains (
evil[.]com
→evil.com
), hashes, and CVEs; merge by observable keys; attach source and confidence.
2.5 Package & publish
-
Emit STIX 2.1 Indicator + Sighting + Relationship objects; push via TAXII 2.1 to MISP or OpenCTI; both speak STIX and have broad integrations. docs.opencti.io+3OASIS Open+3OASIS Open+3
2.6 Close the loop
-
Use ATT&CK Navigator layers to visualize what techniques the intel covers vs your detections. Feed gaps to your SIEM/SOAR backlog. MITRE ATT&CK
3) Minimal working example (Python)
3.1 Extract IOCs with spaCy + Transformers
(Hugging Face “token-classification”/NER pipeline & docs shown for reference.) Hugging Face+1
3.2 Map text snippets to ATT&CK techniques (heuristics)
(Use MITRE ATT&CK technique catalog to validate mappings & keep rules refreshed.) MITRE ATT&CK
3.3 Emit a STIX 2.1 Indicator bundle (simplified)
(STIX 2.1 is the current exchange standard for CTI; see the OASIS spec & examples.) OASIS Open+1
3.4 TAXII 2.1 publish (conceptual)
-
POST your bundle to a TAXII 2.1 collections/{id}/objects endpoint with an API token. (See OASIS TAXII 2.1 for REST details.) docs.oasis-open.org
-
On the receiving end, MISP or OpenCTI ingests and enriches (sightings, relationships, graph). misp-project.org+1
4) Integrations that matter (and why)
Layer | Tooling | Why it helps |
---|---|---|
TIP | OpenCTI, MISP | Knowledge graphing, STIX in/out, connectors, collaboration. docs.opencti.io+1 |
Exchange | STIX 2.1 / TAXII 2.1 | Vendor-neutral, standards-based sharing/publishing. OASIS Open+1 |
Mapping/coverage | MITRE ATT&CK + Navigator | Normalized TTPs and visualization of detection gaps. MITRE ATT&CK+1 |
Extraction | spaCy, Transformers (HF) | Production-grade NER + customizable models. spacy.io+1 |
5) Human-in-the-loop (HITL) keeps you honest
-
Analyst review gates: promote items to “published” only after a short check of precision (especially TTP mappings).
-
CISA’s ATT&CK mapping guidance: avoid “wishful mapping” and biases; require evidence strings linking text to technique IDs. CISA
-
Feedback loops: false positives go back to training (regex tweaks, prompt updates, model fine-tuning).
6) Quality & ROI: measure these, or it didn’t happen
-
Extraction P/R/F1 for IOCs & TTPs (label 200–500 sentences; update quarterly).
-
Latency: ingest→publish p50/p95.
-
Coverage delta: techniques with active detections before vs after intel import (Navigator layer diff). MITRE ATT&CK
-
SOC impact: time saved per case, auto-enrichment hit rate, ratio of auto-closed low-risk alerts.
-
Cost to value: GPU/CPU time vs analyst hours saved.
7) Production safeguards
-
Confidence scoring & source weighting (vendor reputation, age, sightings).
-
De-dup & decay: older IOCs auto-downgrade unless re-sighted.
-
Toxic data filters: block “copy-pasted” attack chains from Reddit/unknown gists without corroboration.
-
Tenant-aware exports: separate workforce vs customer intel where licensing requires it.
8) 30/60/90-day rollout
Days 1–30 (Pilot)
-
Stand up OpenCTI or MISP; wire TAXII input, attach a small set of trusted sources. docs.opencti.io+1
-
Ship IOC extraction + basic ATT&CK heuristics; publish STIX 2.1 to a sandbox collection. OASIS Open
-
Start a 200-sentence golden set for evaluation.
Days 31–60 (Harden)
-
Add HITL UI, confidence tiers, and auto-dedup; enrich with WHOIS/passive DNS; auto-create Navigator layers for coverage reviews. MITRE ATT&CK
-
Begin SIEM/SOAR wiring: blocklists for high-confidence IOCs; analytics for common techniques.
Days 61–90 (Operate)
-
Expand TTP rules; add model fine-tuning for domain-specific jargon; schedule weekly metrics; open TAXII to internal consumers. docs.oasis-open.org
9) Playbooks
IOC → Action (high-confidence)
-
Publish STIX Indicator (+ Sighting if seen).
-
Create SOAR task to block (URL/IP/hash) and hunt last 30 days.
-
Expire after N days without sightings.
TTP → Action
-
Add ATT&CK technique to Navigator; check detection gap. MITRE ATT&CK
-
If gap exists: create SIEM rule/sigma/JEA script task.
-
Backfill search & case.
10) Build vs buy (fast guidance)
-
Buy platform; build extractors. Most teams win with a commercial/open TIP + custom NLP on top.
-
Red flags: no STIX/TAXII, no ATT&CK alignment, black-box ML without feedback loops, no export to SIEM/SOAR.
FAQs
Is LLM summarization safe for CTI?
Yes—with prompt constraints, source citations, and a human approval step for high-impact summaries.
Why not rely only on regex?
Rules give precision; ML adds recall and generalizes to unseen formats. Use both.
Can we auto-map techniques?
Use weak/strong evidence tiers + analyst review. CISA’s paper highlights common mapping errors—treat it as policy. CISA
Sources & primers
-
MITRE ATT&CK enterprise matrix, techniques & tools (Navigator). MITRE ATT&CK+2MITRE ATT&CK+2
-
STIX 2.1 spec & examples; TAXII 2.1 spec & intro docs. oasis-open.github.io+3OASIS Open+3oasis-open.github.io+3
-
MISP project docs; OpenCTI docs & repo. GitHub+3misp-project.org+3misp-project.org+3
-
spaCy NER API & 101; Hugging Face token-classification pipelines. spacy.io+2spacy.io+2
-
CISA: Best Practices for ATT&CK Mapping (analyst bias & evidence). CISA
Comments
Post a Comment