Author: CyberDudeBivash
Powered by: CyberDudeBivash Brand | cyberdudebivash.com
Related: cyberbivash.blogspot.com
Zero-days, exploit breakdowns, IOCs, detection rules & mitigation playbooks.
By Authority of: CyberDudeBivash
The era of "Cryptojacking" has evolved. While hackers once scrambled for your CPU to mine Bitcoin, they are now hunting your GPU to run Large Language Models. This is LLMjacking.
In this guide, we’ll break down how this exploit works and, more importantly, how you can build a fortress around your Ollama or local AI instance.
1. What is LLMjacking?
LLMjacking occurs when an attacker gains unauthorized access to a local AI server (like Ollama) to steal its "inference power."
The Exploit Mechanism
Scanning: Attackers use automated tools to scan the internet for port 11434 (Ollama's default).
Infiltration: Because most users don't set up an authentication layer, the attacker finds an open API.
The Theft: The attacker sends complex prompts to your server. Your GPU works at 100% capacity to generate responses for their application.
The Cost: You pay the electricity bill and suffer massive system lag; the attacker gets a free, high-performance AI API.
2. The CyberDudeBivash "Steel Wall" Defense
To stop LLMjacking, we must move from a "Public" state to a "Hardened" state. Follow these five steps to secure your server.
Step 1: Bind to Localhost (The Foundation)
Never allow Ollama to listen to the open web directly. Ensure your environment variables are set so Ollama only talks to your own machine.
Linux/Systemd: Set
OLLAMA_HOST=127.0.0.1in your service file.Docker: Do not map port
11434:11434. Instead, use internal container networking.
Step 2: Deploy the Nginx "Bouncer"
Since Ollama has no built-in password, we put a "Bouncer" (Nginx) in front of it. This requires every visitor to show an ID card (Username/Password).
Refer to our previous guide on Nginx Basic Auth for the configuration details.
Step 3: Encrypt with SSL (The Secret Code)
Without SSL (HTTPS), your password is sent in plain text. Using Let’s Encrypt ensures that even if someone intercepts the traffic, they can't read your credentials.
Step 4: Rate Limiting (The Anti-Spam)
LLM queries are resource-heavy. By setting a rate limit in Nginx (e.g., 2 requests per second), you prevent an attacker from flooding your GPU with thousands of tokens, even if they somehow bypass your password.
Step 5: Fail2Ban (The Ban Hammer)
Automate your defense. If an IP address tries to guess your password three times and fails, Fail2Ban should block that IP at the firewall level for 24 hours.
3. Verification Checklist
Run these tests to ensure you are safe:
Can I access
http://[Your-IP]:11434? (Answer should be NO).Does
https://yourdomain.comask for a password? (Answer should be YES).Does my GPU usage spike when I'm not using it? (Check via
nvidia-smiorhtop).
The Bottom Line
AI is the most expensive computing resource you own. Leaving an Ollama server unsecured in 2026 is the digital equivalent of leaving a gold bar on your front porch. Lock it down.
CyberDudeBivash Final Word: "Don't let your hardware work for the enemy. Encrypt, Authenticate, and Monitor."
To combat LLMjacking, we don't just want a passive firewall; we want an active alarm system. This script acts as a "tripwire"—if your GPU utilization stays above a certain threshold (e.g., 80%) for too long while you aren't using it, it sends an emergency alert to your phone via Telegram.
Step 1: Get Your Telegram Credentials
Bot Token: Message @BotFather on Telegram. Use
/newbotand follow the prompts to get your API Token.Chat ID: Message @userinfobot to get your unique Chat ID.
Step 2: Install the Python Dependencies
We will use nvitop (or pynvml) to pull real-time NVIDIA data.
pip install nvitop requests
Step 3: The "CyberDudeBivash" Tripwire Script
Create a file named gpu_shield.py and paste the following:
import time
import requests
from nvitop import Device
# --- CONFIGURATION ---
TELEGRAM_TOKEN = "YOUR_BOT_TOKEN"
CHAT_ID = "YOUR_CHAT_ID"
THRESHOLD_PERCENT = 80.0 # Alert if GPU > 80%
CHECK_INTERVAL = 30 # Check every 30 seconds
STRIKE_LIMIT = 2 # Alert after 2 consecutive high readings (60 seconds)
def send_telegram_alert(message):
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
payload = {"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"}
try:
requests.post(url, json=payload)
except Exception as e:
print(f"Error sending alert: {e}")
def monitor_gpu():
strikes = 0
print(" CyberDudeBivash GPU Shield Active...")
while True:
devices = Device.all()
for device in devices:
utilization = device.gpu_utilization()
if utilization > THRESHOLD_PERCENT:
strikes += 1
print(f" Warning: GPU {device.index} at {utilization}% (Strike {strikes})")
else:
strikes = 0 # Reset if usage drops
if strikes >= STRIKE_LIMIT:
alert_msg = (
f" *LLMjacking Alert!*\n"
f"High GPU activity detected on {device.physical_description}.\n"
f"Current Load: {utilization}%\n"
f"Check your Ollama logs immediately!"
)
send_telegram_alert(alert_msg)
strikes = 0 # Reset after sending alert
time.sleep(CHECK_INTERVAL)
if __name__ == "__main__":
monitor_gpu()
Step 4: Running it as a Background Service
To ensure this script runs 24/7 even after you close your terminal, use PM2 or a systemd service.
Using PM2 (easiest):
sudo npm install -g pm2
pm2 start gpu_shield.py --interpreter python3
pm2 save
pm2 startup
Why this works against LLMjacking
Attackers don't just run one small query; they flood your server with high-token-count requests to maximize their "theft." This causes your GPU to stay at high utilization for minutes or hours.
Legitimate Use: You usually know when you are running a model.
LLMjacking: You get a notification while you're away or asleep.
This is the final tier of the CyberDudeBivash defense strategy: Active Countermeasures.
If the "Tripwire" script detects that your GPU is being pinned for a sustained period—indicating a high-token LLMjacking attack—it will automatically execute a "Emergency Shutdown" of the Ollama service and alert you.
The "Emergency Kill" Upgrade
We will update your previous script to include a Strike System with a hard-kill command.
Updated
gpu_shield.pyPythonimport time import requests import subprocess from nvitop import Device # --- CONFIGURATION --- TELEGRAM_TOKEN = "YOUR_BOT_TOKEN" CHAT_ID = "YOUR_CHAT_ID" THRESHOLD_PERCENT = 85.0 # High usage threshold STRIKE_LIMIT = 10 # 10 strikes (at 30s intervals = 5 mins of constant high usage) CHECK_INTERVAL = 30 def send_telegram_alert(message): url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage" payload = {"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"} try: requests.post(url, json=payload) except: pass def emergency_shutdown(): """Shuts down the Ollama service to protect hardware and stop the theft.""" print(" CRITICAL: Sustained attack detected. Shutting down Ollama...") try: # Stop the systemd service subprocess.run(["sudo", "systemctl", "stop", "ollama"], check=True) # Force kill any lingering processes subprocess.run(["sudo", "pkill", "-9", "ollama"], check=False) return True except Exception as e: print(f"Failed to stop service: {e}") return False def monitor_gpu(): strikes = 0 print(" CyberDudeBivash Active Defense System Engaged...") while True: devices = Device.all() for device in devices: utilization = device.gpu_utilization() if utilization > THRESHOLD_PERCENT: strikes += 1 print(f" High usage: {utilization}% (Strike {strikes}/{STRIKE_LIMIT})") else: if strikes > 0: print(" Usage normalized. Resetting strikes.") strikes = 0 if strikes >= STRIKE_LIMIT: if emergency_shutdown(): msg = " *EMERGENCY SHUTDOWN EXECUTED*\nSustained high GPU load (5+ mins) detected. Ollama has been killed to prevent further theft." else: msg = " *SHUTDOWN FAILED*\nSustained attack detected but could not stop Ollama. Check server immediately!" send_telegram_alert(msg) strikes = 0 # Reset and wait for manual restart time.sleep(CHECK_INTERVAL) if __name__ == "__main__": monitor_gpu()Important: Granting "Kill" Permissions
Since the script needs
sudoto stop a system service, you must allow your user to runsystemctl stop ollamawithout a password. Otherwise, the script will hang.Run:
sudo visudoAdd this line at the bottom (replace
your_usernamewith your Linux user):your_username ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop ollama, /usr/bin/pkill -9 ollama
The "CyberDudeBivash" Hardened Stack Recap
Infrastructure: Ollama on Localhost.
Gateway: Nginx + SSL + Basic Auth.
Traffic Control: Rate Limiting (Nginx).
Intrusion Detection: Fail2Ban (Bans failed logins).
Active Countermeasure:
gpu_shield.py(Kills service if theft occurs).
CyberDudeBivash Final Note: "Authentication keeps out the honest hackers; automation stops the smart ones. You've officially turned your server from a victim into a fortress."
Unlike a standard web hack, a compromised AI server involves unique risks like Model Poisoning (corrupting your AI's logic) and Resource Hijacking. Here is the definitive recovery checklist.
Post-Incident Recovery Checklist
Immediate Containment
Kill the Service: Stop the Ollama process immediately (
sudo systemctl stop ollama) to sever any active attacker connections.Sever Network Exposure: Bind Ollama to
127.0.0.1and close port11434on your firewall.Isolate GPU/NPU: In high-security environments, restart the machine to clear the GPU's VRAM, ensuring no malicious resident code remains in memory.
Eradication & Malware Hunting
Audit Model Integrity: Attackers can upload "poisoned" models. Delete all models in your
~/.ollama/modelsfolder and re-download them from official sources (ollama pull).Scan for RCE Footprints: Check
/tmpand%TEMP%directories for suspicious executables. Exploits like CVE-2024-37032 can leave behind reverse shells or miners.Check for Persistence: Review your
crontabandsystemdservices for any new, unrecognized entries that might restart a miner or a backdoor.
Forensics & Investigation
Analyze Ollama Logs: Look for high-volume requests in
journalctl -u ollama. Note the IP addresses—these are your primary attackers.Audit Tool-Calling: If you had "tools" or "functions" enabled, check your system logs for unauthorized API calls or database queries executed by the AI.
Monitor for Data Exfiltration: Review outbound network traffic for spikes. Attackers may have used your model to process and "leak" local files.
Hardening & Restoration
Update to Version 0.7.0+: Ensure you are on the latest version to patch the Out-Of-Bounds Write and Path Traversal vulnerabilities.
Reset API Keys: If your Ollama server was connected to other apps (like LangChain or an Nginx proxy), rotate all associated API keys and passwords immediately.
Enable Logging: Configure Nginx to log not just the access, but the specific headers to better track future attempts.
The Clean Slate Strategy
If you suspect deep compromise (RCE), the safest path is to reimage the OS.
CyberDudeBivash Warning: "AI models are data, but they execute like code. If a model was swapped, your entire application's logic is now untrustworthy. When in doubt, wipe and rebuild."
Final Summary
| Incident Phase | Key Action |
| Detection | GPU usage spikes + Port 11434 exposure. |
| Protection | Nginx Reverse Proxy + SSL + Basic Auth. |
| Monitoring | gpu_shield.py + Fail2Ban. |
| Recovery | Delete local models, update version, and rotate keys. |
#AISecurity #LLMSecurity #Ollama #GenerativeAI #ModelInversion #AdversarialAI #AIInfrastructure # CYBERDUDEBIVASH

Comments
Post a Comment