RTL / LTR Scripts & Browser Gaps — How Attackers Hide Malicious URLs By CyberDudeBivash (Bivash Kumar Nayak)
cyberdudebivash.com | cyberbivash.blogspot.com | cryptobivash.code.blog
TL;DR
Attackers abuse Unicode bidirectional controls (e.g., RIGHT-TO-LEFT OVERRIDE U+202E), mixed-script homoglyphs, and browser rendering quirks to make malicious URLs look benign in addresses, filenames, emails and UIs. This allows silent phishing, file-name spoofing, and evasion of basic URL filtering. Defenders must normalize and inspect for invisible bidi characters, enforce IDN/punycode display rules, and add logging & detection for mixed-script URLs.
How the trick works — short & precise
-
Bidi override characters (U+202E, U+202A, etc.) change the visual order of text. Example:
evilexe\u202Egnp.exe
may render asexe.png
to a user while the real filename isevilexegnp.exe
. -
Mixed-script homoglyphs replace characters (e.g., Latin
a
with Cyrillicа
) soapple.com
looks identical but the Unicode code points differ. -
Punycode / IDN tricks let attackers register domain names that visually match popular domains but are different under the hood (e.g.,
xn--pple-43d.com
). -
Browser & app display differences: some browsers/panels render bidi markers or decode IDNs differently (address bar vs. tab title vs. link text), creating user confusion.
-
Result: users click “what looks like” safe links; attackers get clicks into credential-harvesting pages, drive-by exploits, or spoofed download filenames.
Real-world attack patterns
-
Phishing email with anchor text
https://bank.example.com
but the actualhref
uses mixed scripts or RTL overrides to point tohxxp://evil.example
. -
Malicious attachment named
invoice\u202Egnp.pdf.exe
that appears asinvoice.pdf
in some file managers. -
Fake login pages hosted on IDN domains that display as
g00gle.com
visually. -
Adtech / redirected URLs that use URL shorteners containing bidi or homoglyphs so analysts misread the landing domain in logs.
Detection — SOC & devnotes (practical, deployable)
1) Quick detection regexes & checks
-
Detect bidi control characters in URLs/filenames (U+202A..U+202E, U+200E, U+200F):
-
Regex (PCRE):
[\x{202A}-\x{202E}\x{200E}\x{200F}]
-
-
Detect high ratio of mixed scripts (Latin + Cyrillic + Greek) in a single domain/label:
-
Heuristic: if more than 1 script class present in same label → flag.
-
-
Detect Punycode (IDN) domains:
-
Regex:
(^|\.)xn--[0-9a-z\-]+
-
2) Sigma-style hunt (pseudo)
3) Endpoint/EDR checks
-
Alert on downloads whose filename contains bidi chars or that contain more than one script class.
-
Monitor browser navigation events where destination host contains
xn--
(Punycode) or suspicious mixed-script labels.
4) SIEM enrichment
-
Normalize logged URLs to code point sequences and store both “visual” (rendered) and “raw” forms. Flag differences between link text and href. Correlate with user-click events.
Mitigation & hardening (short → mid → long)
Immediate (hours → days)
-
Canonicalize & normalize incoming URLs in mail gateways and web proxies: remove or encode bidi control characters, and compare normalized hostnames to blocklists.
-
Force display of raw IDN/punycode in admin/privileged UIs (show
xn--
), or show an unmistakable icon/tooltip when IDN is used. -
Disable auto-execution of downloaded files and show full file name including hidden characters in download dialogs.
-
Email gateway rules: if anchor text ≠ href (domain mismatch) — treat as suspicious and quarantine.
-
User education: show examples of RLO tricks and instruct to always hover and inspect full URL.
Mid-term (weeks)
-
Policy: block or warn on IDNs in critical systems and require allowlisting of domains for admin users.
-
Browser hardening: apply enterprise policies that force punycode display for IDNs and disable permissive rendering of bidi markers (many browsers have enterprise flags).
-
Dev/CI controls: sanitize filenames from uploads and downloads (strip bidi + invisible controls).
Long-term (months)
-
Platform fixes: work with vendors (browser, mail client, file explorer vendors) to ensure consistent display of Unicode controls and to show raw machine-readable names on hover.
-
Domain & trademark monitoring: proactively monitor IDN registrations for target brand look-alikes.
Defensive coding checklist (for dev teams)
-
When validating URLs: check
href
!= visible text; if mismatch, require user confirmation. -
Strip control characters from filenames and URL path segments before saving or executing.
-
Convert IDN domains to punycode and validate against allowlists for sensitive flows.
-
Log both rendered and raw forms of user-supplied URLs for incident triage.
For phishing analysts — quick triage workflow
-
Hover link → copy
href
and paste into a text editor that shows invisible chars (e.g., hex view). -
If URL contains
xn--
or bidi chars, fetch WHOIS/punycode and use a controlled sandbox to screenshot landing page. -
Check certificate subject for mismatch (IDN abuse often lacks valid cert for brand).
-
Check web proxy logs for repeated short-lived IDN or mixed-script domains.
IoCs & triage rules (examples)
-
Filenames containing
\u202E
or other bidi code points. -
Domains with
xn--
labels that resolve to uncommon hosts. -
URL anchor text that visually equals a popular domain but
href
points elsewhere.
Incident response (if users clicked / infection suspected)
-
Contain: isolate affected host and capture browser process memory & network connections.
-
Collect: browser history, download folder (show raw filenames), clipboard contents, and email source.
-
Hunt: search fleet for other users who received identical emails or who visited the same IDN domain.
-
Remediate: rotate credentials, revoke sessions, remove any dropped payloads. Reimage if arbitrary code execution found.
User awareness messaging (short, pasteable)
-
“If a link looks like a trusted site but came from email or ad, hover it — check the actual
href
. If the address contains odd characters, or you seexn--
in the domain, don’t click and report it to security.”
Why browsers & apps differ (why this remains an issue)
-
Unicode is complex and the Unicode Bidi algorithm was designed for correct rendering of mixed-direction text (Hebrew/Arabic + Latin). Browsers and apps historically prioritized user-friendly rendering over security; subtle differences in how address bars, tab titles, and link text are rendered cause spoofing opportunities. Vendors have made improvements, but new variants (homoglyphs + mixed-script) keep appearing.
Quick policy templates (for CISO/Security Ops)
-
Blocklist policy: block all inbound emails with hrefs where
anchor_text
!=href
domain or wherehref
contains bidi controls orxn--
unless pre-approved. -
Privileged user rule: admin consoles must be accessible only from devices with IDN display enforcement and no third-party ads.
#CyberDudeBivash #Bidi #RTL #Phishing #URLSpoofing #IDN #Punycode #DotNet #ThreatIntel #Cybersecurity
Comments
Post a Comment