WebDecoy Bot Scanner: Product Roadmap 2025
Introducing Bot Scanner: behavioral analysis and AI crawler detection. Learn our 6-week product roadmap.
WebDecoy Team
WebDecoy Security Team
WebDecoy Bot Scanner: Product Roadmap 2025
We’re excited to announce Bot Scanner, a revolutionary new detection capability that extends WebDecoy’s bot defense beyond honeypots. This comprehensive guide explains what Bot Scanner is, how it works, and our 6-week implementation timeline.
What is Bot Scanner?
Bot Scanner is a lightweight JavaScript-based detection system that identifies suspicious traffic patterns in real-time. Unlike passive honeypots that wait for bots to interact with decoys, Bot Scanner actively analyzes visitor behavior and reports threats immediately.
Key capabilities:
- Detects automation tools and headless browsers
- Identifies AI crawlers requesting your content
- Analyzes behavioral patterns to spot suspicious activity
- Enriches detections with threat intelligence
- Works alongside your existing honeypot defenses
Why Bot Scanner Matters
Current limitations of honeypots:
- Require bots to click on decoy links
- Don’t catch bots that read your sitemap or follow robots.txt
- Can’t detect headless browsers that don’t visit any decoys
- Miss AI crawlers that only request HTML without executing JavaScript
What Bot Scanner solves:
- Detects bots even if they never click on honeypots
- Catches headless browsers with 90%+ accuracy
- Identifies AI crawlers before they access your content
- Provides detailed enrichment data (IP threat level, datacenter detection, bot identity)
- Enables precise policy controls (allow/detect/block per bot type)
How Bot Scanner Works
The Detection Pipeline
┌─────────────────────────────────────────────────────────┐
│ Your Website + Bot Scanner JS Snippet │
│ (< 10KB gzipped, < 50ms overhead) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────┴────────────────┐
│ Suspicious Score >= Threshold? │
└────────────────┬────────────────┘
│ YES
┌────────────────▼────────────────┐
│ Send Signal Report (sendBeacon) │
│ POST /v1/detect │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ Ingest Service Enriches Detection │
│ • IP threat intelligence (GreyNoise, AbuseIPDB) │
│ • Reverse DNS bot verification │
│ • Datacenter/VPN/Proxy/Tor detection │
│ • AI crawler user-agent matching │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────┐
│ Rule Engine Evaluates │
│ (PostgreSQL NOTIFY: ~5-10ms) │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ Execute Response Actions (Parallel) │
│ • Cloudflare IP block │
│ • AWS WAF IP block │
│ • Slack/webhook notification │
│ • Datadog/Splunk events │
└─────────────────────────────────────────────────────┘How Detection Works
Bot Scanner combines multiple analysis techniques:
Multi-layered detection:
- Client-side pattern recognition to identify suspicious visitor behavior
- Behavioral analysis of human vs. bot interaction patterns
- Device fingerprinting to detect spoofed or headless browsers
- AI crawler identification through multiple verification methods
Real-time enrichment:
- Every detection is cross-referenced with threat intelligence databases
- IP reputation and threat classification from industry-standard services
- Geolocation and infrastructure analysis for context
Server-Side Enrichment
When Bot Scanner detects suspicious activity, the Ingest Service enriches the signal:
| Data Source | Information | Latency |
|---|---|---|
| GreyNoise | Threat classification (benign, unknown, malicious) | <100ms |
| AbuseIPDB | Abuse report count and score | <100ms |
| IPQualityScore | Fraud score, VPN/proxy/Tor detection | <100ms |
| MaxMind GeoLite2 | Geolocation and ASN lookup | <5ms (local) |
| Reverse DNS | Bot name verification with 24hr cache | <50ms |
Performance Targets
Bot Scanner is designed for zero-impact integration:
| Metric | Target |
|---|---|
| JS snippet size | < 10KB gzipped |
| Page load overhead | < 50ms |
| Time to block (detection → Cloudflare) | < 1 second |
| Detection accuracy | > 95% |
| False positive rate | < 0.1% |
| Cache hit rate | > 80% |
Implementation Roadmap
Phase 1: JS Snippet Development (Week 1)
Deliverables:
- Minified
s.jsscript (~8-10KB) - CDN deployment at
https://cdn.webdecoy.io/s.js - Configurable via
data-account,data-scanner,data-thresholdattributes
Key features:
- Detection for Puppeteer, Playwright, Selenium (90%+ accuracy)
- AI crawler User-Agent matching
- Behavioral tracking (mouse entropy, interaction timing)
- Optional honeypot injection (hidden links/form fields)
Installation snippet:
<script
src="https://cdn.webdecoy.io/s.js"
data-account="acc_7f8d9e3a"
data-scanner="scn_4b5c6d"
async
></script>Phase 2: Ingest Service — /v1/detect Endpoint (Week 2)
Deliverables:
- New
POST /v1/detectendpoint for JS snippet payloads - IP enrichment pipeline (GreyNoise, AbuseIPDB, IPQS, MaxMind)
- Bot verification via reverse DNS
- Final scoring combining client + server signals
- Storage with full signal data
Performance targets:
- < 500ms from request to detection stored + NOTIFY
80% cache hit rate after warmup
- Zero false positives on verified Googlebot/Bingbot
Acceptance criteria:
- Accurate IP threat classification
- Verified bots never falsely blocked
- Cache warming on startup
Phase 3: Rule Engine — Response Actions (Week 3)
Deliverables:
- 6 response action integrations:
- Cloudflare IP list blocking
- AWS WAF IP set integration
- Slack notifications
- Webhook execution (HMAC-signed)
- Datadog events
- Splunk HEC
Features:
- Configurable per account with minimum threat level
- Retry logic with exponential backoff
- Parallel action execution
- Complete logging and audit trail
Acceptance criteria:
- < 1 second from detection to Cloudflare block
- 3x retry on failures
- 99% action execution success rate
Phase 4: Backend API + Dashboard (Weeks 4-5)
Deliverables:
- Bot Scanner in decoy creation flow
- Configuration UI (sensitivity, module toggles)
- Snippet generator with auto-copy
- Detection explorer with source filtering
- Bot policy management interface
- AI crawler analytics dashboard
Key capabilities:
- Sensitivity levels (low/medium/high) mapping to thresholds
- Per-module toggle (automation, headless, AI crawlers, behavioral, fingerprinting)
- Real-time detection list
- Detailed detection view with all signals
- Bot policies (allow/detect/block) per bot type
- Analytics on detection trends
Phase 5: Testing & Documentation (Week 5-6)
Deliverables:
- Test suite (80%+ coverage on critical paths)
- Load testing results (1000 detections/min)
- E2E tests: Puppeteer → detection → Cloudflare block
- E2E tests: Fake GPTBot → detection → alert
- False positive testing on real browsers/mobile
- Customer integration guide
- API documentation
Supported Bot Detection Categories
AI Crawlers (20+ Patterns)
| Crawler | Company | Detection Method |
|---|---|---|
| GPTBot, ChatGPT-User, OAI-SearchBot | OpenAI | UA + reverse DNS |
| ClaudeBot, Anthropic-AI | Anthropic | UA + reverse DNS |
| Google-Extended | UA + reverse DNS | |
| Meta-ExternalAgent | Meta | UA + reverse DNS |
| PerplexityBot | Perplexity | UA + reverse DNS |
| ByteSpider | ByteDance | UA + reverse DNS |
| AmazonBot | Amazon | UA + reverse DNS |
| CCBot | Common Crawl | UA + reverse DNS |
| YouBot | You.com | UA + reverse DNS |
| Diffbot | Diffbot | UA + reverse DNS |
Automation Frameworks
- Puppeteer (with/without stealth plugin)
- Playwright
- Selenium
- Nightmare
- Webdriver
- Phantom.js
Search Engines & SEO Tools
- Googlebot (verified & allowed)
- Bingbot (verified & allowed)
- Ahrefs, Semrush, Screaming Frog
- Custom robots classified as SEO tools
Threat Classification
Bot Scanner classifies detections into threat levels based on a combination of signals:
- None - Normal visitor behavior, no threat detected
- Low - Suspicious patterns detected, monitor
- Medium - Probable bot detected, consider blocking
- High - Confirmed bot threat, recommend action
- Critical - Malicious bot activity, immediate blocking recommended
Your policies determine what action is taken at each level.
Key Capabilities
1. Bot Policy Management
Control exactly how you want to handle each bot type:
- GPTBot → Allow (train on your content)
- ClaudeBot → Allow (support Claude integration)
- PerplexityBot → Block (competitive risk)
- ByteSpider → Block (data extraction)
- Headless Chrome → Block (likely automation)
- Googlebot → Allow + verify (SEO essential)
2. Rich Detection Context
Every bot detection includes contextual information:
- IP address and geolocation
- Threat level classification
- Detected bot type and identity
- Signals that triggered the detection
- Threat intelligence data (IP reputation, datacenter status, etc.)
- Timestamp and detection source
This context helps you understand what threats are targeting your site and where they’re coming from.
3. AI Crawler Analytics
New dashboard showing:
- Which crawlers visited - Breakdown of GPTBot, ClaudeBot, PerplexityBot, etc.
- Frequency trends - How often each crawler accesses your site
- Pages accessed - Which content is most targeted
- Block effectiveness - Success rate of your policies
- Training data impact - Estimated content extracted
4. Real-Time Alerts
Integrated with existing WebDecoy actions:
- Slack notifications with detection details
- PagerDuty escalation for critical threats
- Custom webhooks with full signal data
- Datadog/Splunk event streaming
- Email summaries (daily/weekly/monthly)
Security Considerations
Privacy-First Design
- Fingerprinting happens entirely on client-side
- No tracking of legitimate user behavior
- Detection only triggers on suspicious patterns
- IP enrichment cached for performance
- No retention of rejected detections
False Positive Prevention
- Conservative thresholds (high specificity)
- Verified bot whitelist (Googlebot, Bingbot always allowed)
- Real browser testing across Chrome, Firefox, Safari, Mobile
- Honeypot interaction required for highest confidence
- User-configurable sensitivity levels
Defense Against Evasion
- Layered detection (client + server signals required)
- Continuous updates to automation detection patterns
- IP threat intelligence reduces false passes
- Reverse DNS verification prevents spoofing
- Behavioral analysis harder to fake than UA checks alone
Frequently Asked Questions
Will Bot Scanner block legitimate users?
No. The false positive rate target is < 0.1%. Real browsers are extremely unlikely to trigger suspicious scores because:
- Mouse entropy from real human movement is distinctive
- Real browsers have proper WebGL/Canvas fingerprints
- Legitimate user agents match expected patterns
- Verified search engines (Google, Bing) are whitelisted
We start with conservative thresholds and provide sensitivity controls.
Does Bot Scanner impact page performance?
Minimally. The snippet is:
- < 10KB gzipped - Smaller than a typical image
- Async loading - Doesn’t block page rendering
- < 50ms overhead - Negligible impact on Core Web Vitals
- sendBeacon API - Fire-and-forget reporting (survives page navigation)
Performance testing shows zero measurable impact on page load times.
What about ad blockers?
We account for this by:
- Using innocuous filenames that bypass common blocklists
- Offering CNAME-based deployment (your domain, bypasses ad blockers)
- Fallback to honeypot-only detection if snippet blocked
- No reliance on external analytics services
Can I use Bot Scanner with my existing honeypots?
Absolutely. Bot Scanner complements your existing decoys:
- Honeypots - Catch bots that click (high confidence, low volume)
- Bot Scanner - Catch bots that don’t (moderate confidence, higher volume)
- Together - Multi-layered defense increases accuracy and coverage
How is this different from WAF solutions?
| Feature | Bot Scanner | WAF |
|---|---|---|
| Detection method | Behavioral + fingerprinting | Rules-based + signatures |
| AI crawler detection | ✓ (specialized) | ✗ |
| False positive rate | < 0.1% | 1-5% |
| Setup time | 5 minutes | Hours/days |
| Cost | $59-449/mo | $100-1000+/mo |
| Integration | 1 script tag | Network-level config |
WAFs are powerful but generate false positives. Bot Scanner is purpose-built for bot detection with minimal friction.
Getting Started Timeline
Early access: Q4 2025 (current - Phase 1-2 complete) Beta release: Q4 2025 (Phase 3 complete) General availability: Q1 2026 (Phase 5 complete)
During early access:
- Free for all existing WebDecoy customers
- Early pricing locked in for life
- Direct input on feature priorities
- Premium support included
The Future of Bot Detection
Bot Scanner represents a fundamental shift in bot defense strategy:
- From reactive to proactive - Detect threats before honeypots are triggered
- From signatures to signals - Multiple correlated indicators, harder to evade
- From blocking to understanding - Rich data enables informed policies
- From one-size-fits-all to nuanced - Allow beneficial crawlers, block harmful ones
Conclusion
Bot Scanner is the next evolution of WebDecoy’s bot detection platform. By combining client-side behavioral analysis, server-side enrichment, and AI-crawler-specific detection, we’re building the most accurate bot detection system available.
Ready to try Bot Scanner?
- Current WebDecoy customers: Early access starting now
- New customers: Sign up for free tier and get instant access
- Questions: Contact our security team at security@webdecoy.com
WebDecoy Bot Scanner. Detect bots before they detect your content.
Share this post
Like this post? Share it with your friends!
Want to see WebDecoy in action?
Get a personalized demo from our team.