WebDecoy Bot Scanner: Product Roadmap 2025

We’re excited to announce Bot Scanner, a revolutionary new detection capability that extends WebDecoy’s bot defense beyond honeypots. This comprehensive guide explains what Bot Scanner is, how it works, and our 6-week implementation timeline.

What is Bot Scanner?

Bot Scanner is a lightweight JavaScript-based detection system that identifies suspicious traffic patterns in real-time. Unlike passive honeypots that wait for bots to interact with decoys, Bot Scanner actively analyzes visitor behavior and reports threats immediately.

Key capabilities:

  • Detects automation tools and headless browsers
  • Identifies AI crawlers requesting your content
  • Analyzes behavioral patterns to spot suspicious activity
  • Enriches detections with threat intelligence
  • Works alongside your existing honeypot defenses

Why Bot Scanner Matters

Current limitations of honeypots:

  • Require bots to click on decoy links
  • Don’t catch bots that read your sitemap or follow robots.txt
  • Can’t detect headless browsers that don’t visit any decoys
  • Miss AI crawlers that only request HTML without executing JavaScript

What Bot Scanner solves:

  • Detects bots even if they never click on honeypots
  • Catches headless browsers with 90%+ accuracy
  • Identifies AI crawlers before they access your content
  • Provides detailed enrichment data (IP threat level, datacenter detection, bot identity)
  • Enables precise policy controls (allow/detect/block per bot type)

How Bot Scanner Works

The Detection Pipeline

┌─────────────────────────────────────────────────────────┐
│ Your Website + Bot Scanner JS Snippet                   │
│ (< 10KB gzipped, < 50ms overhead)                       │
└────────────────────┬────────────────────────────────────┘

    ┌────────────────┴────────────────┐
    │ Suspicious Score >= Threshold?  │
    └────────────────┬────────────────┘
                     │ YES
    ┌────────────────▼────────────────┐
    │ Send Signal Report (sendBeacon) │
    │ POST /v1/detect                 │
    └────────────────┬────────────────┘

    ┌────────────────▼────────────────────────────────────┐
    │ Ingest Service Enriches Detection                   │
    │ • IP threat intelligence (GreyNoise, AbuseIPDB)     │
    │ • Reverse DNS bot verification                      │
    │ • Datacenter/VPN/Proxy/Tor detection                │
    │ • AI crawler user-agent matching                    │
    └────────────────┬────────────────────────────────────┘

    ┌────────────────▼────────────────┐
    │ Rule Engine Evaluates           │
    │ (PostgreSQL NOTIFY: ~5-10ms)    │
    └────────────────┬────────────────┘

    ┌────────────────▼────────────────────────────────────┐
    │ Execute Response Actions (Parallel)                 │
    │ • Cloudflare IP block                               │
    │ • AWS WAF IP block                                  │
    │ • Slack/webhook notification                        │
    │ • Datadog/Splunk events                             │
    └─────────────────────────────────────────────────────┘

How Detection Works

Bot Scanner combines multiple analysis techniques:

Multi-layered detection:

  • Client-side pattern recognition to identify suspicious visitor behavior
  • Behavioral analysis of human vs. bot interaction patterns
  • Device fingerprinting to detect spoofed or headless browsers
  • AI crawler identification through multiple verification methods

Real-time enrichment:

  • Every detection is cross-referenced with threat intelligence databases
  • IP reputation and threat classification from industry-standard services
  • Geolocation and infrastructure analysis for context

Server-Side Enrichment

When Bot Scanner detects suspicious activity, the Ingest Service enriches the signal:

Data SourceInformationLatency
GreyNoiseThreat classification (benign, unknown, malicious)<100ms
AbuseIPDBAbuse report count and score<100ms
IPQualityScoreFraud score, VPN/proxy/Tor detection<100ms
MaxMind GeoLite2Geolocation and ASN lookup<5ms (local)
Reverse DNSBot name verification with 24hr cache<50ms

Performance Targets

Bot Scanner is designed for zero-impact integration:

MetricTarget
JS snippet size< 10KB gzipped
Page load overhead< 50ms
Time to block (detection → Cloudflare)< 1 second
Detection accuracy> 95%
False positive rate< 0.1%
Cache hit rate> 80%

Implementation Roadmap

Phase 1: JS Snippet Development (Week 1)

Deliverables:

  • Minified s.js script (~8-10KB)
  • CDN deployment at https://cdn.webdecoy.io/s.js
  • Configurable via data-account, data-scanner, data-threshold attributes

Key features:

  • Detection for Puppeteer, Playwright, Selenium (90%+ accuracy)
  • AI crawler User-Agent matching
  • Behavioral tracking (mouse entropy, interaction timing)
  • Optional honeypot injection (hidden links/form fields)

Installation snippet:

<script
  src="https://cdn.webdecoy.io/s.js"
  data-account="acc_7f8d9e3a"
  data-scanner="scn_4b5c6d"
  async
></script>

Phase 2: Ingest Service — /v1/detect Endpoint (Week 2)

Deliverables:

  • New POST /v1/detect endpoint for JS snippet payloads
  • IP enrichment pipeline (GreyNoise, AbuseIPDB, IPQS, MaxMind)
  • Bot verification via reverse DNS
  • Final scoring combining client + server signals
  • Storage with full signal data

Performance targets:

  • < 500ms from request to detection stored + NOTIFY
  • 80% cache hit rate after warmup

  • Zero false positives on verified Googlebot/Bingbot

Acceptance criteria:

  • Accurate IP threat classification
  • Verified bots never falsely blocked
  • Cache warming on startup

Phase 3: Rule Engine — Response Actions (Week 3)

Deliverables:

  • 6 response action integrations:
    • Cloudflare IP list blocking
    • AWS WAF IP set integration
    • Slack notifications
    • Webhook execution (HMAC-signed)
    • Datadog events
    • Splunk HEC

Features:

  • Configurable per account with minimum threat level
  • Retry logic with exponential backoff
  • Parallel action execution
  • Complete logging and audit trail

Acceptance criteria:

  • < 1 second from detection to Cloudflare block
  • 3x retry on failures
  • 99% action execution success rate

Phase 4: Backend API + Dashboard (Weeks 4-5)

Deliverables:

  • Bot Scanner in decoy creation flow
  • Configuration UI (sensitivity, module toggles)
  • Snippet generator with auto-copy
  • Detection explorer with source filtering
  • Bot policy management interface
  • AI crawler analytics dashboard

Key capabilities:

  • Sensitivity levels (low/medium/high) mapping to thresholds
  • Per-module toggle (automation, headless, AI crawlers, behavioral, fingerprinting)
  • Real-time detection list
  • Detailed detection view with all signals
  • Bot policies (allow/detect/block) per bot type
  • Analytics on detection trends

Phase 5: Testing & Documentation (Week 5-6)

Deliverables:

  • Test suite (80%+ coverage on critical paths)
  • Load testing results (1000 detections/min)
  • E2E tests: Puppeteer → detection → Cloudflare block
  • E2E tests: Fake GPTBot → detection → alert
  • False positive testing on real browsers/mobile
  • Customer integration guide
  • API documentation

Supported Bot Detection Categories

AI Crawlers (20+ Patterns)

CrawlerCompanyDetection Method
GPTBot, ChatGPT-User, OAI-SearchBotOpenAIUA + reverse DNS
ClaudeBot, Anthropic-AIAnthropicUA + reverse DNS
Google-ExtendedGoogleUA + reverse DNS
Meta-ExternalAgentMetaUA + reverse DNS
PerplexityBotPerplexityUA + reverse DNS
ByteSpiderByteDanceUA + reverse DNS
AmazonBotAmazonUA + reverse DNS
CCBotCommon CrawlUA + reverse DNS
YouBotYou.comUA + reverse DNS
DiffbotDiffbotUA + reverse DNS

Automation Frameworks

  • Puppeteer (with/without stealth plugin)
  • Playwright
  • Selenium
  • Nightmare
  • Webdriver
  • Phantom.js

Search Engines & SEO Tools

  • Googlebot (verified & allowed)
  • Bingbot (verified & allowed)
  • Ahrefs, Semrush, Screaming Frog
  • Custom robots classified as SEO tools

Threat Classification

Bot Scanner classifies detections into threat levels based on a combination of signals:

  • None - Normal visitor behavior, no threat detected
  • Low - Suspicious patterns detected, monitor
  • Medium - Probable bot detected, consider blocking
  • High - Confirmed bot threat, recommend action
  • Critical - Malicious bot activity, immediate blocking recommended

Your policies determine what action is taken at each level.

Key Capabilities

1. Bot Policy Management

Control exactly how you want to handle each bot type:

  • GPTBot → Allow (train on your content)
  • ClaudeBot → Allow (support Claude integration)
  • PerplexityBot → Block (competitive risk)
  • ByteSpider → Block (data extraction)
  • Headless Chrome → Block (likely automation)
  • Googlebot → Allow + verify (SEO essential)

2. Rich Detection Context

Every bot detection includes contextual information:

  • IP address and geolocation
  • Threat level classification
  • Detected bot type and identity
  • Signals that triggered the detection
  • Threat intelligence data (IP reputation, datacenter status, etc.)
  • Timestamp and detection source

This context helps you understand what threats are targeting your site and where they’re coming from.

3. AI Crawler Analytics

New dashboard showing:

  • Which crawlers visited - Breakdown of GPTBot, ClaudeBot, PerplexityBot, etc.
  • Frequency trends - How often each crawler accesses your site
  • Pages accessed - Which content is most targeted
  • Block effectiveness - Success rate of your policies
  • Training data impact - Estimated content extracted

4. Real-Time Alerts

Integrated with existing WebDecoy actions:

  • Slack notifications with detection details
  • PagerDuty escalation for critical threats
  • Custom webhooks with full signal data
  • Datadog/Splunk event streaming
  • Email summaries (daily/weekly/monthly)

Security Considerations

Privacy-First Design

  • Fingerprinting happens entirely on client-side
  • No tracking of legitimate user behavior
  • Detection only triggers on suspicious patterns
  • IP enrichment cached for performance
  • No retention of rejected detections

False Positive Prevention

  • Conservative thresholds (high specificity)
  • Verified bot whitelist (Googlebot, Bingbot always allowed)
  • Real browser testing across Chrome, Firefox, Safari, Mobile
  • Honeypot interaction required for highest confidence
  • User-configurable sensitivity levels

Defense Against Evasion

  • Layered detection (client + server signals required)
  • Continuous updates to automation detection patterns
  • IP threat intelligence reduces false passes
  • Reverse DNS verification prevents spoofing
  • Behavioral analysis harder to fake than UA checks alone

Frequently Asked Questions

Will Bot Scanner block legitimate users?

No. The false positive rate target is < 0.1%. Real browsers are extremely unlikely to trigger suspicious scores because:

  • Mouse entropy from real human movement is distinctive
  • Real browsers have proper WebGL/Canvas fingerprints
  • Legitimate user agents match expected patterns
  • Verified search engines (Google, Bing) are whitelisted

We start with conservative thresholds and provide sensitivity controls.

Does Bot Scanner impact page performance?

Minimally. The snippet is:

  • < 10KB gzipped - Smaller than a typical image
  • Async loading - Doesn’t block page rendering
  • < 50ms overhead - Negligible impact on Core Web Vitals
  • sendBeacon API - Fire-and-forget reporting (survives page navigation)

Performance testing shows zero measurable impact on page load times.

What about ad blockers?

We account for this by:

  • Using innocuous filenames that bypass common blocklists
  • Offering CNAME-based deployment (your domain, bypasses ad blockers)
  • Fallback to honeypot-only detection if snippet blocked
  • No reliance on external analytics services

Can I use Bot Scanner with my existing honeypots?

Absolutely. Bot Scanner complements your existing decoys:

  • Honeypots - Catch bots that click (high confidence, low volume)
  • Bot Scanner - Catch bots that don’t (moderate confidence, higher volume)
  • Together - Multi-layered defense increases accuracy and coverage

How is this different from WAF solutions?

FeatureBot ScannerWAF
Detection methodBehavioral + fingerprintingRules-based + signatures
AI crawler detection✓ (specialized)
False positive rate< 0.1%1-5%
Setup time5 minutesHours/days
Cost$59-449/mo$100-1000+/mo
Integration1 script tagNetwork-level config

WAFs are powerful but generate false positives. Bot Scanner is purpose-built for bot detection with minimal friction.

Getting Started Timeline

Early access: Q4 2025 (current - Phase 1-2 complete) Beta release: Q4 2025 (Phase 3 complete) General availability: Q1 2026 (Phase 5 complete)

During early access:

  • Free for all existing WebDecoy customers
  • Early pricing locked in for life
  • Direct input on feature priorities
  • Premium support included

The Future of Bot Detection

Bot Scanner represents a fundamental shift in bot defense strategy:

  • From reactive to proactive - Detect threats before honeypots are triggered
  • From signatures to signals - Multiple correlated indicators, harder to evade
  • From blocking to understanding - Rich data enables informed policies
  • From one-size-fits-all to nuanced - Allow beneficial crawlers, block harmful ones

Conclusion

Bot Scanner is the next evolution of WebDecoy’s bot detection platform. By combining client-side behavioral analysis, server-side enrichment, and AI-crawler-specific detection, we’re building the most accurate bot detection system available.

Ready to try Bot Scanner?

  • Current WebDecoy customers: Early access starting now
  • New customers: Sign up for free tier and get instant access
  • Questions: Contact our security team at security@webdecoy.com

WebDecoy Bot Scanner. Detect bots before they detect your content.

Want to see WebDecoy in action?

Get a personalized demo from our team.

Request Demo