WebDecoy Bot Scanner: Product Roadmap 2025

We’re excited to announce Bot Scanner, a revolutionary new detection capability that extends WebDecoy’s bot defense beyond honeypots. This comprehensive guide explains what Bot Scanner is, how it works, and our 6-week implementation timeline.

What is Bot Scanner?

Bot Scanner is a lightweight JavaScript-based detection system that identifies suspicious traffic patterns in real-time. Unlike passive honeypots that wait for bots to interact with decoys, Bot Scanner actively analyzes visitor behavior and reports threats immediately.

Key capabilities:

Detects automation tools and headless browsers
Identifies AI crawlers requesting your content
Analyzes behavioral patterns to spot suspicious activity
Enriches detections with threat intelligence
Works alongside your existing honeypot defenses

Why Bot Scanner Matters

Current limitations of honeypots:

Require bots to click on decoy links
Don’t catch bots that read your sitemap or follow robots.txt
Can’t detect headless browsers that don’t visit any decoys
Miss AI crawlers that only request HTML without executing JavaScript

What Bot Scanner solves:

Detects bots even if they never click on honeypots
Catches headless browsers with 90%+ accuracy
Identifies AI crawlers before they access your content
Provides detailed enrichment data (IP threat level, datacenter detection, bot identity)
Enables precise policy controls (allow/detect/block per bot type)

How Bot Scanner Works

The Detection Pipeline

┌─────────────────────────────────────────────────────────┐
│ Your Website + Bot Scanner JS Snippet                   │
│ (< 10KB gzipped, < 50ms overhead)                       │
└────────────────────┬────────────────────────────────────┘
                     │
    ┌────────────────┴────────────────┐
    │ Suspicious Score >= Threshold?  │
    └────────────────┬────────────────┘
                     │ YES
    ┌────────────────▼────────────────┐
    │ Send Signal Report (sendBeacon) │
    │ POST /v1/detect                 │
    └────────────────┬────────────────┘
                     │
    ┌────────────────▼────────────────────────────────────┐
    │ Ingest Service Enriches Detection                   │
    │ • IP threat intelligence (GreyNoise, AbuseIPDB)     │
    │ • Reverse DNS bot verification                      │
    │ • Datacenter/VPN/Proxy/Tor detection                │
    │ • AI crawler user-agent matching                    │
    └────────────────┬────────────────────────────────────┘
                     │
    ┌────────────────▼────────────────┐
    │ Rule Engine Evaluates           │
    │ (PostgreSQL NOTIFY: ~5-10ms)    │
    └────────────────┬────────────────┘
                     │
    ┌────────────────▼────────────────────────────────────┐
    │ Execute Response Actions (Parallel)                 │
    │ • Cloudflare IP block                               │
    │ • AWS WAF IP block                                  │
    │ • Slack/webhook notification                        │
    │ • Datadog/Splunk events                             │
    └─────────────────────────────────────────────────────┘

How Detection Works

Bot Scanner combines multiple analysis techniques:

Multi-layered detection:

Client-side pattern recognition to identify suspicious visitor behavior
Behavioral analysis of human vs. bot interaction patterns
Device fingerprinting to detect spoofed or headless browsers
AI crawler identification through multiple verification methods

Real-time enrichment:

Every detection is cross-referenced with threat intelligence databases
IP reputation and threat classification from industry-standard services
Geolocation and infrastructure analysis for context

Server-Side Enrichment

When Bot Scanner detects suspicious activity, the Ingest Service enriches the signal:

Data Source	Information	Latency
GreyNoise	Threat classification (benign, unknown, malicious)	<100ms
AbuseIPDB	Abuse report count and score	<100ms
IPQualityScore	Fraud score, VPN/proxy/Tor detection	<100ms
MaxMind GeoLite2	Geolocation and ASN lookup	<5ms (local)
Reverse DNS	Bot name verification with 24hr cache	<50ms

Performance Targets

Bot Scanner is designed for zero-impact integration:

Metric	Target
JS snippet size	< 10KB gzipped
Page load overhead	< 50ms
Time to block (detection → Cloudflare)	< 1 second
Detection accuracy	> 95%
False positive rate	< 0.1%
Cache hit rate	> 80%

Implementation Roadmap

Phase 1: JS Snippet Development (Week 1)

Deliverables:

Minified s.js script (~8-10KB)
CDN deployment at https://cdn.webdecoy.io/s.js
Configurable via data-account, data-scanner, data-threshold attributes

Key features:

Detection for Puppeteer, Playwright, Selenium (90%+ accuracy)
AI crawler User-Agent matching
Behavioral tracking (mouse entropy, interaction timing)
Optional honeypot injection (hidden links/form fields)

Installation snippet:

<script
  src="https://cdn.webdecoy.io/s.js"
  data-account="acc_7f8d9e3a"
  data-scanner="scn_4b5c6d"
  async
></script>

Phase 2: Ingest Service — `/v1/detect` Endpoint (Week 2)

Deliverables:

New POST /v1/detect endpoint for JS snippet payloads
IP enrichment pipeline (GreyNoise, AbuseIPDB, IPQS, MaxMind)
Bot verification via reverse DNS
Final scoring combining client + server signals
Storage with full signal data

Performance targets:

< 500ms from request to detection stored + NOTIFY
80% cache hit rate after warmup
Zero false positives on verified Googlebot/Bingbot

Acceptance criteria:

Accurate IP threat classification
Verified bots never falsely blocked
Cache warming on startup

Phase 3: Rule Engine — Response Actions (Week 3)

Deliverables:

6 response action integrations:
- Cloudflare IP list blocking
- AWS WAF IP set integration
- Slack notifications
- Webhook execution (HMAC-signed)
- Datadog events
- Splunk HEC

Features:

Configurable per account with minimum threat level
Retry logic with exponential backoff
Parallel action execution
Complete logging and audit trail

Acceptance criteria:

< 1 second from detection to Cloudflare block
3x retry on failures
99% action execution success rate

Phase 4: Backend API + Dashboard (Weeks 4-5)

Deliverables:

Bot Scanner in decoy creation flow
Configuration UI (sensitivity, module toggles)
Snippet generator with auto-copy
Detection explorer with source filtering
Bot policy management interface
AI crawler analytics dashboard

Key capabilities:

Sensitivity levels (low/medium/high) mapping to thresholds
Per-module toggle (automation, headless, AI crawlers, behavioral, fingerprinting)
Real-time detection list
Detailed detection view with all signals
Bot policies (allow/detect/block) per bot type
Analytics on detection trends

Phase 5: Testing & Documentation (Week 5-6)

Deliverables:

Test suite (80%+ coverage on critical paths)
Load testing results (1000 detections/min)
E2E tests: Puppeteer → detection → Cloudflare block
E2E tests: Fake GPTBot → detection → alert
False positive testing on real browsers/mobile
Customer integration guide
API documentation

Supported Bot Detection Categories

AI Crawlers (20+ Patterns)

Crawler	Company	Detection Method
GPTBot, ChatGPT-User, OAI-SearchBot	OpenAI	UA + reverse DNS
ClaudeBot, Anthropic-AI	Anthropic	UA + reverse DNS
Google-Extended	Google	UA + reverse DNS
Meta-ExternalAgent	Meta	UA + reverse DNS
PerplexityBot	Perplexity	UA + reverse DNS
ByteSpider	ByteDance	UA + reverse DNS
AmazonBot	Amazon	UA + reverse DNS
CCBot	Common Crawl	UA + reverse DNS
YouBot	You.com	UA + reverse DNS
Diffbot	Diffbot	UA + reverse DNS

Automation Frameworks

Puppeteer (with/without stealth plugin)
Playwright
Selenium
Nightmare
Webdriver
Phantom.js

Search Engines & SEO Tools

Googlebot (verified & allowed)
Bingbot (verified & allowed)
Ahrefs, Semrush, Screaming Frog
Custom robots classified as SEO tools

Threat Classification

Bot Scanner classifies detections into threat levels based on a combination of signals:

None - Normal visitor behavior, no threat detected
Low - Suspicious patterns detected, monitor
Medium - Probable bot detected, consider blocking
High - Confirmed bot threat, recommend action
Critical - Malicious bot activity, immediate blocking recommended

Your policies determine what action is taken at each level.

Key Capabilities

1. Bot Policy Management

Control exactly how you want to handle each bot type:

GPTBot → Allow (train on your content)
ClaudeBot → Allow (support Claude integration)
PerplexityBot → Block (competitive risk)
ByteSpider → Block (data extraction)
Headless Chrome → Block (likely automation)
Googlebot → Allow + verify (SEO essential)

2. Rich Detection Context

Every bot detection includes contextual information:

IP address and geolocation
Threat level classification
Detected bot type and identity
Signals that triggered the detection
Threat intelligence data (IP reputation, datacenter status, etc.)
Timestamp and detection source

This context helps you understand what threats are targeting your site and where they’re coming from.

3. AI Crawler Analytics

New dashboard showing:

Which crawlers visited - Breakdown of GPTBot, ClaudeBot, PerplexityBot, etc.
Frequency trends - How often each crawler accesses your site
Pages accessed - Which content is most targeted
Block effectiveness - Success rate of your policies
Training data impact - Estimated content extracted

4. Real-Time Alerts

Integrated with existing WebDecoy actions:

Slack notifications with detection details
PagerDuty escalation for critical threats
Custom webhooks with full signal data
Datadog/Splunk event streaming
Email summaries (daily/weekly/monthly)

Security Considerations

Privacy-First Design

Fingerprinting happens entirely on client-side
No tracking of legitimate user behavior
Detection only triggers on suspicious patterns
IP enrichment cached for performance
No retention of rejected detections

False Positive Prevention

Conservative thresholds (high specificity)
Verified bot whitelist (Googlebot, Bingbot always allowed)
Real browser testing across Chrome, Firefox, Safari, Mobile
Honeypot interaction required for highest confidence
User-configurable sensitivity levels

Defense Against Evasion

Layered detection (client + server signals required)
Continuous updates to automation detection patterns
IP threat intelligence reduces false passes
Reverse DNS verification prevents spoofing
Behavioral analysis harder to fake than UA checks alone

Frequently Asked Questions

Will Bot Scanner block legitimate users?

No. The false positive rate target is < 0.1%. Real browsers are extremely unlikely to trigger suspicious scores because:

Mouse entropy from real human movement is distinctive
Real browsers have proper WebGL/Canvas fingerprints
Legitimate user agents match expected patterns
Verified search engines (Google, Bing) are whitelisted

We start with conservative thresholds and provide sensitivity controls.

Does Bot Scanner impact page performance?

Minimally. The snippet is:

< 10KB gzipped - Smaller than a typical image
Async loading - Doesn’t block page rendering
< 50ms overhead - Negligible impact on Core Web Vitals
sendBeacon API - Fire-and-forget reporting (survives page navigation)

Performance testing shows zero measurable impact on page load times.

What about ad blockers?

We account for this by:

Using innocuous filenames that bypass common blocklists
Offering CNAME-based deployment (your domain, bypasses ad blockers)
Fallback to honeypot-only detection if snippet blocked
No reliance on external analytics services

Can I use Bot Scanner with my existing honeypots?

Absolutely. Bot Scanner complements your existing decoys:

Honeypots - Catch bots that click (high confidence, low volume)
Bot Scanner - Catch bots that don’t (moderate confidence, higher volume)
Together - Multi-layered defense increases accuracy and coverage

How is this different from WAF solutions?

Feature	Bot Scanner	WAF
Detection method	Behavioral + fingerprinting	Rules-based + signatures
AI crawler detection	✓ (specialized)	✗
False positive rate	< 0.1%	1-5%
Setup time	5 minutes	Hours/days
Cost	$59-449/mo	$100-1000+/mo
Integration	1 script tag	Network-level config

WAFs are powerful but generate false positives. Bot Scanner is purpose-built for bot detection with minimal friction.

Getting Started Timeline

Early access: Q4 2025 (current - Phase 1-2 complete) Beta release: Q4 2025 (Phase 3 complete) General availability: Q1 2026 (Phase 5 complete)

During early access:

Free for all existing WebDecoy customers
Early pricing locked in for life
Direct input on feature priorities
Premium support included

The Future of Bot Detection

Bot Scanner represents a fundamental shift in bot defense strategy:

From reactive to proactive - Detect threats before honeypots are triggered
From signatures to signals - Multiple correlated indicators, harder to evade
From blocking to understanding - Rich data enables informed policies
From one-size-fits-all to nuanced - Allow beneficial crawlers, block harmful ones

Conclusion

Bot Scanner is the next evolution of WebDecoy’s bot detection platform. By combining client-side behavioral analysis, server-side enrichment, and AI-crawler-specific detection, we’re building the most accurate bot detection system available.

Ready to try Bot Scanner?

Current WebDecoy customers: Early access starting now
New customers: Sign up for free tier and get instant access
Questions: Contact our security team at security@webdecoy.com

WebDecoy Bot Scanner. Detect bots before they detect your content.

Share this post

Like this post? Share it with your friends!

Want to see WebDecoy in action?

Get a personalized demo from our team.

Request Demo

WebDecoy Bot Scanner: Product Roadmap 2025

WebDecoy Bot Scanner: Product Roadmap 2025

What is Bot Scanner?

Why Bot Scanner Matters

How Bot Scanner Works

The Detection Pipeline

How Detection Works

Server-Side Enrichment

Performance Targets

Implementation Roadmap

Phase 1: JS Snippet Development (Week 1)

Phase 2: Ingest Service — /v1/detect Endpoint (Week 2)

Phase 3: Rule Engine — Response Actions (Week 3)

Phase 4: Backend API + Dashboard (Weeks 4-5)

Phase 5: Testing & Documentation (Week 5-6)

Supported Bot Detection Categories

AI Crawlers (20+ Patterns)

Automation Frameworks

Search Engines & SEO Tools

Threat Classification

Key Capabilities

1. Bot Policy Management

2. Rich Detection Context

3. AI Crawler Analytics

4. Real-Time Alerts

Security Considerations

Privacy-First Design

False Positive Prevention

Defense Against Evasion

Frequently Asked Questions

Will Bot Scanner block legitimate users?

Does Bot Scanner impact page performance?

What about ad blockers?

Can I use Bot Scanner with my existing honeypots?

How is this different from WAF solutions?

Getting Started Timeline

The Future of Bot Detection

Conclusion

Share this post

Want to see WebDecoy in action?

Phase 2: Ingest Service — `/v1/detect` Endpoint (Week 2)