voro-scan (agent-builder)

The scanner CLI. A black-box static analysis engine that produces structured JSON consumed by voro-brain. Installed as agent-builder.

Purpose

voro-scan is the detection layer of the VORO pipeline. It runs pattern-based static analysis, integrates 14 external SAST tools, and outputs structured findings in JSON. It is never imported as a Python library — all consumers call it via CLI subprocess.

Architecture

agent-builder audit <target> --format json
  → Pattern matching (647+ YAML rules)
  → External scanner orchestration (14 tools)
  → Tiered verification (5-tier confidence system)
  → Deduplication
  → JSON output: CWD/audit-AUDIT-{id}.json

26 Components Across 10 Subsystems

Subsystem	Key Components
CLI	Typer entry point, 11 sub-apps (audit, fix, heal, workflow, dashboard, version)
Scanner Agent	Unified scanning: patterns → external tools → tiered verification → LLM → dedup
Pattern Registry	647+ YAML patterns with severity, confidence, regex/literal rules
LLM Provider	BYOK abstraction — Ollama (local, preferred) or Claude API (fallback)
Tiered Verification	5-tier confidence: symbolic proof → ensemble → high precision → low precision → LLM
External Scanners	14 integrations: Slither, Mythril, Echidna, Bandit, ESLint, Opengrep, and more
Benchmark System	Ground truth comparison, P/R/F1, regression detection
HITL Dashboard	FastAPI + HTMX at localhost:8081 for finding review
Fixer Pipeline	AST + LLM hybrid code fixing with approval workflow
Heal Debate	Multi-agent debate for complex vulnerability remediation

Language Coverage

Language	Pattern Count	Ground Truth
Solidity	135	SmartBugs, DeFiVulnLabs, DeFiHackLabs
Python	60	PyGoat
JavaScript	59	Juice Shop
Java	55	OWASP Benchmark (partial)
PHP	41	--
C#	37	--
C/C++	35	--
Motoko	34	--
Ruby	34	--
Go	33	Go Test Bench
Rust	33	--
Anchor (Solana)	32	--
Swift	22	--
Kotlin	21	--
Vyper	15	--
TypeScript	12	--
Total	647+	~32% validated

Scan Modes

Mode	Cost	Description
`patterns`	Free, instant	Regex/literal matching only
`combined`	Free, instant	Patterns + external SAST + SCA scanners (used by voro-web)
`llm_light`	Cheap	LLM on high-risk files only
`llm_deep`	Expensive	Full LLM analysis on all files
`unified`	Variable	Patterns + external tools + tiered LLM (default)

External Scanner Integrations

Python: Bandit, pip-audit
JavaScript: ESLint, npm-audit
Solidity: Slither, Mythril, Echidna, solhint
Go: gosec
Rust: cargo-audit, cargo-geiger, clippy
Universal: Opengrep (preferred over Semgrep), ast-grep

Plus ~800 bundled permissively-licensed Semgrep-format rules for Opengrep.

Tiered Verification System

Symbolic Proof — auto-accept (Mythril/Echidna verified, weight 0.95)
Ensemble Agreement — auto-accept (3+ scanners agree, weighted threshold 1.5)
High Precision — auto-accept (Bayesian CI lower bound > 90%, needs 50+ samples)
Low Precision — auto-reject (Bayesian CI upper bound < 10%, needs 20+ samples)
LLM Required — uncertain findings routed to LLM for verification

10% spot-check of auto-decisions by LLM to detect drift.

Key Capabilities

Offline-first: Pattern and combined scan modes require zero network access
BYOK LLM: Bring your own model — Ollama locally or Claude API
Sovereign: No cloud upload, no telemetry, runs in a terminal
Taxonomy coverage: 9 taxonomies (SWC, CWE, OWASP Smart Contract, Immunefi, OWASP LLM, OWASP Agentic, DASP, MITRE ATLAS, Code4rena)
Canonical crosswalks: vulnerability taxonomy matrix in voro-scan/data/taxonomy_matrix.json and joined compliance crosswalk in voro-scan/data/category_compliance_crosswalk.json
Fix pipeline: agent-builder audit --fix --apply for automated remediation with approval workflow
Benchmark system: Ground truth YAMLs for precision/recall measurement and regression detection

CLI Reference

# Core scanning
agent-builder audit <target> --scan-mode patterns    # Pattern-only (instant, offline)
agent-builder audit <target> --scan-mode combined    # Patterns + external tools
agent-builder audit <target> --scan-mode unified     # Full scan (default)
agent-builder audit <target> --format json           # JSON output for pipeline consumption

# Fix pipeline
agent-builder fix <target>                           # Show fix diffs
agent-builder audit <target> --fix --apply           # Scan + fix + apply

# Dashboard
agent-builder dashboard                              # HITL UI at localhost:8081

Output Contract

The audit command writes CWD/audit-AUDIT-{id}.json containing:

ScanResult {
  resolved_path: string       // Absolute path to scanned repo
  success: boolean
  findings: ScanFinding[]     // Each with file, line, severity (UPPERCASE), category
  files_scanned: number
  languages_detected: dict
  duration_ms: number
}

Field naming: voro-scan uses file and line. Downstream consumers (voro-brain) translate to file_path and line_number via the adapter layer.

Current State

Version: 0.3.0
Tests: 1,873 across 84 test files
Codebase: ~95K LOC in 12 packages (Python 3.12)
Validation: ~32% of patterns validated against ground truth benchmarks
Focus: Benchmark-driven validation and pattern tuning

Purpose​

Architecture​

26 Components Across 10 Subsystems​

Language Coverage​

Scan Modes​

External Scanner Integrations​

Tiered Verification System​

Key Capabilities​

CLI Reference​

Output Contract​

Current State​