voro-scan (agent-builder)
The scanner CLI. A black-box static analysis engine that produces structured JSON consumed by voro-brain. Installed as agent-builder.
Purpose
voro-scan is the detection layer of the VORO pipeline. It runs pattern-based static analysis, integrates 14 external SAST tools, and outputs structured findings in JSON. It is never imported as a Python library — all consumers call it via CLI subprocess.
Architecture
agent-builder audit <target> --format json
→ Pattern matching (647+ YAML rules)
→ External scanner orchestration (14 tools)
→ Tiered verification (5-tier confidence system)
→ Deduplication
→ JSON output: CWD/audit-AUDIT-{id}.json
26 Components Across 10 Subsystems
| Subsystem | Key Components |
|---|---|
| CLI | Typer entry point, 11 sub-apps (audit, fix, heal, workflow, dashboard, version) |
| Scanner Agent | Unified scanning: patterns → external tools → tiered verification → LLM → dedup |
| Pattern Registry | 647+ YAML patterns with severity, confidence, regex/literal rules |
| LLM Provider | BYOK abstraction — Ollama (local, preferred) or Claude API (fallback) |
| Tiered Verification | 5-tier confidence: symbolic proof → ensemble → high precision → low precision → LLM |
| External Scanners | 14 integrations: Slither, Mythril, Echidna, Bandit, ESLint, Opengrep, and more |
| Benchmark System | Ground truth comparison, P/R/F1, regression detection |
| HITL Dashboard | FastAPI + HTMX at localhost:8081 for finding review |
| Fixer Pipeline | AST + LLM hybrid code fixing with approval workflow |
| Heal Debate | Multi-agent debate for complex vulnerability remediation |
Language Coverage
| Language | Pattern Count | Ground Truth |
|---|---|---|
| Solidity | 135 | SmartBugs, DeFiVulnLabs, DeFiHackLabs |
| Python | 60 | PyGoat |
| JavaScript | 59 | Juice Shop |
| Java | 55 | OWASP Benchmark (partial) |
| PHP | 41 | -- |
| C# | 37 | -- |
| C/C++ | 35 | -- |
| Motoko | 34 | -- |
| Ruby | 34 | -- |
| Go | 33 | Go Test Bench |
| Rust | 33 | -- |
| Anchor (Solana) | 32 | -- |
| Swift | 22 | -- |
| Kotlin | 21 | -- |
| Vyper | 15 | -- |
| TypeScript | 12 | -- |
| Total | 647+ | ~32% validated |
Scan Modes
| Mode | Cost | Description |
|---|---|---|
patterns | Free, instant | Regex/literal matching only |
combined | Free, instant | Patterns + external SAST + SCA scanners (used by voro-web) |
llm_light | Cheap | LLM on high-risk files only |
llm_deep | Expensive | Full LLM analysis on all files |
unified | Variable | Patterns + external tools + tiered LLM (default) |
External Scanner Integrations
- Python: Bandit, pip-audit
- JavaScript: ESLint, npm-audit
- Solidity: Slither, Mythril, Echidna, solhint
- Go: gosec
- Rust: cargo-audit, cargo-geiger, clippy
- Universal: Opengrep (preferred over Semgrep), ast-grep
Plus ~800 bundled permissively-licensed Semgrep-format rules for Opengrep.
Tiered Verification System
- Symbolic Proof — auto-accept (Mythril/Echidna verified, weight 0.95)
- Ensemble Agreement — auto-accept (3+ scanners agree, weighted threshold 1.5)
- High Precision — auto-accept (Bayesian CI lower bound > 90%, needs 50+ samples)
- Low Precision — auto-reject (Bayesian CI upper bound < 10%, needs 20+ samples)
- LLM Required — uncertain findings routed to LLM for verification
10% spot-check of auto-decisions by LLM to detect drift.
Key Capabilities
- Offline-first: Pattern and combined scan modes require zero network access
- BYOK LLM: Bring your own model — Ollama locally or Claude API
- Sovereign: No cloud upload, no telemetry, runs in a terminal
- Taxonomy coverage: 9 taxonomies (SWC, CWE, OWASP Smart Contract, Immunefi, OWASP LLM, OWASP Agentic, DASP, MITRE ATLAS, Code4rena)
- Canonical crosswalks: vulnerability taxonomy matrix in
voro-scan/data/taxonomy_matrix.jsonand joined compliance crosswalk invoro-scan/data/category_compliance_crosswalk.json - Fix pipeline:
agent-builder audit --fix --applyfor automated remediation with approval workflow - Benchmark system: Ground truth YAMLs for precision/recall measurement and regression detection
CLI Reference
# Core scanning
agent-builder audit <target> --scan-mode patterns # Pattern-only (instant, offline)
agent-builder audit <target> --scan-mode combined # Patterns + external tools
agent-builder audit <target> --scan-mode unified # Full scan (default)
agent-builder audit <target> --format json # JSON output for pipeline consumption
# Fix pipeline
agent-builder fix <target> # Show fix diffs
agent-builder audit <target> --fix --apply # Scan + fix + apply
# Dashboard
agent-builder dashboard # HITL UI at localhost:8081
Output Contract
The audit command writes CWD/audit-AUDIT-{id}.json containing:
ScanResult {
resolved_path: string // Absolute path to scanned repo
success: boolean
findings: ScanFinding[] // Each with file, line, severity (UPPERCASE), category
files_scanned: number
languages_detected: dict
duration_ms: number
}
Field naming: voro-scan uses file and line. Downstream consumers (voro-brain) translate to file_path and line_number via the adapter layer.
Current State
- Version: 0.3.0
- Tests: 1,873 across 84 test files
- Codebase: ~95K LOC in 12 packages (Python 3.12)
- Validation: ~32% of patterns validated against ground truth benchmarks
- Focus: Benchmark-driven validation and pattern tuning