Skip to main content

voro-scan (agent-builder)

The scanner CLI. A black-box static analysis engine that produces structured JSON consumed by voro-brain. Installed as agent-builder.

Purpose

voro-scan is the detection layer of the VORO pipeline. It runs pattern-based static analysis, integrates 14 external SAST tools, and outputs structured findings in JSON. It is never imported as a Python library — all consumers call it via CLI subprocess.

Architecture

agent-builder audit <target> --format json
→ Pattern matching (647+ YAML rules)
→ External scanner orchestration (14 tools)
→ Tiered verification (5-tier confidence system)
→ Deduplication
→ JSON output: CWD/audit-AUDIT-{id}.json

26 Components Across 10 Subsystems

SubsystemKey Components
CLITyper entry point, 11 sub-apps (audit, fix, heal, workflow, dashboard, version)
Scanner AgentUnified scanning: patterns → external tools → tiered verification → LLM → dedup
Pattern Registry647+ YAML patterns with severity, confidence, regex/literal rules
LLM ProviderBYOK abstraction — Ollama (local, preferred) or Claude API (fallback)
Tiered Verification5-tier confidence: symbolic proof → ensemble → high precision → low precision → LLM
External Scanners14 integrations: Slither, Mythril, Echidna, Bandit, ESLint, Opengrep, and more
Benchmark SystemGround truth comparison, P/R/F1, regression detection
HITL DashboardFastAPI + HTMX at localhost:8081 for finding review
Fixer PipelineAST + LLM hybrid code fixing with approval workflow
Heal DebateMulti-agent debate for complex vulnerability remediation

Language Coverage

LanguagePattern CountGround Truth
Solidity135SmartBugs, DeFiVulnLabs, DeFiHackLabs
Python60PyGoat
JavaScript59Juice Shop
Java55OWASP Benchmark (partial)
PHP41--
C#37--
C/C++35--
Motoko34--
Ruby34--
Go33Go Test Bench
Rust33--
Anchor (Solana)32--
Swift22--
Kotlin21--
Vyper15--
TypeScript12--
Total647+~32% validated

Scan Modes

ModeCostDescription
patternsFree, instantRegex/literal matching only
combinedFree, instantPatterns + external SAST + SCA scanners (used by voro-web)
llm_lightCheapLLM on high-risk files only
llm_deepExpensiveFull LLM analysis on all files
unifiedVariablePatterns + external tools + tiered LLM (default)

External Scanner Integrations

  • Python: Bandit, pip-audit
  • JavaScript: ESLint, npm-audit
  • Solidity: Slither, Mythril, Echidna, solhint
  • Go: gosec
  • Rust: cargo-audit, cargo-geiger, clippy
  • Universal: Opengrep (preferred over Semgrep), ast-grep

Plus ~800 bundled permissively-licensed Semgrep-format rules for Opengrep.

Tiered Verification System

  1. Symbolic Proof — auto-accept (Mythril/Echidna verified, weight 0.95)
  2. Ensemble Agreement — auto-accept (3+ scanners agree, weighted threshold 1.5)
  3. High Precision — auto-accept (Bayesian CI lower bound > 90%, needs 50+ samples)
  4. Low Precision — auto-reject (Bayesian CI upper bound < 10%, needs 20+ samples)
  5. LLM Required — uncertain findings routed to LLM for verification

10% spot-check of auto-decisions by LLM to detect drift.

Key Capabilities

  • Offline-first: Pattern and combined scan modes require zero network access
  • BYOK LLM: Bring your own model — Ollama locally or Claude API
  • Sovereign: No cloud upload, no telemetry, runs in a terminal
  • Taxonomy coverage: 9 taxonomies (SWC, CWE, OWASP Smart Contract, Immunefi, OWASP LLM, OWASP Agentic, DASP, MITRE ATLAS, Code4rena)
  • Canonical crosswalks: vulnerability taxonomy matrix in voro-scan/data/taxonomy_matrix.json and joined compliance crosswalk in voro-scan/data/category_compliance_crosswalk.json
  • Fix pipeline: agent-builder audit --fix --apply for automated remediation with approval workflow
  • Benchmark system: Ground truth YAMLs for precision/recall measurement and regression detection

CLI Reference

# Core scanning
agent-builder audit <target> --scan-mode patterns # Pattern-only (instant, offline)
agent-builder audit <target> --scan-mode combined # Patterns + external tools
agent-builder audit <target> --scan-mode unified # Full scan (default)
agent-builder audit <target> --format json # JSON output for pipeline consumption

# Fix pipeline
agent-builder fix <target> # Show fix diffs
agent-builder audit <target> --fix --apply # Scan + fix + apply

# Dashboard
agent-builder dashboard # HITL UI at localhost:8081

Output Contract

The audit command writes CWD/audit-AUDIT-{id}.json containing:

ScanResult {
resolved_path: string // Absolute path to scanned repo
success: boolean
findings: ScanFinding[] // Each with file, line, severity (UPPERCASE), category
files_scanned: number
languages_detected: dict
duration_ms: number
}

Field naming: voro-scan uses file and line. Downstream consumers (voro-brain) translate to file_path and line_number via the adapter layer.

Current State

  • Version: 0.3.0
  • Tests: 1,873 across 84 test files
  • Codebase: ~95K LOC in 12 packages (Python 3.12)
  • Validation: ~32% of patterns validated against ground truth benchmarks
  • Focus: Benchmark-driven validation and pattern tuning