Peruse AI — Local-First Web Agent
An autonomous web agent powered by local Vision-Language Models that explores web applications and generates structured reports on data, UX, and bugs.
01.Project Overview
Overview
Peruse AI is a local-first universal web agent that autonomously explores web applications and produces structured reports. Give it a URL and a goal — it navigates, clicks, scrolls, and analyzes the page using a Vision-Language Model running locally on your machine.
pip install peruse-ai
peruse run --url "https://example.com/dashboard" \
--task "Explore the dashboard and summarize all visible data"
Key Capabilities
- Autonomous Web Exploration — The agent plans and executes multi-step browser interactions to accomplish a given task
- Dual-Channel Perception — Combines DOM extraction and visual screenshots for robust element detection, handling cases where one modality fails
- 100% Local — All processing stays on your machine. Runs on Ollama, LM Studio, or any OpenAI-compatible local endpoint
- Multi-Output Pipeline — Generates three report types from a single session:
- Data Insights — Summaries of charts, tables, and visible data
- UX/UI Review — Contrast, layout, accessibility, and usability critique
- Bug Report — Console errors, failed requests, and reproduction steps
Architecture
The agent runs as an async Python process with three main layers:
- Browser Control — Playwright manages a Chromium instance, capturing screenshots and extracting DOM state at each step
- VLM Decision Engine — The Vision-Language Model receives screenshots + DOM context and outputs a structured action (click, type, scroll, navigate)
- Report Generator — Accumulated observations are synthesized into structured Markdown reports
The system supports configurable backends (Ollama, LM Studio, Jina VLM), retry logic for GPU crashes, and adjustable context windows for different hardware capabilities.
Tech Stack
- Python, AsyncIO, Playwright
- Ollama / LM Studio / Jina VLM (Vision-Language Models)
- Published on PyPI as
peruse-ai
Technologies
PythonPlaywrightOllamaVision-Language ModelsPyPIAsync
Role
ML Engineer & Software Architect
Timeline
Feb 2026
Category
AI Agents / Developer Tools