Raj
Home/Projects/PocketLedger

PocketLedger

in-progress

Local-first personal finance tracker with OCR-powered statement parsing

2 progress reports
ETLResearchEngineeringModelingLaunch
FastAPISQLAlchemy 2.0AlembicPydantic 2.9+Tesseract (pytesseract)OpenCVPillowpdf2image
Overview

What It Is

PocketLedger is a local-first personal finance tracker that automatically extracts and categorizes transactions from bank statement images and PDFs. It provides spending analytics, budgeting insights, and receipt management, all running on your own machine without requiring cloud integration or external financial APIs.


Architecture

A single-tier web application with server-side rendering:

  • Backend: FastAPI with Jinja2 templates, serving HTML pages and JSON API endpoints. SQLite database with WAL journaling for concurrent access.
  • Parsing Pipeline: Auto-detecting parser discovery inspects OCR text and routes to bank-specific parsers (Chase, Bank of America, Discover). Image preprocessing (grayscale, denoise, adaptive threshold, deskew) precedes Tesseract OCR.
  • Categorization: Multi-stage: SQL LIKE pattern matching against merchant rules first, Ollama LLM fallback for unknowns, manual override available.
  • Receipt Processing: Ollama vision model extracts store name, items, prices, and categories from receipt images with retry logic and JSON validation.
Key Features

Data Sources

  • Bank Statements: PDF and image uploads from Chase, Bank of America, and Discover.
  • Receipt Images: JPG/PNG uploads for line-item expense tracking.
  • Seeded Data: 16 spending categories with merchant pattern rules.

Current State

Version 0.1.0 in active development. Statement upload, bank parser detection and extraction, CSV import, rule-based categorization, dashboard aggregations, receipt OCR with line-item extraction, and theme/font customization are all working. Outstanding items include receipt-to-transaction linking UI, advanced filtering, recurring transaction detection, and export to additional formats.

Progress Reports
Report #2 of 2
PocketLedger: Dashboard Redesign & CSV Import
Mar 13 - Mar 13, 2026

Devlog

Two substantial updates in one day: a dashboard redesign that makes the spending data actually useful for decision-making, and a CSV import feature that sidesteps the OCR pipeline entirely for banks that offer clean export files.

The dashboard redesign focused on drill-down navigation and comparative context. The overview now shows KPI cards with month-over-month deltas so you can see at a glance whether spending is trending up or down in each category. I added a daily view that breaks down individual transactions for any selected day, which is useful for reconciling against bank statements. The monthly view got a calendar layout with spending totals per day.

The recurring transaction detection was a feature I had been wanting: the system scans for the same merchant appearing across consecutive months and auto-flags those transactions. This matters for budgeting because recurring charges (subscriptions, utilities, insurance) behave differently from discretionary spending, and knowing the recurring baseline tells you how much of your monthly budget is already committed before you spend anything.

The CSV import was motivated by a practical problem. Bank of America's PDF statements have inconsistent formatting that makes OCR unreliable, but their CSV export is perfectly structured. So rather than fighting the OCR, I built a CSV parser that supports six bank formats: Chase, Bank of America, Discover, Capital One, Citi, and a generic auto-detect mode that heuristically identifies column layouts. The auto-detect works by scanning for known column header patterns (Transaction Date, Description, Amount, Debit, Credit) and mapping them to the internal schema. No OCR overhead, instant parsing, and the same SHA-256 deduplication and auto-categorization pipeline that the statement uploads use.

The upload page was redesigned into a three-column layout: PDF/image statement upload on the left, CSV import in the center, and receipt upload on the right. Each column has bank-specific instructions showing users how to export their data from their bank's website.

I also added a CSV export endpoint so you can get your categorized transactions back out, filtered by year and month. Data should flow both directions.

What's next: Receipt-to-transaction linking in the UI, advanced filtering and search on the transaction list, and potentially a theme system for visual customization.

Changelog

Added

  • Add dashboard V1 redesign with KPI cards and month-over-month category deltas
  • Add daily transaction drill-down view (day.html)
  • Add recurring transaction auto-detection based on merchant patterns across months
  • Add CSV export endpoint (/export/csv) with year/month filtering
  • Add CSV bank statement import for 6 formats (Chase, BofA, Discover, Capital One, Citi, generic)
  • Add generic CSV auto-detection with heuristic column mapping
  • Add bank-specific CSV export instructions in upload UI
  • Add breadcrumb navigation across dashboard views

Changed

  • Update upload page to 3-column layout (PDF/image, CSV, receipt)
  • Update settings page with expanded appearance panel
  • Update CSS with responsive breakpoints and drill-down UI components (+428 lines)
  • Update JavaScript with interactive category and account filters
dashboard-redesigncsv-importrecurring-transactionsdata-exportkpi-cardsmonth-over-monthdaily-drilldownmulti-bank-csvauto-detectioncapital-onecitiresponsive-designbreadcrumb-navigation
Report #1 of 2
PocketLedger: Full-Featured MVP
Mar 7 - Mar 8, 2026

Devlog

I shipped PocketLedger as a complete working application in a single commit: a local-first personal finance tracker that automatically extracts transactions from bank statement images and PDFs, categorizes them, and presents spending analytics through a web dashboard.

The application is a FastAPI backend with Jinja2 server-side rendering and SQLite for storage. I chose SQLite with WAL journaling over PostgreSQL because this is a personal tool that runs on your own machine. No cloud database, no external APIs beyond the optional Ollama integration for LLM-powered categorization. Everything stays local.

The bank statement parsing pipeline supports Chase, Bank of America, and Discover out of the box. Each parser uses bank-specific regex patterns to extract transactions from OCR text. The image preprocessing chain (grayscale, denoise, adaptive threshold, deskew) runs before Tesseract OCR, and a parser auto-detection function inspects the OCR output to route to the correct bank-specific extractor. PDF support comes via pdf2image for multi-page statements, and SHA-256 file hashing prevents duplicate imports.

Categorization is a two-stage process. First, the rule engine checks merchant names against SQL LIKE patterns (16 seeded categories with common merchant patterns). For transactions that do not match any rule, an Ollama LLM fallback classifies them based on the merchant name and transaction context. Each transaction tracks its categorization source (manual, rule, or ollama) so I can monitor how well the rules cover the common cases.

The receipt processing pipeline is separate from statements: it uses the Ollama vision model to extract store name, line items with quantities and prices, and categories from receipt images. This is more experimental than the statement parsing but works well for simple receipts.

The dashboard has eight pages: overview with KPI cards, monthly spending breakdown with category doughnut charts, yearly trends, transaction list with search and filtering, category management with budgets, receipt gallery, statement upload, and settings. I seeded it with February 2026 transaction data from three credit cards so the dashboard has something to show on first launch.

What's next: Redesigning the dashboard for better drill-down navigation, adding recurring transaction detection, and supporting CSV bank exports for banks that make statement PDFs difficult to parse.

Changelog

Added

  • Add FastAPI backend with Jinja2 server-side rendering and SQLite (WAL mode)
  • Add Alembic migration with 7-table schema (accounts, transactions, categories, merchant_rules, statements, receipts, receipt_line_items)
  • Add bank statement OCR parsers for Chase, Bank of America, and Discover
  • Add image preprocessing pipeline: grayscale, denoise, adaptive threshold, deskew
  • Add auto-detecting parser discovery from OCR text content
  • Add PDF multi-page statement support via pdf2image
  • Add SHA-256 duplicate detection for uploaded statements
  • Add dual categorization: merchant rule engine + Ollama LLM fallback
  • Add categorization source tracking (manual, rule, ollama) per transaction
  • Add receipt OCR with Ollama vision model for line-item extraction
  • Add 8 dashboard pages: overview, month, year, transactions, categories, receipts, settings, upload
  • Add seed scripts for 16 spending categories with merchant patterns and February 2026 test data
  • Add technical specification document
fastapisqlalchemyalembicsqlitejinja2tesseractopencvpdf2imageollamaocrbank-parsersreceipt-scanningcategorizationpersonal-financelocal-firstpython