APEX
in-progressMulti-signal ensemble algorithmic trading framework for US equities
What It Is
APEX (Algorithmic Prediction & Execution System) is a multi-signal ensemble-architecture algorithmic trading framework targeting US equities. It integrates signals across eight data domains using a layered model stack, designed to operate in paper mode before transitioning to live trading on modest seed capital with conservative position sizing.
The system is built for F-1 compliance and prioritizes risk management through circuit breakers and Kelly-criterion sizing over aggressive returns.
Architecture
The system follows an eight-layer pipeline:
- Raw Data Sources: Market data (Alpaca), SEC filings (EDGAR 8-K and Form 4), macroeconomic indicators (FRED), and financial news (Finnhub).
- Ingestion Engine: Nine ingestors following a fetch/transform/store pattern with idempotent upserts and rate limiting.
- Signal Engineering: 47 features across eight domains (technical, sentiment, fundamental, macro, social, alternative, microstructure, cross-signal).
- Feature Store: DuckDB star schema with 15 tables, point-in-time correctness enforcement, and staleness tracking.
- Regime Classifier: Four market regimes (crisis, high-volatility, trending, mean-reverting) that gate all downstream predictions.
- Prediction Ensemble: Multiple model types combined via a stacking meta-learner.
- Risk Gate: Nine circuit breakers covering confidence thresholds, position limits, sector concentration, drawdown, volatility, and liquidity.
- Execution: Broker integration with fill tracking and slippage logging.
Data Pipeline
Data flows through multiple ingestion channels with different update frequencies:
- Market data via Alpaca: 1-minute OHLCV bars with incremental updates, yfinance as fallback.
- SEC filings via EDGAR: 8-K filings and Form 4 insider trades parsed from XML, filtered for material signals.
- Macroeconomic via FRED: Yield curve, credit spread, VIX, dollar index, and a composite financial conditions indicator.
- News via Finnhub: Company-level news for NLP sentiment extraction.
All data lands in DuckDB with a star schema. Feature normalization uses rolling 252-day percentile ranks to handle non-stationary distributions.
Stock Universe
Twenty liquid large-cap equities across six sectors (technology, financials, healthcare, energy, consumer, industrials) plus SPY for regime classification.
What It Does Not Do
- Not a high-frequency trading system. Targets 2-3 trades per day on longer horizons.
- No options, futures, or crypto trading.
- No social trading, copy trading, or community features.
- No betting tips or financial advice output.
Current State
The foundation is complete: configuration, data types, DuckDB schema, ingestion layer (7 of 9 ingestors working), technical signal generation, feature store with point-in-time correctness, the primary gradient boosting model, backtesting engine, risk gate skeleton, execution layer with broker integration, and a five-page Streamlit dashboard.
Remaining work covers completing the risk layer integration with live data, execution layer testing with live orders, regime classifier training, and end-to-end integration testing across the full pipeline.
Devlog
With the spec in hand, I built the entire APEX codebase in a single evening session, roughly four hours from first file to last. The goal was to get every layer of the 8-layer pipeline to a functional state, even if some domains within each layer were stubbed for later.
The build followed the pipeline order. Configuration and core types first: a settings loader reading from .env, a constants file with all risk thresholds and regime boundaries, the 20-stock universe with sector mappings, and the DuckDB connection factory. Then the feature store schema: 15 tables in a star schema covering raw data (bars, news, filings, insider trades, macro, social) and computed features across each signal domain.
Ingestion came next. I built the BaseIngestor pattern (fetch/transform/store with rate limiting and INSERT OR REPLACE deduplication) and implemented 5 of the 7 planned ingestors: Alpaca for 1-minute OHLCV bars, FRED for 5 macro series (yield curve, credit spread, VIX, dollar index), yfinance as a price data fallback, Finnhub for company news, and SEC EDGAR for 8-K filings and Form 4 insider trades with XML parsing. Reddit and StockTwits are stubbed pending API credentials.
Technical signal generation is pure pandas and NumPy (no pandas-ta dependency, keeping Python 3.14 compatibility): EMAs, ADX, RSI, Bollinger Bands, MACD, ATR, OBV, MFI, VWAP deviation, log returns, and multi-horizon realized volatility. The normalizer applies rolling 252-day percentile ranks. The other 7 signal domains (sentiment, fundamental, macro, social, alternative, microstructure, cross-signal) are in the spec but not yet implemented.
The LightGBM directional model is built with DART dropout for regularization, supporting 1-hour, 4-hour, and 1-day prediction horizons. The backtest engine wraps vectorbt with Kelly-sized entries and the full risk gate evaluation. All 9 circuit breakers are implemented: they evaluate sequentially, with hard blocks (daily loss, drawdown) halting trading and soft gates (sector, correlation) adjusting position size. The execution layer integrates with Alpaca's SDK for paper and live modes, with fill tracking and slippage logging.
The Streamlit dashboard has 5 pages: equity curve overview, signal analysis, regime timeline, trade log, and diagnostics (feature coverage, staleness, SHAP). It reads from DuckDB so it reflects the actual system state.
I stopped just before running the first end-to-end data ingestion. Some API connectivity errors came up and I was spent after the build sprint. The codebase is complete but has not been tested as a running system yet.
What's next: Resolving the ingestion errors, running a full historical backfill, training the LightGBM model on real data, and building the XGBoost regime classifier. That will be the integration session that proves the pipeline works end-to-end.
Changelog
Added
- Add configuration layer: settings, constants, tickers, .env management
- Add DuckDB star schema with 15 tables (raw + feature domains)
- Add BaseIngestor pattern with rate limiting and deduplication
- Add Alpaca ingestor for 1-minute OHLCV bars (backfill + incremental)
- Add FRED ingestor for 5 macro series (yield curve, credit spread, VIX, dollar index)
- Add yfinance ingestor as price data fallback
- Add Finnhub news ingestor (code complete, free tier returns no data)
- Add SEC EDGAR ingestor for 8-K filings and Form 4 insider trades (XML parsing)
- Add Reddit and StockTwits ingestor stubs (credentials pending)
- Add technical signal generators: EMA, ADX, RSI, BB, MACD, ATR, OBV, MFI, VWAP dev, realized vol
- Add 252-day rolling percentile rank normalizer
- Add feature store with staleness tracking and point-in-time correctness
- Add LightGBM binary directional model with DART dropout (1h/4h/1d horizons)
- Add vectorbt backtesting engine with risk-gated Kelly sizing
- Add half-Kelly position sizer with TFT quantile support
- Add portfolio state manager with sector exposure and correlation tracking
- Add 9 circuit breaker risk gates (confidence, position, sector, correlation, daily loss, drawdown, VIX, liquidity, streak)
- Add Alpaca broker wrapper with paper/live toggle
- Add order manager: signal-to-order pipeline with risk gate evaluation
- Add fill tracker with slippage logging and per-ticker reporting
- Add Streamlit dashboard with 5 pages (overview, signals, regime, trades, diagnostics)
- Add historical backfill CLI script
Devlog
I completed the full architectural specification for APEX today: an 873-line technical document covering every layer of a multi-signal ensemble algorithmic trading system for US equities.
The design is an 8-layer pipeline: raw data sources feeding into an ingestion engine, signal engineering across 8 domains, a DuckDB feature store with point-in-time correctness, a regime classifier that gates all downstream predictions, a prediction ensemble combining multiple model families, a risk gate with circuit breakers, and an execution layer with broker integration. The spec covers 47 signal sources across technical, sentiment, fundamental, macro, social, alternative, microstructure, and cross-signal domains.
The model stack is a 4-layer ensemble: a gradient boosting model for tabular features, a sequential model for temporal patterns, NLP embeddings for text signals from SEC filings and news, and a logistic meta-learner that fuses base predictions with regime context and signal coherence scores. Each base model produces calibrated probabilities that feed into Kelly-criterion position sizing.
Risk management is the part I spent the most time on. Nine circuit breakers cover confidence thresholds, position limits, sector concentration, correlated positions, daily loss limits, drawdown halts, volatility-triggered reductions, liquidity checks, and losing streak dampening. The system targets 58-65% directional accuracy on modest seed capital with half-Kelly sizing, which is deliberately conservative. The entire design is F-1 compliant.
I also mapped out the cost phasing: Phase 1 at $0/month using free-tier APIs (Alpaca, FRED, EDGAR), scaling to $75/month with premium news feeds, and eventually $350+/month with full social and options data. This matters because the system needs to prove itself on paper before any real capital or recurring API costs are committed.
The spec is rendered as a Data Terminal themed HTML report for portfolio presentation. It is the most detailed design document I have produced for any project.
What's next: Building the codebase from this spec, starting with the configuration layer, DuckDB schema, and data ingestion pipeline.
Changelog
Added
- Add 873-line technical specification covering all 8 pipeline layers
- Add signal taxonomy: 47 sources across 8 domains with update frequencies
- Add 4-layer model stack design (gradient boosting, sequential, NLP, meta-learner)
- Add 9 circuit breaker risk management framework
- Add stock universe definition (20 liquid large-caps across 6 sectors)
- Add cost phasing roadmap ($0 to $350+/month across 4 phases)
- Add Data Terminal themed HTML report for portfolio presentation