Guides

User guides and tutorials for QuantMini’s Medallion Architecture data pipeline.

Quickstart

Get started with QuantMini in 10 minutes.

Installation

# Clone repository
git clone https://github.com/nittygritty-zzy/quantmini.git
cd quantmini

# Install with uv
uv sync

# Configure credentials
cp config/credentials.yaml.example config/credentials.yaml

Download Data

# Activate environment
source .venv/bin/activate

# Download stocks data (Bronze Layer)
uv run python -m src.cli.main data ingest -t stocks_daily -s 2024-01-01 -e 2024-12-31

# Download news articles (8+ years available)
uv run python scripts/download/download_news_1year.py

Query Data

from src.utils.data_loader import DataLoader

# Initialize loader
loader = DataLoader()

# Load stocks data
df = loader.load_stocks_daily(
    symbols=['AAPL', 'MSFT'],
    start_date='2024-01-01',
    end_date='2024-12-31'
)

Medallion Architecture

QuantMini uses a structured data lake pattern:

Landing Layer          Bronze Layer         Silver Layer          Gold Layer
(Raw Sources)         (Validated)          (Enriched)            (ML-Ready)
      ↓                    ↓                    ↓                     ↓
Polygon.io         →  Validated Parquet  →  Feature-Enriched  →  Qlib Binary
  REST API             (Schema Check)        (Indicators)         (Backtesting)
      ↓                    ↓                    ↓                     ↓
landing/              bronze/{type}/      silver/{type}/        gold/qlib/

Data Quality Progression: Raw → Validated → Enriched → ML-Ready

Batch Downloader

High-performance parallel downloads using Polygon REST API.

Features: - Batch request optimization (100+ concurrent requests) - Automatic retries with exponential backoff - Incremental saving to avoid data loss - Date-first Hive partitioning

Example:

# Download ticker events for all CS tickers (optimized)
uv run python scripts/download/download_ticker_events_optimized.py

# Download 8+ years of news articles
uv run python scripts/download/download_news_1year.py --start-date 2017-04-10

See detailed guide at docs/guides/batch-downloader.md

Data Loader

Query and analyze data from the bronze layer efficiently.

Features: - DuckDB-powered SQL queries on Parquet - Automatic partition pruning - Multiple output formats (Polars, Pandas, PyArrow)

Example:

from src.utils.data_loader import DataLoader

loader = DataLoader()

# Load stocks daily data
df = loader.load_stocks_daily(
    symbols=['AAPL', 'TSLA'],
    start_date='2024-01-01',
    end_date='2024-12-31',
    columns=['date', 'close', 'volume']
)

# Filter by conditions
df_filtered = df.filter(pl.col('volume') > 1_000_000)

See detailed guide at docs/guides/data-loader.md

Alpha158 Features

Generate 158 technical indicators for ML backtesting.

Features: - KBAR (Open-close-high-low features) - KDJ (Stochastic indicators) - RSI (Relative strength) - MACD (Moving average convergence divergence) - And 154 more technical features

Example:

# Generate Alpha158 features for silver layer
uv run python scripts/features/generate_alpha158.py

See detailed guide at docs/guides/ALPHA158_FEATURES.md

Additional Guides

For comprehensive guides, see the docs/ directory:

  • Delisted Stocks: docs/guides/delisted-stocks.md

  • Benchmark Data: docs/guides/BENCHMARK_DATA_GUIDE.md

  • Trading Signals: docs/guides/TRADING_SIGNALS_GUIDE.md

  • Testing: docs/guides/testing.md