Getting Started

Welcome to QuantMini! This guide will help you get up and running with the high-performance Medallion Architecture data pipeline.

Prerequisites

  • Python 3.10 or higher

  • Polygon.io API key and S3 credentials (for data ingestion)

  • External storage (recommended: 500GB+ for comprehensive data)

  • Basic understanding of quantitative trading concepts

Installation

Configuration

  1. Copy the credentials template:

cp config/credentials.yaml.example config/credentials.yaml
  1. Add your API credentials:

polygon:
  api_key: "YOUR_POLYGON_API_KEY"
  s3:
    access_key_id: "YOUR_S3_ACCESS_KEY"
    secret_access_key: "YOUR_S3_SECRET_KEY"
  1. Configure data paths:

Edit config/pipeline_config.yaml to set your data storage location:

data_root: /Volumes/sandisk/quantmini-lake

Medallion Architecture Overview

QuantMini uses a structured data lake pattern:

Landing Layer          Bronze Layer         Silver Layer          Gold Layer
(Raw Sources)         (Validated)          (Enriched)            (ML-Ready)
      ↓                    ↓                    ↓                     ↓
Polygon.io         →  Validated Parquet  →  Feature-Enriched  →  Qlib Binary
  REST API             (Schema Check)        (Indicators)         (Backtesting)
      ↓                    ↓                    ↓                     ↓
landing/              bronze/{type}/      silver/{type}/        gold/qlib/

Quick Start

1. Download Data (Bronze Layer)

Choose your ingestion strategy:

Option A: Initial Batch Load (First Time)

# Activate environment
source .venv/bin/activate

# Download 1 year of stocks daily data
uv run python -m src.cli.main data ingest -t stocks_daily -s 2024-01-01 -e 2025-10-18

# Download 1 year of options data
uv run python -m src.cli.main data ingest -t options_daily -s 2024-01-01 -e 2025-10-18

# Download 1 year of news articles (8+ years available)
uv run python scripts/download/download_news_1year.py --start-date 2024-01-01

Option B: Incremental Updates (Daily)

# Update last 7 days with --incremental flag (skips existing dates)
uv run python -m src.cli.main data ingest \
  -t stocks_daily \
  -s $(date -v-7d +%Y-%m-%d) \
  -e $(date +%Y-%m-%d) \
  --incremental

uv run python -m src.cli.main data ingest \
  -t options_daily \
  -s $(date -v-7d +%Y-%m-%d) \
  -e $(date +%Y-%m-%d) \
  --incremental

# Download yesterday's news
uv run python scripts/download/download_news_1year.py \
  --start-date $(date -v-1d +%Y-%m-%d)

Option C: Bulk Download (All Data Types)

# Download everything at once
bash scripts/bulk_download_all_data.sh

See Data Ingestion Strategies for complete workflows including backfill.

2. Enrich with Features (Silver Layer)

Generate technical indicators:

# Generate Alpha158 features
uv run python scripts/features/generate_alpha158.py

3. Convert to Qlib Format (Gold Layer)

Prepare data for ML backtesting:

# Convert to Qlib binary format
uv run python scripts/qlib/convert_to_qlib.py

4. Query and Analyze

Use the data loader to query data:

from src.utils.data_loader import DataLoader

# Initialize loader
loader = DataLoader()

# Load stocks data
df = loader.load_stocks_daily(
    symbols=['AAPL', 'MSFT'],
    start_date='2024-01-01',
    end_date='2024-12-31'
)

Next Steps

  • Batch Downloader Guide: High-performance parallel downloads

  • Data Loader Guide: Query and analyze data

  • Alpha158 Features: Generate technical indicators

  • API Reference: Explore the full API documentation

Common Issues

Issue: Polygon API rate limits

Use batch downloaders for efficient parallel requests:

uv run python scripts/download/download_ticker_events_optimized.py

Issue: External drive setup

If using an external drive, set copy mode for uv:

export UV_LINK_MODE=copy

Issue: Missing data

Check download status:

uv run python scripts/check_download_status.py

Support