Getting Started
Welcome to QuantMini! This guide will help you get up and running with the high-performance Medallion Architecture data pipeline.
Prerequisites
Python 3.10 or higher
Polygon.io API key and S3 credentials (for data ingestion)
External storage (recommended: 500GB+ for comprehensive data)
Basic understanding of quantitative trading concepts
Installation
From Source (Recommended)
# Clone repository
git clone https://github.com/nittygritty-zzy/quantmini.git
cd quantmini
# Install with uv
uv sync
# Configure credentials
cp config/credentials.yaml.example config/credentials.yaml
# Edit config/credentials.yaml with your Polygon.io API key and S3 credentials
Configuration
Copy the credentials template:
cp config/credentials.yaml.example config/credentials.yaml
Add your API credentials:
polygon:
api_key: "YOUR_POLYGON_API_KEY"
s3:
access_key_id: "YOUR_S3_ACCESS_KEY"
secret_access_key: "YOUR_S3_SECRET_KEY"
Configure data paths:
Edit config/pipeline_config.yaml to set your data storage location:
data_root: /Volumes/sandisk/quantmini-lake
Medallion Architecture Overview
QuantMini uses a structured data lake pattern:
Landing Layer Bronze Layer Silver Layer Gold Layer
(Raw Sources) (Validated) (Enriched) (ML-Ready)
↓ ↓ ↓ ↓
Polygon.io → Validated Parquet → Feature-Enriched → Qlib Binary
REST API (Schema Check) (Indicators) (Backtesting)
↓ ↓ ↓ ↓
landing/ bronze/{type}/ silver/{type}/ gold/qlib/
Quick Start
1. Download Data (Bronze Layer)
Choose your ingestion strategy:
Option A: Initial Batch Load (First Time)
# Activate environment
source .venv/bin/activate
# Download 1 year of stocks daily data
uv run python -m src.cli.main data ingest -t stocks_daily -s 2024-01-01 -e 2025-10-18
# Download 1 year of options data
uv run python -m src.cli.main data ingest -t options_daily -s 2024-01-01 -e 2025-10-18
# Download 1 year of news articles (8+ years available)
uv run python scripts/download/download_news_1year.py --start-date 2024-01-01
Option B: Incremental Updates (Daily)
# Update last 7 days with --incremental flag (skips existing dates)
uv run python -m src.cli.main data ingest \
-t stocks_daily \
-s $(date -v-7d +%Y-%m-%d) \
-e $(date +%Y-%m-%d) \
--incremental
uv run python -m src.cli.main data ingest \
-t options_daily \
-s $(date -v-7d +%Y-%m-%d) \
-e $(date +%Y-%m-%d) \
--incremental
# Download yesterday's news
uv run python scripts/download/download_news_1year.py \
--start-date $(date -v-1d +%Y-%m-%d)
Option C: Bulk Download (All Data Types)
# Download everything at once
bash scripts/bulk_download_all_data.sh
See Data Ingestion Strategies for complete workflows including backfill.
2. Enrich with Features (Silver Layer)
Generate technical indicators:
# Generate Alpha158 features
uv run python scripts/features/generate_alpha158.py
3. Convert to Qlib Format (Gold Layer)
Prepare data for ML backtesting:
# Convert to Qlib binary format
uv run python scripts/qlib/convert_to_qlib.py
4. Query and Analyze
Use the data loader to query data:
from src.utils.data_loader import DataLoader
# Initialize loader
loader = DataLoader()
# Load stocks data
df = loader.load_stocks_daily(
symbols=['AAPL', 'MSFT'],
start_date='2024-01-01',
end_date='2024-12-31'
)
Next Steps
Batch Downloader Guide: High-performance parallel downloads
Data Loader Guide: Query and analyze data
Alpha158 Features: Generate technical indicators
API Reference: Explore the full API documentation
Common Issues
Issue: Polygon API rate limits
Use batch downloaders for efficient parallel requests:
uv run python scripts/download/download_ticker_events_optimized.py
Issue: External drive setup
If using an external drive, set copy mode for uv:
export UV_LINK_MODE=copy
Issue: Missing data
Check download status:
uv run python scripts/check_download_status.py
Support
GitHub Issues: Report bugs or request features
Documentation: See
docs/directory for comprehensive guidesExamples: Example scripts