Guides
User guides and tutorials for QuantMini’s Medallion Architecture data pipeline.
Quickstart
Get started with QuantMini in 10 minutes.
Installation
# Clone repository
git clone https://github.com/nittygritty-zzy/quantmini.git
cd quantmini
# Install with uv
uv sync
# Configure credentials
cp config/credentials.yaml.example config/credentials.yaml
Download Data
# Activate environment
source .venv/bin/activate
# Download stocks data (Bronze Layer)
uv run python -m src.cli.main data ingest -t stocks_daily -s 2024-01-01 -e 2024-12-31
# Download news articles (8+ years available)
uv run python scripts/download/download_news_1year.py
Query Data
from src.utils.data_loader import DataLoader
# Initialize loader
loader = DataLoader()
# Load stocks data
df = loader.load_stocks_daily(
symbols=['AAPL', 'MSFT'],
start_date='2024-01-01',
end_date='2024-12-31'
)
Medallion Architecture
QuantMini uses a structured data lake pattern:
Landing Layer Bronze Layer Silver Layer Gold Layer
(Raw Sources) (Validated) (Enriched) (ML-Ready)
↓ ↓ ↓ ↓
Polygon.io → Validated Parquet → Feature-Enriched → Qlib Binary
REST API (Schema Check) (Indicators) (Backtesting)
↓ ↓ ↓ ↓
landing/ bronze/{type}/ silver/{type}/ gold/qlib/
Data Quality Progression: Raw → Validated → Enriched → ML-Ready
Batch Downloader
High-performance parallel downloads using Polygon REST API.
Features: - Batch request optimization (100+ concurrent requests) - Automatic retries with exponential backoff - Incremental saving to avoid data loss - Date-first Hive partitioning
Example:
# Download ticker events for all CS tickers (optimized)
uv run python scripts/download/download_ticker_events_optimized.py
# Download 8+ years of news articles
uv run python scripts/download/download_news_1year.py --start-date 2017-04-10
See detailed guide at docs/guides/batch-downloader.md
Data Loader
Query and analyze data from the bronze layer efficiently.
Features: - DuckDB-powered SQL queries on Parquet - Automatic partition pruning - Multiple output formats (Polars, Pandas, PyArrow)
Example:
from src.utils.data_loader import DataLoader
loader = DataLoader()
# Load stocks daily data
df = loader.load_stocks_daily(
symbols=['AAPL', 'TSLA'],
start_date='2024-01-01',
end_date='2024-12-31',
columns=['date', 'close', 'volume']
)
# Filter by conditions
df_filtered = df.filter(pl.col('volume') > 1_000_000)
See detailed guide at docs/guides/data-loader.md
Alpha158 Features
Generate 158 technical indicators for ML backtesting.
Features: - KBAR (Open-close-high-low features) - KDJ (Stochastic indicators) - RSI (Relative strength) - MACD (Moving average convergence divergence) - And 154 more technical features
Example:
# Generate Alpha158 features for silver layer
uv run python scripts/features/generate_alpha158.py
See detailed guide at docs/guides/ALPHA158_FEATURES.md
Additional Guides
For comprehensive guides, see the docs/ directory:
Delisted Stocks:
docs/guides/delisted-stocks.mdBenchmark Data:
docs/guides/BENCHMARK_DATA_GUIDE.mdTrading Signals:
docs/guides/TRADING_SIGNALS_GUIDE.mdTesting:
docs/guides/testing.md