Guides ====== User guides and tutorials for QuantMini's Medallion Architecture data pipeline. .. toctree:: :maxdepth: 2 quickstart medallion-architecture batch-downloader data-loader alpha158-features Quickstart ---------- Get started with QuantMini in 10 minutes. **Installation** .. code-block:: bash # Clone repository git clone https://github.com/nittygritty-zzy/quantmini.git cd quantmini # Install with uv uv sync # Configure credentials cp config/credentials.yaml.example config/credentials.yaml **Download Data** .. code-block:: bash # Activate environment source .venv/bin/activate # Download stocks data (Bronze Layer) uv run python -m src.cli.main data ingest -t stocks_daily -s 2024-01-01 -e 2024-12-31 # Download news articles (8+ years available) uv run python scripts/download/download_news_1year.py **Query Data** .. code-block:: python from src.utils.data_loader import DataLoader # Initialize loader loader = DataLoader() # Load stocks data df = loader.load_stocks_daily( symbols=['AAPL', 'MSFT'], start_date='2024-01-01', end_date='2024-12-31' ) Medallion Architecture ---------------------- QuantMini uses a structured data lake pattern: .. code-block:: text Landing Layer Bronze Layer Silver Layer Gold Layer (Raw Sources) (Validated) (Enriched) (ML-Ready) ↓ ↓ ↓ ↓ Polygon.io → Validated Parquet → Feature-Enriched → Qlib Binary REST API (Schema Check) (Indicators) (Backtesting) ↓ ↓ ↓ ↓ landing/ bronze/{type}/ silver/{type}/ gold/qlib/ **Data Quality Progression**: Raw → Validated → Enriched → ML-Ready Batch Downloader ---------------- High-performance parallel downloads using Polygon REST API. **Features**: - Batch request optimization (100+ concurrent requests) - Automatic retries with exponential backoff - Incremental saving to avoid data loss - Date-first Hive partitioning **Example**: .. code-block:: bash # Download ticker events for all CS tickers (optimized) uv run python scripts/download/download_ticker_events_optimized.py # Download 8+ years of news articles uv run python scripts/download/download_news_1year.py --start-date 2017-04-10 See detailed guide at ``docs/guides/batch-downloader.md`` Data Loader ----------- Query and analyze data from the bronze layer efficiently. **Features**: - DuckDB-powered SQL queries on Parquet - Automatic partition pruning - Multiple output formats (Polars, Pandas, PyArrow) **Example**: .. code-block:: python from src.utils.data_loader import DataLoader loader = DataLoader() # Load stocks daily data df = loader.load_stocks_daily( symbols=['AAPL', 'TSLA'], start_date='2024-01-01', end_date='2024-12-31', columns=['date', 'close', 'volume'] ) # Filter by conditions df_filtered = df.filter(pl.col('volume') > 1_000_000) See detailed guide at ``docs/guides/data-loader.md`` Alpha158 Features ----------------- Generate 158 technical indicators for ML backtesting. **Features**: - KBAR (Open-close-high-low features) - KDJ (Stochastic indicators) - RSI (Relative strength) - MACD (Moving average convergence divergence) - And 154 more technical features **Example**: .. code-block:: bash # Generate Alpha158 features for silver layer uv run python scripts/features/generate_alpha158.py See detailed guide at ``docs/guides/ALPHA158_FEATURES.md`` Additional Guides ----------------- For comprehensive guides, see the ``docs/`` directory: - **Delisted Stocks**: ``docs/guides/delisted-stocks.md`` - **Benchmark Data**: ``docs/guides/BENCHMARK_DATA_GUIDE.md`` - **Trading Signals**: ``docs/guides/TRADING_SIGNALS_GUIDE.md`` - **Testing**: ``docs/guides/testing.md``