## **Revision Document – v3 Output Contract & Figure Specifications**  
This single guide merges **I/O plumbing**, **logging**, **CI hooks**, **artefact paths**, and **figure design** into one actionable playbook.  
Apply the steps **in order**, submitting small PRs so CI remains green throughout.

---

### 0 ▪ Foundations

| Step | File(s) | Action |
|------|---------|--------|
| 0.1 | **`config.yaml`** | Add: ```yaml base_dirs: {results: results, models: models, logs: logs} output: {figure_dpi: 150, figure_size: [16, 9], log_level: INFO}``` |
| 0.2 | `src/utils/run_id.py` | `make_run_id()` → `"20250418_152310_ab12cd"` (timestamp + short git‑hash). |
| 0.3 | `src/__init__.py` | Expose `__version__`, `GIT_SHA`, `BUILD_DATE`. |

---

### 1 ▪ Core I/O & Logging

| File | Content |
|------|---------|
| **`src/io_manager.py`** | `IOManager(cfg, run_id)` <br>• `path(section, name)`: returns full path under `results|models|logs|figures`.<br>• `save_json`, `save_df` (CSV ≤ 100 MB else Parquet), `save_figure` (uses cfg dpi/size). |
| **`src/logger_setup.py`** | `setup_logger(cfg, run_id, io)` with colourised console (INFO) + rotating file handler (DEBUG) in `logs/<run_id>/`. |

**`run.py` entry banner**

```python
run_id = make_run_id()
cfg    = load_config(args.config)
io     = IOManager(cfg, run_id)
logger = setup_logger(cfg, run_id, io)
logger.info(f"GRU‑SAC v{__version__} | commit {GIT_SHA} | run {run_id}")
logger.info(f"Loaded config file: {args.config}")
```

---

### 2 ▪ Stage Outputs

| Stage | Implementation notes | Artefacts |
|-------|---------------------|-----------|
| **Data load & preprocess** | After sampling/NaN purge save: <br>`io.save_json(summary, "preprocess_summary")`<br>`io.save_df(df.head(20), "head_preprocessed")` | `results/<run_id>/preprocess_summary.txt`<br>`head_preprocessed.csv` |
| **Feature engineering** | Generate correlation heat‑map (see figure table) → `io.save_figure(...,"feature_corr_heatmap")` | 〃 |
| **Label generation** | Log distribution; produce histogram figure. | `label_histogram.png` |
| **Baseline 1 & 2** | Consolidate in `baseline_checker.py`; each returns dict with accuracy, CI etc. <br>`io.save_json(report,"baseline1_report")` (and 2). | `baseline1_report.txt / baseline2_report.txt` |
| **Feature whitelist** | Save JSON to `models/<run_id>/final_whitelist_<run_id>.json`. | — |
| **GRU training** | Use Keras CSVLogger to `logs/<run_id>/gru_history.csv`; after training plot learning curve. | `gru_learning_curve.png` + `.keras` model |
| **Calibration (Vector)** | Save `calibrator_vec_<run_id>.npy`; plot reliability curve. | `reliability_curve_val_<run_id>.png` |
| **SAC training** | Write `episode_rewards.csv`, plot reward curve, save final agent under `models/sac_train_<run_id>/`. | `sac_reward_plot.png` |
| **Back‑test** | Save step‑level CSV, metrics JSON, summary figure. | `backtest_results_<run_id>.csv`<br>`performance_metrics_<run_id>.txt`<br>`backtest_summary_<run_id>.png` |

---

### 3 ▪ Figure Specifications

| File | Visualises | Layout / Details |
|------|-------------|------------------|
| **feature_corr_heatmap.png** | Pearson correlation of engineered features (pre‑prune). | Square heat‑map, features sorted by |ρ| vs target; diverging palette centred at 0; annotate |ρ| > 0.5; colour‑bar. |
| **label_histogram.png** | Direction‑label class mix (train split). | Bar chart: Down / Flat / Up (binary shows two). Percentages on bar tops; title shows ε value. |
| **gru_learning_curve.png** | GRU training progress. | 3 stacked panes: total loss (log‑y), val dir3 accuracy, vertical dashed “early‑stop”; share epoch‑axis. |
| **reliability_curve_val_*.png** | Calibration quality post‑Vector scaling. | Left 70 %: reliability diagram (10 equal‑freq bins). Right 30 %: histogram of predicted p_up. Title shows ECE & Brier. |
| **sac_reward_plot.png** | Offline SAC learning curve. | Smoothed episode reward (EMA 0.2) vs steps; action‑variance on twin y‑axis; checkpoint ticks. |
| **backtest_summary_*.png** | Live back‑test overview. | 3 stacked plots:<br>1) Price line + blue/red background for edge ≥ 0.1.<br>2) Position size step‑graph.<br>3) Equity curve with shaded draw‑downs; textbox shows Sharpe & Max DD. |

_All figs_: 16 × 9 in, 150 DPI, `plt.tight_layout()`, footer `"© GRU‑SAC v3"` right‑bottom.

---

### 4 ▪ Unit Tests

* `tests/test_output_contract.py`  
  * Run mini‑pipeline (`tests/smoke.yaml`), assert each required file exists > 2 KB.  
  * Validate JSON keys (`accuracy`, `ci_lower` etc.).  
  * `assert_any_close(softmax(logits), probs)` for logits view.

---

### 5 ▪ CI Workflow (`.github/workflows/pipeline.yml`)

```yaml
jobs:
  build-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with: {python-version: "3.10"}
      - run: pip install -r requirements.txt
      - run: black --check .
      - run: ruff .
      - run: pytest -q
      - name: Smoke e2e
        run: python run.py --config tests/smoke.yaml
      - name: Upload artefacts
        uses: actions/upload-artifact@v4
        with:
          name: run-${{ github.sha }}
          path: |
            results/*/*
            logs/*/*
```

---

### 6 ▪ Documentation Updates

* **`README.md`** → new *Outputs* section reproducing the artefact table.  
* **`docs/v3_changelog.md`** → one‑pager summarising v3 versus v2 differences (labels, calibration, outputs).

---

### 7 ▪ Roll‑out Plan (5‑PR cadence)

1. **PR #1** – run‑id, IOManager, logger, CI log upload.  
2. **PR #2** – data & feature stage outputs + tests.  
3. **PR #3** – GRU training outputs + calibration figure.  
4. **PR #4** – SAC & back‑test outputs, reward & summary figs.  
5. **PR #5** – docs & README refresh.  

Tag `v3.0.0` after PR #5 passes.

---

### 8 ▪ Success Criteria for CI

Fail the pipeline when **any** occurs:

* `baseline1_report.txt` CI‑LB < 0.52  
* `edge_filtered_accuracy` (val) < 0.60  
* Back‑test Sharpe < 1.2 or Max DD > 15 %

---

Implementing this **single integrated revision** provides:

* **Deterministic artefact paths** for every run.  
* **Rich, shareable figures** for quick diagnostics.  
* **Audit‑ready logs/reports** for research traceability.  

Merge each step once CI is green; you’ll have a reproducible, fully instrumented pipeline ready for iterative accuracy pushes toward the 65 % target.