## **Revision Document – v3 Output Contract & Figure Specifications**
This single guide merges **I/O plumbing**, **logging**, **CI hooks**, **artefact paths**, and **figure design** into one actionable playbook.
Apply the steps **in order**, submitting small PRs so CI remains green throughout.
---
### 0 ▪ Foundations
| Step | File(s) | Action |
|------|---------|--------|
| 0.1 | **`config.yaml`** | Add: ```yaml base_dirs: {results: results, models: models, logs: logs} output: {figure_dpi: 150, figure_size: [16, 9], log_level: INFO}``` |
| 0.2 | `src/utils/run_id.py` | `make_run_id()` → `"20250418_152310_ab12cd"` (timestamp + short git‑hash). |
| 0.3 | `src/__init__.py` | Expose `__version__`, `GIT_SHA`, `BUILD_DATE`. |
---
### 1 ▪ Core I/O & Logging
| File | Content |
|------|---------|
| **`src/io_manager.py`** | `IOManager(cfg, run_id)`
• `path(section, name)`: returns full path under `results|models|logs|figures`.
• `save_json`, `save_df` (CSV ≤ 100 MB else Parquet), `save_figure` (uses cfg dpi/size). |
| **`src/logger_setup.py`** | `setup_logger(cfg, run_id, io)` with colourised console (INFO) + rotating file handler (DEBUG) in `logs//`. |
**`run.py` entry banner**
```python
run_id = make_run_id()
cfg = load_config(args.config)
io = IOManager(cfg, run_id)
logger = setup_logger(cfg, run_id, io)
logger.info(f"GRU‑SAC v{__version__} | commit {GIT_SHA} | run {run_id}")
logger.info(f"Loaded config file: {args.config}")
```
---
### 2 ▪ Stage Outputs
| Stage | Implementation notes | Artefacts |
|-------|---------------------|-----------|
| **Data load & preprocess** | After sampling/NaN purge save:
`io.save_json(summary, "preprocess_summary")`
`io.save_df(df.head(20), "head_preprocessed")` | `results//preprocess_summary.txt`
`head_preprocessed.csv` |
| **Feature engineering** | Generate correlation heat‑map (see figure table) → `io.save_figure(...,"feature_corr_heatmap")` | 〃 |
| **Label generation** | Log distribution; produce histogram figure. | `label_histogram.png` |
| **Baseline 1 & 2** | Consolidate in `baseline_checker.py`; each returns dict with accuracy, CI etc.
`io.save_json(report,"baseline1_report")` (and 2). | `baseline1_report.txt / baseline2_report.txt` |
| **Feature whitelist** | Save JSON to `models//final_whitelist_.json`. | — |
| **GRU training** | Use Keras CSVLogger to `logs//gru_history.csv`; after training plot learning curve. | `gru_learning_curve.png` + `.keras` model |
| **Calibration (Vector)** | Save `calibrator_vec_.npy`; plot reliability curve. | `reliability_curve_val_.png` |
| **SAC training** | Write `episode_rewards.csv`, plot reward curve, save final agent under `models/sac_train_/`. | `sac_reward_plot.png` |
| **Back‑test** | Save step‑level CSV, metrics JSON, summary figure. | `backtest_results_.csv`
`performance_metrics_.txt`
`backtest_summary_.png` |
---
### 3 ▪ Figure Specifications
| File | Visualises | Layout / Details |
|------|-------------|------------------|
| **feature_corr_heatmap.png** | Pearson correlation of engineered features (pre‑prune). | Square heat‑map, features sorted by |ρ| vs target; diverging palette centred at 0; annotate |ρ| > 0.5; colour‑bar. |
| **label_histogram.png** | Direction‑label class mix (train split). | Bar chart: Down / Flat / Up (binary shows two). Percentages on bar tops; title shows ε value. |
| **gru_learning_curve.png** | GRU training progress. | 3 stacked panes: total loss (log‑y), val dir3 accuracy, vertical dashed “early‑stop”; share epoch‑axis. |
| **reliability_curve_val_*.png** | Calibration quality post‑Vector scaling. | Left 70 %: reliability diagram (10 equal‑freq bins). Right 30 %: histogram of predicted p_up. Title shows ECE & Brier. |
| **sac_reward_plot.png** | Offline SAC learning curve. | Smoothed episode reward (EMA 0.2) vs steps; action‑variance on twin y‑axis; checkpoint ticks. |
| **backtest_summary_*.png** | Live back‑test overview. | 3 stacked plots:
1) Price line + blue/red background for edge ≥ 0.1.
2) Position size step‑graph.
3) Equity curve with shaded draw‑downs; textbox shows Sharpe & Max DD. |
_All figs_: 16 × 9 in, 150 DPI, `plt.tight_layout()`, footer `"© GRU‑SAC v3"` right‑bottom.
---
### 4 ▪ Unit Tests
* `tests/test_output_contract.py`
* Run mini‑pipeline (`tests/smoke.yaml`), assert each required file exists > 2 KB.
* Validate JSON keys (`accuracy`, `ci_lower` etc.).
* `assert_any_close(softmax(logits), probs)` for logits view.
---
### 5 ▪ CI Workflow (`.github/workflows/pipeline.yml`)
```yaml
jobs:
build-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with: {python-version: "3.10"}
- run: pip install -r requirements.txt
- run: black --check .
- run: ruff .
- run: pytest -q
- name: Smoke e2e
run: python run.py --config tests/smoke.yaml
- name: Upload artefacts
uses: actions/upload-artifact@v4
with:
name: run-${{ github.sha }}
path: |
results/*/*
logs/*/*
```
---
### 6 ▪ Documentation Updates
* **`README.md`** → new *Outputs* section reproducing the artefact table.
* **`docs/v3_changelog.md`** → one‑pager summarising v3 versus v2 differences (labels, calibration, outputs).
---
### 7 ▪ Roll‑out Plan (5‑PR cadence)
1. **PR #1** – run‑id, IOManager, logger, CI log upload.
2. **PR #2** – data & feature stage outputs + tests.
3. **PR #3** – GRU training outputs + calibration figure.
4. **PR #4** – SAC & back‑test outputs, reward & summary figs.
5. **PR #5** – docs & README refresh.
Tag `v3.0.0` after PR #5 passes.
---
### 8 ▪ Success Criteria for CI
Fail the pipeline when **any** occurs:
* `baseline1_report.txt` CI‑LB < 0.52
* `edge_filtered_accuracy` (val) < 0.60
* Back‑test Sharpe < 1.2 or Max DD > 15 %
---
Implementing this **single integrated revision** provides:
* **Deterministic artefact paths** for every run.
* **Rich, shareable figures** for quick diagnostics.
* **Audit‑ready logs/reports** for research traceability.
Merge each step once CI is green; you’ll have a reproducible, fully instrumented pipeline ready for iterative accuracy pushes toward the 65 % target.