## Revision Instructions for AI Dev‑Agent Implement end‑to‑end missing‑bar handling in GRU and SAC. Apply the steps below in sequence, with small PRs and CI green at each stage. --- ### 1 | Config updates **File:** `config.yaml` Add under `data` and create new sections for `gru` and `sac`: ```yaml data: bar_frequency: "1T" missing: strategy: "neutral" # drop | neutral | ffill | interpolate max_gap: 5 # max consecutive missing bars allowed interpolate: method: "linear" limit: 10 gru: drop_imputed_sequences: true # drop any sequence containing imputed bars sac: imputed_handling: "hold" # hold | skip | penalty action_penalty: 0.05 # used if imputed_handling=penalty ``` --- ### 2 | Detect & fill missing bars **File:** `src/data_loader.py` 1. **Import** at top: ```python import pandas as pd from .io_manager import IOManager ``` 2. **Implement** `find_missing_bars(df, freq)` and `_consecutive_gaps` helpers. 3. **Implement** `report_missing(missing, cfg, io, logger)` as described. 4. **Implement** `fill_missing_bars(df, cfg, io, logger)`: - Detect missing timestamps. - Call `report_missing`. - Reindex to full date_range. - Apply `strategy`: - `drop`: return original df. - `neutral`: ffill close, set open=high=low=close, volume=0. - `ffill`: `df_full.ffill().bfill()`. - `interpolate`: use `df_full.interpolate(...)`. - **After filling**, add column: ```python df['bar_imputed'] = df.index.isin(missing) ``` - **Error** if longest gap > `cfg.data.missing.max_gap`. 5. **Integrate** in `TradingPipeline.load_and_preprocess_data` **before** feature engineering: ```python df = fill_missing_bars(df, self.cfg, io, logger) ``` --- ### 3 | Sequence creation respects imputed bars **File:** `src/trading_pipeline.py` 1. In `create_sequences`, after building `X_seq` and `y_seq`, **build** `mask_seq` of shape `(n, lookback)` from `df['bar_imputed']`. 2. **Conditionally drop** sequences: ```python if self.cfg.gru.drop_imputed_sequences: valid = ~mask_seq.any(axis=1) X_seq = X_seq[valid]; y_seq = y_seq[valid] ``` 3. **Log**: ```python logger.info(f"Generated {orig_n} sequences, dropped {orig_n - X_seq.shape[0]} with imputed bars") ``` 4. **Include** `bar_imputed` as a feature column in `minimal_whitelist`. --- ### 4 | GRU model input channel **File:** `src/model_gru_v3.py` (or `model_gru.py` if v3) 1. **Update input shape**: increase `n_features` by 1 to include `bar_imputed`. 2. **No further architectural change**; the model now sees the imputed‑flag channel. --- ### 5 | SAC environment handles imputed bars **File:** `src/trading_env.py` 1. **Read** `bar_imputed` into `self.bar_imputed` aligned with your sequences. 2. **In `step(action)`**, at the top: ```python imputed = self.bar_imputed[self.current_step] if imputed: mode = self.cfg.sac.imputed_handling if mode == "skip": self.current_step += 1 return next_state, 0.0, False, {} if mode == "hold": action = self.position if mode == "penalty": reward = - self.cfg.sac.action_penalty * (action - self.position)**2 self._update_position(action) self.current_step += 1 return next_state, reward, False, {} # existing normal step follows ``` 3. **Ensure** imputed transitions are added to replay buffer only when `mode` ≠ `skip`. 4. **Log**: ```python logger.debug(f"SAC step {self.current_step} on imputed bar: handling={mode}") ``` --- ### 6 | Logging & artefacts 1. **Data load** warning: ``` WARNING Detected {total} missing bars, longest gap {longest}; applied strategy={strategy} ``` 2. **Sequence creation** info: ``` INFO Generated {orig_n} sequences, dropped {dropped} with imputed bars ``` 3. **SAC training** debug: ``` DEBUG SAC on imputed bar at step {step}: handling={mode} ``` 4. **Report** saved under `results//`: - `missing_bars_summary.json` - `imputed_sequence_summary.json` with counts. - `sac_imputed_transitions.csv` (optional detailed log). --- ### 7 | Unit tests **Files:** `tests/test_data_loader.py`, `tests/test_sequence_creation.py`, `tests/test_trading_env.py` 1. **`test_data_loader.py`**: - Synthetic gappy DataFrame → assert `bar_imputed` flags and each strategy’s output. 2. **`test_sequence_creation.py`**: - Build toy DataFrame with `bar_imputed`; assert sequences dropped when `drop_imputed_sequences=True`. 3. **`test_trading_env.py`**: - Create `TradingEnv` with known imputed steps; for each `imputed_handling` mode assert `step()` behavior: - `skip` moves without adding to buffer; - `hold` returns same position; - `penalty` returns negative reward equal to penalty formula. --- ### 8 | Documentation 1. **README.md** → add **Data Quality** section describing missing‑bar handling, config keys, and recommended defaults. 2. **docs/v3_changelog.md** → note new missing‑bar feature and cfg flags. --- **Roll‑out Plan:** - **PR 1:** Config + data_loader missing‑bar detection & fill + tests. - **PR 2:** Sequence creation & GRU channel update + tests. - **PR 3:** SAC env updates + tests. - **PR 4:** Logging/artefacts + docs. Merge each after CI passes.