gru_sac_predictor/prompts/missing_data.txt

## Revision Instructions for AI Dev‑Agent
Implement end‑to‑end missing‑bar handling in GRU and SAC. Apply the steps below in sequence, with small PRs and CI green at each stage.

---

### 1 | Config updates

**File:** `config.yaml`
Add under `data` and create new sections for `gru` and `sac`:

```yaml
data:
  bar_frequency: "1T"
  missing:
    strategy:      "neutral"      # drop | neutral | ffill | interpolate
    max_gap:       5              # max consecutive missing bars allowed
    interpolate:
      method:      "linear"
      limit:       10

gru:
  drop_imputed_sequences: true    # drop any sequence containing imputed bars

sac:
  imputed_handling: "hold"        # hold | skip | penalty
  action_penalty: 0.05            # used if imputed_handling=penalty
```

---

### 2 | Detect & fill missing bars

**File:** `src/data_loader.py`

1. **Import** at top:
   ```python
   import pandas as pd
   from .io_manager import IOManager
   ```

2. **Implement** `find_missing_bars(df, freq)` and `_consecutive_gaps` helpers.

3. **Implement** `report_missing(missing, cfg, io, logger)` as described.

4. **Implement** `fill_missing_bars(df, cfg, io, logger)`:
   - Detect missing timestamps.
   - Call `report_missing`.
   - Reindex to full date_range.
   - Apply `strategy`:
     - `drop`: return original df.
     - `neutral`: ffill close, set open=high=low=close, volume=0.
     - `ffill`: `df_full.ffill().bfill()`.
     - `interpolate`: use `df_full.interpolate(...)`.
   - **After filling**, add column:
     ```python
     df['bar_imputed'] = df.index.isin(missing)
     ```
   - **Error** if longest gap > `cfg.data.missing.max_gap`.

5. **Integrate** in `TradingPipeline.load_and_preprocess_data` **before** feature engineering:
   ```python
   df = fill_missing_bars(df, self.cfg, io, logger)
   ```

---

### 3 | Sequence creation respects imputed bars

**File:** `src/trading_pipeline.py`

1. In `create_sequences`, after building `X_seq` and `y_seq`, **build** `mask_seq` of shape `(n, lookback)` from `df['bar_imputed']`.

2. **Conditionally drop** sequences:
   ```python
   if self.cfg.gru.drop_imputed_sequences:
       valid = ~mask_seq.any(axis=1)
       X_seq = X_seq[valid]; y_seq = y_seq[valid]
   ```
3. **Log**:
   ```python
   logger.info(f"Generated {orig_n} sequences, dropped {orig_n - X_seq.shape[0]} with imputed bars")
   ```
4. **Include** `bar_imputed` as a feature column in `minimal_whitelist`.

---

### 4 | GRU model input channel

**File:** `src/model_gru_v3.py` (or `model_gru.py` if v3)

1. **Update input shape**: increase `n_features` by 1 to include `bar_imputed`.

2. **No further architectural change**; the model now sees the imputed‑flag channel.

---

### 5 | SAC environment handles imputed bars

**File:** `src/trading_env.py`

1. **Read** `bar_imputed` into `self.bar_imputed` aligned with your sequences.

2. **In `step(action)`**, at the top:
   ```python
   imputed = self.bar_imputed[self.current_step]
   if imputed:
       mode = self.cfg.sac.imputed_handling
       if mode == "skip":
           self.current_step += 1
           return next_state, 0.0, False, {}
       if mode == "hold":
           action = self.position
       if mode == "penalty":
           reward = - self.cfg.sac.action_penalty * (action - self.position)**2
           self._update_position(action)
           self.current_step += 1
           return next_state, reward, False, {}
   # existing normal step follows
   ```

3. **Ensure** imputed transitions are added to replay buffer only when `mode` ≠ `skip`.

4. **Log**:
   ```python
   logger.debug(f"SAC step {self.current_step} on imputed bar: handling={mode}")
   ```

---

### 6 | Logging & artefacts

1. **Data load** warning:
   ```
   WARNING Detected {total} missing bars, longest gap {longest}; applied strategy={strategy}
   ```

2. **Sequence creation** info:
   ```
   INFO Generated {orig_n} sequences, dropped {dropped} with imputed bars
   ```

3. **SAC training** debug:
   ```
   DEBUG SAC on imputed bar at step {step}: handling={mode}
   ```

4. **Report** saved under `results/<run_id>/`:
   - `missing_bars_summary.json`
   - `imputed_sequence_summary.json` with counts.
   - `sac_imputed_transitions.csv` (optional detailed log).

---

### 7 | Unit tests

**Files:** `tests/test_data_loader.py`, `tests/test_sequence_creation.py`, `tests/test_trading_env.py`

1. **`test_data_loader.py`**:
   - Synthetic gappy DataFrame → assert `bar_imputed` flags and each strategy’s output.

2. **`test_sequence_creation.py`**:
   - Build toy DataFrame with `bar_imputed`; assert sequences dropped when `drop_imputed_sequences=True`.

3. **`test_trading_env.py`**:
   - Create `TradingEnv` with known imputed steps; for each `imputed_handling` mode assert `step()` behavior:
     - `skip` moves without adding to buffer;
     - `hold` returns same position;
     - `penalty` returns negative reward equal to penalty formula.

---

### 8 | Documentation

1. **README.md** → add **Data Quality** section describing missing‑bar handling, config keys, and recommended defaults.

2. **docs/v3_changelog.md** → note new missing‑bar feature and cfg flags.

---

**Roll‑out Plan:**

- **PR 1:** Config + data_loader missing‑bar detection & fill + tests.
- **PR 2:** Sequence creation & GRU channel update + tests.
- **PR 3:** SAC env updates + tests.
- **PR 4:** Logging/artefacts + docs.

Merge each after CI passes.