187 lines
5.3 KiB
Plaintext
187 lines
5.3 KiB
Plaintext
## Revision Instructions for AI Dev‑Agent
|
||
Implement end‑to‑end missing‑bar handling in GRU and SAC. Apply the steps below in sequence, with small PRs and CI green at each stage.
|
||
|
||
---
|
||
|
||
### 1 | Config updates
|
||
|
||
**File:** `config.yaml`
|
||
Add under `data` and create new sections for `gru` and `sac`:
|
||
|
||
```yaml
|
||
data:
|
||
bar_frequency: "1T"
|
||
missing:
|
||
strategy: "neutral" # drop | neutral | ffill | interpolate
|
||
max_gap: 5 # max consecutive missing bars allowed
|
||
interpolate:
|
||
method: "linear"
|
||
limit: 10
|
||
|
||
gru:
|
||
drop_imputed_sequences: true # drop any sequence containing imputed bars
|
||
|
||
sac:
|
||
imputed_handling: "hold" # hold | skip | penalty
|
||
action_penalty: 0.05 # used if imputed_handling=penalty
|
||
```
|
||
|
||
---
|
||
|
||
### 2 | Detect & fill missing bars
|
||
|
||
**File:** `src/data_loader.py`
|
||
|
||
1. **Import** at top:
|
||
```python
|
||
import pandas as pd
|
||
from .io_manager import IOManager
|
||
```
|
||
|
||
2. **Implement** `find_missing_bars(df, freq)` and `_consecutive_gaps` helpers.
|
||
|
||
3. **Implement** `report_missing(missing, cfg, io, logger)` as described.
|
||
|
||
4. **Implement** `fill_missing_bars(df, cfg, io, logger)`:
|
||
- Detect missing timestamps.
|
||
- Call `report_missing`.
|
||
- Reindex to full date_range.
|
||
- Apply `strategy`:
|
||
- `drop`: return original df.
|
||
- `neutral`: ffill close, set open=high=low=close, volume=0.
|
||
- `ffill`: `df_full.ffill().bfill()`.
|
||
- `interpolate`: use `df_full.interpolate(...)`.
|
||
- **After filling**, add column:
|
||
```python
|
||
df['bar_imputed'] = df.index.isin(missing)
|
||
```
|
||
- **Error** if longest gap > `cfg.data.missing.max_gap`.
|
||
|
||
5. **Integrate** in `TradingPipeline.load_and_preprocess_data` **before** feature engineering:
|
||
```python
|
||
df = fill_missing_bars(df, self.cfg, io, logger)
|
||
```
|
||
|
||
---
|
||
|
||
### 3 | Sequence creation respects imputed bars
|
||
|
||
**File:** `src/trading_pipeline.py`
|
||
|
||
1. In `create_sequences`, after building `X_seq` and `y_seq`, **build** `mask_seq` of shape `(n, lookback)` from `df['bar_imputed']`.
|
||
|
||
2. **Conditionally drop** sequences:
|
||
```python
|
||
if self.cfg.gru.drop_imputed_sequences:
|
||
valid = ~mask_seq.any(axis=1)
|
||
X_seq = X_seq[valid]; y_seq = y_seq[valid]
|
||
```
|
||
3. **Log**:
|
||
```python
|
||
logger.info(f"Generated {orig_n} sequences, dropped {orig_n - X_seq.shape[0]} with imputed bars")
|
||
```
|
||
4. **Include** `bar_imputed` as a feature column in `minimal_whitelist`.
|
||
|
||
---
|
||
|
||
### 4 | GRU model input channel
|
||
|
||
**File:** `src/model_gru_v3.py` (or `model_gru.py` if v3)
|
||
|
||
1. **Update input shape**: increase `n_features` by 1 to include `bar_imputed`.
|
||
|
||
2. **No further architectural change**; the model now sees the imputed‑flag channel.
|
||
|
||
---
|
||
|
||
### 5 | SAC environment handles imputed bars
|
||
|
||
**File:** `src/trading_env.py`
|
||
|
||
1. **Read** `bar_imputed` into `self.bar_imputed` aligned with your sequences.
|
||
|
||
2. **In `step(action)`**, at the top:
|
||
```python
|
||
imputed = self.bar_imputed[self.current_step]
|
||
if imputed:
|
||
mode = self.cfg.sac.imputed_handling
|
||
if mode == "skip":
|
||
self.current_step += 1
|
||
return next_state, 0.0, False, {}
|
||
if mode == "hold":
|
||
action = self.position
|
||
if mode == "penalty":
|
||
reward = - self.cfg.sac.action_penalty * (action - self.position)**2
|
||
self._update_position(action)
|
||
self.current_step += 1
|
||
return next_state, reward, False, {}
|
||
# existing normal step follows
|
||
```
|
||
|
||
3. **Ensure** imputed transitions are added to replay buffer only when `mode` ≠ `skip`.
|
||
|
||
4. **Log**:
|
||
```python
|
||
logger.debug(f"SAC step {self.current_step} on imputed bar: handling={mode}")
|
||
```
|
||
|
||
---
|
||
|
||
### 6 | Logging & artefacts
|
||
|
||
1. **Data load** warning:
|
||
```
|
||
WARNING Detected {total} missing bars, longest gap {longest}; applied strategy={strategy}
|
||
```
|
||
|
||
2. **Sequence creation** info:
|
||
```
|
||
INFO Generated {orig_n} sequences, dropped {dropped} with imputed bars
|
||
```
|
||
|
||
3. **SAC training** debug:
|
||
```
|
||
DEBUG SAC on imputed bar at step {step}: handling={mode}
|
||
```
|
||
|
||
4. **Report** saved under `results/<run_id>/`:
|
||
- `missing_bars_summary.json`
|
||
- `imputed_sequence_summary.json` with counts.
|
||
- `sac_imputed_transitions.csv` (optional detailed log).
|
||
|
||
---
|
||
|
||
### 7 | Unit tests
|
||
|
||
**Files:** `tests/test_data_loader.py`, `tests/test_sequence_creation.py`, `tests/test_trading_env.py`
|
||
|
||
1. **`test_data_loader.py`**:
|
||
- Synthetic gappy DataFrame → assert `bar_imputed` flags and each strategy’s output.
|
||
|
||
2. **`test_sequence_creation.py`**:
|
||
- Build toy DataFrame with `bar_imputed`; assert sequences dropped when `drop_imputed_sequences=True`.
|
||
|
||
3. **`test_trading_env.py`**:
|
||
- Create `TradingEnv` with known imputed steps; for each `imputed_handling` mode assert `step()` behavior:
|
||
- `skip` moves without adding to buffer;
|
||
- `hold` returns same position;
|
||
- `penalty` returns negative reward equal to penalty formula.
|
||
|
||
---
|
||
|
||
### 8 | Documentation
|
||
|
||
1. **README.md** → add **Data Quality** section describing missing‑bar handling, config keys, and recommended defaults.
|
||
|
||
2. **docs/v3_changelog.md** → note new missing‑bar feature and cfg flags.
|
||
|
||
---
|
||
|
||
**Roll‑out Plan:**
|
||
|
||
- **PR 1:** Config + data_loader missing‑bar detection & fill + tests.
|
||
- **PR 2:** Sequence creation & GRU channel update + tests.
|
||
- **PR 3:** SAC env updates + tests.
|
||
- **PR 4:** Logging/artefacts + docs.
|
||
|
||
Merge each after CI passes. |