gru_sac_predictor/prompts/missing_data.txt
2025-04-20 17:52:49 +00:00

187 lines
5.3 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Revision Instructions for AI DevAgent
Implement endtoend missingbar handling in GRU and SAC. Apply the steps below in sequence, with small PRs and CI green at each stage.
---
### 1 | Config updates
**File:** `config.yaml`
Add under `data` and create new sections for `gru` and `sac`:
```yaml
data:
bar_frequency: "1T"
missing:
strategy: "neutral" # drop | neutral | ffill | interpolate
max_gap: 5 # max consecutive missing bars allowed
interpolate:
method: "linear"
limit: 10
gru:
drop_imputed_sequences: true # drop any sequence containing imputed bars
sac:
imputed_handling: "hold" # hold | skip | penalty
action_penalty: 0.05 # used if imputed_handling=penalty
```
---
### 2 | Detect & fill missing bars
**File:** `src/data_loader.py`
1. **Import** at top:
```python
import pandas as pd
from .io_manager import IOManager
```
2. **Implement** `find_missing_bars(df, freq)` and `_consecutive_gaps` helpers.
3. **Implement** `report_missing(missing, cfg, io, logger)` as described.
4. **Implement** `fill_missing_bars(df, cfg, io, logger)`:
- Detect missing timestamps.
- Call `report_missing`.
- Reindex to full date_range.
- Apply `strategy`:
- `drop`: return original df.
- `neutral`: ffill close, set open=high=low=close, volume=0.
- `ffill`: `df_full.ffill().bfill()`.
- `interpolate`: use `df_full.interpolate(...)`.
- **After filling**, add column:
```python
df['bar_imputed'] = df.index.isin(missing)
```
- **Error** if longest gap > `cfg.data.missing.max_gap`.
5. **Integrate** in `TradingPipeline.load_and_preprocess_data` **before** feature engineering:
```python
df = fill_missing_bars(df, self.cfg, io, logger)
```
---
### 3 | Sequence creation respects imputed bars
**File:** `src/trading_pipeline.py`
1. In `create_sequences`, after building `X_seq` and `y_seq`, **build** `mask_seq` of shape `(n, lookback)` from `df['bar_imputed']`.
2. **Conditionally drop** sequences:
```python
if self.cfg.gru.drop_imputed_sequences:
valid = ~mask_seq.any(axis=1)
X_seq = X_seq[valid]; y_seq = y_seq[valid]
```
3. **Log**:
```python
logger.info(f"Generated {orig_n} sequences, dropped {orig_n - X_seq.shape[0]} with imputed bars")
```
4. **Include** `bar_imputed` as a feature column in `minimal_whitelist`.
---
### 4 | GRU model input channel
**File:** `src/model_gru_v3.py` (or `model_gru.py` if v3)
1. **Update input shape**: increase `n_features` by 1 to include `bar_imputed`.
2. **No further architectural change**; the model now sees the imputedflag channel.
---
### 5 | SAC environment handles imputed bars
**File:** `src/trading_env.py`
1. **Read** `bar_imputed` into `self.bar_imputed` aligned with your sequences.
2. **In `step(action)`**, at the top:
```python
imputed = self.bar_imputed[self.current_step]
if imputed:
mode = self.cfg.sac.imputed_handling
if mode == "skip":
self.current_step += 1
return next_state, 0.0, False, {}
if mode == "hold":
action = self.position
if mode == "penalty":
reward = - self.cfg.sac.action_penalty * (action - self.position)**2
self._update_position(action)
self.current_step += 1
return next_state, reward, False, {}
# existing normal step follows
```
3. **Ensure** imputed transitions are added to replay buffer only when `mode` ≠ `skip`.
4. **Log**:
```python
logger.debug(f"SAC step {self.current_step} on imputed bar: handling={mode}")
```
---
### 6 | Logging & artefacts
1. **Data load** warning:
```
WARNING Detected {total} missing bars, longest gap {longest}; applied strategy={strategy}
```
2. **Sequence creation** info:
```
INFO Generated {orig_n} sequences, dropped {dropped} with imputed bars
```
3. **SAC training** debug:
```
DEBUG SAC on imputed bar at step {step}: handling={mode}
```
4. **Report** saved under `results/<run_id>/`:
- `missing_bars_summary.json`
- `imputed_sequence_summary.json` with counts.
- `sac_imputed_transitions.csv` (optional detailed log).
---
### 7 | Unit tests
**Files:** `tests/test_data_loader.py`, `tests/test_sequence_creation.py`, `tests/test_trading_env.py`
1. **`test_data_loader.py`**:
- Synthetic gappy DataFrame → assert `bar_imputed` flags and each strategys output.
2. **`test_sequence_creation.py`**:
- Build toy DataFrame with `bar_imputed`; assert sequences dropped when `drop_imputed_sequences=True`.
3. **`test_trading_env.py`**:
- Create `TradingEnv` with known imputed steps; for each `imputed_handling` mode assert `step()` behavior:
- `skip` moves without adding to buffer;
- `hold` returns same position;
- `penalty` returns negative reward equal to penalty formula.
---
### 8 | Documentation
1. **README.md** → add **Data Quality** section describing missingbar handling, config keys, and recommended defaults.
2. **docs/v3_changelog.md** → note new missingbar feature and cfg flags.
---
**Rollout Plan:**
- **PR 1:** Config + data_loader missingbar detection & fill + tests.
- **PR 2:** Sequence creation & GRU channel update + tests.
- **PR 3:** SAC env updates + tests.
- **PR 4:** Logging/artefacts + docs.
Merge each after CI passes.