gru_sac_predictor/prompts/calibrating_edge_baseline.txt

### Streamlined Calibration for Baseline LR Gates

If you’re mainly struggling with mis‑calibrated confidence on your edge‑filtered checks, here’s a **minimal** integration to fix it without heavy lifting.

---

#### 1. Add a toggle and holdout in `config.yaml`
```yaml
baseline:
  calibration_enabled: true      # turn on/off easily
  calibration_method: "isotonic"  # handles multiclass
  calibration_holdout: 0.2       # 20% of your train split
  random_state: 42
```

---

#### 2. Quick split for calibration
In your `run_baseline_checks` (before any CI gates):
```python
# original train/val split
X_main, X_val, y_main, y_val = train_test_split(
    X_pruned, y_labels, test_size=0.2, random_state=seed
)

if self.config['baseline']['calibration_enabled']:
    X_train, X_cal, y_train, y_cal = train_test_split(
        X_main, y_main,
        test_size=self.config['baseline']['calibration_holdout'],
        random_state=self.config['baseline']['random_state']
    )
else:
    X_train, y_train = X_main, y_main
    X_cal, y_cal = None, None
```

---

#### 3. Fit an isotonic calibrator only when needed
```python
# train raw LR
lr = LogisticRegression(...).fit(X_train, y_train)

if X_cal is not None:
    from sklearn.calibration import CalibratedClassifierCV
    calibrator = CalibratedClassifierCV(lr, method='isotonic', cv='prefit')
    calibrator.fit(X_cal, y_cal)
else:
    calibrator = lr
```

---

#### 4. Use calibrated probabilities in your gates
Replace all `lr.predict_proba(X)` calls with:
```python
probs = calibrator.predict_proba(X)
# binary: edge = |probs[:,1] - 0.5|
# ternary: edge = max(probs, axis=1) - 1/3
```
Then run your existing CI lower‑bound checks as usual.

---

#### 5. (Optional) Skip persistence
For a quick fix you can skip saving/loading the calibrator—just build and use it in the same process.

---

With these five steps, you’ll correct your edge‑confidence estimates with minimal code and configuration. If your gates then pass, proceed to GRU training; if they still fail, the issue is likely weak features rather than calibration.