Below is a consolidated set of revision instructions — and the key code snippets you’ll need — to switch your GRU/SAC pipeline to supervise **log‑returns** (so your “ret” and “gauss_params” heads are targeting log‑return), and to wire everything end‑to‑end so you get a clean 55 %+ edge before SAC.

---

## 🛠 Revision Playbook

1. **Compute forward log‐returns in your data pipeline**  
   In `TradingPipeline.define_labels_and_align` (or wherever you set up `df`):

   ```python
   # Replace any raw-return calculation with log‐return
   N = config['gru']['prediction_horizon']
   df['fwd_log_ret'] = np.log(df['close'].shift(-N) / df['close'])
   df['direction_label'] = (df['fwd_log_ret'] > 0).astype(int)
   # If you do ternary:
   flat_thr = config['gru']['flat_sigma_multiplier'] * df['fwd_log_ret'].rolling(…)
   df['dir3_label'] = pd.cut(df['fwd_log_ret'],
                             bins=[-np.inf, -flat_thr, flat_thr, np.inf],
                             labels=[0,1,2]).astype(int)
   ```

2. **Align your targets**  
   Drop the last N rows so `fwd_log_ret` has no NaNs:

   ```python
   df = df.iloc[:-N]
   ```

3. **Pass log‐return into both heads**  
   When you build your sequences and target dicts:

   ```python
   y_ret_seq   = ...  # shape (n_seq, 1) from fwd_log_ret
   y_dir3_seq  = ...  # one‑hot from dir3_label

   y_train = {'mu': y_ret_seq,        # Huber head
              'gauss_params': y_ret_seq,  # NLL head uses same target
              'dir3': y_dir3_seq}     # classification head
   ```

4. **Update your GRU builder to match**  
   Make sure your v3 model has exactly three outputs:  
   ```python
   model = Model(inputs, outputs=[mu_output, gauss_params_output, dir3_output])
   model.compile(
     optimizer=Adam(lr),
     loss={
       'mu': Huber(delta),
       'gauss_params': gaussian_nll,
       'dir3': categorical_focal_loss
     },
     loss_weights={'mu':1.0, 'gauss_params':0.2, 'dir3':0.4},
     metrics={'dir3':'accuracy'}
   )
   ```

5. **Train with the new targets dict**  
   In `GRUModelHandler.train(...)`, replace your fit call with:

   ```python
   history = model.fit(
     X_train_seq, 
     y_train_dict,
     validation_data=(X_val_seq, y_val_dict),
     …callbacks…
   )
   ```

6. **Calibrate on the “dir3” softmax outputs**  
   Your calibrator (Temp/Vector) must now consume the 3‑class logits or probabilities:

   ```python
   raw_logits = handler.predict_logits(X_val_seq)
   calibrator.fit(raw_logits, y_val_dir3)
   ```

7. **Feed SAC the log‐return μ and σ**  
   In your `TradingEnv`, when you construct the state:

   ```python
   mu, log_sigma, probs = gru_handler.predict(X_step)
   sigma = np.exp(log_sigma)
   edge = 2 * calibrated_p_up - 1   # if binary
   z_score = np.abs(mu) / sigma
   state = [mu, sigma, edge, z_score, prev_position]
   ```

8. **Re‐run baseline check on log‐returns**  
   Your logistic baseline in `run_baseline_checks` should now be trained on `X_train_pruned` vs `y_dir3_label` (or binary), ensuring the CI ≥ 0.52 before you even build your GRU.

9. **Validate end‑to‑end edge**  
   After these changes, you should see:
   - Baseline logistic CI LB ≥ 0.52  
   - GRU “edge” hit‑rate ≥ 0.55 on validation  
   - SAC backtest hitting meaningful Sharpe/Win‑rate gates  

---

## 🔧 Example Code Snippets

### 1) Gaussian NLL stays the same:

```python
@saving.register_keras_serializable(package='GRU')
def gaussian_nll(y_true, y_pred):
    mu, log_sigma = tf.split(y_pred, 2, axis=-1)
    y_true = tf.reshape(y_true, tf.shape(mu))
    inv_var = tf.exp(-2*log_sigma)
    return tf.reduce_mean(0.5 * inv_var * tf.square(y_true-mu) + log_sigma)
```

### 2) Build & compile v3 model:

```python
def build_gru_model_v3(...):
    inp = layers.Input((lookback, n_features))
    x = layers.GRU(gru_units, return_sequences=True)(inp)
    x = layers.LayerNormalization()(x)
    if attention_units>0:
        x = layers.MultiHeadAttention(...)(x,x)
    x = layers.GlobalAveragePooling1D()(x)

    gauss = layers.Dense(2, name='gauss_params')(x)
    mu    = layers.Lambda(lambda z: z[:,0:1], name='mu')(gauss)
    dir3_logits = layers.Dense(3, name='dir3_logits')(x)
    dir3 = layers.Activation('softmax', name='dir3')(dir3_logits)

    model = Model(inp, [mu, gauss, dir3])
    model.compile(
      optimizer=Adam(lr),
      loss={'mu':Huber(delta),
            'gauss_params':gaussian_nll,
            'dir3':categorical_focal_loss},
      loss_weights={'mu':1.0,'gauss_params':0.2,'dir3':0.4},
      metrics={'dir3':'accuracy'}
    )
    return model
```

### 3) Fitting in your handler:

```python
history = self.model.fit(
    X_train_seq, y_train_dict,
    validation_data=(X_val_seq, y_val_dict),
    epochs=max_epochs,
    batch_size=batch_size,
    callbacks=[early_stop, csv_logger, TqdmCallback()]
)
```

---

### Why these changes boost edge?

- **Log‐returns** stabilize variance & symmetrize up/down moves.  
- **NLL + Huber on log‐return** gives the model both distributional uncertainty (σ) and a robust error measure.  
- **Proper softmax head** on three classes (up/flat/down) cleans up classification.  
- **Calibration + optimized edge threshold** ensures your SAC agent only sees high‐confidence signals (edge≥thr).  

Together, this gets your baseline GRU above 55 % “edge” on validation, so the SAC agent can then learn a meaningful sizing policy rather than fight noise.

Let me know if you need any further refinements!