gru_sac_predictor/prompts/pipeline_refactor.txt

Refactoring Plan for trading_pipeline.py
=======================================

Goal: Break down the large TradingPipeline class into smaller, dedicated modules for better maintainability and readability, while minimizing disruption to the existing E2E system.

Strategy:

1.  **Keep `TradingPipeline` as the Orchestrator:**
    *   The main `TradingPipeline` class in `trading_pipeline.py` remains.
    *   Responsibilities:
        *   Load configuration (`__init__`).
        *   Initialize core components (`DataLoader`, `FeatureEngineer`, etc.).
        *   Manage overall state (instance variables like `df_raw`, `gru_model`, etc.).
        *   Run the main execution flow (`execute`), including the walk-forward loop.
        *   Call functions from new stage-specific modules.

2.  **Create Stage-Specific Modules:**
    *   Create a new sub-directory: `src/pipeline_stages`.
    *   Move stage-specific logic from `TradingPipeline` methods into functions within these modules.
    *   Proposed Modules:
        *   `src/pipeline_stages/data_processing.py`: Loading, feature engineering, label generation, alignment, splitting.
        *   `src/pipeline_stages/feature_processing.py`: Scaling, feature selection, pruning.
        *   `src/pipeline_stages/sequence_creation.py`: GRU sequence creation.
        *   `src/pipeline_stages/modelling.py`: GRU/SAC training/loading, calibration, SAC aggregation.
        *   `src/pipeline_stages/evaluation.py`: Baseline checks, backtesting, results saving, metric aggregation, final decision.

3.  **Refactor `TradingPipeline` Methods:**
    *   Simplify existing methods in `TradingPipeline` (e.g., `load_and_preprocess_data`, `engineer_features`).
    *   These methods will now primarily:
        *   Import the corresponding function from the `pipeline_stages` module.
        *   Call the imported function, passing necessary data and components (`config`, `data_loader`, `io`, state variables).
        *   Receive results and update the `TradingPipeline` instance's state.

4.  **Data Flow:**
    *   Emphasize explicit data passing via function arguments and return values between stages.
    *   The `TradingPipeline.execute` method orchestrates this flow.
    *   State required across multiple stages/folds remains as `TradingPipeline` instance attributes.

5.  **Dependencies:**
    *   Pass `config`, `io`, and component instances (`DataLoader`, `FeatureEngineer`, etc.) as arguments to the stage functions that need them.

Benefits:

*   **Readability:** `trading_pipeline.py` becomes a clearer orchestrator.
*   **Maintainability:** Easier to isolate and modify specific stages.
*   **Testability:** Stage functions are potentially easier to unit test.
*   **Reduced Risk:** Focuses on moving logic, minimizing E2E breakage compared to a full rewrite.

Implementation Steps:

1.  Create the `src/pipeline_stages` directory and module files.
2.  Incrementally move logic for each stage into the corresponding module's functions.
3.  Update `TradingPipeline` methods to import and call these new functions.
4.  Adjust imports and function signatures as needed.
5.  Proceed stage by stage, verifying structure and data flow.