Refactoring Plan for trading_pipeline.py ======================================= Goal: Break down the large TradingPipeline class into smaller, dedicated modules for better maintainability and readability, while minimizing disruption to the existing E2E system. Strategy: 1. **Keep `TradingPipeline` as the Orchestrator:** * The main `TradingPipeline` class in `trading_pipeline.py` remains. * Responsibilities: * Load configuration (`__init__`). * Initialize core components (`DataLoader`, `FeatureEngineer`, etc.). * Manage overall state (instance variables like `df_raw`, `gru_model`, etc.). * Run the main execution flow (`execute`), including the walk-forward loop. * Call functions from new stage-specific modules. 2. **Create Stage-Specific Modules:** * Create a new sub-directory: `src/pipeline_stages`. * Move stage-specific logic from `TradingPipeline` methods into functions within these modules. * Proposed Modules: * `src/pipeline_stages/data_processing.py`: Loading, feature engineering, label generation, alignment, splitting. * `src/pipeline_stages/feature_processing.py`: Scaling, feature selection, pruning. * `src/pipeline_stages/sequence_creation.py`: GRU sequence creation. * `src/pipeline_stages/modelling.py`: GRU/SAC training/loading, calibration, SAC aggregation. * `src/pipeline_stages/evaluation.py`: Baseline checks, backtesting, results saving, metric aggregation, final decision. 3. **Refactor `TradingPipeline` Methods:** * Simplify existing methods in `TradingPipeline` (e.g., `load_and_preprocess_data`, `engineer_features`). * These methods will now primarily: * Import the corresponding function from the `pipeline_stages` module. * Call the imported function, passing necessary data and components (`config`, `data_loader`, `io`, state variables). * Receive results and update the `TradingPipeline` instance's state. 4. **Data Flow:** * Emphasize explicit data passing via function arguments and return values between stages. * The `TradingPipeline.execute` method orchestrates this flow. * State required across multiple stages/folds remains as `TradingPipeline` instance attributes. 5. **Dependencies:** * Pass `config`, `io`, and component instances (`DataLoader`, `FeatureEngineer`, etc.) as arguments to the stage functions that need them. Benefits: * **Readability:** `trading_pipeline.py` becomes a clearer orchestrator. * **Maintainability:** Easier to isolate and modify specific stages. * **Testability:** Stage functions are potentially easier to unit test. * **Reduced Risk:** Focuses on moving logic, minimizing E2E breakage compared to a full rewrite. Implementation Steps: 1. Create the `src/pipeline_stages` directory and module files. 2. Incrementally move logic for each stage into the corresponding module's functions. 3. Update `TradingPipeline` methods to import and call these new functions. 4. Adjust imports and function signatures as needed. 5. Proceed stage by stage, verifying structure and data flow.