gru_sac_predictor/prompts/pipeline_refactor.txt
2025-04-20 17:52:49 +00:00

55 lines
3.1 KiB
Plaintext

Refactoring Plan for trading_pipeline.py
=======================================
Goal: Break down the large TradingPipeline class into smaller, dedicated modules for better maintainability and readability, while minimizing disruption to the existing E2E system.
Strategy:
1. **Keep `TradingPipeline` as the Orchestrator:**
* The main `TradingPipeline` class in `trading_pipeline.py` remains.
* Responsibilities:
* Load configuration (`__init__`).
* Initialize core components (`DataLoader`, `FeatureEngineer`, etc.).
* Manage overall state (instance variables like `df_raw`, `gru_model`, etc.).
* Run the main execution flow (`execute`), including the walk-forward loop.
* Call functions from new stage-specific modules.
2. **Create Stage-Specific Modules:**
* Create a new sub-directory: `src/pipeline_stages`.
* Move stage-specific logic from `TradingPipeline` methods into functions within these modules.
* Proposed Modules:
* `src/pipeline_stages/data_processing.py`: Loading, feature engineering, label generation, alignment, splitting.
* `src/pipeline_stages/feature_processing.py`: Scaling, feature selection, pruning.
* `src/pipeline_stages/sequence_creation.py`: GRU sequence creation.
* `src/pipeline_stages/modelling.py`: GRU/SAC training/loading, calibration, SAC aggregation.
* `src/pipeline_stages/evaluation.py`: Baseline checks, backtesting, results saving, metric aggregation, final decision.
3. **Refactor `TradingPipeline` Methods:**
* Simplify existing methods in `TradingPipeline` (e.g., `load_and_preprocess_data`, `engineer_features`).
* These methods will now primarily:
* Import the corresponding function from the `pipeline_stages` module.
* Call the imported function, passing necessary data and components (`config`, `data_loader`, `io`, state variables).
* Receive results and update the `TradingPipeline` instance's state.
4. **Data Flow:**
* Emphasize explicit data passing via function arguments and return values between stages.
* The `TradingPipeline.execute` method orchestrates this flow.
* State required across multiple stages/folds remains as `TradingPipeline` instance attributes.
5. **Dependencies:**
* Pass `config`, `io`, and component instances (`DataLoader`, `FeatureEngineer`, etc.) as arguments to the stage functions that need them.
Benefits:
* **Readability:** `trading_pipeline.py` becomes a clearer orchestrator.
* **Maintainability:** Easier to isolate and modify specific stages.
* **Testability:** Stage functions are potentially easier to unit test.
* **Reduced Risk:** Focuses on moving logic, minimizing E2E breakage compared to a full rewrite.
Implementation Steps:
1. Create the `src/pipeline_stages` directory and module files.
2. Incrementally move logic for each stage into the corresponding module's functions.
3. Update `TradingPipeline` methods to import and call these new functions.
4. Adjust imports and function signatures as needed.
5. Proceed stage by stage, verifying structure and data flow.