# GRU-SAC Trading Pipeline: Step-by-Step Walkthrough

This notebook demonstrates how to instantiate and run the refactored `TradingPipeline` class **sequentially**, executing each major step individually.

**Goal:** Run the complete pipeline (data loading, feature engineering, GRU training/loading, calibration, SAC loading, backtesting) using a configuration file, inspecting the inputs and outputs at each stage.

## 1. Imports and Setup

Import necessary libraries and configure path variables to locate the project code.

In [1]:
import os
import sys
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import logging

print(f'Initial sys.path: {sys.path}')

# --- Path Setup ---
# Initialize project_root to None
project_root = None
package_root_for_imports = None # Initialize separately for clarity
try:
    notebook_dir = os.path.abspath('') # Get current directory (should be notebooks/)
    print(f'Notebook directory (notebook_dir): {notebook_dir}')

    # Go up ONE level to get the package root directory
    # Since notebook is in .../gru_sac_predictor/notebooks/, parent is .../gru_sac_predictor/
    package_root_for_imports = os.path.dirname(notebook_dir)
    print(f'Calculated path for imports (package_root_for_imports): {package_root_for_imports}')

    # Add the calculated path to sys.path to allow imports
    print(f'Checking if {package_root_for_imports} is in sys.path...')
    if package_root_for_imports not in sys.path:
        print(f'Path not found. Adding {package_root_for_imports} to sys.path.')
        sys.path.insert(0, package_root_for_imports)
        print(f'sys.path after insert: {sys.path}')
    else:
        print(f'Path {package_root_for_imports} already in sys.path.')

    # Define project_root consistently, used later for finding config.yaml
    # It should be the *outer* directory containing the package and config
    project_root = os.path.dirname(package_root_for_imports) # Go up one more level
    print(f'Project root for config/data (project_root): {project_root}')

except Exception as e:
    print(f'Error during path setup: {e}')

# --- Import the main pipeline class ---
print("Attempting to import TradingPipeline...")
try:
    # Import relative to the package root added to sys.path
    from src.trading_pipeline import TradingPipeline
    print('Successfully imported TradingPipeline.')
except ImportError as e:
    print(f'ERROR: Failed to import TradingPipeline: {e}')
    print(f'Final sys.path before error: {sys.path}')
    print("Please verify the project structure and the paths added to sys.path.")
except Exception as e: # Catch other potential errors
    print(f'An unexpected error occurred during import: {e}')
    print(f'Final sys.path before error: {sys.path}')

# Configure basic logging for the notebook
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Set pandas display options for better inspection
pd.set_option('display.max_columns', None) # Show all columns
pd.set_option('display.max_rows', 100)    # Show more rows if needed
pd.set_option('display.width', 1000)    # Wider display

Initial sys.path: ['/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '', '/home/yasha/develop/gru_sac_predictor/.venv/lib/python3.10/site-packages']
Notebook directory (notebook_dir): /home/yasha/develop/gru_sac_predictor/gru_sac_predictor/notebooks
Calculated path for imports (package_root_for_imports): /home/yasha/develop/gru_sac_predictor/gru_sac_predictor
Checking if /home/yasha/develop/gru_sac_predictor/gru_sac_predictor is in sys.path...
Path not found. Adding /home/yasha/develop/gru_sac_predictor/gru_sac_predictor to sys.path.
sys.path after insert: ['/home/yasha/develop/gru_sac_predictor/gru_sac_predictor', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '', '/home/yasha/develop/gru_sac_predictor/.venv/lib/python3.10/site-packages']
Project root for config/data (project_root): /home/yasha/develop/gru_sac_predictor
Attempting to import TradingPipeline...


2025-04-18 03:17:10.421895: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744946230.439676  157301 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744946230.445571  157301 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
  from .autonotebook import tqdm as notebook_tqdm


Successfully imported TradingPipeline.


## 2. Configuration

Specify the path to the configuration file (`config.yaml`). This file defines all parameters for the data, models, training, and backtesting.

In [2]:
# Path to the configuration file
# Assumes config.yaml is in the directory *above* the package root
config_rel_path = 'gru_sac_predictor/config.yaml' # Relative to project_root defined above
config_abs_path = None

# Construct absolute path relative to the project root identified earlier
if 'project_root' in locals() and project_root: # Check if project_root was successfully determined
    config_abs_path = os.path.join(project_root, config_rel_path)
else:
    print('ERROR: project_root not defined. Cannot find config file.')

if config_abs_path:
    print(f'Using config file: {config_abs_path}')
    # Verify the config file exists
    if not os.path.exists(config_abs_path):
        print(f'ERROR: Config file not found at {config_abs_path}')
    else:
        print('Config file found.')
        # Optionally load and display config for verification
        try:
            with open(config_abs_path, 'r') as f:
                config_data = yaml.safe_load(f)
            # print('\nConfiguration:')
            # print(yaml.dump(config_data, default_flow_style=False)) # Pretty print
        except Exception as e:
            print(f'Error reading config file: {e}')

Using config file: /home/yasha/develop/gru_sac_predictor/gru_sac_predictor/config.yaml
Config file found.


## 3. Instantiate the Pipeline

Create an instance of the `TradingPipeline` class, passing the path to the configuration file. This initializes the pipeline object but does not run any steps yet.

In [3]:
pipeline_instance = None # Define outside try block
if 'TradingPipeline' in locals() and config_abs_path and os.path.exists(config_abs_path):
    try:
        # Instantiate the pipeline
        print('Instantiating TradingPipeline...')
        pipeline_instance = TradingPipeline(config_path=config_abs_path)
        print('TradingPipeline instantiated successfully.')
        print(f'Run ID: {pipeline_instance.run_id}')
        print(f'Results Dir: {pipeline_instance.dirs["results"]}')
        print(f'Log Dir: {pipeline_instance.dirs["logs"]}')
        print(f'Models Dir: {pipeline_instance.dirs["models"]}')

    except FileNotFoundError as e:
        print(f'ERROR during pipeline instantiation (FileNotFound): {e}')
    except Exception as e:
        print(f'An error occurred during pipeline instantiation: {e}')
        logging.error('Pipeline instantiation failed.', exc_info=True) # Log traceback
else:
    print('TradingPipeline class not imported, config path invalid, or config file not found. Cannot instantiate pipeline.')

Instantiating TradingPipeline...
2025-04-18 03:17:13,554 - root - INFO - Using Base Models Directory: /home/yasha/develop/gru_sac_predictor/models
2025-04-18 03:17:13,555 - root - INFO - Using results directory: /home/yasha/develop/gru_sac_predictor/results/20250418_031713
2025-04-18 03:17:13,555 - root - INFO - Using logs directory: /home/yasha/develop/gru_sac_predictor/logs/20250418_031713
2025-04-18 03:17:13,556 - root - INFO - Using models directory: /home/yasha/develop/gru_sac_predictor/models/20250418_031713
2025-04-18 03:17:13,557 - root - INFO - Logging setup complete. Log file: /home/yasha/develop/gru_sac_predictor/logs/20250418_031713/pipeline_20250418_031713.log
2025-04-18 03:17:13,558 - root - INFO - --- Starting Pipeline Run: 20250418_031713 ---
2025-04-18 03:17:13,559 - root - INFO - Using config: /home/yasha/develop/gru_sac_predictor/gru_sac_predictor/config.yaml
2025-04-18 03:17:13,560 - root - INFO - Resolved relative db_dir '../../data/crypto_market_data' to absolute 

## 4. Step 1: Load Data

Call the `load_data` method to fetch the raw market data based on the configuration.

In [4]:
%tb
if pipeline_instance:
    try:
        print('\n=== Running Step 1: Load Data ===')
        pipeline_instance.load_data()
        print('load_data() finished.')

        print('\n--- Inspecting Raw Data ---')
        if pipeline_instance.raw_data is not None:
            print(f'Shape of raw_data: {pipeline_instance.raw_data.shape}')
            display(pipeline_instance.raw_data.head())
            display(pipeline_instance.raw_data.tail())
            display(pipeline_instance.raw_data.isnull().sum()) # Check for NaNs
        else:
            print('raw_data attribute is None.')

    except Exception as e:
        print(f'An error occurred during Load Data step: {e}')
        logging.error('Load Data step failed.', exc_info=True)
else:
    print('Pipeline not instantiated. Cannot run step.')


=== Running Step 1: Load Data ===
2025-04-18 03:17:15,747 - root - INFO - --- Notebook Step: Load Data (Calling load_and_preprocess_data) ---
2025-04-18 03:17:15,749 - root - INFO - --- Stage: Loading and Preprocessing Data ---
2025-04-18 03:17:15,751 - gru_sac_predictor.src.data_loader - INFO - Loading data for SOL-USDT (bnbspot) from 2024-06-01 to 2025-03-10, interval 1min
2025-04-18 03:17:15,767 - gru_sac_predictor.src.data_loader - INFO - Scanning for DB files recursively in: /home/yasha/data/crypto_market_data
2025-04-18 03:17:15,769 - gru_sac_predictor.src.data_loader - ERROR - Database directory /home/yasha/data/crypto_market_data does not exist
2025-04-18 03:17:15,773 - gru_sac_predictor.src.data_loader - ERROR - No relevant DB files found and no fallback files available.
2025-04-18 03:17:15,774 - gru_sac_predictor.src.data_loader - ERROR - No relevant database files found for the specified date range.
2025-04-18 03:17:15,779 - root - ERROR - Failed to load data. Exiting.


No traceback available to show.


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## 5. Step 2: Engineer Features

Call the `engineer_features` method to create technical indicators and other features from the raw data.

In [None]:
if pipeline_instance and pipeline_instance.raw_data is not None:
    try:
        print('\n=== Running Step 2: Engineer Features ===')
        pipeline_instance.engineer_features()
        print('engineer_features() finished.')

        print('\n--- Inspecting Features DataFrame ---')
        if pipeline_instance.features_df is not None:
            print(f'Shape of features_df: {pipeline_instance.features_df.shape}')
            display(pipeline_instance.features_df.head())
            display(pipeline_instance.features_df.tail())
            display(pipeline_instance.features_df.isnull().sum()) # Check for NaNs introduced by features
        else:
            print('features_df attribute is None.')

    except Exception as e:
        print(f'An error occurred during Engineer Features step: {e}')
        logging.error('Engineer Features step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or raw_data missing. Cannot run step.')

## 6. Step 3: Prepare Sequences

Call the `prepare_sequences` method to split the data into train/validation/test sets and create sequences suitable for the GRU model.

In [None]:
if pipeline_instance and pipeline_instance.features_df is not None:
    try:
        print('\n=== Running Step 3: Prepare Sequences ===')
        pipeline_instance.prepare_sequences()
        print('prepare_sequences() finished.')

        print('\n--- Inspecting Sequences and Targets ---')
        # Assuming attributes like train_sequences, val_targets etc. exist
        for name in ['train_sequences', 'val_sequences', 'test_sequences',
                     'train_targets', 'val_targets', 'test_targets',
                     'train_indices', 'val_indices', 'test_indices']:
            attr = getattr(pipeline_instance, name, None)
            if attr is not None:
                 # Check if it's numpy array or pandas series/df before getting shape
                 if hasattr(attr, 'shape'):
                     print(f'{name} shape: {attr.shape}')
                 else:
                     print(f'{name} type: {type(attr)}, length: {len(attr)}') # For lists like indices
            else:
                 print(f'{name} attribute is None.')

    except Exception as e:
        print(f'An error occurred during Prepare Sequences step: {e}')
        logging.error('Prepare Sequences step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or features_df missing. Cannot run step.')


## 7. Step 4: Train or Load GRU Model

Call `train_or_load_gru` to either train a new GRU model or load a pre-trained one, based on the configuration flags.

In [None]:
if pipeline_instance and pipeline_instance.train_sequences is not None: # Check if sequences are ready
    try:
        print('\n=== Running Step 4: Train or Load GRU ===')
        pipeline_instance.train_or_load_gru()
        print('train_or_load_gru() finished.')

        print('\n--- Inspecting GRU Handler ---')
        if pipeline_instance.gru_handler is not None:
            print(f'GRU Handler instantiated: {pipeline_instance.gru_handler}')
            # Potentially inspect model summary if handler exposes it
            # print(pipeline_instance.gru_handler.model.summary())
            print(f'GRU Predictions available (val): {hasattr(pipeline_instance.gru_handler, "val_predictions")}')
            print(f'GRU Predictions available (test): {hasattr(pipeline_instance.gru_handler, "test_predictions")}')
        else:
            print('gru_handler attribute is None.')

    except Exception as e:
        print(f'An error occurred during Train/Load GRU step: {e}')
        logging.error('Train/Load GRU step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or sequences missing. Cannot run step.')

## 8. Step 5: Calibrate Predictions

Call `calibrate_predictions` to use the validation set predictions from the GRU to find an optimal probability threshold or apply other calibration techniques.

In [None]:
if pipeline_instance and pipeline_instance.gru_handler is not None and hasattr(pipeline_instance.gru_handler, 'val_predictions'):
    try:
        print('\n=== Running Step 5: Calibrate Predictions ===')
        pipeline_instance.calibrate_predictions()
        print('calibrate_predictions() finished.')

        print('\n--- Inspecting Calibration Results ---')
        if pipeline_instance.calibrator is not None:
             print(f'Calibrator object: {pipeline_instance.calibrator}')
             print(f'Optimal threshold: {getattr(pipeline_instance, "optimal_threshold", "Not set")}')
             print(f'Calibrated Val Probs exist: {hasattr(pipeline_instance.calibrator, "calibrated_val_probabilities")}')
             print(f'Calibrated Test Probs exist: {hasattr(pipeline_instance.calibrator, "calibrated_test_probabilities")}')

        else:
            print('calibrator attribute is None.')

    except Exception as e:
        print(f'An error occurred during Calibrate Predictions step: {e}')
        logging.error('Calibrate Predictions step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or GRU validation predictions missing. Cannot run step.')


## 9. Step 6: Prepare SAC Agent for Backtest

Call `train_or_load_sac`. This step might involve triggering offline SAC training (if configured) or simply identifying and setting the path to the pre-trained SAC agent policy to be used in the backtest.

In [None]:
# Note: Actual SAC training might be complex to run directly inline.
# This step often just prepares the necessary info (like the agent path) for the backtester.
if pipeline_instance:
    try:
        print('\n=== Running Step 6: Train or Load SAC (Prepare for Backtest) ===')
        # This might just set an attribute like sac_agent_path based on config
        pipeline_instance.train_or_load_sac()
        print('train_or_load_sac() finished.')

        print('\n--- Inspecting SAC Agent Info ---')
        # Check the attribute storing the path or relevant SAC info
        print(f'SAC Agent Path for backtest: {getattr(pipeline_instance, "sac_agent_path", "Not set")}')

    except Exception as e:
        print(f'An error occurred during Train/Load SAC step: {e}')
        logging.error('Train/Load SAC step failed.', exc_info=True)
else:
    print('Pipeline not instantiated. Cannot run step.')

## 10. Step 7: Run Backtest

Execute the trading simulation using the test data, GRU predictions (calibrated), and the loaded SAC agent policy.

In [None]:
# Check if necessary components are ready
backtest_ready = (
    pipeline_instance and
    pipeline_instance.test_sequences is not None and
    pipeline_instance.test_targets is not None and
    pipeline_instance.test_indices is not None and
    pipeline_instance.gru_handler is not None and
    pipeline_instance.calibrator is not None and # Ensure calibration ran
    getattr(pipeline_instance, "optimal_threshold", None) is not None and
    getattr(pipeline_instance, "sac_agent_path", None) is not None
)

if backtest_ready:
    try:
        print('\n=== Running Step 7: Run Backtest ===')
        pipeline_instance.run_backtest()
        print('run_backtest() finished.')

        print('\n--- Inspecting Backtest Results ---')
        if pipeline_instance.backtest_metrics:
             print('\n--- Backtest Metrics --- ')
             metrics = pipeline_instance.backtest_metrics
             metrics['Run ID'] = pipeline_instance.run_id # Add run ID for context
             for key, value in metrics.items():
                 if key == "Confusion Matrix (GRU Signal vs Actual Dir)":
                      print(f'{key}:\\n{np.array(value)}')
                 elif key == "Classification Report (GRU Signal)":
                      print(f'{key}:\\n{value}')
                 elif isinstance(value, float):
                      print(f'{key}: {value:.4f}')
                 else:
                      print(f'{key}: {value}')
        else:
             print('Backtest metrics not available.')

        if pipeline_instance.backtest_results_df is not None:
             print('\n--- Backtest Results DataFrame (Head) --- ')
             display(pipeline_instance.backtest_results_df.head())
             print('\n--- Backtest Results DataFrame (Tail) --- ')
             display(pipeline_instance.backtest_results_df.tail())
             print('\n--- Backtest Results DataFrame (Description) --- ')
             display(pipeline_instance.backtest_results_df.describe())
        else:
             print('Backtest results DataFrame not available.')


    except Exception as e:
        print(f'An error occurred during Run Backtest step: {e}')
        logging.error('Run Backtest step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or prerequisites for backtest are missing. Cannot run step.')
    print(f"Prerequisites check: pipeline={bool(pipeline_instance)}, test_sequences={pipeline_instance.test_sequences is not None if pipeline_instance else False}, "
          f"test_targets={pipeline_instance.test_targets is not None if pipeline_instance else False}, test_indices={pipeline_instance.test_indices is not None if pipeline_instance else False}, "
          f"gru_handler={pipeline_instance.gru_handler is not None if pipeline_instance else False}, calibrator={pipeline_instance.calibrator is not None if pipeline_instance else False}, "
          f"optimal_T={getattr(pipeline_instance, 'optimal_threshold', None) is not None}, sac_path={getattr(pipeline_instance, 'sac_agent_path', None) is not None}")


## 11. Step 8: Save Results

Save the calculated metrics, the detailed backtest results DataFrame, and any generated plots to the run-specific output directory.

In [None]:
if pipeline_instance and pipeline_instance.backtest_results_df is not None and pipeline_instance.backtest_metrics:
    try:
        print('\n=== Running Step 8: Save Results ===')
        pipeline_instance.save_results()
        print('save_results() finished.')
        print(f'Results should be saved in: {pipeline_instance.dirs["results"]}')

    except Exception as e:
        print(f'An error occurred during Save Results step: {e}')
        logging.error('Save Results step failed.', exc_info=True)
else:
    print('Pipeline not instantiated or backtest results/metrics missing. Cannot run step.')


## 12. Display Saved Plots

Load and display the plots generated and saved during the pipeline execution (especially during calibration and backtesting/saving).

In [None]:
# This code assumes plots were generated and saved by previous steps (like calibrate or save_results)
if pipeline_instance is not None and pipeline_instance.dirs.get('results'):
    results_dir = pipeline_instance.dirs['results']
    run_id = pipeline_instance.run_id
    print(f'\nLooking for plots in: {results_dir}\n')

    plot_files = [
        f'backtest_summary_{run_id}.png',
        f'confusion_matrix_{run_id}.png',
        f'reliability_curve_val_{run_id}.png', # Generated by calibration
        f'calibration_curve_test_{run_id}.png' # Potentially generated by backtester/save_results
        # Add any other plot filenames generated by your pipeline
    ]

    plot_found = False
    for plot_file in plot_files:
        plot_path = os.path.join(results_dir, plot_file)
        if os.path.exists(plot_path):
            plot_found = True
            print(f'--- Displaying: {plot_file} ---')
            try:
                img = mpimg.imread(plot_path)
                # Determine appropriate figure size based on plot type
                figsize = (15, 12) if 'summary' in plot_file else (8, 7)
                plt.figure(figsize=figsize)
                plt.imshow(img)
                plt.axis('off') # Hide axes for image display
                plt.title(plot_file)
                plt.show()
            except Exception as e:
                 print(f'  Error loading/displaying plot {plot_file}: {e}')
        else:
            print(f'Plot not found: {plot_path}')

    if not plot_found:
         print("No standard plots found in the results directory.")

else:
    print('\nPipeline object not found or results directory is not available. Cannot display plots.')


## 13. Conclusion

This notebook demonstrated the step-by-step workflow of using the `TradingPipeline`. By running each step individually, we could inspect the intermediate outputs. You can modify the `config.yaml` file to experiment with different parameters, data ranges, and control flags, then re-run the relevant steps of this notebook. The final results (metrics, plots, detailed CSV) are saved in the run-specific directory under the main project's `results/` folder.