# Pairs Trading Visualization Notebook

This notebook allows you to visualize pairs trading strategies on individual instrument pairs.
You can examine the relationship between two instruments, their dis-equilibrium, and trading signals.

### 🎯 Key Features:

1. **Interactive Configuration**: 
 - Easy switching between CRYPTO and EQUITY configurations
 - Simple parameter adjustment for thresholds and training periods

2. **Single Pair Focus**: 
 - Instead of running multiple pairs, focuses on one pair at a time
 - Allows deep analysis of the relationship between two instruments

3. **Step-by-Step Visualization**:
 - **Raw price data**: Individual prices, normalized comparison, and price ratios
 - **Training analysis**: Cointegration testing and VECM model fitting
 - **Dis-equilibrium visualization**: Both raw and scaled dis-equilibrium with threshold lines
 - **Strategy execution**: Trading signal generation and visualization
 - **Prediction analysis**: Actual vs predicted prices with trading signals overlaid

4. **Rich Analytics**:
 - Cointegration status and VECM model details
 - Statistical summaries for all stages
 - Threshold crossing analysis
 - Trading signal breakdown

5. **Interactive Experimentation**:
 - Easy parameter modification
 - Re-run capabilities for different configurations
 - Support for both StaticFitStrategy and SlidingFitStrategy

### 🚀 How to Use:

1. **Start Jupyter**:
 ```bash
 cd src/notebooks
 jupyter notebook pairs_trading_visualization.ipynb
 ```

2. **Customize Your Analysis**:
 - Change `SYMBOL_A` and `SYMBOL_B` to your desired trading pair
 - Switch between `CRYPTO_CONFIG` and `EQT_CONFIG`
 - Only **StaticFitStrategy** is supported. 
 - Adjust thresholds and parameters as needed

3. **Run and Visualize**:
 - Execute cells step by step to see the analysis unfold
 - Rich matplotlib visualizations show relationships and signals
 - Comprehensive summary at the end

The notebook provides exactly what you requested - a way to visualize the relationship between two instruments and their scaled dis-equilibrium, with all the stages of your pairs trading strategy clearly displayed and analyzed.


## Setup and Imports

In [None]:
import sys
import os
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Optional

# Import our modules
from strategies import StaticFitStrategy, SlidingFitStrategy
from tools.data_loader import load_market_data
from tools.trading_pair import TradingPair
from results import BacktestResult

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

print("Setup complete!")

## Configuration

In [None]:
# Configuration - Choose between CRYPTO_CONFIG or EQT_CONFIG

CRYPTO_CONFIG = {
 "security_type": "CRYPTO",
 "data_directory": "../../data/crypto",
 "datafiles": [
 "20250519.mktdata.ohlcv.db",
 ],
 "db_table_name": "bnbspot_ohlcv_1min",
 "exchange_id": "BNBSPOT",
 "instrument_id_pfx": "PAIR-",
 "instruments": [
 "BTC-USDT",
 "BCH-USDT",
 "ETH-USDT",
 "LTC-USDT",
 "XRP-USDT",
 "ADA-USDT",
 "SOL-USDT",
 "DOT-USDT",
 ],
 "trading_hours": {
 "begin_session": "00:00:00",
 "end_session": "23:59:00",
 "timezone": "UTC",
 },
 "price_column": "close",
 "min_required_points": 30,
 "zero_threshold": 1e-10,
 "dis-equilibrium_open_trshld": 2.0,
 "dis-equilibrium_close_trshld": 0.5,
 "training_minutes": 120,
 "funding_per_pair": 2000.0,
}

EQT_CONFIG = {
 "security_type": "EQUITY",
 "data_directory": "../../data/equity",
 "datafiles": {
 "0508": "20250508.alpaca_sim_md.db",
 "0509": "20250509.alpaca_sim_md.db",
 "0510": "20250510.alpaca_sim_md.db",
 "0511": "20250511.alpaca_sim_md.db",
 "0512": "20250512.alpaca_sim_md.db",
 "0513": "20250513.alpaca_sim_md.db",
 "0514": "20250514.alpaca_sim_md.db",
 "0515": "20250515.alpaca_sim_md.db",
 "0516": "20250516.alpaca_sim_md.db",
 "0517": "20250517.alpaca_sim_md.db",
 "0518": "20250518.alpaca_sim_md.db",
 "0519": "20250519.alpaca_sim_md.db",
 "0520": "20250520.alpaca_sim_md.db",
 "0521": "20250521.alpaca_sim_md.db",
 "0522": "20250522.alpaca_sim_md.db",
 },
 "db_table_name": "md_1min_bars",
 "exchange_id": "ALPACA",
 "instrument_id_pfx": "STOCK-",
 "instruments": [
 "COIN",
 "GBTC",
 "HOOD",
 "MSTR",
 "PYPL",
 ],
 "trading_hours": {
 "begin_session": "9:30:00",
 "end_session": "16:00:00",
 "timezone": "America/New_York",
 },
 "price_column": "close",
 "min_required_points": 30,
 "zero_threshold": 1e-10,
 "dis-equilibrium_open_trshld": 2.0,
 "dis-equilibrium_close_trshld": 1.0, #0.5,
 "training_minutes": 120,
 "funding_per_pair": 2000.0,
}

# Choose your configuration
CONFIG = EQT_CONFIG # Change to CRYPTO_CONFIG if you want to use crypto data

print(f"Using {CONFIG['security_type']} configuration")
print(f"Available instruments: {CONFIG['instruments']}")

## Select Trading Pair and Data File

In [None]:
# Select your trading pair and strategy
SYMBOL_A = "COIN" # Change these to your desired symbols
SYMBOL_B = "GBTC"
DATA_FILE = CONFIG["datafiles"]["0509"]

# Choose strategy
STRATEGY = StaticFitStrategy()

print(f"Selected pair: {SYMBOL_A} & {SYMBOL_B}")
print(f"Data file: {DATA_FILE}")
print(f"Strategy: {type(STRATEGY).__name__}")

## Load Market Data

In [None]:
# Load market data
datafile_path = f"{CONFIG['data_directory']}/{DATA_FILE}"
print(f"Current working directory: {os.getcwd()}")
print(f"Loading data from: {datafile_path}")

market_data_df = load_market_data(datafile_path, config=CONFIG)

print(f"Loaded {len(market_data_df)} rows of market data")
print(f"Symbols in data: {market_data_df['symbol'].unique()}")
print(f"Time range: {market_data_df['tstamp'].min()} to {market_data_df['tstamp'].max()}")

# Display first few rows
market_data_df.head()

## Create Trading Pair and Analyze

In [None]:
# Create trading pair
pair = TradingPair(
 market_data=market_data_df,
 symbol_a=SYMBOL_A,
 symbol_b=SYMBOL_B,
 price_column=CONFIG["price_column"]
)

print(f"Created trading pair: {pair}")
print(f"Market data shape: {pair.market_data_.shape}")
print(f"Column names: {pair.colnames()}")

# Display first few rows of pair data
pair.market_data_.head()

## Split Data into Training and Testing

In [None]:
# Get training and testing datasets
training_minutes = CONFIG["training_minutes"]
pair.get_datasets(training_minutes=training_minutes)

print(f"Training data: {len(pair.training_df_)} rows")
print(f"Testing data: {len(pair.testing_df_)} rows")
print(f"Training period: {pair.training_df_['tstamp'].iloc[0]} to {pair.training_df_['tstamp'].iloc[-1]}")
print(f"Testing period: {pair.testing_df_['tstamp'].iloc[0]} to {pair.testing_df_['tstamp'].iloc[-1]}")

# Check for any missing data
print(f"Training data null values: {pair.training_df_.isnull().sum().sum()}")
print(f"Testing data null values: {pair.testing_df_.isnull().sum().sum()}")

## Visualize Raw Price Data

In [None]:
# Plot raw price data
fig, axes = plt.subplots(3, 1, figsize=(15, 12))

# Combined price plot
colname_a, colname_b = pair.colnames()
all_data = pd.concat([pair.training_df_, pair.testing_df_]).reset_index(drop=True)

# Plot individual prices
axes[0].plot(all_data['tstamp'], all_data[colname_a], label=f'{SYMBOL_A}', alpha=0.8)
axes[0].plot(all_data['tstamp'], all_data[colname_b], label=f'{SYMBOL_B}', alpha=0.8)
axes[0].axvline(x=pair.training_df_['tstamp'].iloc[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test Split')
axes[0].set_title(f'Price Comparison: {SYMBOL_A} vs {SYMBOL_B}')
axes[0].set_ylabel('Price')
axes[0].legend()
axes[0].grid(True)

# Normalized prices for comparison
norm_a = all_data[colname_a] / all_data[colname_a].iloc[0]
norm_b = all_data[colname_b] / all_data[colname_b].iloc[0]

axes[1].plot(all_data['tstamp'], norm_a, label=f'{SYMBOL_A} (normalized)', alpha=0.8)
axes[1].plot(all_data['tstamp'], norm_b, label=f'{SYMBOL_B} (normalized)', alpha=0.8)
axes[1].axvline(x=pair.training_df_['tstamp'].iloc[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test Split')
axes[1].set_title('Normalized Price Comparison')
axes[1].set_ylabel('Normalized Price')
axes[1].legend()
axes[1].grid(True)

# Price ratio
price_ratio = all_data[colname_a] / all_data[colname_b]
axes[2].plot(all_data['tstamp'], price_ratio, label=f'{SYMBOL_A}/{SYMBOL_B} Ratio', color='green', alpha=0.8)
axes[2].axvline(x=pair.training_df_['tstamp'].iloc[-1], color='red', linestyle='--', alpha=0.7, label='Train/Test Split')
axes[2].set_title('Price Ratio')
axes[2].set_ylabel('Ratio')
axes[2].set_xlabel('Time')
axes[2].legend()
axes[2].grid(True)

plt.tight_layout()
plt.show()

## Train the Pair and Check Cointegration

In [None]:
# Train the pair and check cointegration
try:
 is_cointegrated = pair.train_pair()
 print(f"Pair {pair} cointegration status: {is_cointegrated}")

 if is_cointegrated:
 print(f"VECM Beta coefficients: {pair.vecm_fit_.beta.flatten()}")
 print(f"Training dis-equilibrium mean: {pair.training_mu_:.6f}")
 print(f"Training dis-equilibrium std: {pair.training_std_:.6f}")

 # Display VECM summary
 print("\nVECM Model Summary:")
 print(pair.vecm_fit_.summary())
 else:
 print("Pair is not cointegrated. Cannot proceed with strategy.")

except Exception as e:
 print(f"Training failed: {str(e)}")
 is_cointegrated = False

## Visualize Training Period Dis-equilibrium

In [None]:
if is_cointegrated:
 # fig, axes = plt.subplots(, 1, figsize=(15, 10))

 # # Raw dis-equilibrium
 # axes[0].plot(pair.training_df_['tstamp'], pair.training_df_['dis-equilibrium'],
 # color='blue', alpha=0.8, label='Raw Dis-equilibrium')
 # axes[0].axhline(y=pair.training_mu_, color='red', linestyle='--', alpha=0.7, label='Mean')
 # axes[0].axhline(y=pair.training_mu_ + pair.training_std_, color='orange', linestyle='--', alpha=0.5, label='+1 Std')
 # axes[0].axhline(y=pair.training_mu_ - pair.training_std_, color='orange', linestyle='--', alpha=0.5, label='-1 Std')
 # axes[0].set_title('Training Period: Raw Dis-equilibrium')
 # axes[0].set_ylabel('Dis-equilibrium')
 # axes[0].legend()
 # axes[0].grid(True)

 # Scaled dis-equilibrium
 fig, axes = plt.subplots(1, 1, figsize=(15, 5))
 axes.plot(pair.training_df_['tstamp'], pair.training_df_['scaled_dis-equilibrium'],
 color='green', alpha=0.8, label='Scaled Dis-equilibrium')
 axes.axhline(y=0, color='red', linestyle='--', alpha=0.7, label='Mean (0)')
 axes.axhline(y=1, color='orange', linestyle='--', alpha=0.5, label='+1 Std')
 axes.axhline(y=-1, color='orange', linestyle='--', alpha=0.5, label='-1 Std')
 axes.axhline(y=CONFIG['dis-equilibrium_open_trshld'], color='purple',
 linestyle=':', alpha=0.7, label=f"Open Threshold ({CONFIG['dis-equilibrium_open_trshld']})")
 axes.axhline(y=CONFIG['dis-equilibrium_close_trshld'], color='brown',
 linestyle=':', alpha=0.7, label=f"Close Threshold ({CONFIG['dis-equilibrium_close_trshld']})")
 axes.set_title('Training Period: Scaled Dis-equilibrium')
 axes.set_ylabel('Scaled Dis-equilibrium')
 axes.set_xlabel('Time')
 axes.legend()
 axes.grid(True)

 plt.tight_layout()
 plt.show()

 # Print statistics
 print(f"Training dis-equilibrium statistics:")
 print(f" Mean: {pair.training_df_['dis-equilibrium'].mean():.6f}")
 print(f" Std: {pair.training_df_['dis-equilibrium'].std():.6f}")
 print(f" Min: {pair.training_df_['dis-equilibrium'].min():.6f}")
 print(f" Max: {pair.training_df_['dis-equilibrium'].max():.6f}")

 print(f"\nScaled dis-equilibrium statistics:")
 print(f" Mean: {pair.training_df_['scaled_dis-equilibrium'].mean():.6f}")
 print(f" Std: {pair.training_df_['scaled_dis-equilibrium'].std():.6f}")
 print(f" Min: {pair.training_df_['scaled_dis-equilibrium'].min():.6f}")
 print(f" Max: {pair.training_df_['scaled_dis-equilibrium'].max():.6f}")
else:
 print("The pair is not cointegrated")

## Generate Predictions and Run Strategy

In [None]:
if is_cointegrated:
 try:
 # Generate predictions
 pair.predict()
 print(f"Generated predictions for {len(pair.predicted_df_)} rows")

 # Display prediction data structure
 print(f"Prediction columns: {list(pair.predicted_df_.columns)}")
 print(f"Prediction period: {pair.predicted_df_['tstamp'].iloc[0]} to {pair.predicted_df_['tstamp'].iloc[-1]}")

 # Run strategy
 bt_result = BacktestResult(config=CONFIG)
 pair_trades = STRATEGY.run_pair(config=CONFIG, pair=pair, bt_result=bt_result)

 if pair_trades is not None and len(pair_trades) > 0:
 print(f"\nGenerated {len(pair_trades)} trading signals:")
 print(pair_trades)
 else:
 print("\nNo trading signals generated")

 except Exception as e:
 print(f"Prediction/Strategy failed: {str(e)}")
 pair_trades = None

## Visualize Predictions and Dis-equilibrium

In [None]:
if is_cointegrated and hasattr(pair, 'predicted_df_'):
 fig, axes = plt.subplots(4, 1, figsize=(16, 16))

 # Actual vs Predicted Prices
 colname_a, colname_b = pair.colnames()

 axes[0].plot(pair.predicted_df_['tstamp'], pair.predicted_df_[colname_a],
 label=f'{SYMBOL_A} Actual', alpha=0.8)
 axes[0].plot(pair.predicted_df_['tstamp'], pair.predicted_df_[f'{colname_a}_pred'],
 label=f'{SYMBOL_A} Predicted', alpha=0.8, linestyle='--')
 axes[0].set_title('Actual vs Predicted Prices - Symbol A')
 axes[0].set_ylabel('Price')
 axes[0].legend()
 axes[0].grid(True)

 axes[1].plot(pair.predicted_df_['tstamp'], pair.predicted_df_[colname_b],
 label=f'{SYMBOL_B} Actual', alpha=0.8)
 axes[1].plot(pair.predicted_df_['tstamp'], pair.predicted_df_[f'{colname_b}_pred'],
 label=f'{SYMBOL_B} Predicted', alpha=0.8, linestyle='--')
 axes[1].set_title('Actual vs Predicted Prices - Symbol B')
 axes[1].set_ylabel('Price')
 axes[1].legend()
 axes[1].grid(True)

 # Raw dis-equilibrium
 axes[2].plot(pair.predicted_df_['tstamp'], pair.predicted_df_['disequilibrium'],
 color='blue', alpha=0.8, label='Dis-equilibrium')
 axes[2].axhline(y=pair.training_mu_, color='red', linestyle='--', alpha=0.7, label='Training Mean')
 axes[2].set_title('Testing Period: Raw Dis-equilibrium')
 axes[2].set_ylabel('Dis-equilibrium')
 axes[2].legend()
 axes[2].grid(True)

 # Scaled dis-equilibrium with trading signals
 axes[3].plot(pair.predicted_df_['tstamp'], pair.predicted_df_['scaled_disequilibrium'],
 color='green', alpha=0.8, label='Scaled Dis-equilibrium')

 # Add threshold lines
 axes[3].axhline(y=CONFIG['dis-equilibrium_open_trshld'], color='purple',
 linestyle=':', alpha=0.7, label=f"Open Threshold ({CONFIG['dis-equilibrium_open_trshld']})")
 axes[3].axhline(y=CONFIG['dis-equilibrium_close_trshld'], color='brown',
 linestyle=':', alpha=0.7, label=f"Close Threshold ({CONFIG['dis-equilibrium_close_trshld']})")

 # Add trading signals if they exist
 if pair_trades is not None and len(pair_trades) > 0:
 for _, trade in pair_trades.iterrows():
 color = 'red' if 'BUY' in trade['action'] else 'blue'
 marker = '^' if 'BUY' in trade['action'] else 'v'
 axes[3].scatter(trade['time'], trade['scaled_disequilibrium'],
 color=color, marker=marker, s=100, alpha=0.8,
 label=f"{trade['action']} {trade['symbol']}" if _ < 2 else "")

 axes[3].set_title('Testing Period: Scaled Dis-equilibrium with Trading Signals')
 axes[3].set_ylabel('Scaled Dis-equilibrium')
 axes[3].set_xlabel('Time')
 axes[3].legend()
 axes[3].grid(True)

 plt.tight_layout()
 plt.show()

 # Print prediction statistics
 print(f"\nTesting dis-equilibrium statistics:")
 print(f" Mean: {pair.predicted_df_['disequilibrium'].mean():.6f}")
 print(f" Std: {pair.predicted_df_['disequilibrium'].std():.6f}")
 print(f" Min: {pair.predicted_df_['disequilibrium'].min():.6f}")
 print(f" Max: {pair.predicted_df_['disequilibrium'].max():.6f}")

 print(f"\nTesting scaled dis-equilibrium statistics:")
 print(f" Mean: {pair.predicted_df_['scaled_disequilibrium'].mean():.6f}")
 print(f" Std: {pair.predicted_df_['scaled_disequilibrium'].std():.6f}")
 print(f" Min: {pair.predicted_df_['scaled_disequilibrium'].min():.6f}")
 print(f" Max: {pair.predicted_df_['scaled_disequilibrium'].max():.6f}")

 # Count threshold crossings
 open_crossings = (pair.predicted_df_['scaled_disequilibrium'] >= CONFIG['dis-equilibrium_open_trshld']).sum()
 close_crossings = (pair.predicted_df_['scaled_disequilibrium'] <= CONFIG['dis-equilibrium_close_trshld']).sum()
 print(f"\nThreshold crossings:")
 print(f" Open threshold ({CONFIG['dis-equilibrium_open_trshld']}): {open_crossings} times")
 print(f" Close threshold ({CONFIG['dis-equilibrium_close_trshld']}): {close_crossings} times")

## Summary and Analysis

In [None]:
print("=" * 60)
print("PAIRS TRADING ANALYSIS SUMMARY")
print("=" * 60)

print(f"\nPair: {SYMBOL_A} & {SYMBOL_B}")
print(f"Strategy: {type(STRATEGY).__name__}")
print(f"Data file: {DATA_FILE}")
print(f"Training period: {training_minutes} minutes")

print(f"\nCointegration Status: {'✓ COINTEGRATED' if is_cointegrated else '✗ NOT COINTEGRATED'}")

if is_cointegrated:
 print(f"\nVECM Model:")
 print(f" Beta coefficients: {pair.vecm_fit_.beta.flatten()}")
 print(f" Training mean: {pair.training_mu_:.6f}")
 print(f" Training std: {pair.training_std_:.6f}")

 if pair_trades is not None and len(pair_trades) > 0:
 print(f"\nTrading Signals: {len(pair_trades)} generated")
 unique_times = pair_trades['time'].unique()
 print(f" Unique trade times: {len(unique_times)}")

 # Group by time to see paired trades
 for trade_time in unique_times:
 trades_at_time = pair_trades[pair_trades['time'] == trade_time]
 print(f"\n Trade at {trade_time}:")
 for _, trade in trades_at_time.iterrows():
 print(f" {trade['action']} {trade['symbol']} @ ${trade['price']:.2f} (dis-eq: {trade['scaled_disequilibrium']:.2f})")
 else:
 print(f"\nTrading Signals: None generated")
 print(" Possible reasons:")
 print(" - Dis-equilibrium never exceeded open threshold")
 print(" - Insufficient testing data")
 print(" - Strategy-specific conditions not met")

else:
 print("\nCannot proceed with trading strategy - pair is not cointegrated")
 print("Consider:")
 print(" - Trying different symbol pairs")
 print(" - Adjusting training period length")
 print(" - Using different data timeframe")

print("\n" + "=" * 60)

## Interactive Analysis (Optional)

You can modify the parameters below and re-run the analysis:

In [None]:
# Interactive parameter adjustment
print("Current parameters:")
print(f" Open threshold: {CONFIG['dis-equilibrium_open_trshld']}")
print(f" Close threshold: {CONFIG['dis-equilibrium_close_trshld']}")
print(f" Training minutes: {CONFIG['training_minutes']}")

# Uncomment and modify these to experiment:
# CONFIG['dis-equilibrium_open_trshld'] = 1.5
# CONFIG['dis-equilibrium_close_trshld'] = 0.3
# CONFIG['training_minutes'] = 180

print("\nTo re-run with different parameters:")
print("1. Modify the parameters above")
print("2. Re-run from the 'Split Data into Training and Testing' cell")
print("3. Or try different symbol pairs by changing SYMBOL_A and SYMBOL_B")