833 lines
36 KiB
Plaintext
833 lines
36 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Sliding Fit Strategy Visualization Notebook\n",
|
|
"\n",
|
|
"This notebook is specifically designed for the SlidingFitStrategy, which uses a sliding window approach.\n",
|
|
"It re-trains the model every minute and shows how cointegration, model parameters, and trading signals evolve over time.\n",
|
|
"You can visualize the dynamic nature of the sliding window and how the relationship between instruments changes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### \ud83c\udfaf Key Features:\n",
|
|
"\n",
|
|
"1. **Interactive Configuration**: \n",
|
|
" - Easy switching between CRYPTO and EQUITY configurations\n",
|
|
" - Simple parameter adjustment for thresholds and training periods\n",
|
|
"\n",
|
|
"2. **Single Pair Focus**: \n",
|
|
" - Instead of running multiple pairs, focuses on one pair at a time\n",
|
|
" - Allows deep analysis of the relationship between two instruments\n",
|
|
"\n",
|
|
"3. **Step-by-Step Visualization**:\n",
|
|
" - **Raw price data**: Individual prices, normalized comparison, and price ratios\n",
|
|
" - **Training analysis**: Cointegration testing and VECM model fitting\n",
|
|
" - **Dis-equilibrium visualization**: Both raw and scaled dis-equilibrium with threshold lines\n",
|
|
" - **Strategy execution**: Trading signal generation and visualization\n",
|
|
" - **Prediction analysis**: Actual vs predicted prices with trading signals overlaid\n",
|
|
"\n",
|
|
"4. **Rich Analytics**:\n",
|
|
" - Cointegration status and VECM model details\n",
|
|
" - Statistical summaries for all stages\n",
|
|
" - Threshold crossing analysis\n",
|
|
" - Trading signal breakdown\n",
|
|
"\n",
|
|
"5. **Interactive Experimentation**:\n",
|
|
" - Easy parameter modification\n",
|
|
" - Re-run capabilities for different configurations\n",
|
|
" - Support for both StaticFitStrategy and SlidingFitStrategy\n",
|
|
"\n",
|
|
"### \ud83d\ude80 How to Use:\n",
|
|
"\n",
|
|
"1. **Start Jupyter**:\n",
|
|
" ```bash\n",
|
|
" cd src/notebooks\n",
|
|
" jupyter notebook pairs_trading_visualization.ipynb\n",
|
|
" ```\n",
|
|
"\n",
|
|
"2. **Customize Your Analysis**:\n",
|
|
" - Change `SYMBOL_A` and `SYMBOL_B` to your desired trading pair\n",
|
|
" - Switch between `CRYPTO_CONFIG` and `EQT_CONFIG`\n",
|
|
" - Choose your strategy (StaticFitStrategy or SlidingFitStrategy)\n",
|
|
" - Adjust thresholds and parameters as needed\n",
|
|
"\n",
|
|
"3. **Run and Visualize**:\n",
|
|
" - Execute cells step by step to see the analysis unfold\n",
|
|
" - Rich matplotlib visualizations show relationships and signals\n",
|
|
" - Comprehensive summary at the end\n",
|
|
"\n",
|
|
"The notebook provides exactly what you requested - a way to visualize the relationship between two instruments and their scaled dis-equilibrium, with all the stages of your pairs trading strategy clearly displayed and analyzed.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup and Imports"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Trading Parameters Configuration\n",
|
|
"# Specify your configuration file, trading symbols and date here\n",
|
|
"\n",
|
|
"# Configuration file selection\n",
|
|
"CONFIG_FILE = \"equity\" # Options: \"equity\", \"crypto\", or custom filename (without .cfg extension)\n",
|
|
"\n",
|
|
"# Trading pair symbols\n",
|
|
"SYMBOL_A = \"COIN\" # Change this to your desired symbol A\n",
|
|
"SYMBOL_B = \"MSTR\" # Change this to your desired symbol B\n",
|
|
"\n",
|
|
"# Date for data file selection (format: YYYYMMDD)\n",
|
|
"TRADING_DATE = \"20250605\" # Change this to your desired date\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys\n",
|
|
"import os\n",
|
|
"sys.path.append('..')\n",
|
|
"\n",
|
|
"import pandas as pd\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import seaborn as sns\n",
|
|
"from typing import Dict, List, Optional\n",
|
|
"from IPython.display import clear_output\n",
|
|
"\n",
|
|
"# Import our modules\n",
|
|
"from strategies import SlidingFitStrategy, PairState\n",
|
|
"from tools.data_loader import load_market_data\n",
|
|
"from tools.trading_pair import TradingPair\n",
|
|
"from results import BacktestResult\n",
|
|
"\n",
|
|
"# Set plotting style\n",
|
|
"plt.style.use('seaborn-v0_8')\n",
|
|
"sns.set_palette(\"husl\")\n",
|
|
"plt.rcParams['figure.figsize'] = (15, 10)\n",
|
|
"\n",
|
|
"print(\"Setup complete!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Configuration"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load Configuration from Configuration Files using HJSON\n",
|
|
"import hjson\n",
|
|
"import os\n",
|
|
"\n",
|
|
"def load_config_from_file(config_type=\"equity\"):\n",
|
|
" \"\"\"Load configuration from configuration files using HJSON\"\"\"\n",
|
|
" config_file = f\"../../configuration/{config_type}.cfg\"\n",
|
|
" \n",
|
|
" try:\n",
|
|
" with open(config_file, 'r') as f:\n",
|
|
" # HJSON handles comments, trailing commas, and other human-friendly features\n",
|
|
" config = hjson.load(f)\n",
|
|
" \n",
|
|
" # Convert relative paths to absolute paths from notebook perspective\n",
|
|
" if 'data_directory' in config:\n",
|
|
" data_dir = config['data_directory']\n",
|
|
" if data_dir.startswith('./'):\n",
|
|
" # Convert relative path to absolute path from notebook's perspective\n",
|
|
" config['data_directory'] = os.path.abspath(f\"../../{data_dir[2:]}\")\n",
|
|
" \n",
|
|
" return config\n",
|
|
" \n",
|
|
" except FileNotFoundError:\n",
|
|
" print(f\"Configuration file not found: {config_file}\")\n",
|
|
" return None\n",
|
|
" except hjson.HjsonDecodeError as e:\n",
|
|
" print(f\"HJSON parsing error in {config_file}: {e}\")\n",
|
|
" return None\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Unexpected error loading config from {config_file}: {e}\")\n",
|
|
" return None\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(f\"Trading Parameters:\")\n",
|
|
"print(f\" Configuration: {CONFIG_FILE}\")\n",
|
|
"print(f\" Symbol A: {SYMBOL_A}\")\n",
|
|
"print(f\" Symbol B: {SYMBOL_B}\")\n",
|
|
"print(f\" Trading Date: {TRADING_DATE}\")\n",
|
|
"\n",
|
|
"# Load the specified configuration\n",
|
|
"print(f\"\\nLoading {CONFIG_FILE} configuration using HJSON...\")\n",
|
|
"CONFIG = load_config_from_file(CONFIG_FILE)\n",
|
|
"\n",
|
|
"if CONFIG:\n",
|
|
" print(f\"\u2713 Successfully loaded {CONFIG['security_type']} configuration\")\n",
|
|
" print(f\" Data directory: {CONFIG['data_directory']}\")\n",
|
|
" print(f\" Database table: {CONFIG['db_table_name']}\")\n",
|
|
" print(f\" Exchange: {CONFIG['exchange_id']}\")\n",
|
|
" print(f\" Training window: {CONFIG['training_minutes']} minutes\")\n",
|
|
" print(f\" Open threshold: {CONFIG['dis-equilibrium_open_trshld']}\")\n",
|
|
" print(f\" Close threshold: {CONFIG['dis-equilibrium_close_trshld']}\")\n",
|
|
" \n",
|
|
" # Automatically construct data file name based on date and config type\n",
|
|
" # if CONFIG['security_type'] == \"CRYPTO\":\n",
|
|
" DATA_FILE = f\"{TRADING_DATE}.mktdata.ohlcv.db\"\n",
|
|
" # elif CONFIG['security_type'] == \"EQUITY\":\n",
|
|
" # DATA_FILE = f\"{TRADING_DATE}.alpaca_sim_md.db\"\n",
|
|
" # else:\n",
|
|
" # DATA_FILE = f\"{TRADING_DATE}.mktdata.db\" # Default fallback\n",
|
|
"\n",
|
|
" # Update CONFIG with the specific data file and instruments\n",
|
|
" CONFIG[\"datafiles\"] = [DATA_FILE]\n",
|
|
" CONFIG[\"instruments\"] = [SYMBOL_A, SYMBOL_B]\n",
|
|
" \n",
|
|
" print(f\"\\nData Configuration:\")\n",
|
|
" print(f\" Data File: {DATA_FILE}\")\n",
|
|
" print(f\" Security Type: {CONFIG['security_type']}\")\n",
|
|
" \n",
|
|
" # Verify data file exists\n",
|
|
" import os\n",
|
|
" data_file_path = f\"{CONFIG['data_directory']}/{DATA_FILE}\"\n",
|
|
" if os.path.exists(data_file_path):\n",
|
|
" print(f\" \u2713 Data file found: {data_file_path}\")\n",
|
|
" else:\n",
|
|
" print(f\" \u26a0 Data file not found: {data_file_path}\")\n",
|
|
" print(f\" Please check if the date and file exist in the data directory\")\n",
|
|
" \n",
|
|
" # List available files in the data directory\n",
|
|
" try:\n",
|
|
" data_dir = CONFIG['data_directory']\n",
|
|
" if os.path.exists(data_dir):\n",
|
|
" available_files = [f for f in os.listdir(data_dir) if f.endswith('.db')]\n",
|
|
" print(f\" Available files in {data_dir}:\")\n",
|
|
" for file in sorted(available_files)[:5]: # Show first 5 files\n",
|
|
" print(f\" - {file}\")\n",
|
|
" if len(available_files) > 5:\n",
|
|
" print(f\" ... and {len(available_files)-5} more files\")\n",
|
|
" except Exception as e:\n",
|
|
" print(f\" Could not list files in data directory: {e}\")\n",
|
|
"else:\n",
|
|
" print(\"\u26a0 Failed to load configuration. Please check the configuration file.\")\n",
|
|
" print(\"Available configuration files:\")\n",
|
|
" config_dir = \"../../configuration\"\n",
|
|
" if os.path.exists(config_dir):\n",
|
|
" config_files = [f for f in os.listdir(config_dir) if f.endswith('.cfg')]\n",
|
|
" for file in config_files:\n",
|
|
" print(f\" - {file}\")\n",
|
|
" else:\n",
|
|
" print(f\" Configuration directory not found: {config_dir}\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Select Trading Pair and Initialize Strategy"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Initialize Strategy\n",
|
|
"# Trading pair and data file are now defined in the previous cell\n",
|
|
"\n",
|
|
"# Initialize SlidingFitStrategy\n",
|
|
"STRATEGY = SlidingFitStrategy()\n",
|
|
"\n",
|
|
"print(f\"Strategy Initialization:\")\n",
|
|
"print(f\" Selected pair: {SYMBOL_A} & {SYMBOL_B}\")\n",
|
|
"print(f\" Data file: {DATA_FILE}\")\n",
|
|
"print(f\" Strategy: {type(STRATEGY).__name__}\")\n",
|
|
"print(f\"\\nStrategy characteristics:\")\n",
|
|
"print(f\" - Sliding window training every minute\")\n",
|
|
"print(f\" - Dynamic cointegration testing\")\n",
|
|
"print(f\" - State-based position management\")\n",
|
|
"print(f\" - Continuous model re-training\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load and Prepare Market Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load market data\n",
|
|
"datafile_path = f\"{CONFIG['data_directory']}/{DATA_FILE}\"\n",
|
|
"print(f\"Loading data from: {datafile_path}\")\n",
|
|
"\n",
|
|
"market_data_df = load_market_data(datafile_path, config=CONFIG)\n",
|
|
"\n",
|
|
"print(f\"Loaded {len(market_data_df)} rows of market data\")\n",
|
|
"print(f\"Symbols in data: {market_data_df['symbol'].unique()}\")\n",
|
|
"print(f\"Time range: {market_data_df['tstamp'].min()} to {market_data_df['tstamp'].max()}\")\n",
|
|
"\n",
|
|
"# Create trading pair\n",
|
|
"pair = TradingPair(\n",
|
|
" market_data=market_data_df,\n",
|
|
" symbol_a=SYMBOL_A,\n",
|
|
" symbol_b=SYMBOL_B,\n",
|
|
" price_column=CONFIG[\"price_column\"]\n",
|
|
")\n",
|
|
"\n",
|
|
"print(f\"\\nCreated trading pair: {pair}\")\n",
|
|
"print(f\"Market data shape: {pair.market_data_.shape}\")\n",
|
|
"print(f\"Column names: {pair.colnames()}\")\n",
|
|
"\n",
|
|
"# Calculate maximum possible iterations for sliding window\n",
|
|
"training_minutes = CONFIG[\"training_minutes\"]\n",
|
|
"max_iterations = len(pair.market_data_) - training_minutes\n",
|
|
"print(f\"\\nSliding window analysis:\")\n",
|
|
"print(f\" Training window size: {training_minutes} minutes\")\n",
|
|
"print(f\" Maximum iterations: {max_iterations}\")\n",
|
|
"print(f\" Total analysis time: ~{max_iterations} minutes\")\n",
|
|
"\n",
|
|
"# Display sample data\n",
|
|
"print(f\"\\nSample data:\")\n",
|
|
"display(pair.market_data_.head())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Run SlidingFitStrategy with Real-Time Visualization"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Run the sliding strategy with detailed tracking\n",
|
|
"print(f\"Running SlidingFitStrategy on {pair}...\")\n",
|
|
"print(f\"This will process {max_iterations} minutes of data with sliding training windows.\\n\")\n",
|
|
"\n",
|
|
"# Initialize tracking variables\n",
|
|
"iteration_data = []\n",
|
|
"cointegration_history = []\n",
|
|
"beta_history = []\n",
|
|
"alpha_history = []\n",
|
|
"state_history = []\n",
|
|
"disequilibrium_history = []\n",
|
|
"scaled_disequilibrium_history = []\n",
|
|
"timestamp_history = []\n",
|
|
"training_mu_history = []\n",
|
|
"training_std_history = []\n",
|
|
"\n",
|
|
"# Initialize the strategy state\n",
|
|
"pair.user_data_['state'] = PairState.INITIAL\n",
|
|
"pair.user_data_[\"trades\"] = pd.DataFrame(columns=STRATEGY.TRADES_COLUMNS)\n",
|
|
"pair.user_data_[\"is_cointegrated\"] = False\n",
|
|
"\n",
|
|
"bt_result = BacktestResult(config=CONFIG)\n",
|
|
"training_minutes = CONFIG[\"training_minutes\"]\n",
|
|
"open_threshold = CONFIG[\"dis-equilibrium_open_trshld\"]\n",
|
|
"close_threshold = CONFIG[\"dis-equilibrium_close_trshld\"]\n",
|
|
"\n",
|
|
"# Limit iterations for demonstration (change this to max_iterations for full run)\n",
|
|
"max_demo_iterations = min(200, max_iterations) # Process first 200 minutes\n",
|
|
"print(f\"Processing first {max_demo_iterations} iterations for demonstration...\\n\")\n",
|
|
"\n",
|
|
"for curr_training_start_idx in range(max_demo_iterations):\n",
|
|
" if curr_training_start_idx % 20 == 0:\n",
|
|
" print(f\"Processing iteration {curr_training_start_idx}/{max_demo_iterations}...\")\n",
|
|
"\n",
|
|
" # Get datasets for this iteration\n",
|
|
" pair.get_datasets(\n",
|
|
" training_minutes=training_minutes,\n",
|
|
" training_start_index=curr_training_start_idx,\n",
|
|
" testing_size=1\n",
|
|
" )\n",
|
|
"\n",
|
|
" if len(pair.training_df_) < training_minutes:\n",
|
|
" print(f\"Iteration {curr_training_start_idx}: Not enough training data. Stopping.\")\n",
|
|
" break\n",
|
|
"\n",
|
|
" # Record timestamp for this iteration\n",
|
|
" current_timestamp = pair.testing_df_['tstamp'].iloc[0] if len(pair.testing_df_) > 0 else None\n",
|
|
" timestamp_history.append(current_timestamp)\n",
|
|
"\n",
|
|
" # Train and test cointegration\n",
|
|
" try:\n",
|
|
" is_cointegrated = pair.train_pair()\n",
|
|
" cointegration_history.append(is_cointegrated)\n",
|
|
"\n",
|
|
" if is_cointegrated:\n",
|
|
" # Record model parameters\n",
|
|
" beta_history.append(pair.vecm_fit_.beta.flatten())\n",
|
|
" alpha_history.append(pair.vecm_fit_.alpha.flatten())\n",
|
|
" training_mu_history.append(pair.training_mu_)\n",
|
|
" training_std_history.append(pair.training_std_)\n",
|
|
"\n",
|
|
" # Generate prediction for current minute\n",
|
|
" pair.predict()\n",
|
|
"\n",
|
|
" if len(pair.predicted_df_) > 0:\n",
|
|
" current_disequilibrium = pair.predicted_df_['disequilibrium'].iloc[0]\n",
|
|
" current_scaled_disequilibrium = pair.predicted_df_['scaled_disequilibrium'].iloc[0]\n",
|
|
" disequilibrium_history.append(current_disequilibrium)\n",
|
|
" scaled_disequilibrium_history.append(current_scaled_disequilibrium)\n",
|
|
" else:\n",
|
|
" disequilibrium_history.append(np.nan)\n",
|
|
" scaled_disequilibrium_history.append(np.nan)\n",
|
|
" else:\n",
|
|
" # No cointegration\n",
|
|
" beta_history.append(None)\n",
|
|
" alpha_history.append(None)\n",
|
|
" training_mu_history.append(np.nan)\n",
|
|
" training_std_history.append(np.nan)\n",
|
|
" disequilibrium_history.append(np.nan)\n",
|
|
" scaled_disequilibrium_history.append(np.nan)\n",
|
|
"\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Iteration {curr_training_start_idx}: Training failed: {str(e)}\")\n",
|
|
" cointegration_history.append(False)\n",
|
|
" beta_history.append(None)\n",
|
|
" alpha_history.append(None)\n",
|
|
" training_mu_history.append(np.nan)\n",
|
|
" training_std_history.append(np.nan)\n",
|
|
" disequilibrium_history.append(np.nan)\n",
|
|
" scaled_disequilibrium_history.append(np.nan)\n",
|
|
"\n",
|
|
" # Record current state\n",
|
|
" current_state = pair.user_data_.get('state', PairState.INITIAL)\n",
|
|
" state_history.append(current_state)\n",
|
|
"\n",
|
|
"print(f\"\\nCompleted {len(cointegration_history)} iterations\")\n",
|
|
"print(f\"Cointegration rate: {sum(cointegration_history)/len(cointegration_history)*100:.1f}%\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Visualize Sliding Window Results"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Create comprehensive visualization of sliding window results\n",
|
|
"fig, axes = plt.subplots(6, 1, figsize=(18, 24))\n",
|
|
"\n",
|
|
"# Filter valid timestamps\n",
|
|
"valid_timestamps = [ts for ts in timestamp_history if ts is not None]\n",
|
|
"n_points = len(valid_timestamps)\n",
|
|
"\n",
|
|
"if n_points == 0:\n",
|
|
" print(\"No valid data points to visualize\")\n",
|
|
"else:\n",
|
|
" # 1. Cointegration Status Over Time\n",
|
|
" cointegration_values = [1 if coint else 0 for coint in cointegration_history[:n_points]]\n",
|
|
" axes[0].plot(valid_timestamps, cointegration_values, 'o-', alpha=0.7, markersize=3)\n",
|
|
" axes[0].fill_between(valid_timestamps, cointegration_values, alpha=0.3)\n",
|
|
" axes[0].set_title('Cointegration Status Over Time (1=Cointegrated, 0=Not Cointegrated)')\n",
|
|
" axes[0].set_ylabel('Cointegrated')\n",
|
|
" axes[0].set_ylim(-0.1, 1.1)\n",
|
|
" axes[0].grid(True)\n",
|
|
"\n",
|
|
" # 2. Beta Coefficients Evolution\n",
|
|
" valid_betas = []\n",
|
|
" beta_timestamps = []\n",
|
|
" for i, beta in enumerate(beta_history[:n_points]):\n",
|
|
" if beta is not None and i < len(valid_timestamps):\n",
|
|
" valid_betas.append(beta)\n",
|
|
" beta_timestamps.append(valid_timestamps[i])\n",
|
|
"\n",
|
|
" if valid_betas:\n",
|
|
" beta_array = np.array(valid_betas)\n",
|
|
" axes[1].plot(beta_timestamps, beta_array[:, 1], 'o-', alpha=0.7, markersize=2,\n",
|
|
" label='Beta[1]', color='red')\n",
|
|
" axes[1].set_title('VECM Beta[1] Coefficient Evolution (Beta[0] = 1.0 by normalization)')\n",
|
|
" axes[1].set_ylabel('Beta[1] Value')\n",
|
|
" axes[1].legend()\n",
|
|
" axes[1].grid(True)\n",
|
|
"\n",
|
|
" # 3. Training Mean and Std Evolution\n",
|
|
" valid_mu = [mu for mu in training_mu_history[:n_points] if not np.isnan(mu)]\n",
|
|
" valid_std = [std for std in training_std_history[:n_points] if not np.isnan(std)]\n",
|
|
" mu_timestamps = [valid_timestamps[i] for i, mu in enumerate(training_mu_history[:n_points]) if not np.isnan(mu)]\n",
|
|
"\n",
|
|
" if valid_mu:\n",
|
|
" axes[2].plot(mu_timestamps, valid_mu, 'b-', alpha=0.7, label='Training Mean', linewidth=1)\n",
|
|
" ax2_twin = axes[2].twinx()\n",
|
|
" ax2_twin.plot(mu_timestamps, valid_std, 'r-', alpha=0.7, label='Training Std', linewidth=1)\n",
|
|
" axes[2].set_title('Training Dis-equilibrium Statistics Evolution')\n",
|
|
" axes[2].set_ylabel('Mean', color='b')\n",
|
|
" ax2_twin.set_ylabel('Std', color='r')\n",
|
|
" axes[2].grid(True)\n",
|
|
" axes[2].legend(loc='upper left')\n",
|
|
" ax2_twin.legend(loc='upper right')\n",
|
|
"\n",
|
|
" # 4. Raw Dis-equilibrium Over Time\n",
|
|
" valid_diseq = [diseq for diseq in disequilibrium_history[:n_points] if not np.isnan(diseq)]\n",
|
|
" diseq_timestamps = [valid_timestamps[i] for i, diseq in enumerate(disequilibrium_history[:n_points]) if not np.isnan(diseq)]\n",
|
|
"\n",
|
|
" if valid_diseq:\n",
|
|
" axes[3].plot(diseq_timestamps, valid_diseq, 'g-', alpha=0.7, linewidth=1)\n",
|
|
" # Add rolling mean\n",
|
|
" if len(valid_diseq) > 10:\n",
|
|
" rolling_mean = pd.Series(valid_diseq).rolling(window=10, min_periods=1).mean()\n",
|
|
" axes[3].plot(diseq_timestamps, rolling_mean, 'r-', alpha=0.8, linewidth=2, label='10-period MA')\n",
|
|
" axes[3].legend()\n",
|
|
" axes[3].set_title('Raw Dis-equilibrium Over Time')\n",
|
|
" axes[3].set_ylabel('Dis-equilibrium')\n",
|
|
" axes[3].grid(True)\n",
|
|
"\n",
|
|
" # 5. Scaled Dis-equilibrium with Thresholds\n",
|
|
" valid_scaled_diseq = [diseq for diseq in scaled_disequilibrium_history[:n_points] if not np.isnan(diseq)]\n",
|
|
" scaled_diseq_timestamps = [valid_timestamps[i] for i, diseq in enumerate(scaled_disequilibrium_history[:n_points]) if not np.isnan(diseq)]\n",
|
|
"\n",
|
|
" if valid_scaled_diseq:\n",
|
|
" axes[4].plot(scaled_diseq_timestamps, valid_scaled_diseq, 'purple', alpha=0.7, linewidth=1)\n",
|
|
" axes[4].axhline(y=open_threshold, color='red', linestyle='--', alpha=0.8,\n",
|
|
" label=f'Open Threshold ({open_threshold})')\n",
|
|
" axes[4].axhline(y=close_threshold, color='blue', linestyle='--', alpha=0.8,\n",
|
|
" label=f'Close Threshold ({close_threshold})')\n",
|
|
" axes[4].axhline(y=0, color='black', linestyle='-', alpha=0.5, linewidth=0.5)\n",
|
|
" axes[4].set_title('Scaled Dis-equilibrium with Trading Thresholds')\n",
|
|
" axes[4].set_ylabel('Scaled Dis-equilibrium')\n",
|
|
" axes[4].legend()\n",
|
|
" axes[4].grid(True)\n",
|
|
"\n",
|
|
" # 6. Price Data with Training Windows\n",
|
|
" # Show original price data with indication of training windows\n",
|
|
" colname_a, colname_b = pair.colnames()\n",
|
|
" price_data = pair.market_data_[:n_points + training_minutes].copy()\n",
|
|
"\n",
|
|
" axes[5].plot(price_data['tstamp'], price_data[colname_a], alpha=0.7, label=f'{SYMBOL_A}', linewidth=1)\n",
|
|
" axes[5].plot(price_data['tstamp'], price_data[colname_b], alpha=0.7, label=f'{SYMBOL_B}', linewidth=1)\n",
|
|
"\n",
|
|
" # Highlight training windows\n",
|
|
" for i in range(0, min(n_points, 10), max(1, n_points//20)): # Show every 20th window\n",
|
|
" start_idx = i\n",
|
|
" end_idx = i + training_minutes\n",
|
|
" if end_idx < len(price_data):\n",
|
|
" window_data = price_data.iloc[start_idx:end_idx]\n",
|
|
" axes[5].axvspan(window_data['tstamp'].iloc[0], window_data['tstamp'].iloc[-1],\n",
|
|
" alpha=0.1, color='gray')\n",
|
|
"\n",
|
|
" axes[5].set_title(f'Price Data with Training Windows (Gray bands show some training windows)')\n",
|
|
" axes[5].set_ylabel('Price')\n",
|
|
" axes[5].set_xlabel('Time')\n",
|
|
" axes[5].legend()\n",
|
|
" axes[5].grid(True)\n",
|
|
"\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.show()\n",
|
|
"\n",
|
|
"# Print summary statistics\n",
|
|
"print(f\"\\n\" + \"=\"*80)\n",
|
|
"print(f\"SLIDING WINDOW ANALYSIS SUMMARY\")\n",
|
|
"print(f\"=\"*80)\n",
|
|
"print(f\"Total iterations processed: {n_points}\")\n",
|
|
"print(f\"Cointegration episodes: {sum(cointegration_history[:n_points])}\")\n",
|
|
"print(f\"Cointegration rate: {sum(cointegration_history[:n_points])/n_points*100:.1f}%\")\n",
|
|
"if valid_betas:\n",
|
|
" print(f\"Beta coefficient stability: Std = {np.std(beta_array, axis=0)}\")\n",
|
|
"if valid_scaled_diseq:\n",
|
|
" threshold_breaches = sum(1 for x in valid_scaled_diseq if abs(x) > open_threshold)\n",
|
|
" print(f\"Open threshold breaches: {threshold_breaches} ({threshold_breaches/len(valid_scaled_diseq)*100:.1f}%)\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Analyze Training Window Evolution"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Detailed analysis of how training windows evolve\n",
|
|
"print(\"TRAINING WINDOW EVOLUTION ANALYSIS\")\n",
|
|
"print(\"=\"*50)\n",
|
|
"\n",
|
|
"# Analyze cointegration stability\n",
|
|
"if len(cointegration_history) > 1:\n",
|
|
" # Find cointegration change points\n",
|
|
" change_points = []\n",
|
|
" for i in range(1, len(cointegration_history)):\n",
|
|
" if cointegration_history[i] != cointegration_history[i-1]:\n",
|
|
" change_points.append((i, cointegration_history[i], valid_timestamps[i] if i < len(valid_timestamps) else None))\n",
|
|
"\n",
|
|
" print(f\"\\nCointegration Change Points:\")\n",
|
|
" if change_points:\n",
|
|
" for idx, status, timestamp in change_points[:10]: # Show first 10\n",
|
|
" status_str = \"GAINED\" if status else \"LOST\"\n",
|
|
" print(f\" Iteration {idx}: {status_str} cointegration at {timestamp}\")\n",
|
|
" if len(change_points) > 10:\n",
|
|
" print(f\" ... and {len(change_points)-10} more changes\")\n",
|
|
" else:\n",
|
|
" print(f\" No cointegration changes detected\")\n",
|
|
"\n",
|
|
"# Analyze beta stability when cointegrated\n",
|
|
"if valid_betas and len(valid_betas) > 10:\n",
|
|
" beta_df = pd.DataFrame(valid_betas, columns=[f'Beta_{i}' for i in range(len(valid_betas[0]))])\n",
|
|
" beta_df['timestamp'] = beta_timestamps\n",
|
|
"\n",
|
|
" print(f\"\\nBeta Coefficient Analysis:\")\n",
|
|
" print(f\" Number of valid beta estimates: {len(valid_betas)}\")\n",
|
|
" print(f\" Beta statistics:\")\n",
|
|
" for col in beta_df.columns[:-1]: # Exclude timestamp\n",
|
|
" print(f\" {col}: Mean={beta_df[col].mean():.4f}, Std={beta_df[col].std():.4f}\")\n",
|
|
"\n",
|
|
" # Check for beta regime changes\n",
|
|
" beta_changes = []\n",
|
|
" threshold = 0.1 # 10% change threshold\n",
|
|
" for i in range(1, len(valid_betas)):\n",
|
|
" if np.any(np.abs(np.array(valid_betas[i]) - np.array(valid_betas[i-1])) > threshold):\n",
|
|
" beta_changes.append(i)\n",
|
|
"\n",
|
|
" print(f\" Significant beta changes (>{threshold*100}%): {len(beta_changes)}\")\n",
|
|
" if beta_changes:\n",
|
|
" print(f\" Change frequency: {len(beta_changes)/len(valid_betas)*100:.1f}% of cointegrated periods\")\n",
|
|
"\n",
|
|
"# Analyze dis-equilibrium characteristics\n",
|
|
"if valid_scaled_diseq:\n",
|
|
" scaled_diseq_series = pd.Series(valid_scaled_diseq)\n",
|
|
"\n",
|
|
" print(f\"\\nDis-equilibrium Analysis:\")\n",
|
|
" print(f\" Mean: {scaled_diseq_series.mean():.4f}\")\n",
|
|
" print(f\" Std: {scaled_diseq_series.std():.4f}\")\n",
|
|
" print(f\" Min: {scaled_diseq_series.min():.4f}\")\n",
|
|
" print(f\" Max: {scaled_diseq_series.max():.4f}\")\n",
|
|
"\n",
|
|
" # Threshold analysis\n",
|
|
" open_breaches = sum(1 for x in valid_scaled_diseq if abs(x) >= open_threshold)\n",
|
|
" close_opportunities = sum(1 for x in valid_scaled_diseq if abs(x) <= close_threshold)\n",
|
|
"\n",
|
|
" print(f\" Open threshold breaches: {open_breaches} ({open_breaches/len(valid_scaled_diseq)*100:.1f}%)\")\n",
|
|
" print(f\" Close opportunities: {close_opportunities} ({close_opportunities/len(valid_scaled_diseq)*100:.1f}%)\")\n",
|
|
"\n",
|
|
" # Mean reversion analysis\n",
|
|
" zero_crossings = 0\n",
|
|
" for i in range(1, len(valid_scaled_diseq)):\n",
|
|
" if (valid_scaled_diseq[i-1] * valid_scaled_diseq[i]) < 0: # Sign change\n",
|
|
" zero_crossings += 1\n",
|
|
"\n",
|
|
" print(f\" Zero crossings (mean reversion events): {zero_crossings}\")\n",
|
|
" if zero_crossings > 0:\n",
|
|
" print(f\" Average time between mean reversions: {len(valid_scaled_diseq)/zero_crossings:.1f} minutes\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Run Complete Strategy (Optional)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Optional: Run the complete strategy to generate actual trades\n",
|
|
"# Warning: This may take several minutes depending on data size\n",
|
|
"\n",
|
|
"RUN_COMPLETE_STRATEGY = False # Set to True to run full strategy\n",
|
|
"\n",
|
|
"if RUN_COMPLETE_STRATEGY:\n",
|
|
" print(\"Running complete SlidingFitStrategy...\")\n",
|
|
" print(\"This may take several minutes...\")\n",
|
|
"\n",
|
|
" # Reset strategy state\n",
|
|
" STRATEGY.curr_training_start_idx_ = 0\n",
|
|
"\n",
|
|
" # Create new pair and result objects\n",
|
|
" pair_full = TradingPair(\n",
|
|
" market_data=market_data_df,\n",
|
|
" symbol_a=SYMBOL_A,\n",
|
|
" symbol_b=SYMBOL_B,\n",
|
|
" price_column=CONFIG[\"price_column\"]\n",
|
|
" )\n",
|
|
"\n",
|
|
" bt_result_full = BacktestResult(config=CONFIG)\n",
|
|
"\n",
|
|
" # Run strategy\n",
|
|
" pair_trades = STRATEGY.run_pair(config=CONFIG, pair=pair_full, bt_result=bt_result_full)\n",
|
|
"\n",
|
|
" if pair_trades is not None and len(pair_trades) > 0:\n",
|
|
" print(f\"\\nGenerated {len(pair_trades)} trading signals:\")\n",
|
|
" display(pair_trades)\n",
|
|
"\n",
|
|
" # Analyze trades\n",
|
|
" trade_times = pair_trades['time'].unique()\n",
|
|
" print(f\"\\nTrade Analysis:\")\n",
|
|
" print(f\" Unique trade times: {len(trade_times)}\")\n",
|
|
" print(f\" Trade frequency: {len(trade_times)/max_iterations*100:.2f}% of total periods\")\n",
|
|
"\n",
|
|
" # Group trades by time\n",
|
|
" for trade_time in trade_times[:5]: # Show first 5 trade times\n",
|
|
" trades_at_time = pair_trades[pair_trades['time'] == trade_time]\n",
|
|
" print(f\"\\n Trade at {trade_time}:\")\n",
|
|
" for _, trade in trades_at_time.iterrows():\n",
|
|
" print(f\" {trade['action']} {trade['symbol']} @ ${trade['price']:.2f} \"\n",
|
|
" f\"(dis-eq: {trade['scaled_disequilibrium']:.2f})\")\n",
|
|
" else:\n",
|
|
" print(\"\\nNo trading signals generated\")\n",
|
|
" print(\"Possible reasons:\")\n",
|
|
" print(\" - Insufficient cointegration periods\")\n",
|
|
" print(\" - Dis-equilibrium never exceeded thresholds\")\n",
|
|
" print(\" - Strategy-specific conditions not met\")\n",
|
|
"else:\n",
|
|
" print(\"Complete strategy execution is disabled.\")\n",
|
|
" print(\"Set RUN_COMPLETE_STRATEGY = True to run the full strategy.\")\n",
|
|
" print(\"Note: This may take several minutes depending on your data size.\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Interactive Parameter Analysis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Interactive analysis for parameter optimization\n",
|
|
"print(\"PARAMETER SENSITIVITY ANALYSIS\")\n",
|
|
"print(\"=\"*40)\n",
|
|
"\n",
|
|
"print(f\"Current parameters:\")\n",
|
|
"print(f\" Training window: {CONFIG['training_minutes']} minutes\")\n",
|
|
"print(f\" Open threshold: {CONFIG['dis-equilibrium_open_trshld']}\")\n",
|
|
"print(f\" Close threshold: {CONFIG['dis-equilibrium_close_trshld']}\")\n",
|
|
"\n",
|
|
"# Recommendations based on observed data\n",
|
|
"if valid_scaled_diseq:\n",
|
|
" diseq_stats = pd.Series(valid_scaled_diseq).describe()\n",
|
|
" print(f\"\\nObserved scaled dis-equilibrium statistics:\")\n",
|
|
" print(f\" 75th percentile: {diseq_stats['75%']:.2f}\")\n",
|
|
" print(f\" 95th percentile: {np.percentile(valid_scaled_diseq, 95):.2f}\")\n",
|
|
" print(f\" 99th percentile: {np.percentile(valid_scaled_diseq, 99):.2f}\")\n",
|
|
"\n",
|
|
" # Suggest optimal thresholds\n",
|
|
" suggested_open = np.percentile(np.abs(valid_scaled_diseq), 85)\n",
|
|
" suggested_close = np.percentile(np.abs(valid_scaled_diseq), 30)\n",
|
|
"\n",
|
|
" print(f\"\\nSuggested threshold optimization:\")\n",
|
|
" print(f\" Suggested open threshold: {suggested_open:.2f} (85th percentile)\")\n",
|
|
" print(f\" Suggested close threshold: {suggested_close:.2f} (30th percentile)\")\n",
|
|
"\n",
|
|
" if suggested_open != open_threshold or suggested_close != close_threshold:\n",
|
|
" print(f\"\\nTo test these parameters, modify the CONFIG dictionary:\")\n",
|
|
" print(f\" CONFIG['dis-equilibrium_open_trshld'] = {suggested_open:.2f}\")\n",
|
|
" print(f\" CONFIG['dis-equilibrium_close_trshld'] = {suggested_close:.2f}\")\n",
|
|
"\n",
|
|
"# Training window recommendations\n",
|
|
"if len(cointegration_history) > 0:\n",
|
|
" cointegration_rate = sum(cointegration_history)/len(cointegration_history)\n",
|
|
" print(f\"\\nTraining window analysis:\")\n",
|
|
" print(f\" Current cointegration rate: {cointegration_rate*100:.1f}%\")\n",
|
|
"\n",
|
|
" if cointegration_rate < 0.3:\n",
|
|
" print(f\" Recommendation: Consider increasing training window (current: {training_minutes})\")\n",
|
|
" print(f\" Suggested: {int(training_minutes * 1.5)} minutes\")\n",
|
|
" elif cointegration_rate > 0.8:\n",
|
|
" print(f\" Recommendation: Consider decreasing training window for more responsive model\")\n",
|
|
" print(f\" Suggested: {int(training_minutes * 0.75)} minutes\")\n",
|
|
" else:\n",
|
|
" print(f\" Current training window appears appropriate\")\n",
|
|
"\n",
|
|
"print(f\"\\nTo re-run analysis with different parameters:\")\n",
|
|
"print(f\"1. Modify the CONFIG dictionary above\")\n",
|
|
"print(f\"2. Re-run from the 'Run SlidingFitStrategy' cell\")\n",
|
|
"print(f\"3. Compare results with current analysis\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Summary and Conclusions\n",
|
|
"\n",
|
|
"This notebook demonstrates the SlidingFitStrategy's dynamic approach to pairs trading.\n",
|
|
"Key insights from the sliding window analysis:\n",
|
|
"\n",
|
|
"1. **Cointegration Stability**: How often the pair maintains cointegration\n",
|
|
"2. **Model Parameter Evolution**: How VECM coefficients change over time\n",
|
|
"3. **Threshold Effectiveness**: How well current thresholds capture trading opportunities\n",
|
|
"4. **Mean Reversion Patterns**: Frequency and timing of dis-equilibrium corrections\n",
|
|
"\n",
|
|
"The sliding approach allows for:\n",
|
|
"- **Adaptive modeling**: Responds to changing market conditions\n",
|
|
"- **Dynamic thresholding**: Can be optimized based on observed patterns\n",
|
|
"- **Real-time monitoring**: Provides continuous assessment of pair relationships\n",
|
|
"- **Risk management**: Early detection of cointegration breakdown"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "python3.12-venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |