9.3 KiB
Pairs Trading Backtest
This document provides a guide to understanding, configuring, and running the pairs trading backtest system.
Overview
The system is designed to backtest pairs trading strategies on historical market data. It allows users to select different strategies, configure parameters, and analyze the performance of these strategies.
Core Concepts
Trading Pair
A trading pair consists of two financial instruments (e.g., stocks or cryptocurrencies) whose prices are believed to have a long-term statistical relationship (cointegration). The strategy aims to profit from temporary deviations from this relationship.
Strategy
The system supports different strategies for identifying and exploiting trading opportunities. Each strategy has its own set of configurable parameters.
Trading Signals
Trading signals indicate when to open or close a position based on the configured strategy and parameters. These signals are typically generated when the "dis-equilibrium" (the deviation from the long-term relationship) crosses certain thresholds.
Running a Backtest
1. Configuration
The primary configuration for the backtest is managed in the src/pt_backtest.py file. Here, you will define which dataset to use (cryptocurrencies or equities) and which strategy to employ.
Choosing a Dataset:
You can switch between CRYPTO_CONFIG and EQT_CONFIG by uncommenting the desired configuration block:
# CONFIG = CRYPTO_CONFIG # For cryptocurrency data
CONFIG = EQT_CONFIG # For equity data
Each configuration dictionary specifies:
data_directory: Path to the data files.datafiles: A list of database files to process. You can comment/uncomment specific files to include/exclude them from the backtest.db_table_name: The name of the table within the SQLite database.instruments: A list of symbols to consider for forming trading pairs.trading_hours: Defines the session start and end times, crucial for equity markets.stat_model_price: The column in the data to be used as the price (e.g., "close").dis-equilibrium_open_trshld: The threshold (in standard deviations) of the dis-equilibrium for opening a trade.dis-equilibrium_close_trshld: The threshold (in standard deviations) of the dis-equilibrium for closing an open trade.training_minutes: The length of the rolling window (in minutes) used to train the model (e.g., calculate cointegration, mean, and standard deviation of the dis-equilibrium).funding_per_pair: The amount of capital allocated to each trading pair.
Choosing a Strategy:
The system currently offers two main strategies: StaticFitStrategy and SlidingFitStrategy. You select a strategy by instantiating it:
# STRATEGY = StaticFitStrategy()
STRATEGY = SlidingFitStrategy()
-
StaticFitStrategy: This strategy fits the cointegration model once at the beginning of each trading day (or for the entire dataset if run on a single file without a rolling window logic in the strategy itself). The parameters (mean, standard deviation of dis-equilibrium) derived from this initial fit are used for generating trading signals throughout the day.- Pros: Simpler, computationally less intensive.
- Cons: May not adapt well to changing market conditions during the day.
-
SlidingFitStrategy: This strategy uses a rolling window approach. The cointegration model and its parameters are re-estimated at regular intervals (defined bytraining_minutesand how the strategy implements the sliding window). This allows the strategy to adapt to evolving market dynamics.- Pros: More adaptive to changing market conditions.
- Cons: Computationally more intensive. The
training_minutesparameter is crucial here as it defines the look-back period for each re-estimation.
2. Parameters for Trading Signals
The key parameters that determine trading signals are primarily found within the CONFIG dictionaries:
-
dis-equilibrium_open_trshld: This is the number of standard deviations the current dis-equilibrium must move away from its mean (calculated during the training period) to trigger an opening signal.- A higher value means the strategy will wait for a more significant deviation before entering a trade, leading to fewer but potentially more robust signals.
- A lower value means the strategy will enter trades on smaller deviations, leading to more frequent signals but potentially more false positives.
-
dis-equilibrium_close_trshld: This is the number of standard deviations the current dis-equilibrium must revert towards its mean (from its peak deviation) to trigger a closing signal.- A higher value (closer to the
dis-equilibrium_open_trshld) means the strategy will close trades more quickly as the dis-equilibrium starts to revert. - A lower value (closer to zero) means the strategy will hold onto trades longer, waiting for the dis-equilibrium to revert more significantly towards the mean.
- A higher value (closer to the
-
training_minutes:- For
StaticFitStrategy, this determines the initial period of data used to establish the cointegration relationship and calculate the baseline dis-equilibrium statistics for the entire trading day (or dataset portion being processed). - For
SlidingFitStrategy, this defines the length of the rolling window. The model is refit using data from the most recenttraining_minutesperiod. A shorter window makes the strategy more responsive to recent price action but might be more prone to noise. A longer window provides a more stable model but might be slower to adapt to new trends.
- For
3. Running the Script
Once the configuration is set, you can run the backtest from your terminal:
python src/pt_backtest.py
The script will process each datafile specified in the CONFIG, create all possible unique pairs from the instruments list, and apply the chosen strategy.
4. Interpreting Results
The script will output:
- Progress messages for each datafile being processed.
- A summary of trades taken.
- Grand totals of performance metrics (PnL, etc.).
- A list of any outstanding positions at the end of the backtest.
The core logic for a pair involves:
- Data Preparation: For each pair, relevant price series are extracted.
- Training Phase (for
SlidingFitStrategy, this happens repeatedly; forStaticFitStrategy, typically once per day/file):- The
get_datasets()method inTradingPairsplits data into training and testing sets. check_cointegration()uses the Johansen test to see if the pair's price series are cointegrated within the current training window. If not, the pair is often skipped for that window.- If cointegrated,
fit_VECM()estimates a Vector Error Correction Model (VECM). Thebetacoefficients from this model define the cointegrating relationship (the "spread" or "dis-equilibrium series"). training_mu_(mean) andtraining_std_(standard deviation) of this dis-equilibrium series are calculated. These are crucial for scaling the dis-equilibrium and setting trade thresholds.
- The
- Prediction/Trading Phase:
- The strategy iterates through the "testing" data points.
- For each point, the current dis-equilibrium is calculated using the
betafrom the VECM. - This dis-equilibrium is then scaled:
(current_disequilibrium - training_mu_) / training_std_. - This scaled value is compared against
dis-equilibrium_open_trshldanddis-equilibrium_close_trshldto generate buy/sell/close signals.
Customizing and Extending
- Adding New Strategies: Create a new class that inherits from a base strategy class (if one exists) or implements a similar interface to
StaticFitStrategyorSlidingFitStrategy. The core method to implement would berun_pair(). - Modifying Data Loading: The
tools/data_loader.pycan be modified to support different data formats or sources. - Changing Cointegration/Model Parameters: The
TradingPairclass houses the VECM fitting and cointegration checks. You can adjust parameters likek_ar_diffincoint_johansenor theVECMmodel itself.
Important Considerations
- Data Quality: Ensure your market data is clean, accurate, and properly formatted. Gaps or errors in data can significantly impact backtest results.
- Transaction Costs: The current backtest might not explicitly model transaction costs (brokerage fees, slippage). These can have a significant impact on the profitability of high-frequency strategies. Consider adding a cost model to
BacktestResultor within the strategy execution. - Look-ahead Bias: Be extremely careful to avoid look-ahead bias. Ensure that decisions at any point in time are made using only information that would have been available at that time. The use of
training_df_andtesting_df_inTradingPairis designed to help prevent this. - Overfitting: When optimizing parameters (
dis-equilibrium_open_trshld,training_minutes, etc.), be mindful of overfitting to the historical data. A strategy that performs exceptionally well on past data may not perform well in the future. Use out-of-sample testing or walk-forward optimization for more robust validation.
This tutorial should provide a solid foundation for working with the pairs trading backtest system. Experiment with different configurations and strategies to find what works best for your chosen markets and instruments.