Feature Engineering
Overview
Feature engineering is a critical process in quantitative trading that transforms raw data into meaningful predictors that can be used to forecast trading opportunities. In Merlin, feature engineering extracts and transforms variables from MesoSim backtest data, enriching them with additional calculations to create a comprehensive set of potential predictors.
The quality of features often determines the effectiveness of trading models more than the model algorithm itself. Well-engineered features capture market dynamics and relationships that may not be apparent in raw data.
Feature Categories
Merlin can extract and calculate various types of features from option strategy data:
Implied Volatility (IV) and Historical Volatility Based Features
Implied volatility metrics provide insights into market expectations of future price movement and risk:
| Feature Type | Description | Example |
|---|---|---|
| Leg IV | IV of individual option legs | entry_leg1_iv, entry_leg2_iv |
| Underlying IV | IV of the underlying asset | entry_underlying_iv, |
| IV Ratios | Derived metrics from different IVs | entry_leg1_iv_by_underlying_iv, entry_leg1_iv_by_leg2_iv |
| Underlying HV | Historical Volatility of the underlying | entry_underlying_hv, |
| IV and HV Ratios | Derived metrics between HV and different IVs | entry_leg1_iv_by_underlying_hv, entry_underlying_iv_by_hv_ratio |
Greeks Features
Option Greeks measure the sensitivity of option prices to various factors:
| Feature Type | Description | Example |
|---|---|---|
| Delta | Price sensitivity to underlying movement | entry_leg1_delta, pos_delta |
| Gamma | Delta sensitivity to underlying movement | entry_leg1_gamma, pos_gamma |
| Theta | Price sensitivity to time decay | entry_leg1_theta, pos_theta |
| Vega | Price sensitivity to IV changes | entry_leg1_vega, pos_vega |
| Greek Ratios | Derived metrics from Greeks | entry_theta_by_delta, entry_gamma_by_theta |
Price Features
Price-based features capture market valuation metrics:
| Feature Type | Description | Example |
|---|---|---|
| Leg Prices | Option prices for each leg | entry_leg1_price, entry_leg2_price |
| Price Ratios | Derived metrics from prices | entry_leg1_price_by_leg2_price |
| Underlying Price | Price of the underlying asset | entry_underlying_price |
User Variables
Custom variables defined in the strategy definition file can be included as features, capturing strategy-specific logic.
Feature Transformations
Merlin supports applying transformations to features to potentially improve their predictive power:
Z-Score Normalization
Standardizes features by subtracting the mean and dividing by standard deviation, making them more comparable:
z_score(feature) = (feature - rolling_mean(feature)) / rolling_std(feature)
The standardization is done across multiple lookback periods.
Enable transformations with the --add-feature-transforms flag.
Adding External Features
Using CSV Files
Merlin allows importing external features from CSV files with the AddCsv config:
The CSV file should contain:
- A 'DateTime' column matching the backtest timestamps
- Feature columns with predictive variables
Using Code Extension
For more complex feature engineering, you can extend the feature_engineering.py module:
- Add new feature calculation functions
- Register them in the feature extraction pipeline
- Expose them via configuration options
Example of extending with a new feature category:
def add_custom_indicators(df, prefix=""):
"""Add custom technical indicators"""
df[f'{prefix}custom_indicator_1'] = calculate_indicator_1(df)
df[f'{prefix}custom_indicator_2'] = calculate_indicator_2(df)
return df
# Then add to the main extraction function:
if add_custom_indicators:
df = add_custom_indicators(df, prefix="entry_")
Pandas-TA Integration
Merlin integrates with pandas-ta-openbb, a technical analysis library that provides hundreds of indicators. These can be incorporated as features by extending the feature engineering code:
import pandas_ta as ta
def add_ta_indicators(df, column, prefix=""):
"""Add technical analysis indicators from pandas-ta"""
# Calculate RSI
df[f'{prefix}rsi_14'] = ta.rsi(df[column], length=14)
# Calculate MACD
macd = ta.macd(df[column])
df = df.join(macd)
return df
Feature Selection
After generating features, Merlin uses Cramér's V statistical tests to identify those with the strongest predictive power.
Configure the feature selection process with parameters:
merlin discover-predictors <strategy-json> --cramer-feature-grouping=tails --cramer-bins-or-tails=0.1