Skip to main content

Feature Engineering

Overview

Feature engineering is a critical process in quantitative trading that transforms raw data into meaningful predictors that can be used to forecast trading opportunities. In Merlin, feature engineering extracts and transforms variables from MesoSim backtest data, enriching them with additional calculations to create a comprehensive set of potential predictors.

The quality of features often determines the effectiveness of trading models more than the model algorithm itself. Well-engineered features capture market dynamics and relationships that may not be apparent in raw data.

Feature Categories

Merlin can extract and calculate various types of features from option strategy data:

Implied Volatility (IV) and Historical Volatility Based Features

Implied volatility metrics provide insights into market expectations of future price movement and risk:

Feature TypeDescriptionExample
Leg IVIV of individual option legsentry_leg1_iv, entry_leg2_iv
Underlying IVIV of the underlying assetentry_underlying_iv,
IV RatiosDerived metrics from different IVsentry_leg1_iv_by_underlying_iv, entry_leg1_iv_by_leg2_iv
Underlying HVHistorical Volatility of the underlyingentry_underlying_hv,
IV and HV RatiosDerived metrics between HV and different IVsentry_leg1_iv_by_underlying_hv, entry_underlying_iv_by_hv_ratio

Greeks Features

Option Greeks measure the sensitivity of option prices to various factors:

Feature TypeDescriptionExample
DeltaPrice sensitivity to underlying movemententry_leg1_delta, pos_delta
GammaDelta sensitivity to underlying movemententry_leg1_gamma, pos_gamma
ThetaPrice sensitivity to time decayentry_leg1_theta, pos_theta
VegaPrice sensitivity to IV changesentry_leg1_vega, pos_vega
Greek RatiosDerived metrics from Greeksentry_theta_by_delta, entry_gamma_by_theta

Price Features

Price-based features capture market valuation metrics:

Feature TypeDescriptionExample
Leg PricesOption prices for each legentry_leg1_price, entry_leg2_price
Price RatiosDerived metrics from pricesentry_leg1_price_by_leg2_price
Underlying PricePrice of the underlying assetentry_underlying_price

User Variables

Custom variables defined in the strategy definition file can be included as features, capturing strategy-specific logic.

Feature Transformations

Merlin supports applying transformations to features to potentially improve their predictive power:

Z-Score Normalization

Standardizes features by subtracting the mean and dividing by standard deviation, making them more comparable:

z_score(feature) = (feature - rolling_mean(feature)) / rolling_std(feature)

The standardization is done across multiple lookback periods.

Enable transformations with the --add-feature-transforms flag.

Adding External Features

Using CSV Files

Merlin allows importing external features from CSV files with the AddCsv config:

The CSV file should contain:

  • A 'DateTime' column matching the backtest timestamps
  • Feature columns with predictive variables

Using Code Extension

For more complex feature engineering, you can extend the feature_engineering.py module:

  1. Add new feature calculation functions
  2. Register them in the feature extraction pipeline
  3. Expose them via configuration options

Example of extending with a new feature category:

def add_custom_indicators(df, prefix=""):
"""Add custom technical indicators"""
df[f'{prefix}custom_indicator_1'] = calculate_indicator_1(df)
df[f'{prefix}custom_indicator_2'] = calculate_indicator_2(df)
return df

# Then add to the main extraction function:
if add_custom_indicators:
df = add_custom_indicators(df, prefix="entry_")

Pandas-TA Integration

Merlin integrates with pandas-ta-openbb, a technical analysis library that provides hundreds of indicators. These can be incorporated as features by extending the feature engineering code:

import pandas_ta as ta

def add_ta_indicators(df, column, prefix=""):
"""Add technical analysis indicators from pandas-ta"""
# Calculate RSI
df[f'{prefix}rsi_14'] = ta.rsi(df[column], length=14)

# Calculate MACD
macd = ta.macd(df[column])
df = df.join(macd)

return df

Feature Selection

After generating features, Merlin uses Cramér's V statistical tests to identify those with the strongest predictive power.

Configure the feature selection process with parameters:

merlin discover-predictors <strategy-json> --cramer-feature-grouping=tails --cramer-bins-or-tails=0.1