Skip to main content

MIFS: Mutual Information based Feature Selection Model

Overview

The Mutual Information based Feature Selection (MIFS) model is a feature selection algorithm designed to identify the most predictive features while minimizing redundancy. Unlike simple correlation-based methods that can select highly redundant features, MIFS uses information theory principles to systematically build an optimal set of non-redundant predictors that maximize predictive power.

The model addresses a critical challenge in quantitative analysis: while individual features may show little correlation with each other in isolation, their information content related to predicting a target variable can be highly redundant. For example, two momentum indicators may show low pairwise correlation yet still explain the same return variance.

MIFS solves this by iteratively selecting features that add the most new predictive information to the already selected set.

Key Concepts

Mutual Information

Mutual Information is a fundamental concept from information theory that measures the amount of information one random variable contains about another. In the context of feature selection, it quantifies how much uncertainty about the target variable is reduced by knowing the value of a feature.

Higher mutual information scores indicate stronger predictive relationships, making it an excellent measure for feature selection in trading and financial modeling. The algorithm estimates MI via binning of entropy estimators.

Cramer's V

The MIFS model incorporates Cramer's V, a nominal measure of association that ranges from 0 (no relationship) to 1 (perfect relationship). This provides a normalized measure of predictive power that complements the mutual information score. Because Cramer’s V is already adjusted for degrees of freedom, values are comparable across contingency-table sizes.

Monte Carlo Permutation Test

The Monte Carlo Permutation Test is a statistical method used to assess the significance of the observed predictive power. This test evaluates whether a feature's contribution is statistically meaningful or could have occurred by random chance.

MIFS runs 200 random permutations, yielding p-value estimates in steps of roughly 0.005 (0.5 % granularity).

How the Monte Carlo Permutation Test Works

The following steps outline the Monte Carlo Permutation Test process used in MIFS:

  1. Define the Test Statistic: The improvement in predictive power when adding the new feature to the existing set
  2. Compute Observed Value: Calculate the actual improvement on the original dataset
  3. Generate Permuted Datasets: Randomly shuffle the feature values to break any real association with the target, creating datasets that reflect the null hypothesis (no predictive relationship)
  4. Recompute Statistic Repeatedly: For each permuted dataset, recalculate the improvement statistic. Repeat this process hundreds of times to build a "null distribution"
  5. Estimate p-Value: Determine the proportion of permuted statistics that are at least as extreme as the observed statistic
  6. Draw Conclusions: If the p-value is below a chosen threshold (e.g., 0.05), conclude that the feature adds significant predictive power

This approach provides rigorous statistical validation of each feature's contribution, helping prevent the inclusion of spuriously correlated predictors.

How MIFS Works

The MIFS algorithm operates through an iterative selection process:

  1. Initial Selection: Evaluate all candidate features individually and select the one with the highest mutual information with the target variable

  2. Iterative Addition: For each subsequent selection:

    • Calculate the additional mutual information each remaining candidate would contribute when combined with already selected features
    • Select the candidate that provides the maximum incremental predictive power
    • Validate the selection using Monte Carlo Permutation Testing
  3. Redundancy Minimization: The algorithm specifically considers only the information component related to the target variable, ensuring that redundant predictive information is minimized

This process continues until no remaining candidates provide statistically significant improvement.

Model Parameters

The MIFS model accepts the following parameters:

Feature Grouping

ParameterDescription
featureGroupingDetermines how features are grouped: "bins" (equal frequency bins) or "tails" (most extreme values)
featureBinsOrTailsFor Bins: Number of equal-width bins to create. For Tails: Percentage (0-1) defining the tail size. e.g., 0.1 for top and bottom 10%

Target Grouping

ParameterDescription
targetGroupingDetermines how the target is grouped: "bins" (equal bins) or "sign" (positive/negative return)
targetBinsWhen targetGrouping="bins", specifies the number of bins for the target

CSV Content

The CSV data containing the features and target variable must be provided in the request body.

Results Format

The MIFS model returns results ordered by selection priority, with each feature containing:

The Result Object

The API returns a List of MifsResult objects containing the following fields:

FieldTypeDescription
FeaturestringThe name of the selected feature
ScoredecimalCumulative mutual information score representing the total predictive power achieved by including this feature and all previously selected features
CramersVdecimalCramer's V measure of association between this feature and the target, ranging from 0 (no relationship) to 100 (perfect relationship)
MCpValuedecimal?Monte Carlo Permutation Test p-value indicating the probability that the observed improvement could occur by random chance. Lower values indicate stronger statistical significance

Interpreting Results

  • Selection Order: Features are listed in the order they were selected, with the first being the single most important predictor
  • Cumulative Scoring: The Score field represents cumulative mutual information, showing the total predictive power achieved by including all features up to and including the current one
  • Statistical Significance: MCpValue provides statistical validation of each feature's contribution. Values below 0.05 typically indicate statistically significant predictive power

API Endpoint

POST /models/v1/mifs

Request Format

The request requires the following query parameters and JSON body:

Query Parameters:

  • featureGrouping: "Bins" or "Tails"
  • featureBinsOrTails: Number (bins count) or decimal (tail percentage)
  • targetGrouping: "Bins" or "Sign"
  • targetBins: Integer (required when targetGrouping is "Bins")

Request Body: The features

DateTime,Symbol,StrategyNAV,Feature1,Feature2,Feature3
2024-01-02,STRAT1,9999.9,0.9993,0.1113,0.210000
2024-01-03,STRAT1,9991.5,0.9965,0.9991,0.210000
2024-01-04,STRAT1,9708.0,0.9945,0.9926,0.19991
2024-01-05,STRAT1,9938.9,0.9945,0.10021,0.210031
2024-01-08,STRAT1,9963.9,0.9945,0.10276,0.210350
2024-01-09,STRAT1,9736.4,0.9945,0.10371,0.210555
2024-01-10,STRAT1,9963.9,0.9945,0.10473,0.210751
...

Response Format

The response contains an ordered list of selected features with their statistical measures:

{
"Features": [
{
"Feature": "FEAT_1",
"Score": 0.245,
"CramersV": 78.5,
"MCpValue": 0.02
},
{
"Feature": "FEAT_2",
"Score": 0.387,
"CramersV": 45.2,
"MCpValue": 0.023
}
]
}

Notes and Limitations

  • Serial Correlation Bias: If the target variable has significant serial correlation (common with multi-period look-aheads), the computed p-values may be biased downward, potentially overestimating statistical significance
  • Computational Complexity: The Monte Carlo Permutation Test requires extensive computation. The processing time can take up to several minutes, especially for large datasets or many features
  • Sample Size Sensitivity: Statistical significance testing requires adequate sample sizes for reliable p-value estimation

Practical Applications

MIFS works well as a first-pass filter before building models such as SETS.

A good starting point is to group both the features and target into bins using 2 and 3 bins, respectively:

  POST /models/v1/mifs?featureGrouping=bins&featureBinsOrTails=2&targetGrouping=bins&targetBins=3

Interpreting the results

CriterionRecommended ThresholdPurpose
MCpValue≤ 0.05 or ≤ 0.10Keeps only features whose added predictive power is unlikely under the null hypothesis
Cramer's V≥ 0.30 (optional)Screens for predictors with a practically useful effect size

Use the MCpValue filter first; add the Cramer's V threshold when you need an extra safeguard against weak but statistically significant features. Adjust both cut-offs to match dataset size, model complexity, and the cost of including unnecessary predictors.