MHTSK
Multihead Takagi–Sugeno–Kang (MHTSK) builds multiple sparse subantecedents from random feature subsets and jointly optimizes their rule consequents.
Reference
Z. Bian, Q. Chang, J. Wang and N. R. Pal, "Multihead Takagi–Sugeno–Kang Fuzzy System," in IEEE Transactions on Fuzzy Systems, vol. 33, no. 8, pp. 2561-2573, Aug. 2025, doi: 10.1109/TFUZZ.2025.3569227.
Mathematical Formulation
Subantecedent construction
MHTSK randomly samples S features without replacement to create a subantecedent and repeats this process T times. Each subantecedent is trained by Fuzzy C-Means clustering on the selected feature subset.
For a dataset with D inputs, the selected feature subset for head t is:
and a sampled instance subset of size n is used to fit FCM for that head.
Rule generation
Each head generates K fuzzy rules by applying FCM to the selected features. A rule from head t is defined only on the selected features, while all remaining features use a constant "don't care" membership function.
For rule r from head t, the antecedent is:
where \mathcal{S}_t are the indices of the S features chosen by head t.
Normalization
All T \times K rules are normalized together using standard TSK normalization:
This joint normalization is the same as the paper's final rule aggregation, ensuring the sparse rules compete globally.
Sparse consequent
Each rule consequent depends only on the features used by that rule, creating a naturally sparse consequent structure:
For classification, the output is:
For regression, the result is the scalar weighted sum of sparse consequents.
Code ↔ Paper Correspondence
| Equation / Concept | Class / Method | Description |
|---|---|---|
Subset feature sampling S, T |
highfis.estimators._build_mhtsk_input_mfs |
Random head construction with feature subsets and FCM on each head |
| Feature coverage rate | highfis.estimators.feature_coverage_rate |
Computes 1 - (1 - S/D)^T from the paper |
| Scale parameter defaults | highfis.estimators._resolve_mhtsk_scale_parameters |
Derives S and T from h_value, fcr_target, xi, and sigma |
| Sparse antecedents | highfis.models.MHTSKClassifierModel, highfis.models.MHTSKRegressorModel |
Build rules with constant don't-care MFs on inactive features |
| Sparse consequent | highfis.layers.SparseClassificationConsequentLayer, highfis.layers.SparseRegressionConsequentLayer |
Applies a mask so each rule only uses active input dimensions |
| Rule extraction | highfis.estimators._extract_mhtsk_rule_indices |
Combines unsupervised max-firing strength and supervised Mann–Whitney selection |
Implementation notes
highfisrepresents each head with a partial rule base created byFuzzyCMeanson the sampled feature subset.- Every input feature has a constant
ConstantMF(1.0)to support don't-care membership when the feature is not selected by a head. - The total number of rules is
n_heads * n_mfs, because each head producesn_rulescluster-based rules. - The sparse consequent is implemented by masking the weight tensor with
rule_feature_maskinsideSparseClassificationConsequentLayerandSparseRegressionConsequentLayer. - The rule masks are derived from the selected feature subsets and the per-head cluster indices.
rule_sigmacontrols the Gaussian spread used for all selected feature MFs; the paper fixes this value to preserve interpretability and avoid numeric underflow.
Model classes
highfis.models.MHTSKClassifierModelhighfis.models.MHTSKRegressorModel
These classes extend BaseTSKClassifier and BaseTSKRegressor, respectively, and use rule_base="custom" with explicit rule definitions and a sparse consequent.
MHTSKClassifier
- Uses
SparseClassificationConsequentLayer - Default antecedent t-norm:
prod - Default defuzzifier:
SumBasedDefuzzifier
MHTSKRegressor
- Uses
SparseRegressionConsequentLayer - Default antecedent t-norm:
prod - Default defuzzifier:
SumBasedDefuzzifier
Estimator wrappers
highfis.estimators.MHTSKClassifierhighfis.estimators.MHTSKRegressor
These sklearn-style wrappers build the MHTSK rule base from raw data and expose the paper's scale parameter defaults.
Key estimator parameters
n_rules: Number of FCM clusters per head (K). Default:3.n_heads: Number of heads (T). WhenNone, defaults are resolved fromhead_size,fcr_target, andh_value.head_size: Number of features per head (S). WhenNone, defaults tomax(1, round(D * 0.02))forD <= 5000ormax(1, round(D * 0.01))otherwise.head_size_ratio: Alternative way to specifyhead_sizeas a fraction ofD.fcr_target: Target feature coverage rate. Default:0.85.h_value: Paper-derived scale constantH. If provided, it overridesfcr_target.xi: Numeric underflow threshold constant. Default:743.0.rule_sigma: Gaussian sigma used for FCM-derived MFs. Default:1.0.fcm_m: Fuzzy C-means fuzziness parameter. Default:2.0.instance_sample_fraction: Fraction of training samples used per head. Default:0.8.rule_extraction: Enable MHTSK_RE-style rule extraction. Default:False.crcr_us: Unsupervised cumulative rule contribution rate target. Default:0.5.crcr_s: Supervised cumulative rule contribution rate target (classifier only). Default:0.5.retrain_after_extraction: Retrain after extraction. Default:True.
Membership functions
MHTSK uses standard Gaussian membership functions for active features:
highfis.memberships.GaussianMFis used for selected features.highfis.memberships.ConstantMF(1.0)is used for inactive features, enabling don't-care semantics.
Training in the paper vs. highFIS
- The paper trains MHTSK end-to-end with gradient-based optimization over joint rule weights and consequents.
- highFIS preserves this approach via
BaseTSK.fit(), with mini-batch Adam training and optional early stopping. - The main difference is that highFIS builds the sparse model structure explicitly before training, while the paper describes the same structure in algorithmic form.
- highFIS also supports optional uniform-rule regularization (
ur_weight,ur_target) to encourage balanced rule activations.
Rule extraction (MHTSK_RE)
The MHTSK variant with extraction uses two complementary selection strategies:
- Unsupervised: select rules with the largest maximum normalized firing strength across training samples.
- Supervised: for classifiers, select rules with the smallest Mann–Whitney pairwise
p-value across class groups, using1 - p_{\min}as a score.
Selected rules are merged and the model is retrained on the reduced rule base.
Alignment with the paper
- highFIS implements the random head construction and sparse consequent exactly as described in the MHTSK paper.
feature_coverage_rate()matches the paper's FCR equation:
- The default
head_sizeandn_headsresolution reproduces the paper's recommended scale parameter strategy usingxi,sigma,fcr_target, andh_value. - The sparse consequent layers mirror the paper's per-rule subspace-specific linear consequents.
- The
MHTSKClassifierandMHTSKRegressorprovide a user-facing API that reflects the paper'sS,T,K, and rule extraction workflow.