DG-ALETSK
DG-ALETSK combines adaptive Ln-Exp softmin aggregation with feature and rule gating, enabling simultaneous feature selection and rule extraction in high-dimensional fuzzy models.
Reference
G. Xue, J. Wang, B. Yuan and C. Dai, "DG-ALETSK: A High-Dimensional Fuzzy Approach With Simultaneous Feature Selection and Rule Extraction," in IEEE Transactions on Fuzzy Systems, vol. 31, no. 11, pp. 3866–3880, Nov. 2023, doi: 10.1109/TFUZZ.2023.3270445.
Overview
DG-ALETSK is a high-dimensional TSK fuzzy model that jointly learns:
- feature selection via antecedent gates;
- rule extraction via consequent gates;
- a differentiable adaptive Ln-Exp softmin antecedent;
- a zero-order DG phase followed by first-order fine tuning.
The highFIS implementation supports both classification and regression via
DGALETSKClassifier and DGALETSKRegressor.
Mathematical Formulation
Antecedent membership
Each antecedent uses Gaussian membership functions:
with learned antecedent centers \(m_{r,d}\) and spreads \(\sigma_{r,d}>0\).
Feature gating
DG-ALETSK embeds one gate per input feature in the antecedent. The paper's enhanced gate function (eq. 24) with \(k=10\) is used:
The gated membership term is computed by raising the base membership value to the power of the gate value (paper eq. 12):
When \(M(\lambda_d)=1\) the gate is fully open (membership unchanged); when \(M(\lambda_d)=0\) the gate is closed (\(\tilde{\mu}_{r,d}=1\), so the feature contributes nothing to the softmin).
Adaptive Ln-Exp softmin
DG-ALETSK replaces the standard product T-norm with the Adaptive Ln-Exp (ALE) softmin. The firing strength of rule \(r\) is computed directly from paper eq. 22:
The adaptive exponent \(\hat{q}\) is recomputed on every forward pass: it is not a learned parameter. Choosing \(\xi = 700\) guarantees \(\exp(\hat{q}\cdot\max_d\tilde{\mu}) = e^{-700} > 0\) in IEEE 754 double precision (underflow boundary \(\approx e^{-745}\)), preventing numerical underflow while driving the softmin to closely approximate \(\min_d \tilde{\mu}_{r,d}\). Because \(\hat{q} \le -700\) and \(\tilde{\mu} \in (0,1]\), the output satisfies \(f_r \in (0, 1]\) by construction.
Rule gates and consequents
DG-ALETSK also embeds one gate per rule in the consequent. For zero-order classification, each rule \(r\) produces gated class logits:
For regression, the same gate multiplies a scalar rule output.
Output aggregation
Normalized rule strengths are computed as:
The final model output is the weighted sum of gated rule consequents:
Threshold search and pruning
Gate thresholds are computed from the learned gate values and two coefficients \(\zeta_{\lambda}\) and \(\zeta_{\theta}\):
Features and rules with gate values below these thresholds are pruned.
Code ↔ Paper Correspondence
| Concept | highFIS class / method | Notes |
|---|---|---|
| Adaptive Ln-Exp softmin antecedent | DGALETSKRuleLayer |
Implements ALE softmin (paper eq. 22) with adaptive \(\hat{q}=-700/\max_d\tilde{\mu}_{r,d}\) |
| Feature gates | DGALETSKRuleLayer.lambda_gates + ExpGate(k=10) |
\(\mu^{M(\lambda_d)}\) exponential gating (paper eqs. 12, 24) |
| Rule gates | GatedClassificationZeroOrderConsequentLayer.theta_gates, GatedRegressionZeroOrderConsequentLayer.theta_gates |
Gated zero-order consequents during DG training |
| Zero-order DG phase | fit_dg_phase() |
Jointly optimizes antecedent, feature gates, rule gates, and zero-order consequents |
| First-order conversion | convert_to_first_order() |
Preserves learned rule gates and switches consequent form |
| Threshold computation | compute_thresholds(zeta_lambda, zeta_theta) |
Computes pruning thresholds from gate values |
| Threshold pruning | apply_thresholds(tau_lambda, tau_theta) |
Sets low gate values to zero |
| Threshold search | search_thresholds(...) |
Grid-searches \zeta_\lambda / \zeta_\theta and optionally refits consequents |
Implementation notes
- The highFIS DG-ALETSK implementation uses
rule_base='coco'by default. use_en_frb=Truestarts from an enhanced rule base (enFRB), but the paper's point-based FRB (P-FRB) is not constructed by default.- The DG-ALETSK paper justifies P-FRB as a way to initialize an abundant candidate rule base from training samples, enabling the gate-based DG phase to perform rule extraction and feature selection in tandem.
- Estimator wrappers now support
rule_base='pfrb', which builds a point-based FRB from training samples and uses a CoCo rule base over the resulting sample-centered Gaussian MFs. Usepfrb_max_rulesto cap the number of sample-based rules when the training set is large. DGALETSKClassifierandDGALETSKRegressortrain a zero-order model infit_dg_phase(). The recommended Phase 2 workflow differs by task:- Classification (
use_lse=False):search_thresholdsevaluates the zero-order model on the validation set directly. Callfit_finetuneafterwards to convert to first-order and retrain consequents with MFs and λ-gates frozen (paper §3.3). - Regression (
use_lse=True):search_thresholdsprunes gates and fits first-order consequents via least squares in one step. The LSE result is the final model — do not callfit_finetuneafterwards, as it would reset the LSE-fitted weights and retrain from zero.
- Classification (
fit_finetunefreezes all MF parameters (centres and spreads) and the feature-selection gates (λ) during gradient fine-tuning, retaining only the consequent layer as trainable. This implements paper §3.3: "we fix the first group of gates and the membership functions."- Loss function: the paper uses MSE throughout, including for classification
(applied to one-hot targets). highFIS uses cross-entropy for classification
and MSE for regression. Cross-entropy is more appropriate for discrete class
outputs and is used in both
fit_dg_phaseandfit_finetune. - The feature gate uses
ExpGate(k=10)(\(M(\lambda)=1-e^{-10\lambda^2}\), paper eq. 24), the enhanced gate function introduced in DG-ALETSK. Gate values are applied to antecedent memberships as \(\mu^{M(\lambda_d)}\) (paper eq. 12), i.e., exponential gating. - Threshold search is implemented by deep-copying the current model, pruning candidate copies, optionally refitting first-order consequents via least squares, and selecting the best validation score.
highFIS API summary
fit_dg_phase(x, y, **kwargs)— train the DG-ALETSK zero-order model.convert_to_first_order()— convert the trained zero-order consequent to a gated first-order consequent.compute_thresholds(zeta_lambda, zeta_theta)— compute pruning thresholds from current gate activations.apply_thresholds(tau_lambda, tau_theta)— prune low-value gates.search_thresholds(x, y, *, zeta_lambda, zeta_theta, x_val, y_val, use_lse, inplace, ...)— grid-search over(zeta_lambda, zeta_theta)pairs and select the best gate thresholds by validation score. Whenuse_lse=True, each candidate also fits first-order consequents via least squares before scoring (recommended for regression). Whenuse_lse=False, the zero-order model is scored directly (recommended for classification). Withinplace=True, the winning thresholds are applied toself.fit_finetune(x, y, **kwargs)— convert the pruned zero-order model to first-order, reset consequent weights to zero, and retrain with MFs and λ-gates frozen. Call this only aftersearch_thresholds(use_lse=False)(classification path). Do not call it aftersearch_thresholds(use_lse=True), which already produces final first-order consequents via LSE.
Examples
Classification
For classification, Phase 2 uses use_lse=False so that threshold candidates are
ranked by the zero-order model's accuracy on the validation set. After selecting
the best thresholds, fit_finetune converts the model to first-order and retrains
consequents with MFs and feature gates frozen.
from highfis.models import DGALETSKClassifierModel
from highfis import GaussianMF
input_mfs = {
"x1": [GaussianMF(mean=-1.0, sigma=1.0), GaussianMF(mean=1.0, sigma=1.0)],
"x2": [GaussianMF(mean=-1.0, sigma=1.0), GaussianMF(mean=1.0, sigma=1.0)],
}
model = DGALETSKClassifierModel(
input_mfs,
n_classes=2,
use_en_frb=False,
)
history = model.fit_dg_phase(X_train, y_train, epochs=30, learning_rate=1e-3)
# Phase 2a: select thresholds by evaluating the zero-order model.
result = model.search_thresholds(
X_train,
y_train,
zeta_lambda=[0.0, 0.25, 0.5, 0.75, 1.0],
zeta_theta=[0.0, 0.25, 0.5],
x_val=X_val,
y_val=y_val,
use_lse=False, # evaluate zero-order quality; do not fit first-order here
inplace=True,
)
print(result)
# Phase 2b: convert to first-order and fine-tune (MFs and λ-gates frozen).
model.fit_finetune(X_train, y_train, epochs=60, learning_rate=1e-3)
Regression
For regression, Phase 2 uses use_lse=True so that each threshold candidate is
evaluated after fitting first-order consequents via least squares. The best
candidate's LSE-fitted model is the final result — do not call fit_finetune,
which would reset those weights.
from highfis.models import DGALETSKRegressorModel
from highfis import GaussianMF
input_mfs = {
"x1": [GaussianMF(mean=-1.0, sigma=1.0), GaussianMF(mean=1.0, sigma=1.0)],
"x2": [GaussianMF(mean=-1.0, sigma=1.0), GaussianMF(mean=1.0, sigma=1.0)],
}
model = DGALETSKRegressorModel(
input_mfs,
use_en_frb=False,
)
history = model.fit_dg_phase(X_train, y_train, epochs=60, learning_rate=1e-3)
# Phase 2: select thresholds and fit first-order consequents via LSE in one step.
result = model.search_thresholds(
X_train,
y_train,
zeta_lambda=[0.0, 0.25, 0.5, 0.75, 1.0],
zeta_theta=[0.0, 0.25, 0.5],
x_val=X_val,
y_val=y_val,
use_lse=True, # fit first-order via LSE; this IS the final model
inplace=True,
)
print(result)
# No fit_finetune here — LSE consequents are already optimal.