Hyperparameter Tuning and Scikit-Learn Integration
Every model family in highFIS provides a high-level estimator class that is fully compatible with the scikit-learn API. This compatibility means that highFIS estimators integrate natively with standard model selection, pipeline, and tuning tools such as Pipeline, cross_val_score, GridSearchCV, and RandomizedSearchCV.
1. Using highFIS in a Pipeline
TSK systems are sensitive to input scaling because membership functions are defined over the feature bounds. Preprocessing with a scaler like MinMaxScaler or StandardScaler is highly recommended.
You can build a scikit-learn Pipeline to chain preprocessing and model fitting together:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from highfis import HTSKClassifier
# Generate classification data
X, y = make_classification(n_samples=600, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Build a pipeline
pipeline = Pipeline([
("scaler", MinMaxScaler()),
("classifier", HTSKClassifier(n_mfs=3, epochs=100, random_state=42))
])
# Fit the entire pipeline
pipeline.fit(X_train, y_train)
# Evaluate on test data
accuracy = pipeline.score(X_test, y_test)
print(f"Pipeline Test Accuracy: {accuracy:.2%}")
2. Cross-Validation
You can perform k-fold cross-validation using cross_val_score to verify model stability:
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import make_pipeline
from highfis import HTSKClassifier
# Chain scaler and estimator
model = make_pipeline(
MinMaxScaler(),
HTSKClassifier(n_mfs=3, epochs=80, random_state=42)
)
# Run stratified 5-fold cross validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring="accuracy")
print("CV Accuracies:", scores)
print("Mean Accuracy:", scores.mean())
3. Hyperparameter Tuning with GridSearchCV
To find the optimal configuration for your neuro-fuzzy system, you can use GridSearchCV to test combinations of hyperparameters:
* n_mfs (number of membership functions/rules)
* mf_init (initialization strategy: "kmeans", "fcm", etc.)
* learning_rate (training step size)
When tuning parameters in a pipeline, prefix the parameter names with the pipeline step name followed by a double underscore (classifier__<parameter>).
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from highfis import HTSKClassifier
# Setup pipeline
pipe = Pipeline([
("scaler", MinMaxScaler()),
("classifier", HTSKClassifier(epochs=100, random_state=42, verbose=False))
])
# Define the parameter grid
param_grid = {
"classifier__n_mfs": [2, 3, 5],
"classifier__mf_init": ["kmeans", "fcm"],
"classifier__learning_rate": [0.01, 0.005]
}
# Run grid search
grid = GridSearchCV(pipe, param_grid, cv=3, scoring="accuracy", n_jobs=1)
grid.fit(X_train, y_train)
# Output results
print("Best parameters found:", grid.best_params_)
print("Best cross-validation accuracy:", grid.best_score_)
# Evaluate best estimator on holdout test set
best_pipeline = grid.best_estimator_
print("Test Score:", best_pipeline.score(X_test, y_test))
4. RandomizedSearchCV for Large Spaces
If you are tuning multiple parameters across a wide search space, RandomizedSearchCV is more efficient than exhaustive grid search:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform, randint
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from highfis import HTSKClassifier
# Define parameter distribution
param_dist = {
"classifier__n_mfs": randint(2, 8),
"classifier__learning_rate": loguniform(1e-3, 1e-1),
"classifier__mf_init": ["kmeans", "minibatch_kmeans", "fcm"]
}
# Run randomized search
random_search = RandomizedSearchCV(
pipe,
param_distributions=param_dist,
n_iter=10,
cv=3,
random_state=42,
n_jobs=1
)
random_search.fit(X_train, y_train)
print("Best Parameters:", random_search.best_params_)