AgreementEvaluator¶
- class skeval.evaluators.agreement.AgreementEvaluator(model: ~typing.Any, scorer: ~typing.Mapping[str, ~typing.Any] | ~typing.Any = <function accuracy_score>, verbose: bool = False, sec_model: ~typing.Any | None = None)[source]¶
Bases:
BaseEvaluatorAgreement-based evaluator for supervised classification models.
This evaluator compares predictions produced by a primary model (
model) and a secondary model (sec_model) on an evaluation set. For each sample, an agreement indicator is defined as1when both models predict the same class and0otherwise. Using this indicator, an expected label vector is created by flipping the primary model’s prediction when the models disagree. Metric(s) are then computed comparing the expected label vector to the agreement indicator, providing an estimate of how often the primary model’s predictions would align with a plausible correction strategy based on model disagreement.Evaluation workflow: 1. Fit both the primary and secondary models on the training data. 2. Generate predictions for both models on the evaluation data. 3. Build the agreement vector (1 = same prediction, 0 = different). 4. Produce an expected label vector, flipping predictions where disagreement occurs. 5. Compute the chosen metric(s) using the scorer(s).
- Parameters:
model (estimator) – A classification estimator implementing
fitandpredict. May be a single estimator or a pipeline created withsklearn.pipeline.make_pipeline.scorer (callable or dict of str -> callable, default=accuracy_score) – A single scoring function or a dictionary mapping metric names to scoring callables. Each scorer must follow the signature
scorer(y_true, y_pred).verbose (bool, default=False) – If
True, prints progress information during fit and estimate.sec_model (estimator, optional) – Secondary classification model used solely to generate comparison predictions. If
None, defaults toGaussianNB().
- model¶
The primary model provided at initialization.
- Type:
estimator
- sec_model¶
The secondary model used to create agreement signals.
- Type:
estimator
Notes
This evaluator assumes both models output class labels directly via
predict. No probability calibration is performed. The metric(s) are computed on synthetic targets produced from model agreement—not against real ground-truth labels—so scores should be interpreted as agreement-based estimates, not actual performance metrics.Examples
Basic usage with two RandomForest pipelines and multiple scorers:
>>> import pandas as pd >>> from sklearn.metrics import accuracy_score, f1_score >>> from sklearn.impute import KNNImputer >>> from sklearn.pipeline import make_pipeline >>> from sklearn.ensemble import RandomForestClassifier >>> from skeval.evaluators.agreement import AgreementEvaluator >>> from skeval.utils import get_cv_and_real_scores, print_comparison >>> df_geriatrics = pd.read_csv("geriatria.csv") >>> df_neurology = pd.read_csv("neurologia.csv") >>> X1, y1 = df_geriatrics.drop(columns=["Alzheimer"]), df_geriatrics["Alzheimer"] >>> X2, y2 = df_neurology.drop(columns=["Alzheimer"]), df_neurology["Alzheimer"] >>> model = make_pipeline( ... KNNImputer(n_neighbors=10), ... RandomForestClassifier(n_estimators=50, random_state=42), ... ) >>> sec_model = make_pipeline( ... KNNImputer(n_neighbors=10), ... RandomForestClassifier(n_estimators=100, random_state=42), ... ) >>> scorers = { ... "accuracy": accuracy_score, ... "f1_macro": lambda y, p: f1_score(y, p, average="macro"), ... } >>> evaluator = AgreementEvaluator(model=model, sec_model=sec_model, scorer=scorers) >>> evaluator.fit(X1, y1) >>> estimated_scores = evaluator.estimate(X2) >>> # Optionally compare with CV and real scores >>> scores_dict = get_cv_and_real_scores( ... model=model, scorers=scorers, train_data=(X1, y1), test_data=(X2, y2) ... ) >>> cv_scores = scores_dict["cv_scores"] >>> real_scores = scores_dict["real_scores"] >>> print_comparison(scorers, cv_scores, estimated_scores, real_scores)
- estimate(x_eval: Any) Dict[str, float][source]¶
Estimate agreement-based metric values on evaluation data.
Generates predictions from both models, constructs an agreement vector and an expected label vector (flipping the primary prediction when disagreement occurs), then applies the configured scorer(s).
- Parameters:
x_eval (array-like of shape (n_samples, n_features)) – Evaluation feature matrix.
- Returns:
scores – If
scoreris a dict, returns a mapping from metric name to agreement-based score. Otherwise returns{"score": float}.- Return type:
dict
- Raises:
ValueError – If
scoreris neither a callable nor a dict of callables.
- fit(x: Any, y: Any) AgreementEvaluator[source]¶
Fit the evaluator by training both primary and secondary models.
- Parameters:
x (array-like of shape (n_samples, n_features)) – Feature matrix used to fit both models.
y (array-like of shape (n_samples,)) – Target labels corresponding to
x.
- Returns:
self – The fitted evaluator instance.
- Return type: