ConfidenceThresholdEvaluator

class skeval.evaluators.confidence.ConfidenceThresholdEvaluator(model: ~typing.Any, scorer: ~typing.Callable[[...], ~typing.Any] | ~typing.Mapping[str, ~typing.Callable[[...], ~typing.Any]] = <function accuracy_score>, verbose: bool = False)[source]

Bases: BaseEvaluator

Confidence-based evaluator for classification models.

This evaluator filters predictions from a classification model according to a confidence threshold. Only predictions whose confidence (top-class probability, or other chosen score) is greater than or equal to the given threshold are treated as “trusted”; the remaining predictions are flipped (binary case) to build an expected label vector used for metric estimation.

Parameters:
  • model (object) – Any classifier implementing fit, predict and either predict_proba or decision_function.

  • scorer (callable or dict of str -> callable, default=accuracy_score) – Single scoring function or mapping of metric names to callables with signature scorer(y_true, y_pred).

  • verbose (bool, default=False) – If True, prints intermediate information during fitting and estimation.

model

The primary model evaluated.

Type:

object

scorer

Scoring function(s) applied to agreement-based labels.

Type:

callable or dict

verbose

Verbosity flag.

Type:

bool

Examples

Example using medical datasets and a RandomForest pipeline:

>>> import pandas as pd
>>> from sklearn.metrics import accuracy_score, f1_score
>>> from sklearn.impute import KNNImputer
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.ensemble import RandomForestClassifier
>>> from skeval.evaluators.confidence import ConfidenceThresholdEvaluator
>>> from skeval.utils import get_cv_and_real_scores, print_comparison
>>> # 1. Load datasets
>>> df_geriatrics = pd.read_csv("geriatria.csv")
>>> df_neurology = pd.read_csv("neurologia.csv")
>>> # 2. Separate features and target
>>> X1, y1 = df_geriatrics.drop(columns=["Alzheimer"]), df_geriatrics["Alzheimer"]
>>> X2, y2 = df_neurology.drop(columns=["Alzheimer"]), df_neurology["Alzheimer"]
>>> # 3. Define model pipeline
>>> model = make_pipeline(
...     KNNImputer(n_neighbors=4),
...     RandomForestClassifier(n_estimators=300, random_state=42),
... )
>>> # 4. Initialize evaluator with scorers
>>> scorers = {
...     "accuracy": accuracy_score,
...     "f1_macro": lambda y, p: f1_score(y, p, average="macro"),
... }
>>> evaluator = ConfidenceThresholdEvaluator(model=model, scorer=scorers)
>>> # 5. Fit evaluator
>>> evaluator.fit(X1, y1)
>>> # 6. Estimated performance (using confidence threshold)
>>> estimated_scores = evaluator.estimate(X2, threshold=0.65, limit_to_top_class=True)
>>> # 7. Cross-validation and real performance comparison
>>> scores_dict = get_cv_and_real_scores(
...     model=model, scorers=scorers, train_data=(X1, y1), test_data=(X2, y2)
... )
>>> cv_scores = scores_dict["cv_scores"]
>>> real_scores = scores_dict["real_scores"]
>>> print_comparison(scorers, cv_scores, estimated_scores, real_scores)
estimate(x_eval: Any, threshold: float = 0.65, limit_to_top_class: bool = True) Dict[str, float][source]

Estimates scores based on the confidence threshold.

This method calculates the prediction confidences, filters out those that do not meet the threshold, and then computes the score(s) specified in the scorer.

Parameters:
  • x_eval (array-like of shape (n_samples, n_features)) – Input data for which to estimate scores.

  • threshold (float, default=0.8) – The minimum confidence required to include a prediction in the calculation.

  • limit_to_top_class (bool, default=True) – If True, uses only the probability of the top class as the confidence score.

Returns:

A dictionary with estimated scores for each scorer.

If no predictions pass the threshold, it returns 0.0 for each scorer.

Return type:

dict

fit(x: Any, y: Any) ConfidenceThresholdEvaluator[source]

Fits the model to the training data.

Parameters:
  • x (array-like of shape (n_samples, n_features)) – The training input samples.

  • y (array-like of shape (n_samples,)) – The target labels.

Returns:

self – Returns the instance itself.

Return type:

object