skeval.metrics

comparison module

skeval.metrics.comparison.score_error(real_scores: ~typing.Mapping[str, float], est_scores: ~typing.Mapping[str, float], comparator: ~typing.Callable[[~typing.Any, ~typing.Any], float] | ~typing.Mapping[str, ~typing.Callable[[~typing.Any, ~typing.Any], float]] = <function mean_absolute_error>, verbose: bool = False) Dict[str, float][source]

Compares estimated and real scores using a user-defined comparison function.

This function iterates through the metrics present in both real_scores and est_scores dictionaries and computes the error between them using the provided comparator function(s).

Parameters:
  • real_scores (dict) – A dictionary of scores computed with true labels. Example: {‘accuracy’: 0.9, ‘f1’: 0.85}

  • est_scores (dict) – A dictionary of scores estimated without true labels. Example: {‘accuracy’: 0.88, ‘f1’: 0.82}

  • comparator (callable or dict, default=mean_absolute_error) – The function or dictionary of functions used to compare the real and estimated scores. - If callable, it’s applied to all common metrics. - If dict, it maps a metric name to a specific comparator function.

  • verbose (bool, default=False) – If True, prints the real score, estimated score, and the resulting error for each metric.

Returns:

A dictionary containing the comparison results (errors) for each common metric.

Return type:

dict

Raises:

ValueError – If comparator is not a callable or a dictionary of callables.

Examples

>>> real = {'accuracy': 0.95, 'precision': 0.90, 'recall': 0.85}
>>> estimated = {'accuracy': 0.91, 'precision': 0.92, 'f1_score': 0.88}
>>> # Example 1: Using the default comparator (mean_absolute_error)
>>> errors = score_error(real, estimated)
>>> for metric, error in sorted(errors.items()):
...     print(f"{metric}: {error:.4f}")
accuracy: 0.0400
precision: 0.0200
>>> # Example 2: Using a dictionary of different comparators
>>> from sklearn.metrics import mean_squared_error
>>> comparators = {
...     'accuracy': mean_absolute_error,
...     'precision': mean_squared_error
... }
>>> errors_custom = score_error(real, estimated, comparator=comparators, verbose=True)
[accuracy] Real: 0.95, Estimated: 0.91, Error: 0.040000000000000036
[precision] Real: 0.9, Estimated: 0.92, Error: 0.0004000000000000003
>>> for metric, error in sorted(errors_custom.items()):
...     print(f"{metric}: {error:.4f}")
accuracy: 0.0400
precision: 0.0004

scorers module

skeval.metrics.scorers.make_scorer(func: Callable[[...], float], **kwargs: Any) Callable[[Any, Any], float][source]

Wraps a metric function with fixed keyword arguments into a simple scorer.

This utility is useful for creating a unified scorer interface from metric functions that require specific arguments (like average=’macro’ for f1_score).

Parameters:
  • func (callable) – A metric function from a library like scikit-learn, such as accuracy_score, f1_score, etc.

  • **kwargs (dict) – Keyword arguments to be permanently passed to the metric function whenever the scorer is called.

Returns:

A new scorer function that accepts only y_true and y_pred as arguments.

Return type:

callable

Examples

>>> from sklearn.metrics import f1_score
>>> import numpy as np
>>> # Ground truth and predictions for a multi-class problem
>>> y_true = np.array([0, 1, 2, 0, 1, 2])
>>> y_pred = np.array([0, 2, 1, 0, 0, 1])
>>> # Create a scorer for F1-score with 'macro' averaging
>>> macro_f1_scorer = make_scorer(f1_score, average='macro')
>>> # Use the new scorer
>>> score = macro_f1_scorer(y_true, y_pred)
>>> print(f"Macro F1 Score: {score:.4f}")
Macro F1 Score: 0.2667
>>> # The result is identical to calling f1_score directly with the argument
>>> direct_call_score = f1_score(y_true, y_pred, average='macro')
>>> print(f"Direct call F1 Score: {direct_call_score:.4f}")
Direct call F1 Score: 0.2667
>>> np.isclose(score, direct_call_score)
True