ShapEvaluator¶
- class skeval.evaluators.shap.ShapEvaluator(model: ~typing.Any, scorer: ~typing.Callable[[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes], ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes]], float] | ~typing.Mapping[str, ~typing.Callable[[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes], ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes]], float]] = <function accuracy_score>, verbose: bool = False, inner_clf: ~typing.Any | None = None)[source]¶
Bases:
BaseEvaluatorSHAP-based evaluator for supervised classification models.
This evaluator uses SHAP values computed from a tree-based classifier to train a secondary model that predicts the correctness of the original classifier’s predictions. The predicted correctness on the evaluation set is used to generate an expected label vector, which is then compared with the model predictions to estimate the chosen metric(s).
The evaluation process follows four steps: (1) compute SHAP values on the training and evaluation sets, (2) train a correctness classifier using SHAP values as input, (3) predict correctness for evaluation samples, (4) flip labels where the model is predicted to be wrong, generating an “expected label” vector used to estimate metrics.
- Parameters:
model (estimator) – A classification model implementing
fitandpredict. For SHAP computation usingTreeExplainer. Compatible withsklearn.make_pipeline.scorer (callable or dict of str -> callable, default=accuracy_score) – A scoring function or a dictionary mapping metric names to scoring functions. Scorers must follow the signature
scorer(y_true, y_pred).verbose (bool, default=False) – If True, prints additional progress information.
inner_clf (estimator, optional) – Classifier trained on SHAP values to estimate correctness. If None, defaults to
XGBClassifier(random_state=42).
- model¶
The model provided at initialization.
- Type:
estimator
- inner_clf¶
The classifier used to model correctness from SHAP values.
- Type:
estimator
- explainer¶
Object responsible for computing SHAP values.
- Type:
shap.TreeExplainer
Notes
SHAP computation requirement: The final estimator in
model(or the estimator itself, if not a pipeline) must be compatible withshap.TreeExplainer.Estimate method: The method performs multiple correctness predictions and averages the resulting estimated metrics. This introduces stochasticity and aims to approximate uncertainty in the correctness model.
Examples
>>> # Authors: The scikit-autoeval developers >>> # SPDX-License-Identifier: BSD-3-Clause >>> >>> # ============================================================== >>> # ShapEvaluator Example >>> # ============================================================== >>> >>> import pandas as pd >>> from sklearn.metrics import accuracy_score, f1_score >>> from sklearn.impute import KNNImputer >>> from sklearn.pipeline import make_pipeline >>> from xgboost import XGBClassifier >>> >>> from skeval.evaluators.shap import ShapEvaluator >>> from skeval.utils import get_cv_and_real_scores, print_comparison >>> >>> def run_shap_eval(verbose=False): >>> # ===================================== >>> # 1. Load datasets >>> # ===================================== >>> geriatrics = pd.read_csv("geriatria.csv") >>> neurology = pd.read_csv("neurologia.csv") >>> >>> # ===================================== >>> # 2. Separate features and target >>> # ===================================== >>> X1, y1 = geriatrics.drop(columns=["Alzheimer"]), geriatrics["Alzheimer"] >>> X2, y2 = neurology.drop(columns=["Alzheimer"]), neurology["Alzheimer"] >>> >>> # ===================================== >>> # 3. Define pipeline (KNNImputer + RandomForest) >>> # ===================================== >>> model = make_pipeline(KNNImputer(n_neighbors=5), XGBClassifier()) >>> >>> # ===================================== >>> # 4. Define scorers and evaluator >>> # ===================================== >>> scorers = { >>> "accuracy": accuracy_score, >>> "f1_macro": lambda y, p: f1_score(y, p, average="macro"), >>> } >>> >>> evaluator = ShapEvaluator( >>> model=model, >>> scorer=scorers, >>> verbose=False, >>> inner_clf=XGBClassifier(random_state=42), >>> ) >>> >>> # ===================================== >>> # 5. Fit evaluator on geriatrics data >>> # ===================================== >>> evaluator.fit(X1, y1) >>> >>> # ===================================== >>> # 6. Estimate performance (train on X1, estimate on X2) >>> # ===================================== >>> >>> estimated_scores = evaluator.estimate(X2) >>> >>> # ===================================== >>> # 7. Compute real and CV performance >>> # ===================================== >>> train_data = X1, y1 >>> test_data = X2, y2 >>> scores_dict = get_cv_and_real_scores( >>> model=model, scorers=scorers, train_data=train_data, test_data=test_data >>> ) >>> cv_scores = scores_dict["cv_scores"] >>> real_scores = scores_dict["real_scores"] >>> >>> if verbose: >>> print_comparison(scorers, cv_scores, estimated_scores, real_scores) >>> >>> return {"cv": cv_scores, "estimated": estimated_scores, "real": real_scores} >>> >>> if __name__ == "__main__": >>> results = run_shap_eval(verbose=True)
- estimate(x_eval: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], n_pred: int = 30, train_data: Tuple[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None = None) Dict[str, float][source]¶
Estimate metric values using SHAP-based correctness prediction.
SHAP values are computed for train and eval sets, used to train a correctness classifier, and the resulting correctness predictions decide when to flip the model predictions before scoring. Results are averaged over n_pred iterations.
- Parameters:
x_eval (array-like) – Feature matrix for evaluation.
n_pred (int) – Number of correctness predictions to average over.
train_data (tuple of (array-like, array-like), optional) – Training data (X, y) used to fit the original model and compute SHAP values, if not already provided during fit().
- Returns:
Mapping me
- Return type:
dict
- fit(x: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, y: Sequence[Any] | None = None) ShapEvaluator[source]¶
Fit the model used by the evaluator.
If x_train and y_train were provided during initialization, they take precedence. Otherwise, the provided x and y are stored and used for computing SHAP values during the estimation step.
- Parameters:
x (array-like of shape (n_samples, n_features)) – Feature matrix used to fit the original model and to compute SHAP values if x_train is not already defined.
y (array-like of shape (n_samples,)) – Labels corresponding to x. Required if no training data was provided at initialization.
- Returns:
self – The fitted evaluator instance.
- Return type:
- Raises:
ValueError – If no training data is available, preventing the underlying model from being fitted.