ml-assert
A lightweight, chainable assertion toolkit for validating data and models in ML workflows.
Features
- Data Validation: Schema, nulls, uniqueness, ranges, and value sets.
- Statistical Checks: Distribution drift (KS, Chi-square) and drift detection.
- Model Performance: Accuracy, precision, recall, F1-score, and ROC AUC.
- Fairness & Explainability: Demographic parity, equal opportunity, and SHAP values.
- Integrations: MLflow, Prometheus, Slack, and DVC.
- Fluent Interface: Chain assertions for clean, readable code.
- CLI Runner: Run checks from a YAML configuration.
Architecture Overview
graph TD
A[Core Assertions] --> B[DataFrameAssertion]
A --> C[ModelAssertion]
A --> D[FairnessAssertion]
B & C & D --> E[AssertionResult]
E --> F[Integrations]
F --> G[SlackAlerter]
F --> H[PrometheusExporter]
F --> I[MLflowLogger]
F --> J[DVCArtifactChecker]
E --> K[Plugins]
K --> L[Custom Plugins]
K --> M[Built-in Plugins]
E --> N[CLI Runner]
N --> O[YAML Config]
Documentation Structure
- User Guide: High-level documentation with examples and use cases
- Data Assertions
- Statistical Assertions
- Model Performance
- Fairness & Explainability
- Integrations
-
API Reference: Detailed technical documentation
- Core API
- Data API
- Model API
- Stats API
- Fairness API
- Integrations API
- Plugins API
Installation
pip install ml-assert
Quick Start
Data Assertions
import pandas as pd
from ml_assert import Assertion, schema
df = pd.DataFrame({
"id": [1, 2, 3],
"name": ["A", "B", "C"],
"score": [0.9, 0.8, 0.7]
})
# Create a schema
s = schema()
s.col("id").is_unique()
s.col("score").in_range(0.0, 1.0)
# Validate
Assertion(df).satisfies(s).no_nulls().validate()
Model Performance
from ml_assert import assert_model
assert_model(y_true, y_pred, y_scores) \
.accuracy(min_score=0.8) \
.precision(min_score=0.8) \
.recall(min_score=0.8) \
.f1(min_score=0.8) \
.roc_auc(min_score=0.9) \
.validate()
Statistical Drift
from ml_assert.stats.drift import assert_no_drift
assert_no_drift(df_train, df_test, alpha=0.05)
Fairness & Explainability
from ml_assert.fairness import assert_fairness
assert_fairness(
y_true=y_true,
y_pred=y_pred,
sensitive_features=sensitive_features,
metrics=["demographic_parity", "equal_opportunity"],
threshold=0.1
)
CLI Usage
Run checks from a YAML file:
ml_assert run --config /path/to/config.yaml
Example config:
steps:
- type: drift
train: 'ref.csv'
test: 'cur.csv'
alpha: 0.05
- type: model_performance
y_true: 'y_true.csv'
y_pred: 'y_pred.csv'
y_scores: 'y_scores.csv'
assertions:
accuracy: 0.75
roc_auc: 0.80
- type: file_exists
path: 'my_model.pkl'
For more examples and detailed documentation, see the User Guide and API Reference.