Data Assertions
The ml_assert
module provides a fluent, chainable API for validating the integrity and structure of your pandas
DataFrames.
Quick Start
import pandas as pd
from ml_assert import Assertion, schema
# Create a DataFrame
df = pd.DataFrame({
"user_id": [1, 2, 3],
"age": [25, 30, 35],
"plan_type": ["basic", "premium", "basic"]
})
# Create a schema
s = schema()
s.col("user_id").is_unique()
s.col("age").in_range(18, 70)
s.col("plan_type").is_type("object")
# Validate
Assertion(df).satisfies(s).no_nulls().validate()
Schema Builder
The schema()
builder provides a fluent interface for defining DataFrame validation rules.
Basic Usage
from ml_assert import schema
# Create a schema
s = schema()
s.col("user_id").is_unique()
s.col("age").in_range(18, 70)
s.col("plan_type").in_set(["basic", "premium", "free"])
Available Validators
is_unique()
Checks if column values are unique.
in_range(min_val, max_val)
Checks if column values are within a range.
Parameters:
- min_val
: Minimum allowed value
- max_val
: Maximum allowed value
is_type(dtype)
Checks if column has the specified data type.
Parameters:
- dtype
: Expected data type (e.g., "int64", "float64", "object")
in_set(allowed_values)
Checks if column values are in a set of allowed values.
Parameters:
- allowed_values
: Set or list of allowed values
matches(pattern)
Checks if column values match a regex pattern.
Parameters:
- pattern
: Regular expression pattern to match
is_not_null()
Checks if column has no null values.
is_sorted(ascending=True)
Checks if column is sorted.
Parameters:
- ascending
: Whether to check ascending order (default: True)
DataFrameAssertion
The main class for DataFrame validation.
Methods
satisfies(schema)
Validates the DataFrame against a schema definition.
Parameters:
- schema
: A schema object created using the schema()
builder
Returns:
- self
for method chaining
no_nulls(columns=None)
Checks for null values in specified columns.
Parameters:
- columns
: Optional list of column names to check. If None, checks all columns.
Returns:
- self
for method chaining
validate()
Executes all chained assertions.
Raises:
- AssertionError
if any assertion fails
Error Handling & Result Reporting
- All assertion methods raise
AssertionError
if a check fails during chaining, unless.validate()
is called. .validate()
returns anAssertionResult
object:success
(bool): True if all assertions passed.message
(str): Summary message.timestamp
(datetime): When the check was run.metadata
(dict): Details of each assertion (name, args, success, error if any).
Examples
Basic Schema Validation
from ml_assert import Assertion, schema
# Create a schema
s = schema()
s.col("id").is_unique()
s.col("age").in_range(18, 70)
s.col("email").matches(r"^[^@]+@[^@]+\.[^@]+$")
# Validate
Assertion(df).satisfies(s).validate()
Complex Schema with Multiple Rules
from ml_assert import Assertion, schema
# Create a schema
s = schema()
s.col("user_id").is_unique().is_not_null()
s.col("age").in_range(18, 70).is_type("int64")
s.col("plan_type").in_set(["basic", "premium", "free"])
s.col("subscription_date").is_sorted(ascending=True)
# Validate
Assertion(df).satisfies(s).no_nulls(["user_id", "plan_type"]).validate()
For more detailed API reference, see Data API.