Here are the foundational pieces that make GX so powerful.
1. Expectations
An Expectation is a declarative rule about your data.
For example:
PYTHON
expect_column_values_to_not_be_null("customer_id")
expect_column_values_to_be_between("age", 18, 60)
expect_column_values_to_match_regex("email", r"[^@]+@[^@]+\.[^@]+")
GX includes over 100 built-in Expectations, covering:
- Schema validation
- Numeric range checks
- Regex patterns
- Uniqueness and null detection
- Custom logic through Python functions
2. Expectation Suites
A collection of related expectations, grouped logically into a suite.
For example:
JSON
{
"expectations": [
{"expect_column_values_to_not_be_null": {"column": "customer_id"}},
{"expect_column_values_to_be_between": {"column": "age", "min_value": 18, "max_value": 60}}
]
}
Suites act as data quality contracts, version-controlled just like code.
3. Checkpoints
A Checkpoint runs an Expectation Suite against a dataset.
You can trigger Checkpoints:
- On a schedule (via Airflow or Dagster)
- On data arrival (via AWS Lambda or S3 events)
- In CI/CD (to validate data during deployment)
Example checkpoint configuration:
YAML
name: customer_data_checkpoint
expectation_suite_name: customer_suite
validations:
- batch_request:
datasource_name: customer_db
data_asset_name: customers
4. Data Docs
Data Docs are auto-generated HTML reports that visualize your validation results beautifully.
They include:
- Which expectations passed or failed
- Validation timestamps
- Links to datasets and checkpoints
You can host Data Docs internally or share them across teams — perfect for collaboration between data engineers, analysts, and business users.