Dataframely
============
Dataframely is a Python package to validate the schema and content of `polars `_ data frames.
Its purpose is to make data pipelines more robust by ensuring that data meet expectations and more readable by adding schema information to data frame type hints.
Features
--------
- Declaratively define schemas as classes with arbitrary inheritance structure
- Specify column-specific validation rules (e.g. nullability, minimum string length, ...)
- Specify cross-column and group validation rules with built-in support for checking the primary key property of a column set
- Specify validation constraints across collections of interdependent data frames
- Validate data frames softly by simply filtering out rows violating rules instead of failing hard
- Introspect validation failure information for run-time failures
- Enhanced type hints for validated data frames allowing users to clearly express expectations about inputs and outputs (i.e., contracts) in data pipelines
- Integrate schemas with external tools (e.g., ``sqlalchemy`` or ``pyarrow``)
- Generate test data that comply with a schema or collection of schemas and its validation rules
Contents
========
.. toctree::
:caption: Contents
:maxdepth: 2
Installation
Quickstart
Real-world Example
Features
FAQ
Development Guide
Versioning
API Documentation
=================
.. toctree::
:caption: API Documentation
:maxdepth: 1
Collection <_api/dataframely.collection>
Column Types <_api/dataframely.columns>
Config <_api/dataframely.config>
Random Data Generation <_api/dataframely.random>
Failure Information <_api/dataframely.failure>
Schema <_api/dataframely.schema>