FAQ#
Whenever you find out something that you were surprised by or needed some non-trivial thinking, please add it here.
How do I define additional unique keys in a Schema?#
By default, dataframely only supports defining a single non-nullable (composite) primary key in :class:
~dataframely.Schema.
However, in some scenarios it may be useful to define additional unique keys (which support nullable fields and/or which
are additionally unique).
Consider the following example, which demonstrates two rules: one for validating that a field is entirely unique, and another for validating that a field, when provided, is unique.
class UserSchema(dy.Schema):
user_id = dy.UInt64(primary_key=True, nullable=False)
username = dy.String(nullable=False)
email = dy.String(nullable=True) # Must be unique, or null.
@dy.rule(group_by=["username"])
def unique_username(cls) -> pl.Expr:
"""Username, a non-nullable field, must be total unique."""
return pl.len() == 1
@dy.rule()
def unique_email_or_null(cls) -> pl.Expr:
"""Email must be unique, if provided."""
return pl.col("email").is_null() | pl.col("email").is_unique()
How do I fix the ruff error First argument of a method should be named self?#
If you are using ruff and introduce custom rules for your schemas, ruff will create
the following linting error:
N805 First argument of a method should be named `self`
To fix this, you’ll need to let ruff know that the @dy.rule decorator is applied to classmethods. This can easily
be done by adding the following to your pyproject.toml:
[tool.ruff.lint.pep8-naming]
classmethod-decorators = ["dataframely.rule"]