Struct#
- class dataframely.Struct(
- inner: dict[str, Column],
- *,
- nullable: bool = False,
- primary_key: bool = False,
- unique: bool = False,
- check: Callable[[Expr], Expr] | Sequence[Callable[[Expr], Expr]] | Mapping[str, Callable[[Expr], Expr]] | None = None,
- alias: str | None = None,
- metadata: dict[str, Any] | None = None,
- description: str | None = None,
A struct column.
- Parameters:
inner – The dictionary of struct fields. Struct fields may have
primary_key=Trueset but this setting only takes effect if the struct is nested inside a list. In this case, the list items must be unique wrt. the struct fields that haveprimary_key=Trueset.nullable – Whether this column may contain null values. Explicitly set
nullable=Trueif you want your column to be nullable. In a future release,nullable=Falsewill be the default ifnullableis not specified.primary_key – Whether this column is part of the primary key of the schema.
unique – Whether this column must contain unique values. Unlike
primary_key, this checks uniqueness for this column independently. Multiple columns can each haveunique=Truewithout forming a composite constraint.check –
A custom rule or multiple rules to run for this column. This can be:
A single callable that returns a non-aggregated boolean expression. The name of the rule is derived from the callable name, or defaults to “check” for lambdas.
A list of callables, where each callable returns a non-aggregated boolean expression. The name of the rule is derived from the callable name, or defaults to “check” for lambdas. Where multiple rules result in the same name, the suffix __i is appended to the name.
A dictionary mapping rule names to callables, where each callable returns a non-aggregated boolean expression.
All rule names provided here are given the prefix
"check_".alias – An overwrite for this column’s name which allows for using a column name that is not a valid Python identifier. Especially note that setting this option does _not_ allow to refer to the column with two different names, the specified alias is the only valid name.
metadata – A dictionary of metadata to attach to the column.
description – A human-readable description of the column.
Attributes:
Obtain a Polars column expression for the column.
The
polarsdtype equivalent of this column definition's data type.Get the name of the column in a schema.
Methods:
Obtain a pydantic field type for this column definition.
Sample random elements adhering to the constraints of this column.
Return a new column definition with a specified alias.
Return a new column definition with a specified check.
Return a new column definition with the specified description.
Return a new column definition with specified metadata.
Return a new column definition with specified nullability.
Return a new column definition with a specified primary key status.
Copy the current column definition while updating the provided properties.
- property col: Expr#
Obtain a Polars column expression for the column.
- property dtype: DataType#
The
polarsdtype equivalent of this column definition’s data type.This is primarily used for creating empty data frames with an appropriate schema. Thus, it should describe the default dtype equivalent if this data type encompasses multiple underlying data types.
- pydantic_field() Any[source]#
Obtain a pydantic field type for this column definition.
- Returns:
A pydantic-compatible type annotation that includes structured constraints (such as
min,max, …).
Warning
Custom checks are not translated to pydantic validators.
- sample( ) Series[source]#
Sample random elements adhering to the constraints of this column.
- Parameters:
generator – The generator to use for sampling elements.
n – The number of elements to sample.
- Returns:
A series with the predefined number of elements. All elements are guaranteed to adhere to the column’s constraints.
- Raises:
ValueError – If this column has a custom check. In this case, random values cannot be guaranteed to adhere to the column’s constraints while providing any guarantees on the computational complexity.
- with_alias(alias: str) Self[source]#
Return a new column definition with a specified alias.
- Parameters:
alias – The alias to use for the column name.
- Returns:
A new column instance with the specified alias.
- with_check(
- check: Callable[[Expr], Expr] | Sequence[Callable[[Expr], Expr]] | Mapping[str, Callable[[Expr], Expr]],
Return a new column definition with a specified check.
- Parameters:
check – A custom validation rule or rules for the column.
- Returns:
A new column instance with the specified check.
- with_description(description: str) Self[source]#
Return a new column definition with the specified description.
- Parameters:
description – A human-readable description of the column.
- Returns:
A new column instance with the specified description.
- with_metadata(metadata: dict[str, Any]) Self[source]#
Return a new column definition with specified metadata.
- Parameters:
metadata – A dictionary of metadata to attach to the column.
- Returns:
A new column instance with the specified metadata.
- with_nullable(nullable: bool) Self[source]#
Return a new column definition with specified nullability.
- Parameters:
nullable – Whether the new column may contain null values.
- Returns:
A new column instance with updated nullability.
- with_primary_key(primary_key: bool) Self[source]#
Return a new column definition with a specified primary key status.
- Parameters:
primary_key – Whether the column should be part of the primary key.
- Returns:
A new column instance with updated primary key status.
- with_properties(**kwargs: Any) Self[source]#
Copy the current column definition while updating the provided properties.
All other properties from the original column are preserved.
- Parameters:
**kwargs – Properties to update on the new column instance. The set of allowed properties depends on the type of the column.
- Returns:
A new column instance with updated properties.