Schema.validate#
- classmethod Schema.validate( ) DataFrame[Self] | LazyFrame[Self][source]#
Validate that a data frame satisfies the schema.
If an eager data frame is passed as input, validation is performed within this function. If a lazy frame is passed, the lazy frame is simply extended with the validation logic. The logic will only be executed (and potentially raise an error) once
collect()is called on it.- Parameters:
df – The data frame to validate.
cast – Whether columns with a wrong data type in the input data frame are cast to the schema’s defined data type if possible.
eager –
Whether the validation should be performed eagerly and this method should raise upon failure. If
False, the returned lazy frame will fail to collect if the validation does not pass.Note
If running on the streaming engine, lazy validation will potentially not surface all validation issues as the validation is aborted once the first failure is encountered. Likewise, the reported validation failure can be non-deterministic.
- Returns:
The input eager or lazy frame, wrapped in a generic version of the input’s data frame type to reflect schema adherence. Columns not defined in the schema are removed from the output. This operation is guaranteed to maintain input ordering of rows.
- Raises:
SchemaError – If
eager=Trueand the input data frame misses columns orcast=Falseand any data type mismatches the definition in this schema. Only raised upon collection ifeager=False.ValidationError – If
eager=Trueand in any rule in the schema is violated, i.e. the data does not pass the validation. Wheneager=False, aComputeErroris raised upon collecting.InvalidOperationError – If
eager=True,cast=True, and the cast fails for any value in the data. Only raised upon collection ifeager=False.