dataframely.random module¶

class dataframely.random.Generator(seed: int | None = None)[source]¶

Bases: object

Type that allows to sample primitive types using a random number generator.

All generator methods are called sample_<type> and, if applicable, allow specifying a lower (inclusive) and an upper (exclusive) bound for the type to be sampled.

These methods can be used to sample higher-level types. To this end, users may also directly access the underlying numpy_generator to reuse the generator’s seeding.

Methods

`sample_binary`([n, null_probability])	Sample a list of binary values in the specified length range.
`sample_bool`([n, null_probability, p_true])	Sample a list of booleans in the specified range.
`sample_choice`([n, null_probability, weights])	Sample a list of elements from a list of choices with replacement.
`sample_date`([n, resolution, null_probability])	Sample a list of dates in the provided range.
`sample_datetime`([n, resolution, time_zone, ...])	Sample a list of datetimes in the provided range.
`sample_duration`([n, resolution, ...])	Sample a list of durations in the provided range.
`sample_float`([n, null_probability, ...])	Sample a list of floating point numbers in the specified range.
`sample_int`([n, null_probability])	Sample a list of integers in the specified range.
`sample_seed`()	Sample a single integer that can be used as a seed for other RNGs.
`sample_string`([n, null_probability])	Sample a list of strings adhering to the provided regex.
`sample_time`([n, resolution, null_probability])	Sample a list of times in the provided range.

sample_binary(n: int = 1, *, min_bytes: int, max_bytes: int, null_probability: float = 0.0) → Series[source]¶

Sample a list of binary values in the specified length range.

Args:: n: The number of binary values to sample. min_bytes: The minimum number of bytes for each value. max_bytes: The maximum number of bytes for each value. null_probability: The probability of an element being null.
Returns:: A series with n elements of dtype Binary.

sample_bool(n: int = 1, *, null_probability: float = 0.0, p_true: float | None = None) → Series[source]¶

Sample a list of booleans in the specified range.

Args:: n: The number of booleans to sample. null_probability: The probability of an element being null. p_true: Sampling probability for True within non-null samples.

Default: 0.5 (uniform sampling)
Returns:: A series with n elements of dtype Boolean.

sample_choice(n: int = 1, *, choices: Sequence[T], null_probability: float = 0.0, weights: Sequence[float] | None = None) → Series[source]¶

Sample a list of elements from a list of choices with replacement.

Args:: n: The number of elements to sample. choices: The choices to sample from. null_probability: The probability of an element being null. weights: A ordered weight vector for the different choices
Returns:: A series with n elements of auto-inferred dtype.

sample_date(n: int = 1, *, min: date, max: date | None, resolution: str | None = None, null_probability: float = 0.0) → Series[source]¶

Sample a list of dates in the provided range.

Args:

n: The number of dates to sample. min: The minimum date to sample (inclusive). max: The maximum date to sample (exclusive). ‘10000-01-01’ when None. resolution: The resolution that dates in the column must have. This uses the

formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Date.

sample_datetime(n: int = 1, *, min: datetime, max: datetime | None, resolution: str | None = None, time_zone: str | tzinfo | None = None, time_unit: Literal['ns', 'us', 'ms'] = 'us', null_probability: float = 0.0) → Series[source]¶

Sample a list of datetimes in the provided range.

Args:

n: The number of datetimes to sample. min: The minimum datetime to sample (inclusive). max: The maximum datetime to sample (exclusive). ‘10000-01-01’ when None. resolution: The resolution that datetimes in the column must have. This uses

the formatting language used by polars datetime round method.

time_unit: The time unit of the datetime column. Defaults to us (microseconds). time_zone: The time zone that datetimes in the column must have. The time

zone must use a valid IANA time zone name identifier e.x. Etc/UTC or America/New_York.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Datetime.

sample_duration(n: int = 1, *, min: timedelta, max: timedelta, resolution: str | None = None, null_probability: float = 0.0) → Series[source]¶

Sample a list of durations in the provided range.

Args:

n: The number of durations to sample. min: The minimum duration to sample (inclusive). max: The maximum duration to sample (exclusive). resolution: The resolution that durations in the column must have. This uses

the formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Duration.

sample_float(n: int = 1, *, min: float, max: float, null_probability: float = 0.0, nan_probability: float = 0.0, inf_probability: float = 0.0) → Series[source]¶

Sample a list of floating point numbers in the specified range.

Args:: n: The number of floats to sample. min: The minimum float to sample (inclusive). max: The maximum float to sample (exclusive). null_probability: The probability of an element being null. nan_probability: The probability of an element being nan. inf_probability: The probability of an element being inf.
Returns:: A series with n elements of dtype Float64.

sample_int(n: int = 1, *, min: int, max: int, null_probability: float = 0.0) → Series[source]¶

Sample a list of integers in the specified range.

Args:: n: The number of integers to sample. min: The minimum integer to sample (inclusive). max: The maximum integer to sample (exclusive). null_probability: The probability of an element being null.
Returns:: A series with n elements of dtype Int64.

sample_seed() → int[source]¶

Sample a single integer that can be used as a seed for other RNGs.

Returns:: A seed of type uint32.

sample_string(n: int = 1, *, regex: str, null_probability: float = 0.0) → Series[source]¶

Sample a list of strings adhering to the provided regex.

Args:: n: The number of strings to sample. regex: The regex that all elements have to adhere to. null_probability: The probability of an element being null.
Returns:: A series with n elements of dtype String.

sample_time(n: int = 1, *, min: time, max: time | None, resolution: str | None = None, null_probability: float = 0.0) → Series[source]¶

Sample a list of times in the provided range.

Args:

n: The number of times to sample. min: The minimum time to sample (inclusive). max: The maximum time to sample (exclusive). Midnight when None. resolution: The resolution that times in the column must have. This uses the

formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Time.