dataframely.random module

class dataframely.random.Generator(seed: int | None = None)[source]

Bases: object

Type that allows to sample primitive types using a random number generator.

All generator methods are called sample_<type> and, if applicable, allow specifying a lower (inclusive) and an upper (exclusive) bound for the type to be sampled.

These methods can be used to sample higher-level types. To this end, users may also directly access the underlying numpy_generator to reuse the generator’s seeding.

Methods

sample_binary([n, null_probability])

Sample a list of binary values in the specified length range.

sample_bool([n, null_probability, p_true])

Sample a list of booleans in the specified range.

sample_choice([n, null_probability, weights])

Sample a list of elements from a list of choices with replacement.

sample_date([n, resolution, null_probability])

Sample a list of dates in the provided range.

sample_datetime([n, resolution, time_zone, ...])

Sample a list of datetimes in the provided range.

sample_duration([n, resolution, ...])

Sample a list of durations in the provided range.

sample_float([n, null_probability, ...])

Sample a list of floating point numbers in the specified range.

sample_int([n, null_probability])

Sample a list of integers in the specified range.

sample_seed()

Sample a single integer that can be used as a seed for other RNGs.

sample_string([n, null_probability])

Sample a list of strings adhering to the provided regex.

sample_time([n, resolution, null_probability])

Sample a list of times in the provided range.

sample_binary(n: int = 1, *, min_bytes: int, max_bytes: int, null_probability: float = 0.0) Series[source]

Sample a list of binary values in the specified length range.

Args:

n: The number of binary values to sample. min_bytes: The minimum number of bytes for each value. max_bytes: The maximum number of bytes for each value. null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Binary.

sample_bool(n: int = 1, *, null_probability: float = 0.0, p_true: float | None = None) Series[source]

Sample a list of booleans in the specified range.

Args:

n: The number of booleans to sample. null_probability: The probability of an element being null. p_true: Sampling probability for True within non-null samples.

Default: 0.5 (uniform sampling)

Returns:

A series with n elements of dtype Boolean.

sample_choice(n: int = 1, *, choices: Sequence[T], null_probability: float = 0.0, weights: Sequence[float] | None = None) Series[source]

Sample a list of elements from a list of choices with replacement.

Args:

n: The number of elements to sample. choices: The choices to sample from. null_probability: The probability of an element being null. weights: A ordered weight vector for the different choices

Returns:

A series with n elements of auto-inferred dtype.

sample_date(n: int = 1, *, min: date, max: date | None, resolution: str | None = None, null_probability: float = 0.0) Series[source]

Sample a list of dates in the provided range.

Args:

n: The number of dates to sample. min: The minimum date to sample (inclusive). max: The maximum date to sample (exclusive). ‘10000-01-01’ when None. resolution: The resolution that dates in the column must have. This uses the

formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Date.

sample_datetime(n: int = 1, *, min: datetime, max: datetime | None, resolution: str | None = None, time_zone: str | tzinfo | None = None, time_unit: Literal['ns', 'us', 'ms'] = 'us', null_probability: float = 0.0) Series[source]

Sample a list of datetimes in the provided range.

Args:

n: The number of datetimes to sample. min: The minimum datetime to sample (inclusive). max: The maximum datetime to sample (exclusive). ‘10000-01-01’ when None. resolution: The resolution that datetimes in the column must have. This uses

the formatting language used by polars datetime round method.

time_unit: The time unit of the datetime column. Defaults to us (microseconds). time_zone: The time zone that datetimes in the column must have. The time

zone must use a valid IANA time zone name identifier e.x. Etc/UTC or America/New_York.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Datetime.

sample_duration(n: int = 1, *, min: timedelta, max: timedelta, resolution: str | None = None, null_probability: float = 0.0) Series[source]

Sample a list of durations in the provided range.

Args:

n: The number of durations to sample. min: The minimum duration to sample (inclusive). max: The maximum duration to sample (exclusive). resolution: The resolution that durations in the column must have. This uses

the formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Duration.

sample_float(n: int = 1, *, min: float, max: float, null_probability: float = 0.0, nan_probability: float = 0.0, inf_probability: float = 0.0) Series[source]

Sample a list of floating point numbers in the specified range.

Args:

n: The number of floats to sample. min: The minimum float to sample (inclusive). max: The maximum float to sample (exclusive). null_probability: The probability of an element being null. nan_probability: The probability of an element being nan. inf_probability: The probability of an element being inf.

Returns:

A series with n elements of dtype Float64.

sample_int(n: int = 1, *, min: int, max: int, null_probability: float = 0.0) Series[source]

Sample a list of integers in the specified range.

Args:

n: The number of integers to sample. min: The minimum integer to sample (inclusive). max: The maximum integer to sample (exclusive). null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Int64.

sample_seed() int[source]

Sample a single integer that can be used as a seed for other RNGs.

Returns:

A seed of type uint32.

sample_string(n: int = 1, *, regex: str, null_probability: float = 0.0) Series[source]

Sample a list of strings adhering to the provided regex.

Args:

n: The number of strings to sample. regex: The regex that all elements have to adhere to. null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype String.

sample_time(n: int = 1, *, min: time, max: time | None, resolution: str | None = None, null_probability: float = 0.0) Series[source]

Sample a list of times in the provided range.

Args:

n: The number of times to sample. min: The minimum time to sample (inclusive). max: The maximum time to sample (exclusive). Midnight when None. resolution: The resolution that times in the column must have. This uses the

formatting language used by polars datetime round method.

null_probability: The probability of an element being null.

Returns:

A series with n elements of dtype Time.