dataframely.random module¶
- class dataframely.random.Generator(seed: int | None = None)[source]¶
Bases:
objectType that allows to sample primitive types using a random number generator.
All generator methods are called
sample_<type>and, if applicable, allow specifying a lower (inclusive) and an upper (exclusive) bound for the type to be sampled.These methods can be used to sample higher-level types. To this end, users may also directly access the underlying
numpy_generatorto reuse the generator’s seeding.Methods
sample_binary([n, null_probability])Sample a list of binary values in the specified length range.
sample_bool([n, null_probability, p_true])Sample a list of booleans in the specified range.
sample_choice([n, null_probability, weights])Sample a list of elements from a list of choices with replacement.
sample_date([n, resolution, null_probability])Sample a list of dates in the provided range.
sample_datetime([n, resolution, time_zone, ...])Sample a list of datetimes in the provided range.
sample_duration([n, resolution, ...])Sample a list of durations in the provided range.
sample_float([n, null_probability, ...])Sample a list of floating point numbers in the specified range.
sample_int([n, null_probability])Sample a list of integers in the specified range.
Sample a single integer that can be used as a seed for other RNGs.
sample_string([n, null_probability])Sample a list of strings adhering to the provided regex.
sample_time([n, resolution, null_probability])Sample a list of times in the provided range.
- sample_binary(n: int = 1, *, min_bytes: int, max_bytes: int, null_probability: float = 0.0) Series[source]¶
Sample a list of binary values in the specified length range.
- Args:
n: The number of binary values to sample. min_bytes: The minimum number of bytes for each value. max_bytes: The maximum number of bytes for each value. null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeBinary.
- sample_bool(n: int = 1, *, null_probability: float = 0.0, p_true: float | None = None) Series[source]¶
Sample a list of booleans in the specified range.
- Args:
n: The number of booleans to sample. null_probability: The probability of an element being
null. p_true: Sampling probability for True within non-null samples.Default: 0.5 (uniform sampling)
- Returns:
A series with
nelements of dtypeBoolean.
- sample_choice(n: int = 1, *, choices: Sequence[T], null_probability: float = 0.0, weights: Sequence[float] | None = None) Series[source]¶
Sample a list of elements from a list of choices with replacement.
- Args:
n: The number of elements to sample. choices: The choices to sample from. null_probability: The probability of an element being
null. weights: A ordered weight vector for the different choices- Returns:
A series with
nelements of auto-inferred dtype.
- sample_date(n: int = 1, *, min: date, max: date | None, resolution: str | None = None, null_probability: float = 0.0) Series[source]¶
Sample a list of dates in the provided range.
- Args:
n: The number of dates to sample. min: The minimum date to sample (inclusive). max: The maximum date to sample (exclusive). ‘10000-01-01’ when
None. resolution: The resolution that dates in the column must have. This uses theformatting language used by
polarsdatetimeroundmethod.null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeDate.
- sample_datetime(n: int = 1, *, min: datetime, max: datetime | None, resolution: str | None = None, time_zone: str | tzinfo | None = None, time_unit: Literal['ns', 'us', 'ms'] = 'us', null_probability: float = 0.0) Series[source]¶
Sample a list of datetimes in the provided range.
- Args:
n: The number of datetimes to sample. min: The minimum datetime to sample (inclusive). max: The maximum datetime to sample (exclusive). ‘10000-01-01’ when
None. resolution: The resolution that datetimes in the column must have. This usesthe formatting language used by
polarsdatetimeroundmethod.time_unit: The time unit of the datetime column. Defaults to
us(microseconds). time_zone: The time zone that datetimes in the column must have. The timezone must use a valid IANA time zone name identifier e.x.
Etc/UTCorAmerica/New_York.null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeDatetime.
- sample_duration(n: int = 1, *, min: timedelta, max: timedelta, resolution: str | None = None, null_probability: float = 0.0) Series[source]¶
Sample a list of durations in the provided range.
- Args:
n: The number of durations to sample. min: The minimum duration to sample (inclusive). max: The maximum duration to sample (exclusive). resolution: The resolution that durations in the column must have. This uses
the formatting language used by
polarsdatetimeroundmethod.null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeDuration.
- sample_float(n: int = 1, *, min: float, max: float, null_probability: float = 0.0, nan_probability: float = 0.0, inf_probability: float = 0.0) Series[source]¶
Sample a list of floating point numbers in the specified range.
- Args:
n: The number of floats to sample. min: The minimum float to sample (inclusive). max: The maximum float to sample (exclusive). null_probability: The probability of an element being
null. nan_probability: The probability of an element beingnan. inf_probability: The probability of an element beinginf.- Returns:
A series with
nelements of dtypeFloat64.
- sample_int(n: int = 1, *, min: int, max: int, null_probability: float = 0.0) Series[source]¶
Sample a list of integers in the specified range.
- Args:
n: The number of integers to sample. min: The minimum integer to sample (inclusive). max: The maximum integer to sample (exclusive). null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeInt64.
- sample_seed() int[source]¶
Sample a single integer that can be used as a seed for other RNGs.
- Returns:
A seed of type
uint32.
- sample_string(n: int = 1, *, regex: str, null_probability: float = 0.0) Series[source]¶
Sample a list of strings adhering to the provided regex.
- Args:
n: The number of strings to sample. regex: The regex that all elements have to adhere to. null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeString.
- sample_time(n: int = 1, *, min: time, max: time | None, resolution: str | None = None, null_probability: float = 0.0) Series[source]¶
Sample a list of times in the provided range.
- Args:
n: The number of times to sample. min: The minimum time to sample (inclusive). max: The maximum time to sample (exclusive). Midnight when
None. resolution: The resolution that times in the column must have. This uses theformatting language used by
polarsdatetimeroundmethod.null_probability: The probability of an element being
null.- Returns:
A series with
nelements of dtypeTime.