-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect both mulitple_of and minimum/maximum constraints #505
base: main
Are you sure you want to change the base?
fix: respect both mulitple_of and minimum/maximum constraints #505
Conversation
@richardxia thanks for the writeup! I haven't had a chance to look into it in detail yet, but I'll do that soon :) Also, feel free to change the API if it's necessary. Or add a new function to handle these cases. For example, you mentioned |
@richardxia sorry for the delay, but I've had a chance to look into the PR. So I think the biggest issues that we're facing here is the opaque nature of However, these would have to be deterministic since users have to be able to reliably create the same instances with the same values if needed (based on the seeding of the I think we have room for making whatever changes we need here, and deprecate accordingly, without too many issues since most, if not all, users wouldn't be using these functions but only the APIs directly related to the factories. |
Also, I think we should be fine as long as reasonable scenarios are working, If hypothesis keeps coming up with extreme cases that won't work, then we can just restrict hypothesis or something along those lines and just document that such extreme cases may fail. |
@guacs, thanks for taking the time to review my PR and leaving comments. What you said sounds good to me, in terms of deprecating the |
Previously, `generate_constrained_number()` would potentially generate invalid numbers when `multiple_of` is not None and exactly one of either `minimum` or `maximum` is not None, since it would just return `multiple_of` without respecting the upper or lower bound. This significantly changes the implementation of the code to correctly handle this code. The `generate_constrained_number()` method has been completely removed, being replaced with a `generate_constrained_multiple_of()` function. A major difference between the old function and the new function is that the new one does not accept a `method` argument for generating random numbers. This is because in the new function, we always use `create_random_integer()`, since the problem reduces to generating a random integer multiplier. The high-level algorithm behind `generate_constrained_multiple_of()` is that we need to constrain the random integer generator to generate numbers such that when they are multiplied with `multiple_of`, they still fit within the original bounds constraints. This simplify involves dividing the original bounds by `multiple_of`, with some special handling for negative `multiple_of` numbers as well as carefully chosen rounding behavior. We also need to make some changes to other functions. `get_increment()` needs to take an additional argument for the actual value that the increment is for. This is because floating-point numbers can't use a static increment or else it might get rounded away if the numbers are too large. Python fortunately provides a `math.ulp()` function for computing this for a given float value, so we make use of that function. We still use the original `float_info.epsilon` constant as a lower bound on the increment, though, since in the case that the value is too close to zero, we still need to make sure that the increment doesn't disappear when used against other numbers. Finally, we rename and modify `passes_pydantic_multiple_validator()` to `is_almost_multiple_of()`, modifying its implementation to defer the casting of values to `float()` to minimize rounding errors. This specifically affects Decimal numbers, where casting to float too early causes too much loss of precision. A significant number of changes were made to the tests as well, since the original tests missed the bug being fixed here. Each of the integer, floating-point, and decimal tests has been updated to assert that the result is actually within the minimum and maximum constraints. In addition, we remove some unnecessary sorting of the randomly generated test input values, since this was unnecessarily constraining `multiple_of` to be greater than or less than the minimum and maximum values. This was causing a lot of the scenarios involving negative values to be skipped. Lastly, the floating-point and decimal tests need additional constraints to avoid unrealistic extreme values from hitting precision issues. This was done by adding a number of constraints on the number of significant digits in the input numbers and on the relative magnitudes of the input numbers.
a65482d
to
8881603
Compare
@@ -1,8 +1,9 @@ | |||
from __future__ import annotations | |||
|
|||
from decimal import Decimal | |||
from math import ceil, floor, ulp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I'm failing the Python 3.8 tests because I'm importing ulp
, which was only added in Python 3.9. I'll probably have to implement an alternative way of computing the correct increment for a given floating-point number.
I pushed a completely rewritten solution to this branch, but it looks like I'm currently failing the Python 3.8 tests because I accidentally used a function that was only implemented in Python 3.9. I'll have to revisit this tomorrow. In the meantime, please let me know if the new changes I made look like they're in the right direction. I made some breaking changes to the functions in |
Python 3.8 doesn't have the ulp() function, so we need to manually compute an increment ourselves.
Documentation preview will be available shortly at https://litestar-org.github.io/polyfactory-docs-preview/505 |
} | ||
return cast("T", values[t_type]) | ||
# See https://github.com/python/mypy/issues/17045 for why the redundant casts are ignored. | ||
if t_type == int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these checks should maybe use is_safe_subclass
so that we handle custom int
as well. Same for float
and Decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardxia thanks for this! I think this looks good. We just have to make sure that any functions that we're changing (such as names or the arguments it takes) should be deprecated and we create new functions as replacements for them. Once that's done, I think this is good to go :)
# When ``value`` is large in magnitude, we need to choose an increment that is large enough | ||
# to not be rounded away, but when ``value`` small in magnitude, we need to prevent the | ||
# incerement from vanishing. ``float_info.epsilon`` is defined as the smallest delta that | ||
# can be represented between 1.0 and the next largest number, but it's not sufficient for | ||
# larger values. We instead open up the floating-point representation to grab the exponent | ||
# and calculate our own increment. This can be replaced with ``math.ulp`` in Python 3.9 and | ||
# later. | ||
_, exp = frexp(value) | ||
increment = float_info.radix ** (exp - float_info.mant_dig + 1) | ||
return cast("T", max(increment, float_info.epsilon)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine to use math.ulp
in the case of Python 3.9 and then fall back to this in case it's not 3.8. We have a constant called PY_38
in polyfactory.constants.py
that you can use to handle this. Or maybe even something like:
if PY_38:
def get_float_increment(value: float) -> float:
# the current logic
else:
def get_float_increment(value: float) -> float:
# math.ulp based logic
Then we can just call this function without having to worry about the Python version.
|
||
|
||
def get_increment(t_type: type[T]) -> T: | ||
def get_increment(value: T, t_type: type[T]) -> T: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a breaking change. Instead, we should have another get_increment_v2
function and deprecate this one.
@@ -99,8 +100,8 @@ def is_multiply_of_multiple_of_in_range( | |||
return False | |||
|
|||
|
|||
def passes_pydantic_multiple_validator(value: T, multiple_of: T) -> bool: | |||
"""Determine whether a given value passes the pydantic multiple_of validation. | |||
def is_almost_multiple_of(value: T, multiple_of: T) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the passes_pydantic_multiple_)validator
as it is and deprecate it and then create a new is_almost_multiple_of
function as a replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we could just do something like this:
def passes_pydantic_multiple_validator(value: T, multiple_of: T) -> bool:
return is_almost_mulitple_of(value, multiple_of)
That is, changing the logic in it is fine since it's doing the same, but we would need to keep the old name around.
@@ -210,33 +225,36 @@ def get_constrained_number_range( | |||
return minimum, maximum | |||
|
|||
|
|||
def generate_constrained_number( | |||
def generate_constrained_multiple_of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above regarding it being breaking changes.
Pull Request Checklist
Description
Library Code Changes
Previously,
generate_constrained_number()
would potentially generate invalid numbers whenmultiple_of
is not None and exactly one of eitherminimum
ormaximum
is not None, since it would just returnmultiple_of
without respecting the upper or lower bound.This significantly changes the implementation of the code to correctly handle this code. The
generate_constrained_number()
method has been completely removed, being replaced with agenerate_constrained_multiple_of()
function. A major difference between the old function and the new function is that the new one does not accept amethod
argument for generating random numbers. This is because in the new function, we always usecreate_random_integer()
, since the problem reduces to generating a random integer multiplier.The high-level algorithm behind
generate_constrained_multiple_of()
is that we need to constrain the random integer generator to generate numbers such that when they are multiplied withmultiple_of
, they still fit within the original bounds constraints. This simplify involves dividing the original bounds bymultiple_of
, with some special handling for negativemultiple_of
numbers as well as carefully chosen rounding behavior.We also need to make some changes to other functions.
get_increment()
needs to take an additional argument for the actual value that the increment is for. This is because floating-point numbers can't use a static increment or else it might get rounded away if the numbers are too large. Python fortunately provides amath.ulp()
function for computing this for a given float value, so we make use of that function. We still use the originalfloat_info.epsilon
constant as a lower bound on the increment, though, since in the case that the value is too close to zero, we still need to make sure that the increment doesn't disappear when used against other numbers.Finally, we rename and modify
passes_pydantic_multiple_validator()
tois_almost_multiple_of()
, modifying its implementation to defer the casting of values tofloat()
to minimize rounding errors. This specifically affects Decimal numbers, where casting to float too early causes too much loss of precision.Test Changes
A significant number of changes were made to the tests as well, since the original tests missed the bug being fixed here. Each of the integer, floating-point, and decimal tests has been updated to assert that the result is actually within the minimum and maximum constraints. In addition, we remove some unnecessary sorting of the randomly generated test input values, since this was unnecessarily constraining
multiple_of
to be greater than or less than the minimum and maximum values. This was causing a lot of the scenarios involving negative values to be skipped.Lastly, the floating-point and decimal tests need additional constraints to avoid unrealistic extreme values from hitting precision issues. This was done by adding a number of constraints on the number of significant digits in the input numbers and on the relative magnitudes of the input numbers.
Close Issue(s)
ge
andmultiple_of
but notle
constraints causes invalid values to be generated #503