Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time coding refactor #9906

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Dec 18, 2024

This introduces some changes in the encoding/decoding pipeline to prepare for other units beside ns. It also fixes some inconsistencies wrt the exact ns.resolution datetime64 bounds.

Finally it will reduce the "noise" over in #9618 quite substantial.

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@kmuehlbauer kmuehlbauer marked this pull request as ready for review December 18, 2024 13:47
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/coding/times.py Outdated Show resolved Hide resolved
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this off @kmuehlbauer—these changes look good. Just a few very minor questions.

delta = date * np.timedelta64(1, unit)
if not np.isnan(delta):
# this will raise on dtype overflow for integer dtypes
if date.dtype.kind in "iu" and not np.int64(delta) == date:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not only a risk for uint64 values?

# this should be safe, as errors would have been raised above
ns_time_unit = _NS_PER_TIME_DELTA[time_unit]
ns_ref_date_unit = _NS_PER_TIME_DELTA[ref_date.unit]
if flat_num_dates.dtype.kind in "iuf" and (ns_time_unit > ns_ref_date_unit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I could be missing something—is flat_num_dates ever anything other than an integer, unsigned integer, or float dtype?

Comment on lines +382 to +387
# retrieve cftype
cftype = type(dates[np.nanargmin(num_dates)])
# "ns" borders
# between ['1677-09-21T00:12:43.145224193', '2262-04-11T23:47:16.854775807']
lower = cftype(1677, 9, 21, 0, 12, 43, 145224)
upper = cftype(2262, 4, 11, 23, 47, 16, 854775)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, good to make this more exact 👍.

Comment on lines +281 to +283
# if date is exactly NaT (np.iinfo("int64").min) return refdate
# to make follow-up checks work
return ref_date
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to return pd.Timestamp("NaT")?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants