Correctly interpolate seasons in Grouper #2019

saschahofmann · 2024-12-11T11:21:42Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes Linear interpolation for seasonal bias adjustment leads to non-smooth distribution #2014
Tests for the changes have been added (for bug fixes / features)
CHANGELOG.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

This PR adds a line to correctly interpolate seasonal values. It also changes the test_timeseries function that now accepts a calendar argument instead of cftime. Not providing it or providing None is equivalent to cftime=False and calendar='standard to the previous cftime=True. This allows for testing different calendar implementations e.g. 360_day calendars

saschahofmann · 2024-12-11T16:04:40Z

I just realised that the factor of 1/6 is assuming that all seasons have the same length which in gregorian calendars is not necessarily true but I am not sure it matters too much at least the function should be smooth.

saschahofmann · 2024-12-11T16:22:20Z

Just to prove that this leads to a smooth result, same input as in the issue:

github-actions · 2024-12-11T16:53:52Z

Warning

This Pull Request is coming from a fork and must be manually tagged approved in order to perform additional testing.

saschahofmann · 2024-12-12T09:46:16Z

Weirdly and contrary to what I showed yesterday, today I am still getting clear transitions as if there still wasn't any linear interpolation.

Zeitsperre · 2024-12-12T15:00:42Z

@saschahofmann We recently changed the layout of xclim to use a src structure. It might be worthwhile to try reinstalling the library.

…into fix-Ouranosinc#2014

saschahofmann · 2024-12-12T22:52:17Z

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

Zeitsperre · 2024-12-13T13:27:13Z

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

Could it be that you have obsolete __pycache__ folders still among your cloned folders? @coxipi is looking into recreating your example based on your branch for validation, but if the tests are working as intended on CI, then it's likely a caching/installation issue.

coxipi · 2024-12-13T13:38:30Z

I managed to install the environment, for some reason I only had the branch "main" when I cloned the fork yesterday

I confirmed that the function has the appropriate modifications inside the notebook I'm using

import inspect
print(inspect.getsource(sdba.base.Grouper.get_index))

I also find that the interpolation is wrong.

I'll try to have look later. Maybe the interp boolean condition is not triggered properly?

saschahofmann · 2024-12-16T11:05:38Z

I am pretty sure that the get_index function is updated in my notebook. Either I am wrong in expecting a smoother result (it seems to have changed slightly to what I got earlier) or there is something else going on. I will keep investigating

coxipi · 2024-12-16T18:00:34Z

It's simply interp which can't be "nearest", otherwise no interpolation takes place ... I think our only other option is linear.

from xclim import sdba
QM = sdba.EmpiricalQuantileMapping.train(
    ref, hist, nquantiles=15, group="time.season", kind="+"
)

scen = QM.adjust(sim, extrapolation="constant", interp="nearest")
scen_interp = QM.adjust(sim, extrapolation="constant", interp="linear")
outd = {
    "Reference":ref,
    "Model - biased":hist,
    "Model - adjusted - no interp":scen, 
    "Model - adjusted - linear interp":scen_interp, 
}
for k,da in outd.items(): 
    da.groupby("time.dayofyear").mean().plot(label=k)
plt.legend()

This doesn't reproduce your figure however. It seems your figure above was matching the reference very well, better than what I have even with the linear interpolation. But it does get rid of obvious discontinuities.

coxipi · 2024-12-16T19:00:51Z

There is clearly something wrong going on. Comparing
hist - scen_month
scen_time - scen_month
scen_season - scen_month
scen_month - scen_month

scen_season is way off

saschahofmann · 2024-12-18T13:36:18Z

@coxipi I think only mention this in the original issue: my analysis is done with QuantileDeltaMapping instead of EmpiricalQuantileMapping. Here the equivalent chart to yours for that:

season still seem kinda weird

saschahofmann · 2024-12-18T13:58:08Z

A similar trend becomes apparent when looking at the adjusted - historical (now for EmpiricalQuantileMapping)

coxipi · 2024-12-18T14:01:28Z

Yes, I have seen simlilar things by playing with the choice of how get_index. I feel this should not be this sensitive. Let me try and get this back

coxipi · 2024-12-19T16:43:45Z

In any case, the mean over the 15-year period should be closer to the target, time.season is way off (and in fact, I maintain, it should be equal to the time adjustment.)

Sorry my bad, interpolation in this case means nearest interpolation for each time value, as opposed to the af value provided by the training (with dimensions quantile [and season])

Got it, sorry I read too fast. Good observation. It seems _interp_on_quantiles_2D is messing up season even without interpolation, I will have to look into this

saschahofmann · 2024-12-19T16:49:25Z

I think I am getting closer. I tried to replicate whats happening inside _interp_on_quantiles_2D, for a single value on 2000-04-15. Above that value would be -1.23223145

from scipy.interpolate import griddata
oldx = qdm_season.ds.hist_q.sel(season='MAM')
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season='MAM')
value = hist.sel(time='2000-04-15')
newx = value
newg = u.map_season_to_int(value.time.dt.season)
griddata(
    (oldx.values, oldg),
    oldy.values,
    (newx.values, newg),
    method='nearest'
)

If I didn't make mistake that should mimick the function but it leads to 0.20145109.

saschahofmann · 2024-12-20T15:55:14Z

Ok I just need to find the difference between this code and the one running _interp_on_quantiles_2D. Chart shows the af obtained from EQM with group='time' as above and the one manually computed.

from scipy.interpolate import griddata

oldx = qdm_season.ds.hist_q.sel(season="MAM")
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season="MAM")
value = hist_spring
newx = value
newg = u.map_season_to_int(value.time.dt.season)
afs = griddata((oldx.values, oldg), oldy.values, (newx.values, newg), method="nearest")

af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
af_corrected = xr.DataArray(afs, coords=dict(time=hist_spring.time))
af_corrected.groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with griddata')
plt.legend()

coxipi · 2024-12-20T17:28:11Z

Great, I have reproduced your example

def _interp_on_quantiles_2d(newx, newg, oldx, oldy, oldg, method, extrap):
    mask_new = np.isnan(newx) | np.isnan(newg)
    mask_old = np.isnan(oldy) | np.isnan(oldx) | np.isnan(oldg)
    out = np.full_like(newx, np.nan, dtype=f"float{oldy.dtype.itemsize * 8}")
    if np.all(mask_new) or np.all(mask_old):
        warn(
            "All-nan slice encountered in interp_on_quantiles",
            category=RuntimeWarning,
        )
        return out
    out[~mask_new] = griddata(
        (oldx[~mask_old], oldg[~mask_old]),
        oldy[~mask_old],
        (newx[~mask_new], newg[~mask_new]),
        method=method,
    )
    # if method == "nearest" or extrap != "nan":
    #     # 'nan' extrapolation implicit for cubic and linear interpolation.
    #     out = _extrapolate_on_quantiles(out, oldx, oldg, oldy, newx, newg, extrap)
    return out

def bad_af_season(afq, hist):
    oldg = np.ones(20)
    oldy = afq.sel(season="MAM")
    value = hist.where(hist.time.dt.month.isin([3,4,5]), drop=True)
    newx = value
    newg = u.map_season_to_int(value.time.dt.season)
    afs = _interp_on_quantiles_2d(newx,newg,oldx,oldy,oldg, "nearest", "nan")
    return xr.DataArray(afs, coords=dict(time=value.time))
    
af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
good_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with griddata')
bad_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with xclim internals')
plt.legend()

And I get the same (the functions are quite similar, not surprised, but I still wanted to confirm quickly). I was having some problems with numba, I commented the "extrapolate", but we have problems when method!="nearest" too, so that should not be the problem. So now I suspect that it might be because of the arguments
newx, newg, oldx, oldy, oldg, I've built them in your way, I have to check what is done in xclim

saschahofmann · 2024-12-20T20:18:06Z

Yes I'm also still trying to figure out the difference but I guess I ran into the same problem as you with the extrapolation

saschahofmann · 2025-01-07T14:39:52Z

Happy new year everyone! Sorry for disappearing. The holidays came out of nowhere 😏 but I am back now and really wanna get to the bottom of this and I am think I am closing in on at least one issue:

the behaviour changes a lot on whether or not _extrapolate_on_quantiles is called. If you comment back in the if statement that you removed above it will show again the quite constant interpolated af as we normally see.

if method == "nearest" or extrap != "nan":
        # 'nan' extrapolation implicit for cubic and linear interpolation.
        out = u._extrapolate_on_quantiles(out, oldx, oldg, oldy, newx, newg, extrap)

I will now try to understand what this function does but I am pretty sure we will find something there. However, this function is only called for nearest so probably this is not the end of it.

saschahofmann · 2025-01-07T14:57:47Z

Turns out that a large part of the newx values are out of bounds (either toolow or toohigh)

    bnds = _first_and_last_nonnull(oldx)
    xp = np.arange(bnds.shape[0])
    toolow = newx < np.interp(newg, xp, bnds[:, 0])
    toohigh = newx > np.interp(newg, xp, bnds[:, 1])

saschahofmann · 2025-01-07T15:09:41Z

I am trying to understand how it works and I am a little concerned about this xp = np.arange(bnds.shape[0]) (src/xclim/sdba/nbutils.pyL396). If I understand correctly, this creates new coordinates corresponding to the different groups starting at 0 (e.g. month or season 0) BUT oldx has added cyclic bound so that coordinates start at -1. Could it be that we are setting it wrong here? I can see that the this function was last edited 3years ago by @aulemahal but add_cyclic_bounds was added over 5 years ago, so maybe I don't understand fully how it works?

saschahofmann · 2025-01-07T15:16:50Z

I think this might solve the first issue 🎉

I replaced the suspicious line with this xp = oldg[:, 0] (pushing it now) and now the adjustments for time reduced to the 3 months and adjustment for that season correspond as expected:

…into fix-Ouranosinc#2014

coxipi · 2025-01-07T15:36:46Z

Happy new year!

Nice work! The months are from 1..12, then with cyclic coords become 0..13 which works correctly with the xp=np.arange(...). But the seasons starting at -1 didn't work with xp=np.arange(...) as you pointed out. I had tried to replace the season range 0..3 -> 1..4, but and I had not seen an improvement, but perhaps I had not made the change correctly.

I'm a bit rusty after holidays, you mention a first issue being solved, what would be the second? That's related to your comment:

However, this function is only called for nearest so probably this is not the end of it.

?

saschahofmann · 2025-01-07T15:45:25Z

This finally also looks as expected:

and the EQM with nearest now shows now spikes:

coxipi · 2025-01-07T15:47:41Z

Hmm, in your last plot, the biased and adjusted models are the same?

saschahofmann · 2025-01-07T15:49:01Z

I'm a bit rusty after holidays, you mention a first issue being solved, what would be the second?

In theory, the line should only be called when method='nearest' so I am worried that this is not covering the full picture since we also saw issues with interp='linear'. In fact, I started this whole journey having problems with linear 😅 but lets see. will check now.

saschahofmann · 2025-01-07T15:49:58Z

Sorry was plotting the wrong thing, edited the comment.

coxipi · 2025-01-07T15:52:26Z

I'm hopeful this will work. The condition is an OR:
method == "nearest" OR extrap != "nan"
I don't think you ever set extrap to nan? So the extrapolate routine was always being called, even with method=="linear"

saschahofmann · 2025-01-07T15:58:12Z

Ah of course I saw the OR but not !='nan' 🤦 .

Indeed this resolves EQM:

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

coxipi · 2025-01-07T16:03:50Z

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

I see your point. Maybe the sufficiently high number of quantiles (20 or 50) and the fact that you average over 15 years is enough to make this smooth. If you look directly at the time series, the "nearest" should be less smooth?

Anyways, great work, thanks a lot!

saschahofmann · 2025-01-07T16:05:15Z

I am also finally getting smooth results for QDM 🎉 :

and one of our other checks also finally looks as expected:

I officially declare this the greatest detective work since Sherlock Holmes solved the Mistery of the hound of Baskerville 😂
Will keep playing around with this but I think this might be ready now!

coxipi · 2025-01-07T16:06:04Z

Ah of course I saw the OR but not !='nan' 🤦 .

I made the same error above when I commented the extrapolate haha... maybe I influenced your reading

saschahofmann · 2025-01-07T16:08:58Z

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

I see your point. Maybe the sufficiently high number of quantiles (20 or 50) and the fact that you average over 15 years is enough to make this smooth. If you look directly at the time series, the "nearest" should be less smooth?

Anyways, great work, thanks a lot!

Hm I am looking at timeseries but looks smooth as well:

but I won't complain about a graph looking smoother than expected.

coxipi · 2025-01-07T16:09:30Z

Hum, in the QDM case, the linear interpolation seems to have some issues?

saschahofmann · 2025-01-07T16:20:18Z

Dang now I see it too somehow I was focusing on the nearest. Let's see.

saschahofmann · 2025-01-07T16:31:57Z

Ok here the linearly interpolated afs for QDM:

as comparison nearest:

I believe this is due to the problem I mentioned in December and better summarised by you

Gonna think about this tomorrow.

saschahofmann · 2025-01-08T14:41:26Z

Do you have any resources on someone else doing this? The originally paper Cannon et al. (2015 ) doesn't seem to look at monthly or seasonal adjustments/ the only thing I could find was them saying to use a sliding window:

To correct biases in the seasonal cycle, the quantile mapping algorithms are applied to pooled daily data falling within sliding 3-month windows centered on the month of interest. For example, correction of data from December would include days from November to January, correction of data from January would include days from December to February, and so forth. Time-dependent means and empirical quantiles of projected data are calculated over 30-yr sliding windows centered on the year of interest

coxipi · 2025-01-08T15:00:32Z

No, unfortunately, I'm searching right now. @aulemahal , Sascha fixed one problem, in the extrapolation, it was assumed that values of season/month would start at 0, but for season with periodic condition, it can go below zero, this needed a change.

But now, we have the problem I describe , e.g.:

To give a concrete example, consider tasmax on Feb.28. Its rank is computed in DJF. But if we included it in MAM, it would be a lower rank. We are saying that we have an interpolated continuous function that we want to apply on those ranks, but these values are still segmented, there are four groups of ranks in a year.

Do you agree this is a problem? Do you know if people explored specifically the use of QDM with seasons in the litterature?

saschahofmann · 2025-01-08T15:05:56Z

I was also thinking the other option for a fix for the extrapolation without needing to change the extrapolate function is changing the mapping of the seasons to start at 1 (so that cyclic_bounds would add 0 and 5). Not sure which of the two you prefer. I think the current fix might be more robust to future changes because it simply uses the coordinate values.

aulemahal · 2025-01-08T15:34:18Z

Do you know if people explored specifically the use of QDM with seasons in the litterature?

I don't think I have ever read such a paper (neither for QDM nor for any other QMs). Maybe @huard remembers if we had sources in mind when implementing it ?

My "fear" is that we implemented it because it was possible, because time.dt.season existed in the code.

saschahofmann added 2 commits December 11, 2024 12:17

Correctly interpolate seasonal in Grouper

e8ba3f0

Update Changelog

26a9b0f

github-actions bot added the sdba Issues concerning the sdba submodule. label Dec 11, 2024

saschahofmann added 3 commits December 11, 2024 12:24

Fix test_make_hourly_temps

8661313

Fix linting

2c18c22

Add comment explaining the factor 1/6

18687b3

Fix typo in Changelog

ca2fc5f

Zeitsperre requested a review from aulemahal December 11, 2024 16:53

Zeitsperre added bug Something isn't working standards / conventions Suggestions on ways forward labels Dec 11, 2024

Zeitsperre added 2 commits December 11, 2024 15:57

Merge branch 'main' into fix-Ouranosinc#2014

4d8ee67

Merge branch 'main' into fix-Ouranosinc#2014

0ce6dfa

Zeitsperre mentioned this pull request Dec 12, 2024

Prepare v0.54.0 #2016

Merged

5 tasks

saschahofmann added 2 commits December 12, 2024 23:44

Change order of params for pip install

c111abd

Merge branch 'fix-Ouranosinc#2014' of github.com:saschahofmann/xclim …

ce8ff0c

…into fix-Ouranosinc#2014

github-actions bot added the docs Improvements to documenation label Dec 12, 2024

Merge branch 'main' into fix-Ouranosinc#2014

94091fc

saschahofmann added 2 commits January 7, 2025 16:17

Fix xp creation when extrapolating on quantiles

1046620

Merge branch 'fix-Ouranosinc#2014' of github.com:saschahofmann/xclim …

1c7d023

…into fix-Ouranosinc#2014

Correctly interpolate seasons in Grouper #2019

Are you sure you want to change the base?

Correctly interpolate seasons in Grouper #2019

Conversation

saschahofmann commented Dec 11, 2024

Pull Request Checklist:

What kind of change does this PR introduce?

saschahofmann commented Dec 11, 2024

saschahofmann commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

saschahofmann commented Dec 12, 2024

Zeitsperre commented Dec 12, 2024

saschahofmann commented Dec 12, 2024

Zeitsperre commented Dec 13, 2024

coxipi commented Dec 13, 2024 • edited Loading

saschahofmann commented Dec 16, 2024

coxipi commented Dec 16, 2024 • edited Loading

coxipi commented Dec 16, 2024

saschahofmann commented Dec 18, 2024

saschahofmann commented Dec 18, 2024 • edited Loading

coxipi commented Dec 18, 2024

coxipi commented Dec 19, 2024

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 20, 2024

coxipi commented Dec 20, 2024 • edited Loading

saschahofmann commented Dec 20, 2024

saschahofmann commented Jan 7, 2025 • edited Loading

saschahofmann commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

coxipi commented Jan 7, 2025 • edited Loading

saschahofmann commented Jan 7, 2025 • edited Loading

coxipi commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

coxipi commented Jan 7, 2025 • edited Loading

saschahofmann commented Jan 7, 2025

coxipi commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

coxipi commented Jan 7, 2025 • edited Loading

saschahofmann commented Jan 7, 2025

coxipi commented Jan 7, 2025

saschahofmann commented Jan 7, 2025

saschahofmann commented Jan 7, 2025 • edited Loading

saschahofmann commented Jan 8, 2025

coxipi commented Jan 8, 2025

saschahofmann commented Jan 8, 2025

aulemahal commented Jan 8, 2025

coxipi commented Dec 13, 2024 •

edited

Loading

coxipi commented Dec 16, 2024 •

edited

Loading

saschahofmann commented Dec 18, 2024 •

edited

Loading

coxipi commented Dec 20, 2024 •

edited

Loading

saschahofmann commented Jan 7, 2025 •

edited

Loading

coxipi commented Jan 7, 2025 •

edited

Loading

saschahofmann commented Jan 7, 2025 •

edited

Loading

coxipi commented Jan 7, 2025 •

edited

Loading

coxipi commented Jan 7, 2025 •

edited

Loading

saschahofmann commented Jan 7, 2025 •

edited

Loading