-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emulating SpeedyWeather for the RainMaker challenge #47
Comments
Great, this seems like it could work! I am curious what are the bounds we can sample from for I tried to CC chris on this but I don't think he has access to the issue? |
On bounds: Those are not necessarily obvious and can result from physics as well as numerics. E.g. a negative temperature likely causes issues but even a 100K atmosphere is far from realistic and may push the numerical stability (there's some assumption about a reference temperature that is more and more violated the further away we are). I could guess some reasonable ranges for all those parameters if you wish, but there's no guarantee that the optimum is within this range or that that range is always stable. Also we should constrain some parameters: For the RainMaker challenge I ran with students two months ago we had the rule that sea surface temperature cannot exceed 305K anywhere. This is because a really hot ocean and therefore atmosphere would be the easiest to make it rain like crazy, but a 400K ocean is far from realistic. Would you need such a constraint to be formulated as an inequality of the parameters already or would it be enough to throw an error if it's violated? |
Also we can run the simulation longer (less about chaos more about statistics of it) or at higher resolution. This simulation takes 5s on my laptop but we can make it as expensive as you want. Regarding time series: What exactly do you need? The value we want to maximise is accumulated precipitation at one location. So you need the time series which sum is that value we maximise? (It's part of the |
So in terms of details of generating data for the parameter based surrogate, which is either
Full disclosure.. I have no idea which one it is 😅 - pinging @ChrisRackauckas on this as I think he has the vision of what the scenario should look like. In either case however, the typical workflow one has to go through is:
Initially I was under the impression that we were learning a surrogate of parameters -> time series, i.e learning a surrogate for a parameterised ODE, which is why I was inquiring about that particular detail😅. I think if we can fill out the details of the workflow above, there will be a tight scope in place that we can work in. My understanding is you and Chris have already discussed some of this relating to the vision of the hackathon, so perhaps one final review of that to get us on the same page would be great Re: simulation approach: Coming from the sciml world the mental model I have is something like setting up an ODE problem, remake it for every sample of the parameters and simulate it. The returned solve object has a retcode that we can then use to filter it out if it didn't succeed. Does such a workflow exist here? I want to go for the solution that is minimal in work for you and have it be friction free for us or others to use at the workshop. Looking forward to hearing your thoughts! |
I would say the surrogate should just learn the total precipitation (=scalar) over 15 days (maybe more to make it less dependent on chaos and represent more the statistics?). I would not try to learn the time series as the 10 parameters do not give you enough information about this. The initial conditions O(10,000) parameters at this resolution would need to be known and then you're back to the weather forecasting problem that's too complex (think GraphCast, NeuralGCM, GenCast etc).
Optimize the parameters to maximise total precipitation over the Azores.
I would start with the following that are physically not too unreasonable
|
Yes, you can
We could just |
By the way, I'm happy to upload by tomorrow evening a notebook that outlines the problem and shows how one can simulate SpeedyWeather (without the surrogate) this should probably be the introduction to that session on Fri 11 anyway, that I'm happy to cover for 10-15min, if you agree? Then we can update the notebook before the session anyway, the participants will only download the notebook just before the session anyway. |
Many thanks for your efforts by the way! Excited to see how you approach this surrogate modelling problem!! |
Okay so I took some time to go over and plan out a basic implementation of the full workflow in terms of code and also had a chance to run it so that it at least produces something sensible as a first pass. After going through this exercise, a few important discussion points came up that I think we should try and figure out - but more on that after Generating Data
using Distributed
addprocs(64)
using QuasiMonteCarlo
@everywhere using RainMaker
@everywhere using SpeedyWeather
include("utils.jl")
@everywhere const PARAMETER_KEYS = (
:orography_scale, # [1], default: 1, scale of global orography
:mountain_height, # [m], default: 0, height of an additional azores mountain
:mountain_size, # [˚], default: 1, horizontal size of an additional azores mountain
:mountain_lon, # [˚E], default: -27.25, longitude of an additional azores mountain
:mountain_lat, # [˚N], default: 38.7, latitude of an additional azores mountain
:temperature_equator, # [K], default: 300, sea surface temperature at the equator
:temperature_pole, # [K], default: 273, sea surfaec temperature at the poles
:temperature_atlantic, # [K], default: 0, sea surface temperature anomaly in the atlantic
:temperature_azores, # [K], default: 0, sea surface temperature anomaly at the azores
:zonal_wind, # [m/s], default: 35, zonal wind speed
)
n = 10000
lb = [0, -2000, 0, -180, -90, 270, 270, -5, -5, 5]
ub = [2, 5000, 30, 180, 90, 300, 300, 5, 5, 50]
s = QuasiMonteCarlo.sample(n, lb, ub, LatinHypercubeSample())
sols = pmap(eachcol(s)) do sample
sol = max_precipitation(sample)
(sample,sol)
end
sampled_params = reduce(hcat,first.(sols))
sampled_outputs = last.(sols)
#save the data
using JLSO
JLSO.save("100kdata.jlso", Dict(:d=>(inputs = sampled_params, outputs = sampled_outputs)))
#We can do exploration of data statistics etc.. julia> mean(output_data)
11.152389f0
julia> var(output_data)
137.91365f0
julia> minimum(output_data)
0.0f0
julia> maximum(output_data)
227.97934f0 Discussion points:Data appears to have a lot of variability with a CV of ~ 11. This and the histogram suggest the parameter space contains a wide range of outputs per sample which means that sampling wise, we will need a strong start (north of 10000 samples). If this is the case, do we intend for anyone to go through this process there? And if not, there is still the requirement of training the surrogate, if we are not expecting the participants to have any significant hardware, I don't know if anyone will get a surrogate done by the end of the session even if we did pre-generate the data with datasets north of 100k. This calls into question whether or not it would be productive to make the parameter space smaller in the interest of teaching purposes and time. I was able to generate 10000 samples in about ~30 minutes with 64 processes. Pretty cool! Preparing Data for Surrogate Training
# load the data if we don't decide to generate data but use something already generated
#data = JLSO.load("10kdata.jlso")[:d]
# Normalise inputs
input_data = data.inputs
inputs_lb = [0, -2000, 0, -180, -90, 270, 270, -5, -5, 5]
inputs_ub = [2, 5000, 30, 180, 90, 300, 300, 5, 5, 50]
input_data_norm = normalise(input_data, inputs_lb, inputs_ub)
# Normalise outputs
output_data = reshape(data.outputs, (1,:))
outputs_lb, outputs_ub = extrema(output_data)
output_data_norm = normalise(output_data, outputs_lb, outputs_ub)
# Split data
tsplit = 9000
input_data_norm_train = input_data_norm[:, 1:tsplit]
input_data_norm_valid = input_data_norm[:, tsplit+1:end]
output_data_norm_train = output_data_norm[:, 1:tsplit]
output_data_norm_valid = output_data_norm[:, tsplit+1:end]
#Loading data into gpu
bs = 512*4
dataloader = Flux.DataLoader(
(input_data = input_data_norm_train |> gpu,
output_data = output_data_norm_train |> gpu);
batchsize = bs
) Init a surrogate model
act = gelu
HSIZE = 256
INSIZE = length(inputs_lb)
OUTSIZE = length(outputs_lb)
surrogate = Chain(
NNLayer(INSIZE, HSIZE, act),
SkipConnection(Chain(NNLayer(HSIZE, HSIZE, act),
NNLayer(HSIZE, HSIZE, act)), +),
SkipConnection(Chain(NNLayer(HSIZE, HSIZE, act),
NNLayer(HSIZE, HSIZE, act)), +),
NNLayer(HSIZE, OUTSIZE)
)
surrogate_gpu = surrogate |> gpu Configure training hyperparameters
opt = OptimiserChain(Adam())
lrs = [1e-3, 5e-4, 3e-4, 1e-4, 5e-5]
epochs = [10000 for i in 1:length(lrs)]
st = Optimisers.setup(opt, surrogate_gpu)
ps = Flux.params(surrogate_gpu) # to use for L2 reg
lambda_reg = 1e-4 Train the model!
# Training Loop
for (e, lr) in zip(epochs, lrs)
for epoch in 1:e
Optimisers.adjust!(st, lr)
for batch in dataloader
x, y = batch.input_data, batch.output_data
gs = gradient(surrogate_gpu) do model
Flux.mae(model(x), y) + lambda_reg * sum(x_ -> sum(abs2, x_), ps) #l2 reg
end
st, surrogate_gpu = Optimisers.update(st, surrogate_gpu, gs...)
end
if epoch % 100 == 0
surrogate_cpu = surrogate_gpu |> cpu
l_t = Flux.mae(surrogate_cpu(input_data_norm_train), output_data_norm_train)
surr_v_pred = denormalise(surrogate_cpu(input_data_norm_valid), outputs_lb, outputs_ub)
gt_v = denormalise(output_data_norm_valid, outputs_lb, outputs_ub)
l_v = Flux.mae(surr_v_pred, gt_v)
@info "Epoch $epoch lr:$lr Training Loss: $l_t Validation Loss:$l_v"
end
end
end Resulting training and validation loss curvesParallel Coordinate Plot for Training DataParallel Coordinate Plot for Validation DataDiscussion PointsAs can be seen, we can easily fit the training data to great accuracy (practically indistinguishable from the training set), and we can also see the patterns of where within the training data performs poorly by selecting the region of high loss in the parallel coordinate plot. Validation however, it is clear we still haven't learned something that generalizes. In particular, the variance of the data sampled from earlier and the parallel coordinate plot for the validation data shows there is a lot of behaviour that is still not captured with the 10k data across the board (since most of the errors are above 5 or 10 which is still large). Patterns for worst behavior shows that the extrema is also a sore point of interest that requires some love as well. This particular part of the surrogate training shows how you can diagnose what you've learned then perform active learning to generate more samples. What do we think about this? Downstream OptimizationKey idea here is to demonstrate the power of having a surrogate that is incredibly fast and is fully differentiable, and is batchable. Therefore we can do the following:
K = 5 # number of candidate solutions
X_candidates = CUDA.rand(INSIZE, K) # random init in [0,1]
function input_objective(surr, X; λ=100.0)
y_pred = surr(X)
obj = -sum(y_pred) #since we have to minimize the objective, the -sum will work here
# Heavy differentiable penalty if X < 0 or X > 1
lower_violation = @. max(0.0f0, -X)
upper_violation = @. max(0.0f0, X - 1.0f0)
penalty = sum(lower_violation.^2) + sum(upper_violation.^2)
return obj + λ * penalty
end
total_steps = 10000
opt_in = Flux.Adam(1e-4)
st = Flux.setup(opt_in, X_candidates)
for step in 1:total_steps
gs = gradient(X_candidates) do guesses
input_objective(surrogate_gpu,guesses)
end
st, X_candidates = Optimisers.update(st, X_candidates, gs...)
# Print progress occasionally
if step % 200 == 0
@info "Step $step, objective() = $(input_objective(surrogate_gpu,X_candidates))"
end
end This is kind of like a global optimization with tons of different starts. In the end, you take the parameter that gave the largest value, and run it in simulation to check the answer. Of course this all depends if your surrogate is very accurate to begin with, but in a very interesting turn of events, this particular surrogate produced sensible results despite not being ready to be used yet. julia> best_inputs_orig
10-element Vector{Float32}:
0.15383114
3693.732
20.740133
-84.24525
61.337418
287.06415
294.8209
7.4847403
4.0663996
50.17178
julia> pred_vals = denormalise(pred_vals_norm, outputs_lb, outputs_ub)[idx]
255.56555f0
julia> max_precipitation(best_inputs_orig)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_7RgC
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (990.60 years/day)
71.64633f0 The surrogate prediction was off but it was correct in being in the part of parameter space that does give lots of participation (70mm is like a crazy storm right?) |
That's probably sufficient prep for the hackathon. We just need then:
and then let people play with it. |
Probably should mention a few more things. Benchmarking a single simulation vs the surrogate: julia> @benchmark max_precipitation(best_inputs_orig)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_r9ld
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (978.08 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_QGbi
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (959.20 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_IZog
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (975.92 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_o4WT
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (971.07 years/day)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min … max): 4.982 s … 4.990 s ┊ GC (min … max): 0.93% … 1.52%
Time (median): 4.986 s ┊ GC (median): 1.23%
Time (mean ± σ): 4.986 s ± 5.726 ms ┊ GC (mean ± σ): 1.23% ± 0.42%
█ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
4.98 s Histogram: frequency by time 4.99 s <
Memory estimate: 806.23 MiB, allocs estimate: 8323135.
julia> @benchmark surrogate_cpu(best_inputs_norm[:,1:1])
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 19.671 μs … 447.771 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 24.760 μs ┊ GC (median): 0.00%
Time (mean ± σ): 25.069 μs ± 8.258 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▇█▄▂
▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄▅▄▃▂▂▁▁▁▁▁▂▄▇█████▇▇▆▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▂
19.7 μs Histogram: frequency by time 29.6 μs <
Memory estimate: 7.69 KiB, allocs estimate: 12.
With speedups like this you could probably sample 1000s or 10s of 1000s of parameters during the downstream optimization unlocking some serious throughput for the procedure Also, after catching the bug of curiosity with the problem and digging deeper, I tried sampling 100k samples to generate a new surrogate, but along the way one of the simulations ran into a domain error with log ending the whole thing :( Should probably use NanMath somewhere there, but if we are expecting people to generate data we should update the data gen part with a Either way, this is a very cool simulator @milankl and I learned a lot about weather dynamics. I am excited in helping deliver all this. I think it is a very cool introduction for climate scientists to the workflow that involves real world machine learning and their domain. |
Re: the notebook I don't know if I can have one ready by tomorrow, going to be tied up with some things. @milankl I think the idea of submitting something temporarily that is still relevant so we have time to update the notebook later is a great idea. Re: how to filter the simulations IIUC, you're saying that failed simulations do not yield large amounts of rain, and since the use of a surrogate is to find the parameters that maximize the amount of rain, their mapping being low values means they won't be targets of interest anyways, so including them is harmless. However, depending on the nature of their values, i.e many parameters share the same output, this might cause more nuanced issues downstream during training, so it might be better to filter it as well. More experimentation would be necessary to determine the nature of this effect, but I am not sure if we have to go that far. |
So it turns out I won't be at the hackathon because my flight is at 11 am on friday, sad news indeed. @milankl do you think you have everything you need here to turn this into a notebook? Happy to sit with you on one of the days to clean anything up. Could help add the plotting utilities too. |
I think this is already great!! I will start formulating the challenge for 10min given this is a hackathon where the participants have 1-2hours for this challenge. I'll state the rules and that they have to submit to the RainMaker challenge before the end of the hackathon using some non-ML experimentation as outlined in the RainMaker docs but then will spend 10min on your workflow and how they could optimise this problem with a surrogate model. With the notebook they should have enough guidance to submit some surrogate-powered maximised precipitation!! |
Just to illustrate some plots of what the 20 days of precipitation on the Azores: Mostly convective rain, and it managed to have the highest rain between day 5 and 10 where on average 60mm/day is a strong continuous downpour. In comparison the 2024 Spanish floods had areas with 500mm/day but this is not something easily simulated with such a low resolution model we're using here! Putting this on a map You can see a large patch of precipitation over the North Atlantic with the Azores in the middle of it. So the location/height of that additional mountain and sea surface temperatures where placed by the surrogate model such as to create this! As you can see there's now a big 3000-4000m mountain plateau over North America and the true Earth's orography is somewhat shrunk. Himalaya's are only some 2000m ( So the surrogate best sample increased the temperatures in the North Atlantic and especially around the Azores to destabilise the atmosphere and cause convection! The boxes are because this is how I defined "North Atlantic" and "Azores" technically this is an array of thousands of parameters one could optimise, but that was obviously an overkill here! |
This is so COOL 😍 I'm so impressed by the simulator outputs and the surrogates choice. There was also the second set of parameters I found that gave even more when I ran the optimization for a bit longer. It's remarkable how useful the output is given the surrogate isn't trained to a decent level of performance. Being able to just AD your way to an answer is so satisfying, especially when the original simulator is so advanced. Definitely a very cool hackathon. Are we sure we only want to let them have only 10 mins? |
Ah no, I mean I try to explain only for 10-20min in the beginning, then they should have as much time as possible independently to play with SpeedyWeather + surrogate!! They have 1hour before lunch and 2h30min afterwards -- I envisage this is enough for them to submit initially one simulation with hand-changed parameters to understand the problem of the non-surrogate approach and then use the surrogate to find an optimised set of parameters!! I hope after this 3h30min in total there will be many submissions to the rainmaker challenge with 300mm and more of rainfall |
This is perfect. I am curious to know whether it is easy to generate these images for a given parameter vector? I want to include the run I did that got 220 mm as a real world example for my intro to surrogates presentation. It is just too good not too have. These were the parameters if you're curious best_param_sample = [0.243467, |
I did exactly that above! Here's the instructions https://speedyweather.github.io/RainMaker.jl/dev/rain_gauge/#Visualising-RainGauge-measurements I only had to |
@AnasAbdelR To emulate SpeedyWeather for the RainMaker challenge (10 parameters, 1 output parameter) I suggest the following
So that for example
yields 12.6mm of rain, which is the value we want to maximise. So you can simply pass on a vector of length 10 to explore the parameter space. Not all parameter values make sense or are even stable. When the model blows up, there's a warning printed and the precipitation will not further be accumulated. E.g.
blows up immediately, likely because of negative temperatures. Hope this is a good starting point. We can probably iterate here over the next days? Also very happy to reserve some time at the workshop, given we're only presenting on the Friday
The text was updated successfully, but these errors were encountered: