Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emulating SpeedyWeather for the RainMaker challenge #47

Open
milankl opened this issue Dec 30, 2024 · 19 comments
Open

Emulating SpeedyWeather for the RainMaker challenge #47

milankl opened this issue Dec 30, 2024 · 19 comments

Comments

@milankl
Copy link
Member

milankl commented Dec 30, 2024

@AnasAbdelR To emulate SpeedyWeather for the RainMaker challenge (10 parameters, 1 output parameter) I suggest the following

using RainMaker    # current main, I will release a new version at JuliaEO
using SpeedyWeather

const PARAMETER_KEYS = (
    :orography_scale,           # [1],      default: 1, scale of global orography
    :mountain_height,           # [m],      default: 0, height of an additional azores mountain
    :mountain_size,             # [˚],      default: 1, horizontal size of an additional azores mountain
    :mountain_lon,              # [˚E],     default: -27.25, longitude of an additional azores mountain
    :mountain_lat,              # [˚N],     default: 38.7, latitude of an additional azores mountain
    :temperature_equator,       # [K],      default: 300, sea surface temperature at the equator
    :temperature_pole,          # [K],      default: 273, sea surfaec temperature at the poles
    :temperature_atlantic,      # [K],      default: 0, sea surface temperature anomaly in the atlantic
    :temperature_azores,        # [K],      default: 0, sea surface temperature anomaly at the azores
    :zonal_wind,                # [m/s],    default: 35, zonal wind speed
)

const PARAMETER_DEFAULTS = [1, 0, 1, -27.25, 38.7, 300, 273, 0, 0, 35]

function max_precipitation(parameters::AbstractVector)
    parameter_tuple = NamedTuple{PARAMETER_KEYS}(parameters)
    return max_precipitation(parameter_tuple)
end

function max_precipitation(parameters::NamedTuple)

    # define resolution. Use trunc=42, 63, 85, 127, ... for higher resolution, cubically slower
    spectral_grid = SpectralGrid(trunc=31, nlayers=8)

    # Define AquaPlanet ocean, for idealised sea surface temperatures
    # but don't change land-sea mask = retain real ocean basins
    ocean = AquaPlanet(spectral_grid,
                temp_equator=parameters.temperature_equator,
                temp_poles=parameters.temperature_pole)

    initial_conditions = InitialConditions(
        vordiv = ZonalWind(u₀=parameters.zonal_wind),
        temp = JablonowskiTemperature(u₀=parameters.zonal_wind),
        pres = PressureOnOrography(),
        humid = ConstantRelativeHumidity())

    orography = EarthOrography(spectral_grid, scale=parameters.orography_scale)

    # construct model
    model = PrimitiveWetModel(spectral_grid; ocean, initial_conditions, orography)

    # Add rain gauge, locate on Terceira Island
    rain_gauge = RainGauge(spectral_grid, lond=-27.25, latd=38.7)
    add!(model, rain_gauge)

    # Initialize
    simulation = initialize!(model, time=DateTime(2025, 1, 10))

    # Add additional  mountain
    H = parameters.mountain_height
    λ₀, φ₀, σ = parameters.mountain_lon, parameters.mountain_lat, parameters.mountain_size  
    set!(model, orography=(λ,φ) -> H*exp(-spherical_distance((λ,φ), (λ₀,φ₀), radius=360/2π)^2/2σ^2), add=true)

    # set sea surface temperature anomalies
    # 1. Atlantic
    set!(simulation, sea_surface_temperature=
        (λ, φ) -> (30 < φ < 60) && (270 < λ < 360) ? parameters.temperature_atlantic : 0, add=true)

    # 2. Azores
    A = parameters.temperature_azores
    λ_az, φ_az, σ_az = -27.25, 38.7, 4    # location [˚], size [˚] of Azores
    set!(simulation, sea_surface_temperature=
        (λ, φ) -> A*exp(-spherical_distance((λ,φ), (λ_az,φ_az), radius=360/2π)^2/2σ_az^2), add=true)

    # Run simulation for 20 days, maybe longer for more stable statistics? Could be increased to 30, 40, ... days ?
    run!(simulation, period=Day(20))

    # skip first 5 days, as is done in the RainMaker challenge
    RainMaker.skip!(rain_gauge, Day(5))

    # evaluate rain gauge
    lsc = rain_gauge.accumulated_rain_large_scale
    conv = rain_gauge.accumulated_rain_convection
    total_precip = maximum(lsc) + maximum(conv)
    return total_precip
end

So that for example

julia> max_precipitation(PARAMETER_DEFAULTS)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_rZgV
Weather is speedy: 100%|█████████████████████| Time: 0:00:06 (775.84 years/d
ay)
12.624438f0

yields 12.6mm of rain, which is the value we want to maximise. So you can simply pass on a vector of length 10 to explore the parameter space. Not all parameter values make sense or are even stable. When the model blows up, there's a warning printed and the precipitation will not further be accumulated. E.g.

julia> max_precipitation(randn(10))
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_l8vG
┌ Warning: NaN or Inf detected at time step 1
└ @ SpeedyWeather ~/git/SpeedyWeather.jl/src/output/feedback.jl:144
0.0f0

blows up immediately, likely because of negative temperatures. Hope this is a good starting point. We can probably iterate here over the next days? Also very happy to reserve some time at the workshop, given we're only presenting on the Friday

@AnasAbdelR
Copy link

Great, this seems like it could work!

I am curious what are the bounds we can sample from for PARAMETER_DEFAULTS i.e what are the lower and upper bounds for those 10 numbers. Also, after a simulation has been conducted i.e max_precipitation(randn(10)) how do I collect the state over time values? I think we can sort out the major things through the issue and I would love to set some time ahead during the days of the workshop for any minor touch ups!

I tried to CC chris on this but I don't think he has access to the issue?

@milankl
Copy link
Member Author

milankl commented Dec 31, 2024

On bounds: Those are not necessarily obvious and can result from physics as well as numerics. E.g. a negative temperature likely causes issues but even a 100K atmosphere is far from realistic and may push the numerical stability (there's some assumption about a reference temperature that is more and more violated the further away we are). I could guess some reasonable ranges for all those parameters if you wish, but there's no guarantee that the optimum is within this range or that that range is always stable.

Also we should constrain some parameters: For the RainMaker challenge I ran with students two months ago we had the rule that sea surface temperature cannot exceed 305K anywhere. This is because a really hot ocean and therefore atmosphere would be the easiest to make it rain like crazy, but a 400K ocean is far from realistic. Would you need such a constraint to be formulated as an inequality of the parameters already or would it be enough to throw an error if it's violated?

@milankl
Copy link
Member Author

milankl commented Dec 31, 2024

Also we can run the simulation longer (less about chaos more about statistics of it) or at higher resolution. This simulation takes 5s on my laptop but we can make it as expensive as you want.

Regarding time series: What exactly do you need? The value we want to maximise is accumulated precipitation at one location. So you need the time series which sum is that value we maximise? (It's part of the rain_gauge although there's a diff missing as its vectors are a cumsum)

@AnasAbdelR
Copy link

AnasAbdelR commented Dec 31, 2024

So in terms of details of generating data for the parameter based surrogate, which is either

  1. parameters -> vector/scalar value
  2. parameters -> time series

Full disclosure.. I have no idea which one it is 😅 - pinging @ChrisRackauckas on this as I think he has the vision of what the scenario should look like.

In either case however, the typical workflow one has to go through is:

  1. Choose the downstream application of the surrogate (parameter estimation etc..)
  2. Scope the bounds of the parameter space in which the surrogate is expected to be used in.
  3. Sample this space, and for every sample, run a simulation (preferably parallelized if possible 😍)
  4. Organize the dataset into input, output pairs of parameter -> output(s)_of_interest and any other preprocessing
  5. Pick your architecture of choice
  6. Train it, validate it etc..
  7. Use it in your downstream process!

Initially I was under the impression that we were learning a surrogate of parameters -> time series, i.e learning a surrogate for a parameterised ODE, which is why I was inquiring about that particular detail😅. I think if we can fill out the details of the workflow above, there will be a tight scope in place that we can work in. My understanding is you and Chris have already discussed some of this relating to the vision of the hackathon, so perhaps one final review of that to get us on the same page would be great

Re: simulation approach:

Coming from the sciml world the mental model I have is something like setting up an ODE problem, remake it for every sample of the parameters and simulate it. The returned solve object has a retcode that we can then use to filter it out if it didn't succeed. Does such a workflow exist here? I want to go for the solution that is minimal in work for you and have it be friction free for us or others to use at the workshop. Looking forward to hearing your thoughts!

@milankl
Copy link
Member Author

milankl commented Jan 1, 2025

parameters -> vector/scalar value
parameters -> time series

I would say the surrogate should just learn the total precipitation (=scalar) over 15 days (maybe more to make it less dependent on chaos and represent more the statistics?). I would not try to learn the time series as the 10 parameters do not give you enough information about this. The initial conditions O(10,000) parameters at this resolution would need to be known and then you're back to the weather forecasting problem that's too complex (think GraphCast, NeuralGCM, GenCast etc).

  1. Choose the downstream application of the surrogate (parameter estimation etc..)

Optimize the parameters to maximise total precipitation over the Azores.

  1. Scope the bounds of the parameter space in which the surrogate is expected to be used in.

I would start with the following that are physically not too unreasonable

  • orography_scale in $$[0, 2]$$ to remove all mountains (=0) or make them twice the height (=2)
  • mountain_height in $$[-2000, 5000]$$ (meters) to add a mountain that can also be a dip
  • mountain_size in $$[0, 30]$$ (degrees) more than 30˚ in lat/lon is just an enormous mountain range
  • mountain_lon in $$[-180, 180]$$ (degrees)
  • mountain_lat in $$[-90, 90]$$ (degrees)
  • temperature_equator in $$[270, 300]$$ (Kelvin) between freezing and 30ish degress
  • temperature_pole in $$[270, 300]$$ (Kelvin)
  • temperature_atlantic in $$[-5, 5]$$ (Kelvin) making the Atlantic/Azores only a few degres warmer/colder
  • temperature_azores in $$[-5, 5]$$ (Kelvin)
  • zonal_wind in $$[5, 50]$$ (m/s) this is in balance with a meridional temperature profile, values outside these ranges may make the poles too warm or the equator too cold

@milankl
Copy link
Member Author

milankl commented Jan 1, 2025

preferably parallelized if possible

Yes, you can @distribute this function (or whatever parallelism you like) there's no race condition in SpeedyWeather as long as you're dealing with independent instances of model and simulation (which are created within the max_precipitation function I defined above)

The returned solve object has a retcode that we can then use to filter it out if it didn't succeed. Does such a workflow exist here?

We could just return total_precip, simulation above, then simulation.model.feedback.nars_detected is true when the model blew up but false when it ran through succesfully. However, the total_precip is also not further increased when the model blew up so maybe that's enough penalisation in the loss function? E.g. for parameter p=1 the model doesn't blow up and returns 30 (mm of precipitation) for parameter p=2 the model blows up and returns 0 (or, say, ~15 if it blew up halfway through the simulation)

@milankl
Copy link
Member Author

milankl commented Jan 1, 2025

By the way, I'm happy to upload by tomorrow evening a notebook that outlines the problem and shows how one can simulate SpeedyWeather (without the surrogate) this should probably be the introduction to that session on Fri 11 anyway, that I'm happy to cover for 10-15min, if you agree? Then we can update the notebook before the session anyway, the participants will only download the notebook just before the session anyway.

@milankl
Copy link
Member Author

milankl commented Jan 1, 2025

Many thanks for your efforts by the way! Excited to see how you approach this surrogate modelling problem!!

@AnasAbdelR
Copy link

Okay so I took some time to go over and plan out a basic implementation of the full workflow in terms of code and also had a chance to run it so that it at least produces something sensible as a first pass. After going through this exercise, a few important discussion points came up that I think we should try and figure out - but more on that after

Generating Data

  1. Specify upper and lower bounds, including number of samples
  2. Use a Quasi Monte Carlo scheme such as LatinHypercubeSample to sample this space
  3. Simulate each sample in parallel and return a tuple that is (params,output)
using Distributed
addprocs(64)
using QuasiMonteCarlo
@everywhere using RainMaker  
@everywhere using SpeedyWeather
include("utils.jl")


@everywhere const PARAMETER_KEYS = (
    :orography_scale,           # [1],      default: 1, scale of global orography
    :mountain_height,           # [m],      default: 0, height of an additional azores mountain
    :mountain_size,             # [˚],      default: 1, horizontal size of an additional azores mountain
    :mountain_lon,              # [˚E],     default: -27.25, longitude of an additional azores mountain
    :mountain_lat,              # [˚N],     default: 38.7, latitude of an additional azores mountain
    :temperature_equator,       # [K],      default: 300, sea surface temperature at the equator
    :temperature_pole,          # [K],      default: 273, sea surfaec temperature at the poles
    :temperature_atlantic,      # [K],      default: 0, sea surface temperature anomaly in the atlantic
    :temperature_azores,        # [K],      default: 0, sea surface temperature anomaly at the azores
    :zonal_wind,                # [m/s],    default: 35, zonal wind speed
)

n = 10000
lb = [0, -2000, 0, -180, -90, 270, 270, -5, -5, 5]
ub = [2, 5000, 30, 180, 90, 300, 300, 5, 5, 50]

s = QuasiMonteCarlo.sample(n, lb, ub, LatinHypercubeSample())

sols = pmap(eachcol(s)) do sample
    sol = max_precipitation(sample)
    (sample,sol)
end

sampled_params = reduce(hcat,first.(sols))
sampled_outputs = last.(sols)

#save the data
using JLSO
JLSO.save("100kdata.jlso", Dict(:d=>(inputs = sampled_params, outputs = sampled_outputs)))
#We can do exploration of data statistics etc.. 

totalperciptation

julia> mean(output_data)
11.152389f0

julia> var(output_data)
137.91365f0

julia> minimum(output_data)
0.0f0

julia> maximum(output_data)
227.97934f0

Discussion points:

Data appears to have a lot of variability with a CV of ~ 11. This and the histogram suggest the parameter space contains a wide range of outputs per sample which means that sampling wise, we will need a strong start (north of 10000 samples). If this is the case, do we intend for anyone to go through this process there? And if not, there is still the requirement of training the surrogate, if we are not expecting the participants to have any significant hardware, I don't know if anyone will get a surrogate done by the end of the session even if we did pre-generate the data with datasets north of 100k. This calls into question whether or not it would be productive to make the parameter space smaller in the interest of teaching purposes and time. I was able to generate 10000 samples in about ~30 minutes with 64 processes. Pretty cool!

Preparing Data for Surrogate Training

  1. Normalize inputs and outputs to be between 0 and 1
  2. Split the data into train and validation
  3. Package data to be in batches and place on the gpu
# load the data if we don't decide to generate data but use something already generated
#data = JLSO.load("10kdata.jlso")[:d] 

# Normalise inputs
input_data = data.inputs
inputs_lb = [0, -2000, 0, -180, -90, 270, 270, -5, -5, 5]
inputs_ub = [2, 5000, 30, 180, 90, 300, 300, 5, 5, 50]
input_data_norm = normalise(input_data, inputs_lb, inputs_ub)

# Normalise outputs
output_data = reshape(data.outputs, (1,:))
outputs_lb, outputs_ub = extrema(output_data)
output_data_norm = normalise(output_data, outputs_lb, outputs_ub)

# Split data
tsplit = 9000
input_data_norm_train = input_data_norm[:, 1:tsplit]
input_data_norm_valid = input_data_norm[:, tsplit+1:end]
output_data_norm_train = output_data_norm[:, 1:tsplit]
output_data_norm_valid = output_data_norm[:, tsplit+1:end]


#Loading data into gpu
bs = 512*4
dataloader = Flux.DataLoader(
    (input_data = input_data_norm_train |> gpu,
     output_data = output_data_norm_train |> gpu);
    batchsize = bs
)

Init a surrogate model

  1. Choose architecture (we defer to something simple for now)
  2. Specify hyperparameters
  3. Load onto the gpu
act = gelu
HSIZE = 256
INSIZE = length(inputs_lb)
OUTSIZE = length(outputs_lb)

surrogate = Chain(
    NNLayer(INSIZE, HSIZE, act),
    SkipConnection(Chain(NNLayer(HSIZE, HSIZE, act),
                         NNLayer(HSIZE, HSIZE, act)), +),
    SkipConnection(Chain(NNLayer(HSIZE, HSIZE, act),
                         NNLayer(HSIZE, HSIZE, act)), +),
    NNLayer(HSIZE, OUTSIZE)
)

surrogate_gpu = surrogate |> gpu

Configure training hyperparameters

  1. Use a basic schedule to decay learning rate over time
  2. Decide on epochs per learning rate decrement
  3. Choose the optimiser
  4. track parameters for L2 regularization
opt = OptimiserChain(Adam())
lrs = [1e-3, 5e-4, 3e-4, 1e-4, 5e-5]
epochs = [10000 for i in 1:length(lrs)]
st = Optimisers.setup(opt, surrogate_gpu)
ps = Flux.params(surrogate_gpu) # to use for L2 reg
lambda_reg = 1e-4

Train the model!

  1. Run the model through the loop
  2. Sprinkle some standard logging (we can definitely flesh this out way more)
# Training Loop
for (e, lr) in zip(epochs, lrs)
    for epoch in 1:e
        Optimisers.adjust!(st, lr)
        for batch in dataloader
            x, y = batch.input_data, batch.output_data
            gs = gradient(surrogate_gpu) do model
                Flux.mae(model(x), y) + lambda_reg * sum(x_ -> sum(abs2, x_), ps) #l2 reg
            end
            st, surrogate_gpu = Optimisers.update(st, surrogate_gpu, gs...)
        end
        if epoch % 100 == 0
            surrogate_cpu = surrogate_gpu |> cpu
            l_t = Flux.mae(surrogate_cpu(input_data_norm_train), output_data_norm_train)
            surr_v_pred = denormalise(surrogate_cpu(input_data_norm_valid), outputs_lb, outputs_ub)
            gt_v = denormalise(output_data_norm_valid, outputs_lb, outputs_ub)
            l_v = Flux.mae(surr_v_pred, gt_v)
            @info "Epoch $epoch lr:$lr Training Loss: $l_t Validation Loss:$l_v"
        end
    end
end

Resulting training and validation loss curves

loss_curves

Parallel Coordinate Plot for Training Data

image (4)
image (5)

Parallel Coordinate Plot for Validation Data

image (6)
image (7)

Discussion Points

As can be seen, we can easily fit the training data to great accuracy (practically indistinguishable from the training set), and we can also see the patterns of where within the training data performs poorly by selecting the region of high loss in the parallel coordinate plot. Validation however, it is clear we still haven't learned something that generalizes. In particular, the variance of the data sampled from earlier and the parallel coordinate plot for the validation data shows there is a lot of behaviour that is still not captured with the 10k data across the board (since most of the errors are above 5 or 10 which is still large). Patterns for worst behavior shows that the extrema is also a sore point of interest that requires some love as well. This particular part of the surrogate training shows how you can diagnose what you've learned then perform active learning to generate more samples. What do we think about this?

Downstream Optimization

Key idea here is to demonstrate the power of having a surrogate that is incredibly fast and is fully differentiable, and is batchable. Therefore we can do the following:

  1. Randomly sample a series of parameters from the parameter space as starting points
  2. Specify the objective of finding the parameters that result in the most total precipitation, and that the parameters stay in the bounds the network was trained on
  3. Perform gradient descent to update the input parameters into the network that minimize this objectiive.
  4. Choose from the sampled parameters that gave the largest output
  5. Simulate it with the simulator to verify
K = 5  # number of candidate solutions
X_candidates = CUDA.rand(INSIZE, K)  # random init in [0,1]
function input_objective(surr, X; λ=100.0)
    y_pred = surr(X) 
    obj = -sum(y_pred) #since we have to minimize the objective, the -sum will work here
    # Heavy differentiable penalty if X < 0 or X > 1
    lower_violation = @. max(0.0f0, -X)      
    upper_violation = @. max(0.0f0, X - 1.0f0) 
    penalty = sum(lower_violation.^2) + sum(upper_violation.^2)
    return obj + λ * penalty
end
total_steps = 10000
opt_in = Flux.Adam(1e-4)
st = Flux.setup(opt_in, X_candidates)
for step in 1:total_steps
    gs = gradient(X_candidates) do guesses
        input_objective(surrogate_gpu,guesses)
    end
    st, X_candidates = Optimisers.update(st, X_candidates, gs...)
    # Print progress occasionally
    if step % 200 == 0
        @info "Step $step, objective() = $(input_objective(surrogate_gpu,X_candidates))"
    end
end

This is kind of like a global optimization with tons of different starts. In the end, you take the parameter that gave the largest value, and run it in simulation to check the answer. Of course this all depends if your surrogate is very accurate to begin with, but in a very interesting turn of events, this particular surrogate produced sensible results despite not being ready to be used yet.

julia> best_inputs_orig
10-element Vector{Float32}:
    0.15383114
 3693.732
   20.740133
  -84.24525
   61.337418
  287.06415
  294.8209
    7.4847403
    4.0663996
   50.17178

julia> pred_vals = denormalise(pred_vals_norm, outputs_lb, outputs_ub)[idx]
255.56555f0

julia> max_precipitation(best_inputs_orig)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_7RgC
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (990.60 years/day)
71.64633f0

The surrogate prediction was off but it was correct in being in the part of parameter space that does give lots of participation (70mm is like a crazy storm right?)

@ChrisRackauckas
Copy link

That's probably sufficient prep for the hackathon. We just need then:

  1. A few slides introducing the problem.
  2. A notebook that walks through this first attempt
  3. Some suggested other methods:
    a. XGBoost.jl
    b. MLJ.jl
    c. https://github.com/JuliaML/LIBSVM.jl
    d. Tuning the neural network
    e. Improving the data sampling (adaptive sampling)

and then let people play with it.

@AnasAbdelR
Copy link

Probably should mention a few more things. Benchmarking a single simulation vs the surrogate:

julia> @benchmark max_precipitation(best_inputs_orig)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_r9ld
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (978.08 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_QGbi
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (959.20 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_IZog
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (975.92 years/day)
[ Info: RainGauge{Float32, AnvilInterpolator{Float32, OctahedralGaussianGrid}} callback added with key callback_o4WT
Weather is speedy: 100%|████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:04 (971.07 years/day)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min  max):  4.982 s   4.990 s  ┊ GC (min  max): 0.93%  1.52%
Time  (median):     4.986 s             ┊ GC (median):    1.23%
Time  (mean ± σ):   4.986 s ± 5.726 ms  ┊ GC (mean ± σ):  1.23% ± 0.42%

█                                                      █  
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
4.98 s        Histogram: frequency by time        4.99 s <

Memory estimate: 806.23 MiB, allocs estimate: 8323135.


julia> @benchmark surrogate_cpu(best_inputs_norm[:,1:1])                                                                                              
BenchmarkTools.Trial: 10000 samples with 1 evaluation.                                                                                                
Range (min  max):  19.671 μs  447.771 μs  ┊ GC (min  max): 0.00%  0.00%
Time  (median):     24.760 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   25.069 μs ±   8.258 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                            ▄▇█▄▂                              
▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄▅▄▃▂▂▁▁▁▁▁▂▄▇█████▇▇▆▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▂
19.7 μs         Histogram: frequency by time         29.6 μs <

Memory estimate: 7.69 KiB, allocs estimate: 12.

With speedups like this you could probably sample 1000s or 10s of 1000s of parameters during the downstream optimization unlocking some serious throughput for the procedure

Also, after catching the bug of curiosity with the problem and digging deeper, I tried sampling 100k samples to generate a new surrogate, but along the way one of the simulations ran into a domain error with log ending the whole thing :( Should probably use NanMath somewhere there, but if we are expecting people to generate data we should update the data gen part with a try catch. Providing the data IMO feels like the play.

Either way, this is a very cool simulator @milankl and I learned a lot about weather dynamics. I am excited in helping deliver all this. I think it is a very cool introduction for climate scientists to the workflow that involves real world machine learning and their domain.

@AnasAbdelR
Copy link

AnasAbdelR commented Jan 2, 2025

Re: the notebook

I don't know if I can have one ready by tomorrow, going to be tied up with some things. @milankl I think the idea of submitting something temporarily that is still relevant so we have time to update the notebook later is a great idea.

Re: how to filter the simulations

IIUC, you're saying that failed simulations do not yield large amounts of rain, and since the use of a surrogate is to find the parameters that maximize the amount of rain, their mapping being low values means they won't be targets of interest anyways, so including them is harmless. However, depending on the nature of their values, i.e many parameters share the same output, this might cause more nuanced issues downstream during training, so it might be better to filter it as well. More experimentation would be necessary to determine the nature of this effect, but I am not sure if we have to go that far.

@AnasAbdelR
Copy link

So it turns out I won't be at the hackathon because my flight is at 11 am on friday, sad news indeed. @milankl do you think you have everything you need here to turn this into a notebook? Happy to sit with you on one of the days to clean anything up. Could help add the plotting utilities too.

@milankl
Copy link
Member Author

milankl commented Jan 3, 2025

I think this is already great!! I will start formulating the challenge for 10min given this is a hackathon where the participants have 1-2hours for this challenge. I'll state the rules and that they have to submit to the RainMaker challenge before the end of the hackathon using some non-ML experimentation as outlined in the RainMaker docs but then will spend 10min on your workflow and how they could optimise this problem with a surrogate model. With the notebook they should have enough guidance to submit some surrogate-powered maximised precipitation!!

@milankl
Copy link
Member Author

milankl commented Jan 3, 2025

Just to illustrate some plots of what the best_param_sample looks like

image

20 days of precipitation on the Azores: Mostly convective rain, and it managed to have the highest rain between day 5 and 10 where on average 60mm/day is a strong continuous downpour. In comparison the 2024 Spanish floods had areas with 500mm/day but this is not something easily simulated with such a low resolution model we're using here!

Putting this on a map

image

You can see a large patch of precipitation over the North Atlantic with the Azores in the middle of it. So the location/height of that additional mountain and sea surface temperatures where placed by the surrogate model such as to create this!
Our orography now looks like

image

As you can see there's now a big 3000-4000m mountain plateau over North America and the true Earth's orography is somewhat shrunk. Himalaya's are only some 2000m (orography_scale is 1/4!). Sea surface temperature now looks like

image

So the surrogate best sample increased the temperatures in the North Atlantic and especially around the Azores to destabilise the atmosphere and cause convection! The boxes are because this is how I defined "North Atlantic" and "Azores" technically this is an array of thousands of parameters one could optimise, but that was obviously an overkill here!

@AnasAbdelR
Copy link

This is so COOL 😍 I'm so impressed by the simulator outputs and the surrogates choice. There was also the second set of parameters I found that gave even more when I ran the optimization for a bit longer. It's remarkable how useful the output is given the surrogate isn't trained to a decent level of performance. Being able to just AD your way to an answer is so satisfying, especially when the original simulator is so advanced. Definitely a very cool hackathon. Are we sure we only want to let them have only 10 mins?

@milankl
Copy link
Member Author

milankl commented Jan 3, 2025

Ah no, I mean I try to explain only for 10-20min in the beginning, then they should have as much time as possible independently to play with SpeedyWeather + surrogate!! They have 1hour before lunch and 2h30min afterwards -- I envisage this is enough for them to submit initially one simulation with hand-changed parameters to understand the problem of the non-surrogate approach and then use the surrogate to find an optimised set of parameters!! I hope after this 3h30min in total there will be many submissions to the rainmaker challenge with 300mm and more of rainfall

@AnasAbdelR
Copy link

AnasAbdelR commented Jan 3, 2025

This is perfect. I am curious to know whether it is easy to generate these images for a given parameter vector? I want to include the run I did that got 220 mm as a real world example for my intro to surrogates presentation. It is just too good not too have.

These were the parameters if you're curious

best_param_sample = [0.243467,
3456.64,
21.2324,
-90.1826,
58.4064,
287.617,
295.99,
4.98632,
3.45722,
50.0179]

@milankl
Copy link
Member Author

milankl commented Jan 3, 2025

I did exactly that above! Here's the instructions https://speedyweather.github.io/RainMaker.jl/dev/rain_gauge/#Visualising-RainGauge-measurements

I only had to return simulation instead of just total_precip but you can also just unpack that function and run the steps individually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants