-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate live models #132
Comments
I'd be glad to work on this. Just for clarification. I am to follow the method here but with the updates 5min parquet? and by 'run forecast over 2024-05 and see which ones does the best' do you mean run the forecast for each model (gradient boosting ICON |
Thanks for reaching out on this issue. Yea, try to run this for each of the models and compare the results |
You might need to use https://open-meteo.com/en/docs/previous-runs-api in order to get NWP forecast that were made in the past |
Hi Peter, Sorry for the delay. I've been working on this using a combination of my code and the repository's code, but I've encountered some issues. I'll explain my process: • I created a test file with the first 10 PV IDs from the 5-minute parquet and then generated random timestamps for each of those systems. predictions_df = run_forecast(site, model="xgb", ts=timestamp, nwp_source=all_nwp_data) • I merged the predictions with the 5-minute data to get the actual power generated and ensured that the columns were correct for passing to the metrics function. I have a question here: is the horizon hour supposed to be just the hour? So, for the timestamp 2024-05-03 10:00:00, is the horizon hour 10? Here are the metrics: However, the cloud coverage data was not accurate, which could be contributing to these high error rates. Could you provide some guidance on these issues? |
Thanks @Jacqueline-J for this work. What do you mean the API here? The forecast horizon is the hours or minutes after the forceast is made. So horizon 10 (I think) is 10 minutes after the foreast is made. So if the forecast is made at 2024-06-10 09:00:00 then the hoirzon 10 value would be for 2024-06-10 09:10:00. Does this make sense? Im sure the cloud coverage does effect it, but there might be ways to get round that? Which function do you use the get the NWP data? It might be worth plotting a few forecasts and the truths, to understand a bit more whats going on |
Thank you for clarifying the horizon hour, I've amended my code. In terms of the NWP data, I was using the python script generated from here https://open-meteo.com/en/docs/previous-runs-api and just entering mock variables for cloud coverage. |
This is how I'm generating the NWP data import openmeteo_requests
import requests_cache
import pandas as pd
from retry_requests import retry
# pv_data from the test df
pv_df = test_df
pv_df['timestamp'] = pd.to_datetime(pv_df['timestamp']).dt.tz_localize(None)
# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after=3600)
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
openmeteo = openmeteo_requests.Client(session=retry_session)
# Prepare an empty DataFrame to collect all results
all_nwp_data = pd.DataFrame()
# Loop through each row in the pv_df
for idx, pv_row in pv_df.iterrows():
url = "https://previous-runs-api.open-meteo.com/v1/forecast"
params = {
"latitude": pv_row['latitude'],
"longitude": pv_row['longitude'],
"hourly": ["temperature_2m",
"precipitation",
"cloud_cover",
"shortwave_radiation"],
"past_days": 0
}
responses = openmeteo.weather_api(url, params=params)
# Process response
response = responses[0]
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_precipitation = hourly.Variables(1).ValuesAsNumpy()
hourly_cloud_cover = hourly.Variables(2).ValuesAsNumpy()
hourly_shortwave_radiation = hourly.Variables(3).ValuesAsNumpy()
# Define start and end time for the range
start_time = pv_row['timestamp']
end_time = pv_row['timestamp']
# Create the date range for hourly data
hourly_data = {"timestamp": pd.date_range(
start=start_time,
end=end_time,
freq=pd.Timedelta(hours=1),
inclusive="left"
)}
# Ensure the lengths of the data arrays match the length of the timestamp range
required_length = len(hourly_data["timestamp"])
hourly_data["temperature_2m"] = hourly_temperature_2m[:required_length]
hourly_data["precipitation"] = hourly_precipitation[:required_length]
hourly_data["cloud_cover"] = hourly_cloud_cover[:required_length]
hourly_data["shortwave_radiation"] = hourly_shortwave_radiation[:required_length]
# Calculate additional cloud cover variables
hourly_data['cloudcover_low'] = hourly_data['cloud_cover'] / 5
hourly_data['cloudcover_mid'] = hourly_data['cloud_cover'] / 2
hourly_data['cloudcover_high'] = hourly_data['cloud_cover'] / 1
# Convert to DataFrame
nwp_df = pd.DataFrame(data=hourly_data)
nwp_df['pv_id'] = pv_row['pv_id']
nwp_df['latitude'] = pv_row['latitude']
nwp_df['longitude'] = pv_row['longitude']
nwp_df = nwp_df.drop('cloud_cover', axis=1)
# Append to the all_nwp_data DataFrame
all_nwp_data = pd.concat([all_nwp_data, nwp_df], ignore_index=True)
# Display the combined DataFrame
all_nwp_data.head(2) |
And its probably worth noting, this is getting the final NWP for that timestamp? Or is it getting the NWP at the time of that timestamp? These NWP are forecast which make them slightly confusing |
They are now saved in https://huggingface.co/openclimatefix/open-source-quartz-solar-forecast/tree/main/data |
The generation data for august is now here - https://huggingface.co/datasets/openclimatefix/uk_pv/tree/main/data/2024/08 |
Detailed Description
It would be great to evaluate the live models
Context
Possible Implementation
The text was updated successfully, but these errors were encountered: