Add release commits and release notes for 8.0.2 #1508

madolson · 2025-01-06T07:04:38Z

See release notes in PR.

…failover wait / client pause (valkey-io#1131) We have an assert in propagateNow. If the primary node receives a CLUSTER UPDATE such as dirty slots during SIGTERM waitting or during a manual failover pausing or during a client pause, the delKeysInSlot call will trigger this assert and cause primary crash. In this case, we added a new server_del_keys_in_slot state just like client_pause_in_transaction to track the state to avoid the assert in propagateNow, the dirty slots will be deleted in the end without affecting the data consistency. Signed-off-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>

… categories (valkey-io#1140) The module commands which were added to acl categories were getting skipped when `ACL CAT category` command was executed. This PR fixes the bug. Before: ``` 127.0.0.1:6379> ACL CAT foocategory (empty array) ``` After: ``` 127.0.0.1:6379> ACL CAT foocategory aclcheck.module.command.test.add.new.aclcategories ``` --------- Signed-off-by: Roshan Khatri <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>

…y-io#1171) The client that was killed by FUNCTION KILL received a reply of SCRIPT KILL and the server log also showed SCRIPT KILL. Signed-off-by: Binbin <[email protected]>

…o#1155) valkey-io#1145 First part of a two-step effort to add `WithSlot` API for expiry. This PR is to fix a crash that occurs when a RANDOMKEY uses a different slot than the cached slot of a client during a multi-exec. The next part will be to utilize the new API as an optimization to prevent duplicate work when calculating the slot for a key. --------- Signed-off-by: Nadav Levanoni <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Nadav Levanoni <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>

…y-io#1182) This special pattern '#' is used to get the element itself, it does not actually participate in the slot check. In this case, passing `GET #` will cause '#' to participate in the slot check, causing the command to get an `pattern may be in different slots` error. Signed-off-by: Binbin <[email protected]>

…memory usage (valkey-io#1213) The command argument strings created while parsing inline commands (see `processInlineBuffer()`) can contain free capacity. Since some commands ,such as `SET`, store these strings in the database, that free capacity increases the memory usage. In the worst case, it could double the memory usage. This only occurs if the inline command format is used. The argument strings are built by appending character by character in `sdssplitargs()`. Regular RESP commands are not affected. This change trims the strings within `processInlineBuffer()`. ### Why `trimStringObjectIfNeeded()` within `object.c` is not solving this? When the command argument string is packed into an object, `trimStringObjectIfNeeded()` is called. This does only trim the string if it is larger than `PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would ever need trimming if the command it sent using the bulk string format. We could modify this condition, but that would potentially have a performance impact on commands using the bulk format. Since those make up for the vast majority of executed commands, limiting this change to inline commands seems prudent. ### Experiment Results * 1 million `SET [key] [value]` commands * Random keys (16 bytes) * 600 bytes values Memory usage without this change: ``` used_memory:1089327888 used_memory_human:1.01G used_memory_rss:1131696128 used_memory_rss_human:1.05G used_memory_peak:1089348264 used_memory_peak_human:1.01G used_memory_peak_perc:100.00% used_memory_overhead:49302800 used_memory_startup:911808 used_memory_dataset:1040025088 used_memory_dataset_perc:95.55% ``` Memory usage with this change: ``` used_memory:705327888 used_memory_human:672.65M used_memory_rss:718802944 used_memory_rss_human:685.50M used_memory_peak:705348256 used_memory_peak_human:672.67M used_memory_peak_perc:100.00% used_memory_overhead:49302800 used_memory_startup:911808 used_memory_dataset:656025088 used_memory_dataset_perc:93.13% ``` If the same experiment is repeated using the normal RESP array of bulk string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB with and without of this change. If a replica is attached, its memory usage is 672MB with and without this change, since the replication link never uses inline commands. Signed-off-by: Stefan Mueller <[email protected]>

The CI job was introduced in valkey-io#1363, we should skip it in forks. Signed-off-by: Binbin <[email protected]>

…ccess_key` (valkey-io#1363) ## Summary This PR fixes valkey-io#1346 where we can get rid of the long term credentials by using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions workflows to access resources in Amazon Web Services (AWS), without needing to store the AWS credentials as long-lived GitHub secrets. --------- Signed-off-by: vudiep411 <[email protected]>

We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I missed it in the initial review. Signed-off-by: Roshan Khatri <[email protected]>

Introduced in valkey-io#1363, the file name does not match. Signed-off-by: Binbin <[email protected]>

- Moves `build-config.json` to workflow dir to build old versions with new configs. - Enables contributors to test release Wf on private repo by adding `github.event_name == 'workflow_dispatch' ||` --------- Signed-off-by: Roshan Khatri <[email protected]>

### Problem Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1 hashes, but it needs a way to know which scripts are least recently used. It uses an LRU list (lua_scripts_lru_list) to keep track of scripts in usage order. When the list reaches a maximum length, Valkey evicts the oldest scripts to free memory in both the list and dictionary. The problem here is that the sds from the LRU list can be pointing to already freed/moved memory by active defrag that the sds in the dictionary used to point to. It results in assertion error at [this line](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L519) ### Solution If we duplicate the sds when adding it to the LRU list, we can create an independent copy of the script identifier (sha). This duplication ensures that the sha string in the LRU list remains stable and unaffected by any defragmentation that could alter or free the original sds. In addition, dictUnlink doesn't require exact pointer match([ref](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L71-L78)) so this change makes sense to unlink the right dictEntry with the copy of the sds. ### Reproduce To reproduce it with tcl test: 1. Disable je_get_defrag_hint in defrag.c to trigger defrag often 2. Execute test script ``` start_server {tags {"auth external:skip"}} { test {Regression for script LRU crash} { r config set activedefrag yes r config set active-defrag-ignore-bytes 1 r config set active-defrag-threshold-lower 0 r config set active-defrag-threshold-upper 1 r config set active-defrag-cycle-min 99 r config set active-defrag-cycle-max 99 for {set i 0} {$i < 100000} {incr i} { r eval "return $i" 0 } after 5000; } } ``` ### Crash info Crash report: ``` === REDIS BUG REPORT START: Cut & paste starting from here === 14044:M 12 Nov 2024 14:51:27.054 # === ASSERTION FAILED === 14044:M 12 Nov 2024 14:51:27.054 # ==> eval.c:556 'de' is not true ------ STACK TRACE ------ Backtrace: /usr/bin/redis-server 127.0.0.1:6379 [cluster](luaDeleteFunction+0x148)[0x723708] /usr/bin/redis-server 127.0.0.1:6379 [cluster](luaCreateFunction+0x26c)[0x724450] /usr/bin/redis-server 127.0.0.1:6379 [cluster](evalCommand+0x2bc)[0x7254dc] /usr/bin/redis-server 127.0.0.1:6379 [cluster](call+0x574)[0x5b8d14] /usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommand+0xc84)[0x5b9b10] /usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommandAndResetClient+0x11c)[0x6db63c] /usr/bin/redis-server 127.0.0.1:6379 [cluster](processInputBuffer+0x1b0)[0x6dffd4] /usr/bin/redis-server 127.0.0.1:6379 [cluster][0x6bd968] /usr/bin/redis-server 127.0.0.1:6379 [cluster][0x659634] /usr/bin/redis-server 127.0.0.1:6379 [cluster](amzTLSEventHandler+0x194)[0x6588d8] /usr/bin/redis-server 127.0.0.1:6379 [cluster][0x750c88] /usr/bin/redis-server 127.0.0.1:6379 [cluster](aeProcessEvents+0x228)[0x757fa8] /usr/bin/redis-server 127.0.0.1:6379 [cluster](redisMain+0x478)[0x7786b8] /lib64/libc.so.6(__libc_start_main+0xe4)[0xffffa7763da4] /usr/bin/redis-server 127.0.0.1:6379 [cluster][0x5ad3b0] ``` Defrag info: ``` mem_fragmentation_ratio:1.18 mem_fragmentation_bytes:47229992 active_defrag_hits:20561 active_defrag_misses:5878518 active_defrag_key_hits:77 active_defrag_key_misses:212 total_active_defrag_time:29009 ``` ### Test: Run the test script to push 100,000 scripts to ensure the LRU list keeps 500 maximum length without any crash. ``` 27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500 [ok]: Regression for script LRU crash (6811 ms) [1/1 done]: unit/test (7 seconds) ``` --------- Signed-off-by: Seungmin Lee <[email protected]> Signed-off-by: Seungmin Lee <[email protected]> Co-authored-by: Seungmin Lee <[email protected]> Co-authored-by: Binbin <[email protected]>

…alkey-io#1265) This PR goal is to revert the changes on PR valkey-io#759 Recently, we got some reports that in Valkey 8.0 the PR valkey-io#759 (Decline unsubscribe related command in non-subscribed mode) causes break change. (valkey-io#1228) Although from my thought, call commands "unsubscribeCommand", "sunsubscribeCommand", "punsubscribeCommand" in request-response mode make no sense. This is why I created PR valkey-io#759 But breaking change is always no good, @valkey-io/core-team How do you think we revert this PR code changes? Signed-off-by: hwware <[email protected]>

) functionsLibCtxClear should clear the provided lib_ctx parameter, not the static variable curr_functions_lib_ctx, as this contradicts the function's intended purpose. The impact i guess is minor, like in some unhappy paths (diskless load fails, function restore fails?), we will mess up the functions_caches field, which is used in used_memory_functions / used_memory_scripts fileds in INFO. Signed-off-by: Binbin <[email protected]>

This PR fixes the missing stat update for `total_net_repl_output_bytes` that was removed during the refactoring in PR valkey-io#758. The metric was not being updated when writing to replica connections. Changes: - Restored the stat update in postWriteToClient for replica connections - Added integration test to verify the metric is properly updated Signed-off-by: Uri Yagelnik <[email protected]> Co-authored-by: Binbin <[email protected]>

…upKey (valkey-io#1499) When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid updating the last access time in `lookupKey`. An exception must be made for the `TOUCH` command which must always update the key. When called from a script, `server.executing_client` will point to the `TOUCH` command, while `server.current_client` will point to e.g. an `EVAL` command. So, we must use the former to find out the currently executing command if defined. This fix addresses the issue where TOUCH wasn't updating key access times when called from scripts like EVAL. Fixes valkey-io#1498 Signed-off-by: Simon Baatz <[email protected]> Co-authored-by: Binbin <[email protected]>

The code is ok before 2de544c, but now we will set server.repl_transfer_fd right after dfd was initiated, and in here we have a double close error since dfd and server.repl_transfer_fd are the same fd. Also move the declaration of dfd/maxtries to a small scope to avoid the confusion since they are only used in this code. Signed-off-by: Binbin <[email protected]>

…a side (valkey-io#1199) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]>

We should reset repl_down_since only on state change, in the current code, if the rdb channel in the dual channel is normal, that is, rdb is loaded normally, but the psync channel is abnormal, we will set repl_down_since 0 here. If the primary is down at this time, the replica may be abnormal when calculating data_age in cluster failover, since repl_state != REPL_STATE_CONNECTED, this causes the replica to be unable to initiate an election due to the old data_age. In dualChannelSyncHandleRdbLoadCompletion, if the psync channel is not established, the function will return. We will set repl_state to REPL_STATE_CONNECTED and set repl_down_since to 0 in dualChannelSyncSuccess, that is, in establishPrimaryConnection. See also 677d10b for more details. Signed-off-by: Binbin <[email protected]>

This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes valkey-io#1152 --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: ranshid <[email protected]>

Signed-off-by: Madelyn Olson <[email protected]>

codecov · 2025-01-06T07:44:24Z

Codecov Report

Attention: Patch coverage is 94.38202% with 5 lines in your changes missing coverage. Please review.

Project coverage is 70.68%. Comparing base (4fbab57) to head (ee5d99d).
Report is 1 commits behind head on 8.0.

Files with missing lines	Patch %	Lines
src/rio.h	40.00%	3 Missing ⚠️
src/db.c	95.12%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              8.0    #1508      +/-   ##
==========================================
+ Coverage   70.66%   70.68%   +0.02%     
==========================================
  Files         114      114              
  Lines       61681    62947    +1266     
==========================================
+ Hits        43585    44494     +909     
- Misses      18096    18453     +357

Files with missing lines	Coverage Δ
src/acl.c	`88.82% <100.00%> (-0.21%)`	⬇️
src/cluster_legacy.c	`86.29% <100.00%> (+0.13%)`	⬆️
src/eval.c	`57.15% <100.00%> (+0.56%)`	⬆️
src/functions.c	`95.70% <100.00%> (+0.10%)`	⬆️
src/kvstore.c	`96.29% <100.00%> (+0.15%)`	⬆️
src/networking.c	`88.06% <100.00%> (+0.07%)`	⬆️
src/rdb.c	`75.52% <100.00%> (-0.51%)`	⬇️
src/replication.c	`87.59% <100.00%> (+0.25%)`	⬆️
src/script_lua.c	`90.30% <100.00%> (+0.14%)`	⬆️
src/server.c	`88.66% <100.00%> (+0.03%)`	⬆️
... and 4 more

... and 85 files with indirect coverage changes

zuiderkwast

LGTM. Good job!

How did you keep track? The "To be backported" column in https://github.com/orgs/valkey-io/projects/2 doesn't seem to be used. Do you go through all PRs merged since the last release?

hpatro · 2025-01-06T19:45:18Z

Failures observed on daily run https://github.com/valkey-io/valkey/actions/runs/12628417919/:

https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35184526162?pr=1508 (Cherry picking change the container image to ubuntu:plucky #1359 should fix this issue)

Run apt-get update && apt-get install -y make gcc-13
Ign:1 http://archive.ubuntu.com/ubuntu lunar InRelease
Ign:2 http://archive.ubuntu.com/ubuntu lunar-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu lunar-backports InRelease
Err:4 http://archive.ubuntu.com/ubuntu lunar Release
  404  Not Found [IP: 91.189.91.82 80]
Err:5 http://archive.ubuntu.com/ubuntu lunar-updates Release
  404  Not Found [IP: 91.189.91.82 80]
Err:[6](https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35184526162?pr=1508#step:5:7) http://archive.ubuntu.com/ubuntu lunar-backports Release
  404  Not Found [IP: 91.189.91.82 80]
Ign:7 http://security.ubuntu.com/ubuntu lunar-security InRelease
Err:8 http://security.ubuntu.com/ubuntu lunar-security Release
  404  Not Found [IP: 185.125.190.83 80]
Reading package lists...
E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file.
E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file.
update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35184526162?pr=1508#step:5:14) doesn't exist
Error: Process completed with exit code 2.

https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35216795413?pr=1508 (Cherry picking Try to stabilize the failover call in the slot migration test #1078 should fix this issue)

[exception]: Executing test client: ERR Master is down or failed, please use CLUSTER FAILOVER FORCE.
ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
    while executing
"[Rn $n] {*}$args"
    (procedure "R" line 2)
    invoked from within
"R 1 cluster failover"
    ("uplevel" body line 12)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test "Migration target is auto-updated after failover in target shard" {
        # Trigger an auto-failover from R1 to R4
        fail_server 1
      ..."
    ("uplevel" body line 32)
    invoked from within
"uplevel 1 $code"
    (procedure "cluster_setup" line 35)
    invoked from within
"cluster_setup 3 3 6 continuous_slot_allocation default_replica_allocation {

    set node_timeout [lindex [R 0 CONFIG GET cluster-node-timeout] 1]
   ..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000} tags {external:skip cluster} overrides {cluster-allo..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $code"
    (procedure "start_multiple_servers" line 5)
    invoked from within
"start_multiple_servers $node_count $options $code"
    (procedure "start_cluster" line 16)
    invoked from within
"start_cluster 3 3 {tags {external:skip cluster} overrides {cluster-allow-replica-migration no cluster-node-timeout 1000} } {

    set node_timeout [li..."
    (file "tests/unit/cluster/slot-migration.tcl" line 53)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "

https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35216796290?pr=1508 (Cherry picking Update info.tcl test to revert client output limits sooner #1462 should fix this issue)

[exception]: Executing test client: I/O error reading reply.
I/O error reading reply
    while executing
"[srv $level "client"] {*}$args"
    (procedure "r" line 7)
    invoked from within
"r info stats"
    ("uplevel" body line 18)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test {stats: client input and output buffer limit disconnections} {
            r config resetstat
            set info [r info stats]
            ass..."
    ("uplevel" body line 373)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {} {

        test {latencystats: disable/enable} {
            r config resetstat
            r CONFIG SET latency-tracking no
         ..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {tags {"info" "external:skip"}} {
    start_server {} {

        test {latencystats: disable/enable} {
            r config resetstat
   ..."
    (file "tests/unit/info.tcl" line 13)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
Killing still running Valkey server 7222
Killing still running Valkey server 7223
Killing still running Valkey server 7252
Killing still running Valkey server 7253
Killing still running Valkey server 7282
Killing still running Valkey server 7283
Killing still running Valkey server 7298
Killing still running Valkey server 7467
Killing still running Valkey server 7525
Killing still running Valkey server 7641
Killing still running Valkey server 7949
Killing still running Valkey server 8047
Killing still running Valkey server 8147
Killing still running Valkey server 8159
I/O error reading reply
Killing still running Valkey server 8297
    while executing
Killing still running Valkey server 8314
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 8)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/gen_write_load.tcl" line 24)
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 8)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/gen_write_load.tcl" line 24)
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 8)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/gen_write_load.tcl" line 24)
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 8)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/gen_write_load.tcl" line 24)
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 8)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/gen_write_load.tcl" line 24)
Killing still running Valkey server 8343

https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35216802218?pr=1508 (Cherry picking Resolve bounds checks on cluster_legacy.c #1463 should fix this issue)

In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterSendFail’ at cluster_legacy.c:4125:37,
    inlined from ‘markNodeAsFailingIfNeeded’ at cluster_legacy.c:2042:5:
cluster_legacy.c:1293:24: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2312]’ [-Werror=array-bounds=]
 1293 |     msgblock->refcount = 1;
      |     ~~~~~~~~~~~~~~~~~~~^~~
cluster_legacy.c:1292:37: note: object of size 2312 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c:1294:22: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2312]’ [-Werror=array-bounds=]
 1294 |     msgblock->totlen = blocklen;
      |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~
cluster_legacy.c:1292:37: note: object of size 2312 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterSendFail’ at cluster_legacy.c:4125:37:
cluster_legacy.c:1293:24: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2312]’ [-Werror=array-bounds=]
 1293 |     msgblock->refcount = 1;
      |     ~~~~~~~~~~~~~~~~~~~^~~
cluster_legacy.c:1292:37: note: object of size 2312 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c:1294:22: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2312]’ [-Werror=array-bounds=]
 1294 |     msgblock->totlen = blocklen;
      |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~
cluster_legacy.c:1292:37: note: object of size 2312 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterRequestFailoverAuth’ at cluster_legacy.c:4252:37:
cluster_legacy.c:1293:24: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1293 |     msgblock->refcount = 1;
      |     ~~~~~~~~~~~~~~~~~~~^~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c:1294:22: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1294 |     msgblock->totlen = blocklen;
      |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c: In function ‘clusterRequestFailoverAuth’:
cluster_legacy.c:4258:36: error: array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 4258 |     if (server.cluster->mf_end) hdr->mflags[0] |= CLUSTERMSG_FLAG0_FORCEACK;
      |                                    ^~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterRequestFailoverAuth’ at cluster_legacy.c:4252:37:
cluster_legacy.c:1292:37: note: at offset 16 into object of size [227](https://github.com/valkey-io/valkey/actions/runs/12628417919/job/35216802218?pr=1508#step:4:228)2 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c: In function ‘clusterRequestFailoverAuth’:
cluster_legacy.c:4258:36: error: array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 4258 |     if (server.cluster->mf_end) hdr->mflags[0] |= CLUSTERMSG_FLAG0_FORCEACK;
      |                                    ^~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterRequestFailoverAuth’ at cluster_legacy.c:4252:37:
cluster_legacy.c:1292:37: note: at offset 16 into object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterSendFailoverAuth’ at cluster_legacy.c:4268:37:
cluster_legacy.c:1293:24: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1293 |     msgblock->refcount = 1;
      |     ~~~~~~~~~~~~~~~~~~~^~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c:1294:22: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1294 |     msgblock->totlen = blocklen;
      |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
In function ‘createClusterMsgSendBlock’,
    inlined from ‘clusterSendMFStart’ at cluster_legacy.c:4279:37:
cluster_legacy.c:1293:24: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1293 |     msgblock->refcount = 1;
      |     ~~~~~~~~~~~~~~~~~~~^~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cluster_legacy.c:1294:22: error: array subscript ‘clusterMsgSendBlock[0]’ is partly outside array bounds of ‘unsigned char[2272]’ [-Werror=array-bounds=]
 1294 |     msgblock->totlen = blocklen;
      |     ~~~~~~~~~~~~~~~~~^~~~~~~~~~
cluster_legacy.c:1292:37: note: object of size 2272 allocated by ‘zcalloc’
 1292 |     clusterMsgSendBlock *msgblock = zcalloc(blocklen);
      |                                     ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:534: cluster_legacy.o] Error 1
make[1]: Leaving directory '/home/runner/work/valkey/valkey/src'
make: *** [Makefile:6: all-with-unit-tests] Error 2

hpatro · 2025-01-06T20:11:37Z

@madolson There are four failures on the daily run and seems like all of it has been already fixed on unstable. Should I go ahead and cherry pick all of them? One of the change is in the server code and remaining are test/setup fixes. I've linked each failure with the corresponding fix in the above comment. PTAL.

madolson · 2025-01-06T21:26:09Z

@zuiderkwast >How did you keep track? The "To be backported" column in https://github.com/orgs/valkey-io/projects/2 doesn't seem to be used. Do you go through all PRs merged since the last release?

When I have them merged I move it from "To be backported" to Done.

Our fortify workflow is running on ubuntu lunar container that is EOL since [January 25, 2024(January 25, 2024](https://lists.ubuntu.com/archives/ubuntu-announce/2024-January/000298.html). This case cause the workflow to fail during update actions like: ``` apt-get update && apt-get install -y make gcc-13 update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-1[3](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:3) 100 make all-with-unit-tests CC=gcc OPT=-O3 SERVER_CFLAGS='-Werror -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3' shell: sh -e {0} Ign:1 http://security.ubuntu.com/ubuntu lunar-security InRelease Err:2 http://security.ubuntu.com/ubuntu lunar-security Release [4](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:4)04 Not Found [IP: 91.189.91.82 80] Ign:3 http://archive.ubuntu.com/ubuntu lunar InRelease Ign:4 http://archive.ubuntu.com/ubuntu lunar-updates InRelease Ign:[5](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:5) http://archive.ubuntu.com/ubuntu lunar-backports InRelease Err:[6](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:7) http://archive.ubuntu.com/ubuntu lunar Release 404 Not Found [IP: 185.125.190.81 80] Err:7 http://archive.ubuntu.com/ubuntu lunar-updates Release 404 Not Found [IP: 185.125.190.81 80] Err:8 http://archive.ubuntu.com/ubuntu lunar-backports Release 404 Not Found [IP: 185.125.190.81 80] Reading package lists... E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file. update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:14) doesn't exist Error: Process completed with exit code 2. ``` example: https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209 This pr uses the latest stable ubuntu image release [plucky](https://hub.docker.com/layers/library/ubuntu/plucky/images/sha256-dc4565c7636f006c26d54c988faae576465e825ea349fef6fd3af6bf5100e8b6?context=explore) Signed-off-by: Ran Shidlansik <[email protected]>

…-io#1078) The CI report replica will return the error when performing CLUSTER FAILOVER: ``` -ERR Master is down or failed, please use CLUSTER FAILOVER FORCE ``` This may because the primary state is fail or the cluster connection is disconnected during the primary pause. In this PR, we added some waits in wait_for_role, if the role is replica, we will wait for the replication link and the cluster link to be ok. Signed-off-by: Binbin <[email protected]>

…#1462) We set the client output buffer limits to 10 bytes, and then execute `info stats` which produces more than 10 bytes of output, which can cause that command to throw an error. I'm not sure why it wasn't consistently erroring before, might have been some change related to the ubuntu upgrade though. Issues related to ubuntu-tls are hopefully resolved now. Signed-off-by: Madelyn Olson <[email protected]>

The explanation on the original commit was wrong. Key based access must have a `~` in order to correctly configure whey key prefixes to apply the selector to. If this is missing, a server assert will be triggered later. Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: YaacovHazan <[email protected]>

Reset GC state before closing the lua VM to prevent user data to be wrongly freed while still might be used on destructor callbacks. Created and publish by Redis in their OSS branch. Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: YaacovHazan <[email protected]>

00-RELEASENOTES

PingXie · 2025-01-07T00:00:32Z

Should I go ahead and cherry pick all of them?

@hpatro, all these 4 fixes look good to me.

hpatro · 2025-01-07T00:05:46Z

Should I go ahead and cherry pick all of them?

@hpatro, all these 4 fixes look good to me.

Discussed this with @madolson offline. I've skipped the one change on the server side - #1463. We talked we can rely on clang + sanitizer for the patch release..

PingXie · 2025-01-07T00:16:47Z

Should I go ahead and cherry pick all of them?

@hpatro, all these 4 fixes look good to me.

Discussed this with @madolson offline. I've skipped the one change on the server side - #1463. We talked we can rely on clang + sanitizer for the patch release..

Not sure I follow. Don't we get build breaks without #1463?

hpatro · 2025-01-07T04:43:02Z

Not sure I follow. Don't we get build breaks without #1463?

That's right. However, cherry picking that change felt unnecessary for the patch release (security fixes + bugs). We were trying to get the release out today and I wanted to keep the payload minimum.

Signed-off-by: Madelyn Olson <[email protected]>

enjoy-binbin and others added 21 commits January 5, 2025 21:53

Fix FUNCTION KILL error message being displayed as SCRIPT KILL (valke…

c3bdb3c

…y-io#1171) The client that was killed by FUNCTION KILL received a reply of SCRIPT KILL and the server log also showed SCRIPT KILL. Signed-off-by: Binbin <[email protected]>

Skip build-release-packages CI job in forks (valkey-io#1438)

e35fffa

The CI job was introduced in valkey-io#1363, we should skip it in forks. Signed-off-by: Binbin <[email protected]>

Fix the secrete for test bucket. (valkey-io#1447)

8d17f46

We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I missed it in the initial review. Signed-off-by: Roshan Khatri <[email protected]>

Fix wrong file name in build-release-packages.yml (valkey-io#1437)

ce86ecb

Introduced in valkey-io#1363, the file name does not match. Signed-off-by: Binbin <[email protected]>

Add release notes for 8.0.2

9537d69

Signed-off-by: Madelyn Olson <[email protected]>

madolson force-pushed the 8.0 branch from de642a2 to 9537d69 Compare January 6, 2025 07:15

Remove test that relies on functionality from unstable

eaf6b8d

Signed-off-by: Madelyn Olson <[email protected]>

zuiderkwast approved these changes Jan 6, 2025

View reviewed changes

ranshid and others added 2 commits January 6, 2025 21:27

madolson and others added 3 commits January 6, 2025 21:28

madolson requested review from PingXie and hpatro January 6, 2025 22:50

hpatro reviewed Jan 6, 2025

View reviewed changes

00-RELEASENOTES Show resolved Hide resolved

hpatro reviewed Jan 6, 2025

View reviewed changes

00-RELEASENOTES Show resolved Hide resolved

PingXie approved these changes Jan 6, 2025

View reviewed changes

00-RELEASENOTES Show resolved Hide resolved

00-RELEASENOTES Show resolved Hide resolved

madolson commented Jan 6, 2025

View reviewed changes

00-RELEASENOTES Show resolved Hide resolved

enjoy-binbin approved these changes Jan 7, 2025

View reviewed changes

Update release notes with final security information

8e33420

Signed-off-by: Madelyn Olson <[email protected]>

madolson force-pushed the 8.0 branch from f36b6f7 to 8e33420 Compare January 7, 2025 18:29

madolson merged commit 8a870ed into valkey-io:8.0 Jan 7, 2025
48 of 59 checks passed

madolson deleted the 8.0 branch January 7, 2025 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add release commits and release notes for 8.0.2 #1508

Add release commits and release notes for 8.0.2 #1508

madolson commented Jan 6, 2025

codecov bot commented Jan 6, 2025 •

edited

Loading

zuiderkwast left a comment

hpatro commented Jan 6, 2025 •

edited

Loading

hpatro commented Jan 6, 2025 •

edited

Loading

madolson commented Jan 6, 2025

PingXie commented Jan 7, 2025

hpatro commented Jan 7, 2025

PingXie commented Jan 7, 2025

hpatro commented Jan 7, 2025

Add release commits and release notes for 8.0.2 #1508

Add release commits and release notes for 8.0.2 #1508

Conversation

madolson commented Jan 6, 2025

codecov bot commented Jan 6, 2025 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

hpatro commented Jan 6, 2025 • edited Loading

hpatro commented Jan 6, 2025 • edited Loading

madolson commented Jan 6, 2025

PingXie commented Jan 7, 2025

hpatro commented Jan 7, 2025

PingXie commented Jan 7, 2025

hpatro commented Jan 7, 2025

codecov bot commented Jan 6, 2025 •

edited

Loading

hpatro commented Jan 6, 2025 •

edited

Loading

hpatro commented Jan 6, 2025 •

edited

Loading