Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster: Inconsistent MOVED/READONLY errors for read-only client #869

Open
gmbnomis opened this issue Aug 4, 2024 · 10 comments
Open

Comments

@gmbnomis
Copy link
Contributor

gmbnomis commented Aug 4, 2024

Describe the bug

Issuing a write command in Valkey cluster (7.2.6 and 8.0) while in read-only mode (i.e. after a "READONLY" command) returns
different errors depending on whether the command is part of a script or not.

(I think this issue is not very significant, but it is an inconsistency)

To reproduce

Use the following commands on a cluster replica node. In this particular example, key "a" is served by a different node, key "b" is served by the primary of the current node:

127.0.0.1:30006> READONLY
OK
127.0.0.1:30006> SET a a
(error) MOVED 15495 127.0.0.1:30003
127.0.0.1:30006> SET b a
(error) MOVED 3300 127.0.0.1:30001
127.0.0.1:30006> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 a a
(error) MOVED 15495 127.0.0.1:30003
127.0.0.1:30006> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 b a
(error) READONLY You can't write against a read only replica. script: d8f2fad9f8e86a53d2a6ebd960b33c4972cacc37, on @user_script:1.

Expected behavior

I think the description of the "READONLY" command does not make fully clear what is supposed to happen if a client uses a writing command, but I suppose we should see "MOVED" in all four cases here.

@madolson madolson moved this to Optional for rc1 in Valkey 8.0 Aug 5, 2024
@zuiderkwast
Copy link
Contributor

Makes sense. MOVED if all cases would be better.

I don't think this is a very severe bug, because clients are not expected to use READONLY with write commands. It is not clearly documented what will happen if they do. I don't think it needs to be a blocker for 8.0.0.

@proost
Copy link

proost commented Oct 30, 2024

In my case, it works differently. I use Valkey 8.0.1.

127.0.0.1:7004 has slots from 10923 to 16383

127.0.0.1:7004> READONLY
OK
127.0.0.1:7004> CLUSTER KEYSLOT a
(integer) 15495
127.0.0.1:7004> CLUSTER KEYSLOT b
(integer) 3300
127.0.0.1:7004> SET a a
-> Redirected to slot [15495] located at 127.0.0.1:7003
OK
127.0.0.1:7004> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 a a
(error) READONLY You can't write against a read only replica. script: d8f2fad9f8e86a53d2a6ebd960b33c4972cacc37, on @user_script:1.
127.0.0.1:7004> SET b a
-> Redirected to slot [3300] located at 127.0.0.1:7005
OK
127.0.0.1:7004> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 b a
-> Redirected to slot [3300] located at 127.0.0.1:7005
OK

Is change to all cases to REDIRECTED ?

@zuiderkwast
Copy link
Contributor

@soloestoy are you familiar with these differences?

@gmbnomis
Copy link
Contributor Author

gmbnomis commented Oct 31, 2024

I think the difference is just that @proost started valkey-cli in cluster mode (which supports and follows MOVED/ASK) whereas I connected to the replica node directly without enabling cluster mode to get the verbatim errors (I probably should have mentioned that).

In interactive mode, valkey-cli prints the "Redirected to slot..." message when seeing a MOVED error. As far as I know, valkey-cli does not support REDIRECT (yet?).

Thus, no REDIRECT errors are happening here. This is just a different view on the same problem and, as expected, valkey-cli can't redirect in the EVAL case, as it gets back a READONLY error instead of a MOVED.

@gmbnomis
Copy link
Contributor Author

gmbnomis commented Dec 19, 2024

I had a look at the PR #1284 and this issue again. On a more in-depth look, I stand corrected: MOVED is the wrong error code here.

The reason is that Valkey issues MOVED for EVAL before running the script. The error is caused by the key arguments of the EVAL command and is issued by the generic check here.

In contrast to this, a MOVED error is never returned for a command called from a script, see:

/* error_code == CLUSTER_REDIR_MOVED || error_code == CLUSTER_REDIR_ASK */

This makes sense, as MOVED indicates to the client that it is safe to re-issue the entire EVAL command on another node (and this is not the case if a part of the script already ran).

The problem with READONLY mode is that we can't check beforehand whether it is safe to execute the script on the current node. For the keys we serve on the current replica node, read commands are ok, but write commands are not.

So, ERR Script attempted to access a non local key in a cluster node (or a more specific error message) would probably be better than READONLY You can't write against a read only replica. But MOVED is definitively wrong IMHO.

@zuiderkwast
Copy link
Contributor

READONLY You can't write against a read only replica

Isn't this error returned before the script even starts, because EVAL is a flagged as a write command?

There is a readonly version of EVAL called EVAL_RO that is not flagged as a write command and can't do writes.

https://valkey.io/commands/eval_ro/

@gmbnomis
Copy link
Contributor Author

gmbnomis commented Dec 20, 2024

Isn't this error returned before the script even starts, because EVAL is a flagged as a write command?

In "READWRITE" mode, yes. Although it will be a MOVED.

In "READONLY" mode, as the key's slot is handled by the replica, the script will begin to run. For example:

EVAL "redis.call('PUBLISH', 'channel', 'message'); return redis.call('SET', KEYS[1], ARGV[1])" 1 b a

publishes the message and fails with READONLY ... then.

@proost
Copy link

proost commented Dec 26, 2024

@gmbnomis
But as you said, when key is handled by other shard, then return MOVED. Doing same thing on same shard is more clear and consistent to me.

@gmbnomis
Copy link
Contributor Author

While it may look more consistent at first glance, the information conveyed by a "MOVED" to the client does not allow to use it in this situation:

In READWRITE mode, Valkey can determine if the EVAL command should be executed on the current node or redirected before starting the script execution. This is why we see a MOVED error in this case.

The situation in READONLY mode is more complex:

  1. Keys on the current node: When all keys are on the replica node, the script starts executing. However, we can't guarantee that all commands within the script are read-only operations.

  2. Keys on other nodes: If any key is on a different node, Valkey returns a MOVED error before script execution, similar to READWRITE mode.

The critical point is that once a script starts executing, we can't safely return a MOVED error. Returning MOVED could lead to the entire script being re-executed on another node, which could result in duplicate operations (e.g. messages published multiple times).

Thus, if a script accesses a key it is not supposed to access, Valkey deliberately does not return MOVED (as indicated, see

/* error_code == CLUSTER_REDIR_MOVED || error_code == CLUSTER_REDIR_ASK */
). The case we are discussing here is just another of those cases and thus, must be treated the same.

@proost
Copy link

proost commented Dec 29, 2024

@gmbnomis
OK, now i see. It makes sense to me. so i change error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Optional for next RC
Development

No branches or pull requests

3 participants