Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print the number of lua scripts if any when doing check-rdb #1448

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Dec 17, 2024

In old versions, if we are saving the replication info on disk, we will
save the lua scripts in the RDB see f11a758.

Currently check-rdb prints all the aux fields, this including lua scripts.
Although we no longer save lua scripts to RDB file, we may use the new
check-rdb tool to check the old RDB files, and the printing of lua scripts
will pollute the output. If it is an RDB file that abuses lua eval scripts,
it will print a lot of content.

Although it is not an actually aux field, we will still print it since
it's easy to filter using external grep.

At the end, if there is a lua script, we will print an info to show the
number of lua scripts.

In old versions, if we are saving the replication info on disk, we will
save the lua scripts in the RDB see f11a758.

Currently check-rdb prints all the aux fields, this including lua scripts.
Although we no longer save lua scripts to RDB file, we may use the new
check-rdb tool to check the old RDB files, and the printing of lua scripts
will pollute the output. If it is an RDB file that abuses lua eval scripts,
it will print a lot of content. At the end, if there is a lua script, we
will print an info to show the number of lua scripts.

This may be a potential breaking change, but i dont think someone is using
it to get the lua scripts output, people should not rely on it and we should
not print lua body anyway.

Signed-off-by: Binbin <[email protected]>
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.85%. Comparing base (980a801) to head (812b10e).
Report is 13 commits behind head on unstable.

Files with missing lines Patch % Lines
src/valkey-check-rdb.c 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1448      +/-   ##
============================================
+ Coverage     70.78%   70.85%   +0.06%     
============================================
  Files           119      119              
  Lines         64691    64697       +6     
============================================
+ Hits          45790    45838      +48     
+ Misses        18901    18859      -42     
Files with missing lines Coverage Δ
src/valkey-check-rdb.c 63.84% <50.00%> (-0.27%) ⬇️

... and 25 files with indirect coverage changes

@xbasel
Copy link
Member

xbasel commented Dec 17, 2024

This breaks existing behavior. Why not to keep all the info in the output and use tools like grep to filter output instead? (I'm not sure how LUA appears in the output and if it is possible to filter lua though)

@enjoy-binbin
Copy link
Member Author

The existing behavior is undocumented and i think we never meant to be like this, so i think it should be fine.

The output example:

[offset 0] Checking RDB file /xxx/dump.rdb
[offset 26] AUX FIELD redis-ver = '6.2.9'
[offset 40] AUX FIELD redis-bits = '64'
[offset 52] AUX FIELD ctime = '1734443263'
[offset 67] AUX FIELD used-mem = '954056'
[offset 83] AUX FIELD aof-preamble = '0'
[offset 85] Selecting DB ID 0
[offset 135] AUX FIELD lua = 'return 000004108401'
[offset 160] AUX FIELD lua = 'return 000005382894'
[offset 185] AUX FIELD lua = 'return 000006673011'
[offset 210] AUX FIELD lua = 'return 000004618768'
[offset 235] AUX FIELD lua = 'return 000003805264'
[offset 260] AUX FIELD lua = 'return 000004309420'
[offset 285] AUX FIELD lua = 'return 000000562021'
[offset 310] AUX FIELD lua = 'return 000003148230'
[offset 335] AUX FIELD lua = 'return 000009161891'
...
[offset 2619] Checksum OK
[offset 2619] \o/ RDB looks OK! \o/
[info] 1 keys read
[info] 0 expires
[info] 0 already expired

@xbasel
Copy link
Member

xbasel commented Dec 17, 2024

Even if the behavior is undocumented, clients might still rely on it, although I don’t have a strong opinion on this

@zuiderkwast
Copy link
Contributor

A hard-coded exception for some field name sees odd. I can imagine that in some cases it can be useful too see all aux fields to understand why the file is very large, for example.

Can you just grep it like grep -E -v '^\[offset [0-9]+\] AUX FIELD lua ='?

If you have multiline scripts, it's not too hard to use one-liner in sed or perl.

@enjoy-binbin
Copy link
Member Author

I can imagine that in some cases it can be useful too see all aux fields to understand why the file is very large

yean, that why i added a count that will show the number.

    if (rdbstate.lua_scripts) {
        printf("[info] %lu lua scripts read\n", rdbstate.lua_scripts);
    }

I just think we should not let people rely on it, they will see the lua scripts in a replication RDB but unable to see the lua script in the normal RDB.

@zuiderkwast
Copy link
Contributor

they will see the lua scripts in a replication RDB but unable to see the lua script in the normal RDB

We store the Lua scripts in replication RDB? Let's remove that! :)

@hwware
Copy link
Member

hwware commented Dec 17, 2024

If you just remove the Lua script, how client can understand the huge gap for the offset number?

@enjoy-binbin
Copy link
Member Author

enjoy-binbin commented Dec 18, 2024

We store the Lua scripts in replication RDB? Let's remove that! :)

yes, we only save the lua in replication RDB, see the commit and we can see there is a rsi. Back then we meant to fix the psync2 issue with the evalsha noscript problem.

    /* If we are storing the replication information on disk, persist
     * the script cache as well: on successful PSYNC after a restart, we need
     * to be able to process any EVALSHA inside the replication backlog the
     * master will send us. */
    if (rsi && dictSize(server.lua_scripts)) {

If you just remove the Lua script, how client can understand the huge gap for the offset number?

does client rely on the aux field offset? these lua script fields are in the very end of the RDB file, so they cant understand it if we remove it, but why do they need to understand the offset?

another point is that, lua script is not a real aux field at all, i think it is just a workaround, at the time psync2 was introduced to avoid replication hitting evalsha noscript errors, using this backwards-compatible aux field (server will ignore it if it doesn't understand the aux). It is not a useful aux field we should print in the check-rdb mode.

@zuiderkwast
Copy link
Contributor

I am not convinced that we should filter the output of check-rdb, because it's easy to filter using external grep.

Regarding the old code comment:

/* If we are storing the replication information on disk, persist
 * the script cache as well: on successful PSYNC after a restart, we need
 * to be able to process any EVALSHA inside the replication backlog the
 * master will send us. */

In an upgrade scenario, a new replica connected to an old version primary can still receive EVALSHA in the replication stream, right? In this case, it should also load scripts that come from the replication stream, to make sure it can correctly handle EVALSHA in the replication stream. So, for this scenario the lua fields have some significance.

But I see now in rdb.c we just ignore it

        } else if (!strcasecmp(auxkey->ptr, "lua")) {
            /* Won't load the script back in memory anymore. */

Does this mean that upgrade using replication from an old version with verbatim script replication is broken?

@enjoy-binbin
Copy link
Member Author

I am not convinced that we should filter the output of check-rdb, because it's easy to filter using external grep.

ok, i did try my best. it doesn't matter since we drop the lua script replication in 7.0, and replication RDB is also not always common (people should pay more attention to a bgsave RDB file), so it is not easy for people to find this problem (or rely on it)

Does this mean that upgrade using replication from an old version with verbatim script replication is broken?

yes, maybe, that is the reason we marked 1b0968d with a breaking change label. in this case, the user need to use lua-replicate-commands in order to use the effects replication.

@zuiderkwast
Copy link
Contributor

ok, i did try my best.

Yes, it was a good try! 😃

in this case, the user need to use lua-replicate-commands in order to use the effects replication.

Right, of course, users just change this config before they upgrade, so the upgrade is possible.

@enjoy-binbin enjoy-binbin changed the title Dont print lua script body when doing check-rdb Print the number of lua scripts if any when doing check-rdb Dec 20, 2024
@@ -280,6 +284,13 @@ int redis_check_rdb(char *rdbfilename, FILE *fp) {
goto eoferr;
}

if (!strcasecmp(auxkey->ptr, "lua")) {
/* In older version before 7.0, we may save lua scripts in a replication RDB,
* although it is not an actually aux field, we will still print it in here since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "it is not an actually aux field"? It is an aux field.

Aux field doens't mean it's not used for anything. I just means that a server ignores it if it doesn't understands it.

Btw, isn't it equally easy to count the lua scripts using grep --count?

Copy link
Member Author

@enjoy-binbin enjoy-binbin Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I just wanted to say that it's not really aux, i.e. it's different from the other aux fields (unlike other aux that provide useful info like metadata, lua is considered a data here (client data or other)). But i can see the wording is ugly indeed.

I am just bother that we are talking about external grep here a lot, it should be friendly and easy to read on its own without external help. Do you want me to drop it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants