Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for scripting engines as Valkey modules #1277

Merged
merged 38 commits into from
Dec 21, 2024

Conversation

rjd15372
Copy link
Contributor

@rjd15372 rjd15372 commented Nov 8, 2024

This PR extends the module API to support the addition of different scripting engines to execute user defined functions.

The scripting engine can be implemented as a Valkey module, and can be dynamically loaded with the loadmodule config directive, or with the MODULE LOAD command.

This PR also adds an example of a dummy scripting engine module, to show how to use the new module API. The dummy module is implemented in tests/modules/helloscripting.c.

The current module API support, only allows to load scripting engines to run functions using FCALL command.

The additions to the module API are the following:

/* This struct represents a scripting engine function that results from the
 * compilation of a script by the engine implementation. */
struct ValkeyModuleScriptingEngineCompiledFunction

typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    const char *code,
    size_t timeout,
    size_t *out_num_compiled_functions,
    char **err);

typedef void (*ValkeyModuleScriptingEngineCallFunctionFunc)(
    ValkeyModuleCtx *module_ctx,
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    ValkeyModuleScriptingEngineFunctionCtx *func_ctx,
    void *compiled_function,
    ValkeyModuleString **keys,
    size_t nkeys,
    ValkeyModuleString **args,
    size_t nargs);

typedef size_t (*ValkeyModuleScriptingEngineGetUsedMemoryFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx);

typedef size_t (*ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc)(
    void *compiled_function);

typedef size_t (*ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx);

typedef void (*ValkeyModuleScriptingEngineFreeFunctionFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    void *compiled_function);

/* This struct stores the callback functions implemented by the scripting
 * engine to provide the functionality for the `FUNCTION *` commands. */
typedef struct ValkeyModuleScriptingEngineMethodsV1 {
    uint64_t version; /* Version of this structure for ABI compat. */

    /* Library create function callback. When a new script is loaded, this
     * callback will be called with the script code, and returns a list of
     * ValkeyModuleScriptingEngineCompiledFunc objects. */
    ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library;

    /* The callback function called when `FCALL` command is called on a function
     * registered in this engine. */
    ValkeyModuleScriptingEngineCallFunctionFunc call_function;

    /* Function callback to get current used memory by the engine. */
    ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory;

    /* Function callback to return memory overhead for a given function. */
    ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead;

    /* Function callback to return memory overhead of the engine. */
    ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead;

    /* Function callback to free the memory of a registered engine function. */
    ValkeyModuleScriptingEngineFreeFunctionFunc free_function;
} ValkeyModuleScriptingEngineMethodsV1;

/* Registers a new scripting engine in the server.
 *
 * - `engine_name`: the name of the scripting engine. This name will match
 *   against the engine name specified in the script header using a shebang.
 *
 * - `engine_ctx`: engine specific context pointer.
 *
 * - `engine_methods`: the struct with the scripting engine callback functions
 * pointers.
 */
int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx *ctx,
                                         const char *engine_name,
                                         void *engine_ctx,
                                         ValkeyModuleScriptingEngineMethods engine_methods);

/* Removes the scripting engine from the server.
 *
 * `engine_name` is the name of the scripting engine.
 *
 */
int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx *ctx, const char *engine_name);

@rjd15372
Copy link
Contributor Author

rjd15372 commented Nov 8, 2024

I'm opening this PR in draft mode because I still need to fix test failures, and add new tests as well.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 79.09091% with 46 lines in your changes missing coverage. Please review.

Project coverage is 70.79%. Comparing base (e024b4b) to head (3cfea1c).
Report is 9 commits behind head on unstable.

Files with missing lines Patch % Lines
src/functions.c 76.27% 28 Missing ⚠️
src/module.c 10.52% 17 Missing ⚠️
src/function_lua.c 98.57% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1277      +/-   ##
============================================
- Coverage     70.82%   70.79%   -0.03%     
============================================
  Files           119      119              
  Lines         64691    64884     +193     
============================================
+ Hits          45818    45937     +119     
- Misses        18873    18947      +74     
Files with missing lines Coverage Δ
src/script_lua.c 90.31% <100.00%> (ø)
src/util.c 71.74% <100.00%> (+0.42%) ⬆️
src/function_lua.c 98.89% <98.57%> (-0.30%) ⬇️
src/module.c 9.62% <10.52%> (+<0.01%) ⬆️
src/functions.c 91.98% <76.27%> (-3.56%) ⬇️

... and 23 files with indirect coverage changes

@rjd15372 rjd15372 marked this pull request as ready for review November 8, 2024 14:52
@rjd15372 rjd15372 force-pushed the engine-api-1261 branch 2 times, most recently from 4a2e223 to 1a808b2 Compare November 8, 2024 16:13
@rjd15372
Copy link
Contributor Author

rjd15372 commented Nov 8, 2024

I just realized I still need to implement the MODULE UNLOAD support. The existing code was not prepared to allow the deletion of a scripting engine.

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Nov 11, 2024
@madolson
Copy link
Member

Core team said we are directionally aligned with, we still want someone to take a deeper look to make sure the details makes sense.

@rjd15372 rjd15372 force-pushed the engine-api-1261 branch 5 times, most recently from fe56090 to 9d4b296 Compare November 11, 2024 17:35
@rjd15372
Copy link
Contributor Author

The PR is ready for review.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like great progress. Is it ready for a proper review or are you going to change much more?

Shall we do EVAL as a follow-up PR?

Each engine has only one context which is used for all calls, right? Would it be possible for an engine to use a context (or sub-context, etc.) for each client? This idea came up in the LuaJIT discussion by @secwall in #1229. A separate Lua context per client would isolate each client from each other, which compensates for the fact that Lua isn't completely sandboxed by design. Maybe the engine can organize this by itself, if the engine just has access to the current client at the time of a FCALL or EVAL?

tests/modules/helloscripting.c Outdated Show resolved Hide resolved
src/functions.c Outdated Show resolved Hide resolved
tests/unit/moduleapi/scriptingengine.tcl Show resolved Hide resolved
src/valkeymodule.h Outdated Show resolved Hide resolved
@rjd15372
Copy link
Contributor Author

This looks like great progress. Is it ready for a proper review or are you going to change much more?

It's ready. I don't think I'll add more changes apart from the changes asked by the reviewers.

Shall we do EVAL as a follow-up PR?

Yes, that's the plan.

Each engine has only one context which is used for all calls, right? Would it be possible for an engine to use a context (or sub-context, etc.) for each client? This idea came up in the LuaJIT discussion by @secwall in #1229. A separate Lua context per client would isolate each client from each other, which compensates for the fact that Lua isn't completely sandboxed by design. Maybe the engine can organize this by itself, if the engine just has access to the current client at the time of a FCALL or EVAL?

I'll take a look if it's possible, and will open a follow-up PR with the changes.

This commit extends the module API to support the addition of different
scripting engines to run user defined functions.

The scripting engine can be implemented as a Valkey module, and can be
dynamically loaded with the `loadmodule` config directive, or with
the `MODULE LOAD` command.

The current module API support, only allows to load scripting engines to
run functions using `FCALL` command.

In a follow up PR, we will move the Lua scripting engine implmentation
into its own module.

Signed-off-by: Ricardo Dias <[email protected]>
This commit adds a module with a very simple stack based scripting
language implementation to test the new module API that allows to
implement new scripting engines as modules.

Signed-off-by: Ricardo Dias <[email protected]>
@zuiderkwast
Copy link
Contributor

Can you avoid force-pushing? It's easier to review what's changed since last time if you just push small commits. You can even merge unstable to it, rather than rebase. We squash-merge PRs anyway in the end and we use the PR title and description as the commit message.

@rjd15372
Copy link
Contributor Author

Can you avoid force-pushing? It's easier to review what's changed since last time if you just push small commits. You can even merge unstable to it, rather than rebase. We squash-merge PRs anyway in the end and we use the PR title and description as the commit message.

Ah sorry. I wasn't aware that we were squashing the commits upon merge. I'll stop force-pushing.

@rjd15372
Copy link
Contributor Author

Today I added more test cases to test the error code paths.

@rjd15372
Copy link
Contributor Author

rjd15372 commented Dec 19, 2024

@zuiderkwast @PingXie I've addressed all comments and pushed the commits with the changes.

Besides the comments, I've also add a new module API function in commit 3c60980

The new module API function is used by the scripting engine implementation to return the result of the execution of a compiled function. This allows to have a clear separation between the scripting engine implementation and the client object that called the FCALL or EVAL in the future.

@zuiderkwast
Copy link
Contributor

I've also add a new module API function in commit 3c60980

The new module API function is used by the scripting engine implementation to return the result of the execution of a compiled function. This allows to have a clear separation between the scripting engine implementation and the client object that called the FCALL or EVAL in the future.

This is basically a preparation for calling scripts from modules, right?

It's a good idea, but it exposes even more API. If it's not usable from the module API yet, then it's perhaps better to hide it from valkeymodule.h so we don't commit to this exact API just yet...?

For example, I see you named a field long double d128 but I'm not sure a long double is always 128 bits. I think it can be 80 or 96 bits too. All these field names are stuck in the module API once we add it.

@rjd15372
Copy link
Contributor Author

This is basically a preparation for calling scripts from modules, right?

It's a good idea, but it exposes even more API. If it's not usable from the module API yet, then it's perhaps better to hide it from valkeymodule.h so we don't commit to this exact API just yet...?

For example, I see you named a field long double d128 but I'm not sure a long double is always 128 bits. I think it can be 80 or 96 bits too. All these field names are stuck in the module API once we add it.

Actually, I think I misunderstood how the module API is implemented. I was under the impression that if the scripting engine wanted to call another command using the ValkeyModule_Call, it would have to pass a ValkeyModuleCtx object with a fake client connection already. But that is not the case, in all API functions that require to make an internal connection to the cluster using a fake client, they'll create a temporary connection, or use an existing connection from a pool.
Then there are also other API functions that don't even use the client pointer from the context object.

Therefore, I'm going to revert this change.

…xtElement callback should be implemented"

This reverts commit 4a2cb12.

Signed-off-by: Ricardo Dias <[email protected]>
…function execution"

This reverts commit 3c60980.

Signed-off-by: Ricardo Dias <[email protected]>
@rjd15372
Copy link
Contributor Author

Revert is done. I think now everything is addressed.

Comment on lines +820 to +826
typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
ValkeyModuleCtx *module_ctx,
ValkeyModuleScriptingEngineCtx *engine_ctx,
const char *code,
size_t timeout,
size_t *out_num_compiled_functions,
char **err);
Copy link
Member

@PingXie PingXie Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, I'd like to propose returning an integer error code instead:

Suggested change
typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
ValkeyModuleCtx *module_ctx,
ValkeyModuleScriptingEngineCtx *engine_ctx,
const char *code,
size_t timeout,
size_t *out_num_compiled_functions,
char **err);
typedef int (*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
ValkeyModuleCtx *module_ctx,
ValkeyModuleScriptingEngineCtx *engine_ctx,
const char *code,
size_t timeout,
size_t *out_num_compiled_functions,
ValkeyModuleScriptingEngineCompiledFunction **compiled_functions);

This would

  1. avoid the assumption that a scripting module would allocate these strings using zmalloc, which can't be enforced by compiler
  2. eliminate the "char * to sds" conversion
  3. help ensure consistency over the error strings across different scripting engines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion but I see one problem with 3. How the scripting engine communicates the kind of error that made this function to fail back to the client?

For instance, most of the times this function might fail because of a syntax error in the script, and the scripting engine can give a nice message specifying where the syntax error is located, which in this case will not be sent to the client.

We might log the error in the server log, but I assume the client of valkey might not have access to the server logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do you have an example? How does the built-in Lua interpreter handle syntax errors? I wonder if we could generalize an outgoing line number and an outgoing column number? Or we'd like to provide more detailed context like a real compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when there's some syntax, or runtime error we send the error message returned by lua:

127.0.0.1:6379> function load replace  "#!lua name=mylib\nlocal function foo()\nretrn 1\nend\nserver.register_function('foo', bar)\n"
(error) ERR Error compiling function: user_function:3: '=' expected near '1'

127.0.0.1:6379> function load replace  "#!lua name=mylib\nlocal function foo()\nreturn 1\nend\nserver.register_function('foo', bar)\n"
(error) ERR Error registering functions: ERR user_function:5: Script attempted to access nonexistent global variable 'bar'

I think the more generic approach is to allow the scripting engine to return a C string with the error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingXie How about we get this PR merged and we can still change the API a bit more before 8.1 is released? Ricardo has the follow-up PR for EVAL waiting to be added. It will slightly modify the API already.

Copy link
Contributor

@zuiderkwast zuiderkwast Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjd15372 We should document this in some way for the module API docs, for example that the module needs to allocate the err using ValkeyModule_Alloc (a wrapper for zmalloc).

The comments before each VM_ function definition gets extracted to produce this page: https://valkey.io/topics/modules-api-ref/

We also put comments for structs in valkeymodule.h, but it's not perfect for documentation. For example, Rust module authors don't directly use this file.

We can consider creating a new documentation topic page for engines later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we get this PR merged and we can still change the API a bit more before 8.1 is released?

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more generic approach is to allow the scripting engine to return a C string with the error message.

Yeah if we would like to provide detailed compilation errors then we need to keep this capability but I would suggest robj too over C strings.

src/valkeymodule.h Outdated Show resolved Hide resolved
src/valkeymodule.h Outdated Show resolved Hide resolved
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 🎉 Thanks @rjd15372!

@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes major-decision-approved Major decision approved by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. to-be-merged Almost ready to merge and removed major-decision-pending Major decision pending by TSC team labels Dec 21, 2024
@zuiderkwast zuiderkwast merged commit 6adef8e into valkey-io:unstable Dec 21, 2024
47 of 48 checks passed
@zuiderkwast zuiderkwast removed the to-be-merged Almost ready to merge label Dec 21, 2024

/* This struct is used to return the memory information of the scripting
* engine. */
typedef struct ValkeyModuleScriptingEngineMemoryInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this info struct needs to be versioned. see ValkeyModuleClientInfo.

Copy link
Contributor Author

@rjd15372 rjd15372 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it requires versioning. The version number in ValkeyModuleScriptingEngineMethodsV1 should be enough. If we need to change the structure of ValkeyModuleClientInfo, we can bump the version of VALKEYMODULE_SCRIPTING_ENGINE_ABI_VERSION.

We're doing the same for the ValkeyModuleScriptingEngineCompiledFunction struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes we might just need a new field to the struct. bumping the ABI version would be more disruptive and might be breaking. If we version the struct, we can just add the new field to the end of the struct. The module who only understands V1 struct would still be able to parse the return info (all the way to the end of the V1 field list).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes we might just need a new field to the struct. bumping the ABI version would be more disruptive and might be breaking. If we version the struct, we can just add the new field to the end of the struct. The module who only understands V1 struct would still be able to parse the return info (all the way to the end of the V1 field list).

Yes, and we already version the struct ValkeyModuleScriptingEngineMethodsV1, which is the entry point for all scripting engine structs. If we add a field in any of these structs, we bump the version in the main struct ValkeyModuleScriptingEngineMethods.

@PingXie is there any benefit is versioning every single struct? And what do you mean by bumping the ABI version would be more disruptive? All we're talking about here is just versioned structs...

ValkeyModuleScriptingEngineCtx *engine_ctx,
void *compiled_function);

typedef ValkeyModuleScriptingEngineMemoryInfo (*ValkeyModuleScriptingEngineGetMemoryInfoFunc)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the info buffer needs be allocated by the caller with size provided for future-proofing. see ValkeyModule_GetClientInfoById

Comment on lines +820 to +826
typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
ValkeyModuleCtx *module_ctx,
ValkeyModuleScriptingEngineCtx *engine_ctx,
const char *code,
size_t timeout,
size_t *out_num_compiled_functions,
char **err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more generic approach is to allow the scripting engine to return a C string with the error message.

Yeah if we would like to provide detailed compilation errors then we need to keep this capability but I would suggest robj too over C strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants