Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About ping command routing in validateConnection with Redis Cluster #3081

Closed
zchazc opened this issue Dec 30, 2024 · 3 comments
Closed

About ping command routing in validateConnection with Redis Cluster #3081

zchazc opened this issue Dec 30, 2024 · 3 comments
Labels
for: stackoverflow A question that's better suited to stackoverflow.com

Comments

@zchazc
Copy link

zchazc commented Dec 30, 2024

Environments:

  • spring-data-redis 2.2.0.RELEASE
  • lettuce 5.2.0.RELEASE (with validateConnection and autoTopologyRefresh enabled - periodic and all triggers)
  • redis 4.0.14 (3 masters, 3 slaves)

Issue:

When using RedisTemplate.execute(conn->conn.incrBy(key,delt)), I observed that some microservices PING a slave node before sending the INCR command to the correct master node.

Through debugging, I found the following logic: The PING command uses the defaultWriter because it has no parameters, unlike the INCR command which is routed to a specific master node.

    return dispatch(commandBuilder.ping());

https://github.com/redis/lettuce/blob/55ebef8b5c095869aecf8cea8eb9eb5da585badf/src/main/java/io/lettuce/core/AbstractRedisAsyncCommands.java#L999

// exclude CLIENT commands from cluster routing
        if (args != null && !CommandType.CLIENT.equals(commandToSend.getType())) {

https://github.com/redis/lettuce/blob/55ebef8b5c095869aecf8cea8eb9eb5da585badf/src/main/java/io/lettuce/core/cluster/ClusterDistributionChannelWriter.java#L114-L115

  • 6.5.1.RELEASE
    return dispatch(commandBuilder.ping());

https://github.com/redis/lettuce/blob/b396f621cbc6e88cd50e9e117addcaecf040abca/src/main/java/io/lettuce/core/AbstractRedisAsyncCommands.java#L1902

// exclude CLIENT commands from cluster routing
        if (args != null && !CommandType.CLIENT.equals(commandToSend.getType())) {

https://github.com/redis/lettuce/blob/b396f621cbc6e88cd50e9e117addcaecf040abca/src/main/java/io/lettuce/core/cluster/ClusterDistributionChannelWriter.java#L172-L173

I suspect this behavior is due to the routing logic shown above.
I've been searched Google for "spring-data-redis validateConnection ping" and "spring-data-redis validateConnection ping slave", and also searched for "validateConnection" in the issues, but found no relevant information.

Questions

  • Is my analysis of the routing mechanism correct?
  • Is it intended that validateConnection sends PING command through defaultWriter instead of using the same connection as the subsequent command?

I'm currently diving deeper into the source code. Any insights or confirmation would be greatly appreciated.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 30, 2024
@zchazc
Copy link
Author

zchazc commented Jan 3, 2025

After further read of related issues, I found this comment #1470 (comment) :

Mark Paluch commented

To your questions (these are Lettuce-specific):

  1. You can provide [ClusterTopologyRefreshOptions](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.html) via [{{ClusterClientOptions](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterClientOptions.html)}}%7D%7D) and LettuceClientConfiguration
  2. No, setValidateConnection is only intended for standalone connections. If your Redis Cluster is down, then connection validation won't really help. If you reconfigure your cluster, Lettuce takes care of this itself if topology refresh is enabled.
  3. Enable Adaptive and Periodic Refresh. Ideally, you provide an endpoint in your application to get hold of RedisClusterClient and call reloadPartitions() during such an event. Lettuce's reference docs will provide you with additional details.
  4. Yes, Lettuce does all the command routing

This led me to review the setValidateConnectionAPI document :

Enables validation of the shared native Lettuce connection on calls to getConnection(). A new connection will be created and used if validation fails.
Lettuce will automatically reconnect until close is called, which should never happen through LettuceConnection if a shared native connection is used, therefore the default is false.
Setting this to true will result in a round-trip call to the server on each new connection, so this setting should only be used if connection sharing is enabled and there is code that is actively closing the native Lettuce connection.

While the API documentation doesn't explicitly state "only intended for standalone",but my application doesn't actively close connections, so it seems I can safely disable validateConnection.

The validateConnection was enabled by another committer. Since its literal meaning didn't raise any uncommon, so I hadn't checked the API documentation before.
Additionally, I'm curious: Was this feature really designed only for standalone Redis?

@mp911de
Copy link
Member

mp911de commented Jan 7, 2025

validateConnection has been introduced to fail early if Redis isn't reachable eagerly. Without further context, it is impossible to assume what is meant with "Redis is not reachable" in a clustered environment. Also, with the validation, we do not get any access to the later operations that are being invoked so we cannot assume which node is going to be accessed. The PING command uses the default (the first connected node) node for its inspection.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Jan 7, 2025
@mp911de mp911de added for: stackoverflow A question that's better suited to stackoverflow.com and removed status: waiting-for-triage An issue we've not yet triaged labels Jan 7, 2025
@zchazc
Copy link
Author

zchazc commented Jan 7, 2025

validateConnection has been introduced to fail early if Redis isn't reachable eagerly. Without further context, it is impossible to assume what is meant with "Redis is not reachable" in a clustered environment. Also, with the validation, we do not get any access to the later operations that are being invoked so we cannot assume which node is going to be accessed. The PING command uses the default (the first connected node) node for its inspection.

Thank you for taking the time to reply to this issue.

Initially, because validateConnection appears to PING a different Redis node than the one used for subsequent command calls in a Redis cluster scenario, So I chose to open this issue. However, if validateConnection is specifically designed for Redis clusters, then it may not be an issue after all.

Upon further investigation, I confirmed the default node selection logic for PING, which selects the Redis node with the fewest client connections, and in most cases, this is a Slave node. Therefore, in a Redis cluster context, it is indeed unnecessary to enable this feature, as it may continuously PING a Redis node that is irrelevant to subsequent commands, or even a node that is no longer part of the Redis cluster.

    private <K, V> CompletableFuture<StatefulRedisClusterConnection<K, V>> connectClusterAsync(RedisCodec<K, V> codec) {

        if (partitions == null) {
            return Futures.failed(new IllegalStateException(
                    "Partitions not initialized. Initialize via RedisClusterClient.getPartitions()."));
        }

        activateTopologyRefreshIfNeeded();

        logger.debug("connectCluster(" + initialUris + ")");
        //sortByClientCount
        Mono<SocketAddress> socketAddressSupplier = getSocketAddressSupplier(TopologyComparators::sortByClientCount);
       //defualt writer
        DefaultEndpoint endpoint = new DefaultEndpoint(clientOptions, clientResources);
        RedisChannelWriter writer = endpoint;

        if (CommandExpiryWriter.isSupported(clientOptions)) {
            writer = new CommandExpiryWriter(writer, clientOptions, clientResources);
        }

        ClusterDistributionChannelWriter clusterWriter = new ClusterDistributionChannelWriter(clientOptions, writer,
                clusterTopologyRefreshScheduler);
        PooledClusterConnectionProvider<K, V> pooledClusterConnectionProvider = new PooledClusterConnectionProvider<>(this,
                clusterWriter, codec, clusterTopologyRefreshScheduler);//defualt writer

        clusterWriter.setClusterConnectionProvider(pooledClusterConnectionProvider);

        StatefulRedisClusterConnectionImpl<K, V> connection = new StatefulRedisClusterConnectionImpl<>(clusterWriter, codec,
                timeout);

        connection.setReadFrom(ReadFrom.MASTER);
        connection.setPartitions(partitions);

        Supplier<CommandHandler> commandHandlerSupplier = () -> new CommandHandler(clientOptions, clientResources, endpoint);

        Mono<StatefulRedisClusterConnectionImpl<K, V>> connectionMono = Mono
                .defer(() -> connect(socketAddressSupplier, codec, endpoint, connection, commandHandlerSupplier));

        for (int i = 1; i < getConnectionAttempts(); i++) {
            connectionMono = connectionMono
                    .onErrorResume(t -> connect(socketAddressSupplier, codec, endpoint, connection, commandHandlerSupplier));
        }

        return connectionMono.flatMap(c -> c.reactive().command().collectList() //
                .map(CommandDetailParser::parse) //
                .doOnNext(detail -> c.setState(new RedisState(detail))) //
                .doOnError(e -> c.setState(new RedisState(Collections.emptyList()))).then(Mono.just(c))
                .onErrorResume(RedisCommandExecutionException.class, e -> Mono.just(c)))
                .doOnNext(
                        c -> connection.registerCloseables(closeableResources, clusterWriter, pooledClusterConnectionProvider))
                .map(it -> (StatefulRedisClusterConnection<K, V>) it).toFuture();
    }

https://github.com/redis/lettuce/blob/55ebef8b5c095869aecf8cea8eb9eb5da585badf/src/main/java/io/lettuce/core/cluster/RedisClusterClient.java#L558-L609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: stackoverflow A question that's better suited to stackoverflow.com
Projects
None yet
Development

No branches or pull requests

3 participants