Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On startup aerospike client get operations take much more time than usually #135

Closed
Aloren opened this issue May 7, 2019 · 9 comments
Closed

Comments

@Aloren
Copy link
Contributor

Aloren commented May 7, 2019

We observe increased timing on aerospike-client read operations after service is startup. Usually timing is around 2ms, but after startup it increases up to 100 ms. As I understand that might be an issue with connection pool, since each request needs new connection to Aerospike. Until pool reaches average pool size -- timing is higher than expected. We are thinking about warming up the pool -- maybe it makes sense to have such option in aerospike-client (if that is applicable)? What are your thoughts?

Thanks.

Screen Shot 2019-05-07 at 3 43 17 PM

@Aloren
Copy link
Contributor Author

Aloren commented May 7, 2019

Looks like it is related to #123

@BrianNichols
Copy link
Member

One way to warm up the pool is to issue sync or async read commands. In addition to creating connections, the applicable java code is also loaded and initialized. This is helpful in large libraries like netty that create a large number of classes. I think issuing these read commands is better done outside the client, because the user will have a better understanding on what to read and whether to use async and/or sync reads.

We are considering adding new ClientPolicy minConnsPerNodeSync and minConnsPerNodeAsync arguments that would pre-allocate connections and never drop connections below the minimum, but this does not address the warmup of the java code path.

@BrianNichols
Copy link
Member

We have decided on an alternate approach already used in our go client. A warmup method will be added to AerospikeClient. warmup(ConnectionType type, int count) will initialize "count" connections on each node and put those connections into each node's connection pool. ConnectionType indicates sync or async. These connections will still be subject for removal if they are idle for more than "ClientPolicy.maxSocketIdle". warmup can be called anytime after AerospikeClient instantiation.

We are currently busy with other projects and will implement when time becomes available.

@mrozk
Copy link

mrozk commented Mar 11, 2021

Hi! I suppose we have problem with minConnsPerNode in LIFO connections pool. Assume we have added minConnsPerNode=25 and we have 25 connections in LIFO pool with TTL 55 seconds by default. We have not very big load on start, and we are using top 5 connections from the pool and we are not using 20 connections for a long time. Hence this 20 connections is becoming invalid. When we will have a traffic spike, instead of warmed up connections in pool we will have a pool of invalid connections and aerospike client we be executing the next code

if (conn != null) {
				// Found socket.
				// Verify that socket is active.
				if (cluster.isConnCurrentTran(conn.getLastUsed())) {
					try {
						conn.setTimeout(timeoutMillis);
						return conn;
					}
					catch (Exception e) {
						// Set timeout failed. Something is probably wrong with timeout
						// value itself, so don't empty queue retrying.  Just get out.
						closeConnection(conn);
						throw new AerospikeException.Connection(e);
					}
				}
				closeConnection(conn);
			}

Client closes invalid connections and after that client need to create new connections to execute our requests. I think we need to have kind of async healthchecks for connection pool with min connections configuration.

@BrianNichols
Copy link
Member

If client minConnsPerNode > 0, it's highly recommended that client maxSocketIdle and server proto-fd-idle-ms be set to zero. This will prevent valid connections from being discarded due to expiration. The javadocs explicitly mention this:

https://www.aerospike.com/apidocs/java/com/aerospike/client/policy/ClientPolicy.html#minConnsPerNode

The server employs TCP keep-alive, so it can still detect and reap peer closed sockets without an expiration.

@yarosman
Copy link

yarosman commented Oct 6, 2022

Hello.
@BrianNichols therefore the current solution is

    clientPolicy.asyncMinConnsPerNode = clientPolicy.maxConnsPerNode
    clientPolicy.maxSocketIdle        = 0

am I right ?

@BrianNichols
Copy link
Member

Yes. Also, make sure proto-fd-idle-ms is 0 on the server nodes.

@mrozk
Copy link

mrozk commented Oct 12, 2022

@BrianNichols Hello. it looks like minConnsPerNode does not work properly because sometimes it is not possible to set proto-fd-idle-ms is 0 on the cluster side and we can't handle spikes in this case.
I have an idea how we can fix this. Now we have LIFO connection pool implementation. If I we configure minConnsPerNode 25, and normally use only 5, I will have 20 non working connections in pool because those will be expired. When I will have spike, application will start to recreate connections and response time of an application will degrade. If we would have LILO connection pool, we might could handle spikes, but we will have another broblem, how to close unused connection after requests spike is over. For example we set minConnsPerNode=25 in LILO connection pool implementation, we use 25 everything works fine. After spike our connection pool will grow to 50 connections that will never shrink to 25. What if we will combine 2 approaches. For example we could have strategy in case when minConnsPerNode is configured to have 2 data structures for connection pool. We store first 25 connections in LILO structure and other connections that are higher than 25, we will be storing in LIFO data structure which can be shrinked. In this case aerospike java client will be able to handle spikes without touching cluster configs.

@BrianNichols
Copy link
Member

Why is setting proto-fd-idle-ms to 0 not possible?

The proto-fd-idle-ms default has been 0 since at least server version 4.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants