Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16505: Switch UpdateShardHandler.getRecoveryOnlyHttpClient to Jetty HTTP2 #2276

Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c2e578e
SOLR-16505: Switch UpdateShardHandler.getRecoveryOnlyHttpClient to Je…
iamsanjay Feb 18, 2024
d5bc8c4
Merge main
iamsanjay Feb 20, 2024
3572dec
SOLR-16505: Switch UpdateShardHandler.getRecoveryOnlyHttpClient to Je…
iamsanjay Feb 20, 2024
9588fdf
Merge main
iamsanjay Feb 27, 2024
0655f11
Using FutureTask to send PREPRECOVERY, without executor
iamsanjay Feb 27, 2024
e7c346f
Merge main
iamsanjay Feb 29, 2024
0611782
Null check for FutureTask, removed try-catch
iamsanjay Feb 29, 2024
91f3c9d
code format, added test case
iamsanjay Mar 1, 2024
6a0b54b
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay Mar 3, 2024
1c6798b
Remove comment, create method for cancel recovery
iamsanjay Mar 3, 2024
5a62766
Merge main
iamsanjay Mar 7, 2024
ae368b5
Update IndexFetcher Class to Use Http2SolrClient
iamsanjay Mar 7, 2024
9e9b5f7
Adding header for compression to SolrRequests
iamsanjay Mar 7, 2024
8deebcd
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay Mar 12, 2024
625a364
Enable testing resplication handler for externalCompression
iamsanjay Mar 12, 2024
901ef51
Renaming method to more appropriate name
iamsanjay Mar 12, 2024
d574b42
Merge main
iamsanjay Mar 13, 2024
cc4011b
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay Mar 13, 2024
26a4c0e
Resolve conflicts Http2SolrClient
iamsanjay Mar 13, 2024
de5a40c
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay Mar 18, 2024
43dda16
Restoring the old auth of IndexFetcher
iamsanjay Mar 18, 2024
b48c0b9
Merge main
iamsanjay Mar 27, 2024
851109f
Fix retry fetch() IndexFetcher
iamsanjay Mar 27, 2024
6af3d76
Merge main
iamsanjay Mar 30, 2024
73c5ba8
Avoid closing InputStream before receiving zero-length Data field
iamsanjay Mar 30, 2024
4c16404
Read till end-of-file
iamsanjay Mar 30, 2024
19ec489
read till end-of-file
iamsanjay Mar 30, 2024
917509f
Merge main
iamsanjay Apr 18, 2024
c54cd5b
Added Test case for User managed replication with basic auth enabled
iamsanjay Apr 18, 2024
bb1c3b3
Removed isContentDownloaded and updated listener factory setting mech…
iamsanjay Apr 22, 2024
38f532a
Merge main
iamsanjay Apr 22, 2024
377eaa3
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay Apr 30, 2024
1b9b7fc
Change return code when downloaded successfully
iamsanjay Apr 30, 2024
fdb1d1f
Merge branch 'main' into SOLR-16367_getRecoveryOnlyHttpClient_to_Jett…
iamsanjay May 8, 2024
ef92b6b
tidy code
iamsanjay May 8, 2024
6279656
Update basic-authentication-plugin.adoc (#2446)
gspgsp May 8, 2024
b789a71
group operators together (#2450)
epugh May 8, 2024
f39e8ba
SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas (#…
gerlowskija May 8, 2024
6bde352
CHANGES.txt
dsmiley May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions solr/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ Other Changes
* SOLR-17066: GenericSolrRequest now has a `setRequiresCollection` setter that allows it to specify whether
it should make use of the client-level default collection/core. (Jason Gerlowski)

* SOLR-16505: Switch internal replica recovery commands to Jetty HTTP2 (Sanjay Dutt, David Smiley)

================== 9.5.0 ==================
New Features
---------------------
Expand Down
38 changes: 13 additions & 25 deletions solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,14 @@
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.TimeUnit;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.lucene.index.IndexCommit;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.store.Directory;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient.HttpUriRequestResponse;
import org.apache.solr.client.solrj.impl.Http2SolrClient;
import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.request.CoreAdminRequest.WaitForState;
import org.apache.solr.client.solrj.request.UpdateRequest;
Expand Down Expand Up @@ -124,7 +123,7 @@ public static interface RecoveryListener {
private int retries;
private boolean recoveringAfterStartup;
private CoreContainer cc;
private volatile HttpUriRequest prevSendPreRecoveryHttpUriRequest;
private volatile FutureTask<NamedList<Object>> prevSendPreRecoveryHttpUriRequest;
private final Replica.Type replicaType;

private CoreDescriptor coreDescriptor;
Expand Down Expand Up @@ -175,24 +174,19 @@ public final void setRecoveringAfterStartup(boolean recoveringAfterStartup) {
this.recoveringAfterStartup = recoveringAfterStartup;
}

/** Builds a new HttpSolrClient for use in recovery. Caller must close */
private HttpSolrClient.Builder recoverySolrClientBuilder(String baseUrl, String leaderCoreName) {
// workaround for SOLR-13605: get the configured timeouts & set them directly
// (even though getRecoveryOnlyHttpClient() already has them set)
dsmiley marked this conversation as resolved.
Show resolved Hide resolved
private Http2SolrClient.Builder recoverySolrClientBuilder(String baseUrl, String leaderCoreName) {
iamsanjay marked this conversation as resolved.
Show resolved Hide resolved
final UpdateShardHandlerConfig cfg = cc.getConfig().getUpdateShardHandlerConfig();
return (new HttpSolrClient.Builder(baseUrl)
return new Http2SolrClient.Builder(baseUrl)
.withDefaultCollection(leaderCoreName)
.withConnectionTimeout(cfg.getDistributedConnectionTimeout(), TimeUnit.MILLISECONDS)
.withSocketTimeout(cfg.getDistributedSocketTimeout(), TimeUnit.MILLISECONDS)
.withHttpClient(cc.getUpdateShardHandler().getRecoveryOnlyHttpClient()));
.withHttpClient(cc.getUpdateShardHandler().getRecoveryOnlyHttpClient());
dsmiley marked this conversation as resolved.
Show resolved Hide resolved
}

// make sure any threads stop retrying
@Override
public final void close() {
close = true;
if (prevSendPreRecoveryHttpUriRequest != null) {
prevSendPreRecoveryHttpUriRequest.abort();
prevSendPreRecoveryHttpUriRequest.cancel(true);
iamsanjay marked this conversation as resolved.
Show resolved Hide resolved
}
log.warn("Stopping recovery for core=[{}] coreNodeName=[{}]", coreName, coreZkNodeName);
}
Expand Down Expand Up @@ -634,10 +628,8 @@ public final void doSyncOrReplicateRecovery(SolrCore core) throws Exception {
.getCollection(cloudDesc.getCollectionName())
.getSlice(cloudDesc.getShardId());

try {
prevSendPreRecoveryHttpUriRequest.abort();
} catch (NullPointerException e) {
// okay
if (prevSendPreRecoveryHttpUriRequest != null) {
dsmiley marked this conversation as resolved.
Show resolved Hide resolved
prevSendPreRecoveryHttpUriRequest.cancel(true);
}

if (isClosed()) {
Expand Down Expand Up @@ -894,7 +886,6 @@ public final boolean isClosed() {

private final void sendPrepRecoveryCmd(String leaderBaseUrl, String leaderCoreName, Slice slice)
throws SolrServerException, IOException, InterruptedException, ExecutionException {

WaitForState prepCmd = new WaitForState();
prepCmd.setCoreName(leaderCoreName);
prepCmd.setNodeName(zkController.getNodeName());
Expand All @@ -915,18 +906,15 @@ private final void sendPrepRecoveryCmd(String leaderBaseUrl, String leaderCoreNa
int readTimeout =
conflictWaitMs
+ Integer.parseInt(System.getProperty("prepRecoveryReadTimeoutExtraWait", "8000"));
iamsanjay marked this conversation as resolved.
Show resolved Hide resolved
try (HttpSolrClient client =
try (SolrClient client =
recoverySolrClientBuilder(
leaderBaseUrl,
null) // leader core omitted since client only used for 'admin' request
.withSocketTimeout(readTimeout, TimeUnit.MILLISECONDS)
.withIdleTimeout(readTimeout, TimeUnit.MILLISECONDS)
.build()) {
HttpUriRequestResponse mrr = client.httpUriRequest(prepCmd);
prevSendPreRecoveryHttpUriRequest = mrr.httpUriRequest;

prevSendPreRecoveryHttpUriRequest = new FutureTask<>(() -> client.request(prepCmd));
log.info("Sending prep recovery command to [{}]; [{}]", leaderBaseUrl, prepCmd);

mrr.future.get();
prevSendPreRecoveryHttpUriRequest.run();
}
}
}
27 changes: 17 additions & 10 deletions solr/core/src/java/org/apache/solr/update/UpdateShardHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ public class UpdateShardHandler implements SolrInfoBean {

private final Http2SolrClient updateOnlyClient;

private final CloseableHttpClient recoveryOnlyClient;
private final Http2SolrClient recoveryOnlyClient;

private final CloseableHttpClient defaultClient;

Expand All @@ -85,7 +85,7 @@ public class UpdateShardHandler implements SolrInfoBean {

private final InstrumentedHttpRequestExecutor httpRequestExecutor;

private final InstrumentedHttpListenerFactory updateHttpListenerFactory;
private final InstrumentedHttpListenerFactory trackHttpSolrMetrics;

private SolrMetricsContext solrMetricsContext;

Expand Down Expand Up @@ -120,10 +120,8 @@ public UpdateShardHandler(UpdateShardHandlerConfig cfg) {
log.debug("Created default UpdateShardHandler HTTP client with params: {}", clientParams);

httpRequestExecutor = new InstrumentedHttpRequestExecutor(getMetricNameStrategy(cfg));
updateHttpListenerFactory = new InstrumentedHttpListenerFactory(getNameStrategy(cfg));
recoveryOnlyClient =
HttpClientUtil.createClient(
clientParams, recoveryOnlyConnectionManager, false, httpRequestExecutor);
trackHttpSolrMetrics = new InstrumentedHttpListenerFactory(getNameStrategy(cfg));

defaultClient =
HttpClientUtil.createClient(
clientParams, defaultConnectionManager, false, httpRequestExecutor);
Expand All @@ -133,15 +131,24 @@ public UpdateShardHandler(UpdateShardHandlerConfig cfg) {
DistributedUpdateProcessor.DISTRIB_FROM,
DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM);
Http2SolrClient.Builder updateOnlyClientBuilder = new Http2SolrClient.Builder();
Http2SolrClient.Builder recoveryOnlyClientBuilder = new Http2SolrClient.Builder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, updateOnly & recoveryOnly clients are configured the same except for withTheseParamNamesInTheUrl but it's benign to have that shared. Ultimately, the point I think is for both clients to be separated so that saturation in one (particularly updates) doesn't block the other (recovery). Hopefully I can get Mark Miller to weigh in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markrmiller do you recall why there are separate clients for "updateOnly" vs "recoveryOnly"?

if (cfg != null) {
updateOnlyClientBuilder
.withConnectionTimeout(cfg.getDistributedConnectionTimeout(), TimeUnit.MILLISECONDS)
.withIdleTimeout(cfg.getDistributedSocketTimeout(), TimeUnit.MILLISECONDS)
.withMaxConnectionsPerHost(cfg.getMaxUpdateConnectionsPerHost());
recoveryOnlyClientBuilder
.withConnectionTimeout(cfg.getDistributedConnectionTimeout(), TimeUnit.MILLISECONDS)
.withIdleTimeout(cfg.getDistributedSocketTimeout(), TimeUnit.MILLISECONDS)
.withMaxConnectionsPerHost(cfg.getMaxUpdateConnectionsPerHost());
}

updateOnlyClientBuilder.withTheseParamNamesInTheUrl(urlParamNames);
updateOnlyClient = updateOnlyClientBuilder.build();
updateOnlyClient.addListenerFactory(updateHttpListenerFactory);
updateOnlyClient.addListenerFactory(trackHttpSolrMetrics);

recoveryOnlyClient = recoveryOnlyClientBuilder.build();
recoveryOnlyClient.addListenerFactory(trackHttpSolrMetrics);

ThreadFactory recoveryThreadFactory = new SolrNamedThreadFactory("recoveryExecutor");
if (cfg != null && cfg.getMaxRecoveryThreads() > 0) {
Expand Down Expand Up @@ -205,7 +212,7 @@ public String getName() {
public void initializeMetrics(SolrMetricsContext parentContext, String scope) {
solrMetricsContext = parentContext.getChildContext(this);
String expandedScope = SolrMetricManager.mkName(scope, getCategory().name());
updateHttpListenerFactory.initializeMetrics(solrMetricsContext, expandedScope);
trackHttpSolrMetrics.initializeMetrics(solrMetricsContext, expandedScope);
defaultConnectionManager.initializeMetrics(solrMetricsContext, expandedScope);
updateExecutor =
MetricUtils.instrumentedExecutorService(
Expand Down Expand Up @@ -247,7 +254,7 @@ public Http2SolrClient getUpdateOnlyHttpClient() {
}

// don't introduce a bug, this client is for recovery ops only!
public HttpClient getRecoveryOnlyHttpClient() {
public Http2SolrClient getRecoveryOnlyHttpClient() {
return recoveryOnlyClient;
}

Expand Down Expand Up @@ -290,7 +297,7 @@ public void close() {
// do nothing
}
IOUtils.closeQuietly(updateOnlyClient);
HttpClientUtil.close(recoveryOnlyClient);
IOUtils.closeQuietly(recoveryOnlyClient);
HttpClientUtil.close(defaultClient);
defaultConnectionManager.close();
recoveryOnlyConnectionManager.close();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.cloud;

import com.carrotsearch.randomizedtesting.annotations.Nightly;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudLegacySolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.request.CollectionAdminRequest;
import org.apache.solr.client.solrj.request.UpdateRequest;
import org.apache.solr.common.cloud.DocCollection;
import org.apache.solr.common.cloud.Replica;
import org.apache.solr.common.cloud.Slice;
import org.apache.solr.common.util.SolrNamedThreadFactory;
import org.apache.solr.embedded.JettySolrRunner;
import org.junit.BeforeClass;
import org.junit.Test;

@Nightly
public class RecoveryStrategyStressTest extends SolrCloudTestCase {
iamsanjay marked this conversation as resolved.
Show resolved Hide resolved

@BeforeClass
public static void setupCluster() throws Exception {
cluster = configureCluster(4).addConfig("conf", configset("cloud-minimal")).configure();
}

@Test
public void stressTestRecovery() throws Exception {
final String collection = "recoveryStressTest";
CollectionAdminRequest.createCollection(collection, "conf", 1, 4)
.process(cluster.getSolrClient());
waitForState(
"Expected a collection with one shard and two replicas", collection, clusterShape(1, 4));
final var scheduledExecutorService =
Executors.newScheduledThreadPool(1, new SolrNamedThreadFactory("stressTestRecovery"));
try (SolrClient solrClient =
cluster.basicSolrClientBuilder().withDefaultCollection(collection).build()) {
final StoppableIndexingThread indexThread =
new StoppableIndexingThread(null, solrClient, "1", true, 10, 1, true);

final var startAndStopCount = new CountDownLatch(50);
final Thread startAndStopRandomReplicas =
new Thread(
() -> {
try {
while (startAndStopCount.getCount() > 0) {
DocCollection state = getCollectionState(collection);
Replica leader = state.getLeader("shard1");
Replica replica =
getRandomReplica(state.getSlice("shard1"), (r) -> !leader.equals(r));

JettySolrRunner jetty = cluster.getReplicaJetty(replica);
jetty.stop();
Thread.sleep(100);
jetty.start();
startAndStopCount.countDown();
}
} catch (Exception e) {
throw new RuntimeException(e);
}
});
startAndStopRandomReplicas.start();
// index and commit doc after fixed interval of 10 sec
scheduledExecutorService.scheduleWithFixedDelay(
indexThread, 1000, 10000, TimeUnit.MILLISECONDS);
scheduledExecutorService.scheduleWithFixedDelay(
() -> {
try {
new UpdateRequest().commit(solrClient, collection);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (SolrServerException e) {
throw new RuntimeException(e);
}
},
100,
10000,
TimeUnit.MILLISECONDS);

startAndStopCount.await();
scheduledExecutorService.shutdownNow();
// final commit to make documents visible for replicas
new UpdateRequest().commit(solrClient, collection);
}
cluster.getZkStateReader().waitForState(collection, 120, TimeUnit.SECONDS, clusterShape(1, 4));

// test that leader and replica have same doc count
DocCollection state = getCollectionState(collection);
assertShardConsistency(state.getSlice("shard1"), true);
}

private void assertShardConsistency(Slice shard, boolean expectDocs) throws Exception {
List<Replica> replicas = shard.getReplicas(r -> r.getState() == Replica.State.ACTIVE);
long[] numCounts = new long[replicas.size()];
int i = 0;
for (Replica replica : replicas) {
try (var client =
new HttpSolrClient.Builder(replica.getBaseUrl())
.withDefaultCollection(replica.getCoreName())
.withHttpClient(((CloudLegacySolrClient) cluster.getSolrClient()).getHttpClient())
.build()) {
numCounts[i] =
client.query(new SolrQuery("*:*").add("distrib", "false")).getResults().getNumFound();
i++;
}
}
for (int j = 1; j < replicas.size(); j++) {
if (numCounts[j] != numCounts[j - 1])
fail("Mismatch in counts between replicas"); // TODO improve this!
if (numCounts[j] == 0 && expectDocs)
fail("Expected docs on shard " + shard.getName() + " but found none");
}
}
}
Loading