Received server error (500) from primary and could not load the entire response body from endpoint #1331

zinebtabet · 2023-07-10T12:16:36Z

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not load the entire response body. See https://eu-west-3.console.aws.amazon.com/cloudwatch/home?region=eu-west-3#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2023-07-10-09-51-02-299 in account 086892845792 for more information. Traceback (most recent call last): File "/var/task/lambda_function.py", line 442, in lambda_handler pred_prob = invoke_endpoint_with_idx(endpointname = ENDPOINT_NAME, target_id = transaction_id, subgraph_dict = subgraph_dict, n_feats = transaction_embed_value_dict) File "/var/task/lambda_function.py", line 314, in invoke_endpoint_with_idx response = runtime.invoke_endpoint(EndpointName=endpointname, File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 911, in _make_api_call raise error_class(parsed_response, operation_name) enter image description here please i got this error while running the following code https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/tree/main/src/sagemaker.

zxkane · 2023-07-11T01:19:51Z

@zinebtabet Could you add the detailed reproduciable steps how you using the code?

zinebtabet · 2023-07-11T07:25:56Z

I used the same code as you have in the SageMaker repository. The only thing I modified was the Docker file since I am in EU West 3. I set it up like this: ARG IMAGE_REPO=763104351884.dkr.ecr.eu-west-3.amazonaws.com FROM $IMAGE_REPO/pytorch-training:1.11.0-cpu-py38-ubuntu20.04-sagemaker ENV PATH="/opt/ml/model:${PATH}" ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/model COPY * /opt/ml/code/ ENV SAGEMAKER_PROGRAM fd_sl_train_entry_point.py RUN pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html

I used the same version I specified in the Docker file for the deployment as well. Then I invoked my endpoint with the Lambda function. Once I executed the test event, I received this error. I have had this error for over a month now. I will provide screenshots of the error:

the error in the lambda test event: Traceback (most recent call last): File "/var/task/lambda_function.py", line 442, in lambda_handler pred_prob = invoke_endpoint_with_idx(endpointname = ENDPOINT_NAME, target_id = transaction_id, subgraph_dict = subgraph_dict, n_feats = transaction_embed_value_dict) File "/var/task/lambda_function.py", line 314, in invoke_endpoint_with_idx response = runtime.invoke_endpoint(EndpointName=endpointname, File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 911, in _make_api_call raise error_class(parsed_response, operation_name)

aminHelkinz · 2023-07-28T15:13:22Z

Hello @zxkane,

Thank you for the nice project. I learn a lot from it.

We use the SageMaker notebook & studio to reproduce the project. The model was created and repackaged successfully and the endpoints of them work well. Suddenly, (in a middle of a demo) the endpoint didn't respond.

Right now, we have only one model that has a workable endpoint which is trained and repackaged with SageMaker notebook.

From then none of our endpoints (the models created with notebook or studio) does not work anymore.

I appreciate any help or suggestion!

zxkane · 2023-08-02T03:01:47Z

@zhjwy9343 is the data scentist for authoring those Notebook. James, could you have a look at those problems?

zinebtabet added bug Something isn't working needs-triage Triage required labels Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Received server error (500) from primary and could not load the entire response body from endpoint #1331

Received server error (500) from primary and could not load the entire response body from endpoint #1331

zinebtabet commented Jul 10, 2023

zxkane commented Jul 11, 2023

zinebtabet commented Jul 11, 2023 •

edited

Loading

aminHelkinz commented Jul 28, 2023

zxkane commented Aug 2, 2023

Received server error (500) from primary and could not load the entire response body from endpoint #1331

Received server error (500) from primary and could not load the entire response body from endpoint #1331

Comments

zinebtabet commented Jul 10, 2023

zxkane commented Jul 11, 2023

zinebtabet commented Jul 11, 2023 • edited Loading

aminHelkinz commented Jul 28, 2023

zxkane commented Aug 2, 2023

zinebtabet commented Jul 11, 2023 •

edited

Loading