AmazonkendraRetriver #28739

vinayburam · 2024-12-16T09:02:12Z

vinayburam
Dec 16, 2024

from langchain_community.retrievers import AmazonKendraRetriever
import orjson
import boto3
from langchain.chains import RetrievalQA ,ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import OpenAI
from langchain_openai import AzureChatOpenAI
import json
from openai import AzureOpenAI
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.messages import HumanMessage
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
import os
from new_promts import prompt1

store = {}

def lambda_handler(event, context):
region = os.environ["REGION"]
kendra_index_id = os.environ["KENDRA_INDEX"]
print("Event: ",event)
#llm = OpenAI(batch_size=5, temperature=0, max_tokens=300)

The base URL for your Azure OpenAI resource. e.g. "https://.openai.azure.com"

#azure_endpoint="https://api.geneai.thermofisher.com/dev"
#gpt 4 config
azure_endpoint = os.environ["azure_endpoint"]

API version e.g. "2023-07-01-preview"

#openai_api_version="2024-03-01-preview"
openai_api_version="2024-03-01-preview"

The name of your Azure OpenAI deployment chat model. e.g. "gpt-35-turbo-0613"

#deployment_name="gpt-35-turbo"
deployment_name="gpt-4o"

The API key for your Azure OpenAI resource.

openai_api_key = os.environ["openai_api_key"]

This is set to `azure`

openai_api_type="azure"

llm = AzureChatOpenAI(
azure_endpoint=azure_endpoint,
openai_api_version=openai_api_version,
deployment_name=deployment_name,
openai_api_key=openai_api_key,
openai_api_type=openai_api_type,
)

#retriever = AmazonKendraRetriever(index_id=kendra_index_id, region_name=region)
data_source_id = '1bf6a354-7aef-471e-b5b9-9ce733e9d085'
retriever = AmazonKendraRetriever(index_id=kendra_index_id, top_k=10,
attribute_filter = {'AndAllFilters':
[
{"EqualsTo": {"Key": "_language_code","Value": {"StringValue": "en"}}},
{"EqualsTo": {"Key": "_data_source_id","Value": {"StringValue": data_source_id}}}

contextualize_q_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is."""
#prompt_template = prompt_template1
#prompt_template= prompmt3
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)

history_aware_retriever = create_history_aware_retriever(llm, retriever, contextualize_q_prompt )

'''qa_system_prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.\

{context}"""

'''
print("Context: ",context)
qa_system_prompt = prompt1
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", qa_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

### Statefully manage chat history ###

query = event.get("query", "SELECT 1")
sessionId = event.get("sessionId", "SELECT 1")
#sessionId
question = query

conversational_rag_chain = RunnableWithMessageHistory(
rag_chain,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
output_messages_key="answer",
)

ai_msg_1 = conversational_rag_chain.invoke(
{"input": question},
config={
"configurable": {"session_id": sessionId}
},
)["answer"]

print("ai_msg_1 :",ai_msg_1)
#print("ai_msg_2 :",ai_msg_2)
print("********",store[sessionId].messages)

Extract retrieved documents from the retriever

print(retriever)
retrieved_docs = retriever.invoke(question)
print("Fetched Documents:")
for i, doc in enumerate(retrieved_docs):
print(f"Document {i + 1}:")
print(f"Page Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
return ai_msg_1

'''
chat_history = []
query = event.get("query", "SELECT 1")
question = query
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg_1["answer"]])
print("ai_msg_1: ",ai_msg_1["answer"])
for document in ai_msg_1["context"]:
print("Retrived document: ",document)
print()

print(chat_history)
'''
#return "success"

def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]

  I am getting top 10 results from kendra but my only top 1 or 2 documents are used by the llm to summarize the question.

can you please help me to pass all the documents

@dosu · 2024-12-16T09:04:58Z

dosubot[bot]
bot Dec 16, 2024

I found a similar discussion that might be relevant to your issue: AmazonkendraRetriver, which is currently unsolved ^[1].

To ensure that all top 10 documents retrieved from Amazon Kendra are used by the LLM to summarize the question, you should verify that the create_stuff_documents_chain function is correctly configured to handle multiple documents. This function should format and combine all retrieved documents into a single input string for the language model.

Here's a brief overview of how create_stuff_documents_chain works:

Document Formatting: It formats each document using a specified document_prompt and joins them with a document_separator.
Chain Construction: The formatted documents are added to the inputs with a variable name specified by document_variable_name, which defaults to "context".
Execution: The chain processes the combined input using the language model and the provided prompt.

Ensure that your qa_prompt in the create_stuff_documents_chain is set up to include all documents in the "context" input. This will allow the LLM to consider all retrieved documents when generating a summary ^[2]^[3].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

7 replies

dosubot[bot] bot Dec 18, 2024

To handle the BadRequestError (400) related to Azure OpenAI's content filtering policy, you should ensure that your content is properly formatted and compliant with the policy. Here are some steps you can take:

Format Content Properly: Ensure that your content blocks are formatted correctly. For text content, use:
```
content = [{
    "type": "text", 
    "text": "Your message here"
}]
```
Modify the Prompt: Review and modify your prompt to ensure it does not contain content that might trigger the content filtering policy. This might involve removing or rephrasing sensitive topics or language.

Catch and Handle the Error: Implement error handling in your code to catch the InvalidRequestError and check if the error code is content_filter. If this error occurs, modify the prompt and retry. Here is an example of how you might handle this in your code:

try:
    ai_msg_1 = conversational_rag_chain.invoke(
        {"input": question},
        config={"configurable": {"session_id": sessionId}}
    )["answer"]
except InvalidRequestError as e:
    if e.code == 'content_filter':
        # Modify the prompt here
        print("Prompt triggered content filter. Please modify the prompt.")

Consult Documentation: For more detailed guidance, refer to the Azure OpenAI documentation on content filtering policies ^[1].

By following these steps, you can better manage and handle the BadRequestError related to content filtering in Azure OpenAI.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

vinayburam Dec 18, 2024
Author

[ERROR] NameError: name 'InvalidRequestError' is not defined
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 134, in lambda_handler
except InvalidRequestError as e:

@dosu

dosubot[bot] bot Dec 18, 2024

The NameError: name 'InvalidRequestError' is not defined indicates that the InvalidRequestError is not imported or defined in your code. To resolve this, you need to import the InvalidRequestError from the appropriate library. If you are using the OpenAI Python client, you can import it as follows:

from openai.error import InvalidRequestError

Add this import statement at the beginning of your script where other imports are located. This should resolve the NameError and allow you to handle the InvalidRequestError properly in your exception handling block.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

vinayburam Dec 18, 2024
Author

[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'openai.error'
Traceback (most recent call last):

@dosu

dosubot[bot] bot Dec 18, 2024

The Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'openai.error' error suggests that the OpenAI Python client library is not installed in your AWS Lambda environment. To resolve this, you need to ensure that the OpenAI library is included in your Lambda deployment package. Here are the steps to do this:

Install the OpenAI Library Locally: First, install the OpenAI library in your local environment using pip:
```
pip install openai
```
Create a Deployment Package: Create a deployment package that includes your Lambda function code and the OpenAI library. You can do this by:
- Creating a directory for your Lambda function.
- Installing the OpenAI library into this directory using the --target option:
```
pip install openai -t /path/to/your/lambda/directory
```
- Adding your Lambda function code to this directory.
Zip the Deployment Package: Zip the contents of the directory (not the directory itself) to create a deployment package:
```
cd /path/to/your/lambda/directory
zip -r lambda_function.zip .
```
Upload the Deployment Package to AWS Lambda: Go to the AWS Lambda console and upload the lambda_function.zip file as your function's code.

By following these steps, you should be able to resolve the ImportModuleError and successfully import the openai.error module in your Lambda function.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AmazonkendraRetriver #28739

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

AmazonkendraRetriver #28739

vinayburam Dec 16, 2024

The base URL for your Azure OpenAI resource. e.g. "https://.openai.azure.com"

API version e.g. "2023-07-01-preview"

The name of your Azure OpenAI deployment chat model. e.g. "gpt-35-turbo-0613"

The API key for your Azure OpenAI resource.

This is set to azure

Extract retrieved documents from the retriever

Replies: 1 comment · 7 replies

dosubot[bot] bot Dec 16, 2024

dosubot[bot] bot Dec 18, 2024

vinayburam Dec 18, 2024 Author

dosubot[bot] bot Dec 18, 2024

vinayburam Dec 18, 2024 Author

dosubot[bot] bot Dec 18, 2024

vinayburam
Dec 16, 2024

This is set to `azure`

Replies: 1 comment 7 replies

dosubot[bot]
bot Dec 16, 2024

vinayburam Dec 18, 2024
Author

vinayburam Dec 18, 2024
Author