Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: graphrag index creates 637 AsyncAzureOpenAI on gutenberg QuickStart #1517

Open
3 tasks done
mmaitre314 opened this issue Dec 15, 2024 · 3 comments
Open
3 tasks done
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working stale Used by auto-resolve bot to flag inactive issues triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@mmaitre314
Copy link
Contributor

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

GraphRAG v1.0.0 keeps on calling fnllm.openai.create_openai_client() during indexing instead of reusing the OpenAI client. Since fnllm creates a new DefaultAzureCredential for each create_openai_client() call (code), this restarts the authentication process and adds to indexing runtime.

Steps to reproduce

Follow the GraphRAG quickstart (https://microsoft.github.io/graphrag/get_started/) until the step graphrag index --root ./ragtest, using Entra authentication instead of API key. Open indexing-engine.log and observe repeated log lines like this:

azure.identity._credentials.managed_identity INFO ManagedIdentityCredential will use IMDS

Expected Behavior

The OpenAI client is reused and only one Entra access token is acquired for authentication.

GraphRAG Config Used

encoding_model: o200k_base

llm:
  type: azure_openai_chat
  model: gpt-4o-mini
  model_supports_json: true
  api_base: https://<snip>.openai.azure.com
  api_version: 2024-08-01-preview
  deployment_name: gpt-4o-mini

parallelization:
  stagger: 0.3
  # num_threads: 50

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  vector_store:
    type: lancedb
    db_uri: 'output\lancedb'
    container_name: default
    overwrite: true
  llm:
    type: azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://<snip>.openai.azure.com
    api_version: "2023-05-15"
    deployment_name: text-embedding-3-small

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "../../inputs/gutenberg"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # or blob
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

storage:
  type: file # or blob
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "../../prompts/default/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  prompt: "../../prompts/default/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  enabled: false
  prompt: "../../prompts/default/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "../../prompts/default/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  embeddings: false
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "../../prompts/default/local_search_system_prompt.txt"

global_search:
  map_prompt: "../../prompts/default/global_search_map_system_prompt.txt"
  reduce_prompt: "../../prompts/default/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "../../prompts/default/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "../../prompts/default/drift_search_system_prompt.txt"

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 1.0.0
  • Operating System: Windows 11
  • Python Version: 3.12
  • Related Issues:
@mmaitre314 mmaitre314 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Dec 15, 2024
@mmaitre314
Copy link
Contributor Author

mmaitre314 commented Dec 15, 2024

I found a workaround by overriding the LLM loaders in graphrag.index.llm.load_llm.loaders to ensure the OpenAI clients get reused across queries:

from pathlib import Path
from typing import Any
from fnllm import JsonStrategy, LLMEvents
from fnllm.openai import (
    AzureOpenAIConfig,
    create_openai_chat_llm,
    create_openai_client,
    create_openai_embeddings_llm,
)
from fnllm.openai.types.chat.parameters import OpenAIChatParameters
from graphrag.cli.index import index_cli
from graphrag.config.enums import LLMType
from graphrag.logger.types import LoggerType
from graphrag.index.llm.load_llm import loaders
from graphrag.index.typing import ErrorHandlerFn
import graphrag.config.defaults as defs
import tiktoken

def main():

    _initialize_llm_loader(
        LLMType.AzureOpenAIChat,
        model="gpt-4o-mini",
        model_supports_json=True,
        api_base="https://<snip>.openai.azure.com",
        api_version="2024-08-01-preview",
        deployment_name="gpt-4o-mini",
    )

    _initialize_llm_loader(
        LLMType.AzureOpenAIEmbedding,
        model="text-embedding-3-small",
        model_supports_json=False,
        api_base="https://<snip>.openai.azure.com",
        api_version="2023-05-15",
        deployment_name="text-embedding-3-small",
    )

    index_cli(
        root_dir=Path(os.path.dirname(__file__)),
        verbose=True,
        resume=None,
        memprofile=False,
        cache=True,
        logger=LoggerType.PRINT,
        config_filepath=None,
        dry_run=False,
        skip_validation=False,
        output_dir=None,
    )

def _initialize_llm_loader(
        type: LLMType,
        model: str,
        model_supports_json: bool,
        api_base: str,
        api_version: str,
        deployment_name: str,
        ) -> None:
    
    openai_config=AzureOpenAIConfig(
        model=model,
        encoding=tiktoken.encoding_name_for_model(model),
        deployment=deployment_name,
        endpoint=api_base,
        json_strategy=JsonStrategy.VALID if model_supports_json else JsonStrategy.LOOSE,
        api_version=api_version,
        max_retries=defs.LLM_MAX_RETRIES,
        max_retry_wait=defs.LLM_MAX_RETRY_WAIT,
        requests_per_minute=defs.LLM_REQUESTS_PER_MINUTE,
        tokens_per_minute=defs.LLM_TOKENS_PER_MINUTE,
        timeout=defs.LLM_REQUEST_TIMEOUT,
        max_concurrency=defs.LLM_CONCURRENT_REQUESTS,
        chat_parameters=OpenAIChatParameters(
            frequency_penalty=defs.LLM_FREQUENCY_PENALTY,
            presence_penalty=defs.LLM_PRESENCE_PENALTY,
            top_p=defs.LLM_TOP_P,
            max_tokens=defs.LLM_MAX_TOKENS,
            n=defs.LLM_N,
            temperature=defs.LLM_TEMPERATURE,
        ),
    )

    openai_client = create_openai_client(openai_config)

    if type == LLMType.AzureOpenAIChat:
        loaders[type]["load"] = lambda on_error, cache, _: create_openai_chat_llm(
            openai_config,
            client=openai_client,
            cache=cache,
            events=GraphRagLLMEvents(on_error),
        )
    elif type == LLMType.AzureOpenAIEmbedding:
        loaders[type]["load"] = lambda on_error, cache, _: create_openai_embeddings_llm(
            openai_config,
            client=openai_client,
            cache=cache,
            events=GraphRagLLMEvents(on_error),
        )
    else:
        raise ValueError(f"Unsupported LLM type: {type}")

class GraphRagLLMEvents(LLMEvents):
    def __init__(self, on_error: ErrorHandlerFn):
        self._on_error = on_error

    async def on_error(
        self,
        error: BaseException | None,
        traceback: str | None = None,
        arguments: dict[str, Any] | None = None,
    ) -> None:
        self._on_error(error, traceback, arguments)

if __name__ == "__main__":
    main()

@natoverse
Copy link
Collaborator

We believe this was a bug introducing during our adoption of fnllm as the underlying LLM library. We just pushed out a 1.0.1 patch today, please let if know if your problem still exists with that version.

@natoverse natoverse added the awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response label Dec 18, 2024
Copy link

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

@github-actions github-actions bot added the stale Used by auto-resolve bot to flag inactive issues label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working stale Used by auto-resolve bot to flag inactive issues triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

2 participants