-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: NotADirectoryError: [Errno 20] Not a directory: '/Users/username/ragtest/output/.DS_Store/artifacts/create_final_nodes.parquet' #891
Comments
.DS_Store is a mac file, not a directory. The query library tries to find the most recent timestamped run - for some reason it is picking up this .DS_Store file instead. We'll take a look at the selection code to ensure it is looking only for folders. As an immediate fix, you should be able to delete that file (it is just an OS cache of view settings) and re-run. If you'd like to be more precise each time you run, you can add the |
Thanks! This time it moved further, but a new error appeared:
The command was the same:
|
We've seen a fair bit of reporting on non-OpenAI model JSON formats. 0.2.1 included some improvements to the fallback parsing when a model returns malformed JSON, but it may still have issues that we are unaware of. Unfortunately there's not a lot we can do to help diagnose these since it would be a lot of work to test all the models available. My best recommendation is to search through issues linked to #657 to see what solutions folks have found. |
It seems you might be using Ollama to run the Mistral model locally. To help resolve the issue, following setup can be helpful
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: mistral
model_supports_json: true
api_base: http://localhost:11434/v1
parallelization:
stagger: 120
async_mode: threaded
embeddings:
async_mode: threaded
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF
api_base: http://localhost:8001/v1/
concurrent_requests: 2
chunks:
size: 300
overlap: 100
group_by_columns: [id]
input:
type: file
file_type: text
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file
base_dir: "cache"
storage:
type: file
base_dir: "output/${timestamp}/artifacts"
reporting:
type: file
base_dir: "output/${timestamp}/reports"
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false
umap:
enabled: false
snapshots:
graphml: true
raw_entities: true
top_level_nodes: false
git clone https://github.com/9prodhi/EmbedAdapter
cd EmbedAdapter
python ollama_serv.py
By following these steps, you should be able to resolve the JSON parsing issues and get your local setup working correctly with the Mistral model for LLM tasks and the nomic-embed-text model for embeddings. If you continue to experience problems, please provide more details about the specific error you're encountering, and I'll be happy to assist further. |
Dear All, I solved the problem! The following is my steps:Firstly, I get the same problem of "NotADirectoryError: [Errno 20] Not a directory: '/Users/xxxxx/ragtest/output/.DS_Store/artifacts/create_final_nodes.parquet' ,too. Then, as @natoverse said, DS_store is just a temp file. and the system is looking for a file named create_final_nodes.parquet. If we want to get the file, we use --data. Third, I get help from cli command, the usage is :" python -m graphrag.query [-h] [--config CONFIG] [--data DATA] [--root ROOT] --method {local,global}[--community_level COMMUNITY_LEVEL] [--response_type RESPONSE_TYPE] query" Then I use the cli command:""python -m graphrag.query It works!! Attention, the 20240820-232301 is my timeslot. JUST look for the artifacts directory and then you put the path of artifacts to data is fine!! |
Do you need to file an issue?
Describe the bug
I managed to follow your example here and got the message: "All workflows completed successfully", even though I saw "Errors occurred during the pipeline run, see logs for more details" message couple of times at the "create_base_entity_graph" step.
However, when I run the first command for interaction with the graph I got (it took my M2 ~ 5 hours to create it for "A Christmas Carol") I got some Python error.
Command I'm trying:
Error I get:
Steps to reproduce
--root ./ragtest
--method global
"What are the top themes in this story?"
Expected Behavior
I expect to get the answer for the question: "What are the top themes in this story?"
GraphRAG Config Used
Logs and screenshots
Additional Information
The text was updated successfully, but these errors were encountered: