This repo contains a streamlit app for introducing and teaching example GraphRAG patterns.
Follow the below steps to run the sample app:
The sample app uses OpenAI to demonstrate embedding and LLM capabilities. To get an OpenAI API key:
- Create an OpenAI account if you don't have one already. Otherwise, sign in.
- Navigate to the API key page and "Create new secret key". Optionally naming the key. Save this somewhere safe, and do not share it with anyone.
This app uses two datasets:
- The classic Northwind Database: Sales data for Northwind Traders, a fictitious specialty foods export/import company.
- A sample of the H&M Fashion Dataset: Real-world retail data, including customer purchases and rich information around products such as names, types, descriptions, department sections, etc.
The app has 4 pages in total, reflecting 4 GraphRAG patterns. Each page relies on one of the above datasets:
Page / GraphRAG Pattern | Dataset Used | Pattern Description |
---|---|---|
Vector Search With Graph Context | Northwind | Use graph traversals to retrieve items related to vector search results |
Text2Cypher | Northwind | Convert natural language prompts to explicit Cypher queries for retrieval |
Graph Vectors | H&M Fashion Dataset | Use graph embeddings for retrieval, incorporating both structured and unstructured data in vector similarity search |
Graph Filtering | H&M Fashion Dataset | Use graph patterns and properties to pre/post filter vector search results (can also include Hybrid search |
For the entire app to work, each dataset must be loaded into its own Neo4j database. If you choose not to load one of the datasets, the associated pages will not function which may be acceptable if those pages are not of interest to you.
To Load Northwind:
- create an empty database on a Neo4j deployment type of your choosing. Good options include a blank Neo4j Sandbox or an Aura Free instance
- Run the Cypher from
load-data/northwind-data.cypher
on that database through Neo4j Browser. At the top of that script, you will need to replace<your OpenAI API Key>
with your own OpenAI api key.
To Load the H&M Fashion Dataset:
- This dataset involves some graph machine learning stuff. As such, you will need to create an empty Neo4j database with Graph Data Science enabled. There is no Aura Free option for this. A couple good options include:
- (free) Starting a blank graph data science Neo4j Sandbox which should be sufficient for learning and exploration.
- (paid) use an AuraDS instance. This is a paid option ($1.00 USD per hour) but should run significantly faster for loading, indexing, querying, and running GDS algorithms
- Run the Notebook
load-data/hm-data.ipynb
. It will attempt to read Neo4j and Open AI credentials from a secrets.toml file. You can create that file per directions below or replace with hard-coded credentials in the notebook.
-
Create a
secrets.toml
file usingsecrets.toml.example
as a template:cp .streamlit/secrets.toml.example .streamlit/secrets.toml vi .streamlit/secrets.toml
-
Fill in the below credentials in the
secrets.toml
file.# OpenAI OPENAI_API_KEY = "sk-..." # NEO4J NORTHWIND_NEO4J_URI = "neo4j+s://<xxxxx>.databases.neo4j.io" NORTHWIND_NEO4J_USERNAME = "neo4j" NORTHWIND_NEO4J_PASSWORD = "<password>" HM_NEO4J_URI = "neo4j+s://<xxxxx>.databases.neo4j.io" HM_NEO4J_USERNAME = "neo4j" HM_NEO4J_PASSWORD = "<password>" HM_AURA_DS = false
-
Install requirements (recommended in an isolated python virtual environment):
pip install -r requirements.txt
Run the app with the command: streamlit run Home.py --server.port=80