- Search Architecture
This document describes the detailed architecture and system requirements for the Search
application. The Search
application is a web-based application that allows users to search for information anything related to life sciences and healthcare.
This application is divided into two main components:
- Data Ingestion Services
- Search Web Services
The Data Ingestion Services
component is responsible for ingesting the data from various sources and storing it in the responsible database. The component is divided into the following sub-components:
- Clinical Trials Data Ingestion
- Pubmed Data Ingestion
- More to come...
Link to Excalidraw Canvas for High Level Architecture: https://excalidraw.com/#room=fc98786bbbb1ff061bb2,wnOTLykNfm9eZsrxGhagAg
The Search Web Services
component is responsible for providing the search functionality to the users. The component is divided into the following sub-components:
- Frontend
- API Server
- Agency Modules
- Clinical Trials
- Drugs
- Pubmed
- Web Search
- More to come...
- Sign Up: Users can sign up for the application using their email address or google account. After the sign-up, users will optioanlly provide their personal information e.g.
about
,location
,profile picture
,survey questions
etc. - Sign In: Users can sign in to the application using their email address or google account.
- Search: Users can search with any query and the system will display the search results and related sources.
- Threads: All of the searches will be stored as a thread. Users can search for more queries in a existing thread or start a new search thread. Users can rename the thread.
- History: Users can view their search history as threads.
- Collections: Users can create collections and view them in a separate page. Users can add search results and sources in the collections. Users can also share the collections with other users. Users can also modify and delete the collections.
- Profile: Users can view their profile and update their personal information.
- Settings: Users can update their password, preferences and other settings.
Technologies:
- Next.js
- Tailwind CSS
- Axios API Client
Technologies:
- Rust
- Axum Web Framework
- Postgres Database
- Redis Cache
- gRPC Client
Technologies:
- Python
- gRPC Server
- Llama Index
- Qdrant Vector Database
- Postgres Database
- Nebula Graph Database
- Users - 5K DAU, 50K Total Users
- Search - 50K Per Day, 20 RPS at Peak, 10M Total Searches
- Sources - 5 Unique Sources Per Search, 250K Daily Unique Sources, 50M Total Sources
- New Collection - 2K Per Day, 0.5M Total Collections
- New Thread - 5K Per Day, 1M Total Threads
- Other Actions (View/Modify History, Collections, Profile) - 100K Per Day
- User - 0.1 KB
- Thread - 0.05 KB
- Search - 1 KB (1K Words)
- Sources - 0.2 KB (100 Words)
- Collection - 0.1 KB
- User - 5 MB
- Thread - 50 MB
- Search - 10 GB
- Sources - 10 GB
- Collection - 50 MB
- Other Tables - 1 GB
Total Storage in 6 Months: 25 GB
POST /signup
POST /signin
GET /users/profile
PATCH /users/profile
PATCH /users/password
PATCH /users/preferences
GET /search?query=<query>&thread_id=<thread_id>
GET /history/search?search_id=<search_id>
GET /history/threads?thread_id=<thread_id>&limit=<limit>&offset=<offset>
GET /history/threads/all?limit=<limit>&offset=<offset>
GET /sources?source_id=<source_id>
PATCH /threads?thread_id=<thread_id>
POST /collections
GET /collections?collection_id=<collection_id>&limit=<limit>&offset=<offset>
PATCH /collections?collection_id=<collection_id>
DELETE /collections?collection_id=<collection_id>
GET /collections/all?limit=<limit>&offset=<offset>
PUT /collections/items?collection_id=<collection_id>
DELETE /collections/items?collection_id=<collection_id>