Search Architecture

Introduction

This document describes the detailed architecture and system requirements for the Search application. The Search application is a web-based application that allows users to search for information anything related to life sciences and healthcare.

This application is divided into two main components:

Data Ingestion Services
Search Web Services

Data Ingestion Services

The Data Ingestion Services component is responsible for ingesting the data from various sources and storing it in the responsible database. The component is divided into the following sub-components:

Clinical Trials Data Ingestion
Pubmed Data Ingestion
More to come...

Clinical Trials

Link to Excalidraw Canvas for High Level Architecture: https://excalidraw.com/#room=fc98786bbbb1ff061bb2,wnOTLykNfm9eZsrxGhagAg

High Level Architecture

Pubmed

High Level Architecture

Infrastructure High Level Architecture

Search Web Services

The Search Web Services component is responsible for providing the search functionality to the users. The component is divided into the following sub-components:

Frontend
API Server
Agency Modules
1. Clinical Trials
2. Drugs
3. Pubmed
4. Web Search
5. More to come...

User Flow

Features and User Actions

Sign Up: Users can sign up for the application using their email address or google account. After the sign-up, users will optioanlly provide their personal information e.g. about, location, profile picture, survey questions etc.
Sign In: Users can sign in to the application using their email address or google account.
Search: Users can search with any query and the system will display the search results and related sources.
Threads: All of the searches will be stored as a thread. Users can search for more queries in a existing thread or start a new search thread. Users can rename the thread.
History: Users can view their search history as threads.
Collections: Users can create collections and view them in a separate page. Users can add search results and sources in the collections. Users can also share the collections with other users. Users can also modify and delete the collections.
Profile: Users can view their profile and update their personal information.
Settings: Users can update their password, preferences and other settings.

High Level Architecture

Single Search Query Flow

Data Models

Frontend

Technologies:

Next.js
Tailwind CSS
Axios API Client

API Server

Technologies:

Rust
Axum Web Framework
Postgres Database
Redis Cache
gRPC Client

Agency Modules

Technologies:

Python
gRPC Server
Llama Index
Qdrant Vector Database
Postgres Database
Nebula Graph Database

System Requirements

6 Months Usage Projection:

Users - 5K DAU, 50K Total Users
Search - 50K Per Day, 20 RPS at Peak, 10M Total Searches
Sources - 5 Unique Sources Per Search, 250K Daily Unique Sources, 50M Total Sources
New Collection - 2K Per Day, 0.5M Total Collections
New Thread - 5K Per Day, 1M Total Threads
Other Actions (View/Modify History, Collections, Profile) - 100K Per Day

Data Entities and Size:

User - 0.1 KB
Thread - 0.05 KB
Search - 1 KB (1K Words)
Sources - 0.2 KB (100 Words)
Collection - 0.1 KB

Data Storage

User - 5 MB
Thread - 50 MB
Search - 10 GB
Sources - 10 GB
Collection - 50 MB
Other Tables - 1 GB

Total Storage in 6 Months: 25 GB

API Endpoints

Users

POST /signup
POST /signin
GET /users/profile
PATCH /users/profile
PATCH /users/password
PATCH /users/preferences

Search

GET /search?query=<query>&thread_id=<thread_id>
GET /history/search?search_id=<search_id>
GET /history/threads?thread_id=<thread_id>&limit=<limit>&offset=<offset>
GET /history/threads/all?limit=<limit>&offset=<offset>

Sources

GET /sources?source_id=<source_id>

Threads

PATCH /threads?thread_id=<thread_id>

Collections

POST /collections
GET /collections?collection_id=<collection_id>&limit=<limit>&offset=<offset>
PATCH /collections?collection_id=<collection_id>
DELETE /collections?collection_id=<collection_id>
GET /collections/all?limit=<limit>&offset=<offset>
PUT /collections/items?collection_id=<collection_id>
DELETE /collections/items?collection_id=<collection_id>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Search Architecture

Table of Contents

Introduction

Data Ingestion Services

Clinical Trials

Link to Excalidraw Canvas for High Level Architecture: https://excalidraw.com/#room=fc98786bbbb1ff061bb2,wnOTLykNfm9eZsrxGhagAg

High Level Architecture

Pubmed

High Level Architecture

Infrastructure High Level Architecture

Search Web Services

User Flow

Features and User Actions

High Level Architecture

Single Search Query Flow

Data Models

Frontend

API Server

Agency Modules

System Requirements

6 Months Usage Projection:

Data Entities and Size:

Data Storage

API Endpoints

Users

Search

Sources

Threads

Collections

Files

README.md

Latest commit

History

README.md

File metadata and controls

Search Architecture

Table of Contents

Introduction

Data Ingestion Services

Clinical Trials

Link to Excalidraw Canvas for High Level Architecture: https://excalidraw.com/#room=fc98786bbbb1ff061bb2,wnOTLykNfm9eZsrxGhagAg

High Level Architecture

Pubmed

High Level Architecture

Infrastructure High Level Architecture

Search Web Services

User Flow

Features and User Actions

High Level Architecture

Single Search Query Flow

Data Models

Frontend

API Server

Agency Modules

System Requirements

6 Months Usage Projection:

Data Entities and Size:

Data Storage

API Endpoints

Users

Search

Sources

Threads

Collections