inspired by imanoop7/Ollama-OCR

Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

Supported Models

LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Quick Start

Prerequisites

Install Ollama
Pull the required models:

ollama pull llama3.2-vision:11b
ollama pull llava:13b
ollama pull minicpm-v:8b

Then run following command:

git clone git@github.com:dwqs/ollama-ocr.git
cd ollama-ocr
yarn or npm i
yarn dev or npm run dev

Docker Supports

you can run the demo from docker: debounce/ollama-ocr

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

Markdown Format: The output is a markdown string containing the extracted text from the image.
Text Format: The output is a plain text string containing the extracted text from the image.
JSON Format: The output is a JSON object containing the extracted text from the image.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License