Skip to content

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

License

Notifications You must be signed in to change notification settings

dwqs/ollama-ocr

Repository files navigation

inspired by imanoop7/Ollama-OCR

Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

Supported Models

  • LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
  • Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
  • MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Quick Start

Prerequisites

  1. Install Ollama
  2. Pull the required models:
ollama pull llama3.2-vision:11b
ollama pull llava:13b
ollama pull minicpm-v:8b

Then run following command:

git clone [email protected]:dwqs/ollama-ocr.git
cd ollama-ocr
yarn or npm i
yarn dev or npm run dev

Docker Supports

you can run the demo from docker: debounce/ollama-ocr

Examples

Input Image1

input-image

Output Markdown

output-markdown.png

Input Image2

input-image

Output JSON

output-json.png

Output Format Details

  • Markdown Format: The output is a markdown string containing the extracted text from the image.
  • Text Format: The output is a plain text string containing the extracted text from the image.
  • JSON Format: The output is a JSON object containing the extracted text from the image.

License

MIT

About

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published