GitHub - dwqs/ollama-ocr: A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

inspired by imanoop7/Ollama-OCR

Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

Supported Models

LLaVA: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
Llama 3.2 Vision: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Quick Start

Prerequisites

Install Ollama
Pull the required models:

ollama pull llama3.2-vision:11b
ollama pull llava:13b
ollama pull minicpm-v:8b

Then run following command:

git clone [email protected]:dwqs/ollama-ocr.git
cd ollama-ocr
yarn or npm i
yarn dev or npm run dev

Docker Supports

you can run the demo from docker: debounce/ollama-ocr

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

Markdown Format: The output is a markdown string containing the extracted text from the image.
Text Format: The output is a plain text string containing the extracted text from the image.
JSON Format: The output is a JSON object containing the extracted text from the image.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
public		public
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
env.d.ts		env.d.ts
eslint.config.js		eslint.config.js
index.html		index.html
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License

About

Releases

Packages

Languages

License

dwqs/ollama-ocr

Folders and files

Latest commit

History

Repository files navigation

Ollama OCR for web

Supported Models

Quick Start

Prerequisites

Docker Supports

Examples

Input Image1

Output Markdown

Input Image2

Output JSON

Output Format Details

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages