#

extract-text

Here are 48 public repositories matching this topic...

dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

nodejs extraction extract-text

Updated Oct 5, 2022
HTML

pd3f

pd3f / pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

python pdf machine-learning ocr pipeline text-extraction pdf-to-text language-model extract-text parsr pd3f

Updated Oct 13, 2023
HTML

ropensci-archive / fulltext

⚠️ ARCHIVED ⚠️ Search across and get full text for OA & closed journals

metadata pdf r xml open-access rstats text-ming r-package crossref extract-text

Updated Sep 9, 2022
R

opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

Updated Oct 9, 2022
Python

KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform

tika extract-text

Updated Apr 13, 2024
Rich Text Format

PDFs-TextExtract

ahmedkhemiri95 / PDFs-TextExtract

Multiple and Large PDF Documents Text Extraction.

python pdf parser data-science pdf-document text-analytics pdfs pypdf2 extract-text pdfminer pdf-processing pdfs-textextract

Updated Feb 2, 2024
Python

nojimage / twitter-text-php

Twitter text processing library (auto linking and extraction of usernames, lists and hashtags). Based on the Ruby and Java implementations by Matt Sanford

hashtag twitter php-library autolink extract-text

Updated Jul 12, 2023
PHP

lu4p / cat

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

cat go golang cross-platform text-extraction extract-text pdftotext docx2txt textextracting rtf-to-text pdf2txt odt2txt

Updated Nov 25, 2023
Go

zetahernandez / pdf-to-text

Read pdf files on javascript

javascript pdf extract-text pdftotext text-pdf

Updated Mar 11, 2020
JavaScript

BitMiracle / Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

Updated Dec 23, 2024
Visual Basic .NET

ropensci / antiword

R wrapper for antiword utility

r rstats r-package extract-text antiword

Updated Oct 3, 2024
C

ropensci / rtika

R Interface to Apache Tika

java r parse tika tesseract rstats pdf-files r-package extract-text extract-metadata peer-reviewed

Updated May 4, 2023
R

pdftron-document-search

ApryseSDK / pdftron-document-search

Build search across multiple documents client-side in your file storage

extract-text algolia-instantsearch seach-documents search-pdf search-office-text

Updated Mar 30, 2023
JavaScript

OpenJarbas / simple_NER

simple rule based named entity recognition

nlp extract-information information-extraction named-entity-recognition keywords annotator ner nlp-library extract-text nlp-keywords-extraction annotation-tool ner-entities

Updated Feb 14, 2022
Python

AllanCameron / PDFR

An R package to extract text from pdf.

pdf data-scientists extract-text pdf-format

Updated May 5, 2023
C++

maxim2266 / OCR

A collection of tools for OCR (optical character recognition).

c linux ocr tesseract bash-script extract-text ocr-recognition

Updated Oct 17, 2024
C

Zoltanar / Happy-Reader

VNDB explorer and VNR-like text hooker.

wpf visual-novels extract-text vndb game-launcher vnr ithvnr translation-apis text-hooking

Updated Nov 20, 2024
C#

bhattbhavesh91 / google-vision-api-for-ocr-demo

Repo which contains a small demo to Extract Text from image OCR using Google Vision API in Python

python demo google-vision-api extract-text google-vision google-ocr image-ocr

Updated Jun 21, 2021
Jupyter Notebook

sgerwk / pdftoroff

view pdf on X11 and the Linux framebuffer; resize pdf; convert pdf to text, html, TeX, groff

html pdf tex accessibility framebuffer pdf-viewer pdf-files extract-text small-screen groff linux-framebuffer two-columns pdf-scale small-page pdf-resize

Updated Dec 15, 2023
C

devmehq / extract-text

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

pdf ocr extractor tesseract-ocr extract-text tessaract

Updated Dec 28, 2024
HTML

Improve this page

Add a description, image, and links to the extract-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the extract-text topic, visit your repo's landing page and select "manage topics."