-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revert "Merge pull request #852 from project-anuvaad/develop"
- Loading branch information
1 parent
456d721
commit 1fcb7d8
Showing
2 changed files
with
25 additions
and
55 deletions.
There are no files selected for viewing
58 changes: 14 additions & 44 deletions
58
anuvaad-etl/anuvaad-extractor/document-processor/ocr/tesseract_ulca_v2/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,59 +1,29 @@ | ||
|
||
# Anuvaad OCR | ||
|
||
Open source OCR models for Indic Languages (Printed), developed and used as part of project Anuvaad. | ||
Repo contains tesseract service with REST interface, which is ULCA compliant: | ||
|
||
A tesseract service with rest interface: | ||
input : image url | ||
ouput : [sentences] | ||
|
||
Hindi and Tamil use custom weights | ||
|
||
detection of language and downloads tess-best weights if not already avilable | ||
detection of language and downloading tess-best weights if not already avilable | ||
|
||
**Sample curl** : | ||
sample curl : | ||
|
||
|
||
|
||
curl --location 'http://localhost:5000/anuvaad/ocr/v0/ulca-ocr' \ | ||
--header 'Content-Type: application/json' \ | ||
--data '{ | ||
"image" : [ | ||
{ | ||
"imageUri": "https://anuvaad-raw-datasets.s3-us-west-2.amazonaws.com/anuvaad_ocr_hindi.jpg" | ||
} | ||
], | ||
"config": { | ||
"languages": [{ | ||
"sourceLanguage" : "hi" | ||
}] | ||
} | ||
}' | ||
' | ||
|
||
**Sample Response:** | ||
```json | ||
{ | ||
"output" : [ | ||
{ | ||
"source" : "बिपिन रावत का एक माचिस की डिबिया के कारण हुआ था" | ||
curl --location --request POST 'http://0.0.0.0:5000/anuvaad/ocr/v0/ulca-ocr' \ | ||
--header 'Content-Type: application/json' \ | ||
--data-raw '{ | ||
"config": { | ||
"language": { | ||
"sourceLanguage": "en" | ||
} | ||
], | ||
"status" : { | ||
"statusCode" : 200 , | ||
"message" : "success" | ||
}, | ||
"imageUri": ["https://anuvaad-raw-datasets.s3-us-west-2.amazonaws.com/anuvaad_ocr_english.jpg" | ||
|
||
] | ||
} | ||
} | ||
|
||
``` | ||
**Deployment** | ||
## **Deployment** | ||
|
||
|
||
```shell | ||
' | ||
|
||
docker build -t anuvaad_ocr_ulca_v2 . | ||
docker run --name anuvaad_ocr_ulca_v2 -d --network host anuvaad_ocr_ulca_v2 | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters