lucidworks/ocr-parser-tool

Name: ocr-parser-tool

Owner: Lucidworks

Description: Tool multi-threaded OCR-scanning of a folder with Tika and Tesseract

Created: 2016-04-12 18:54:28.0

Updated: 2018-05-17 21:50:44.0

Pushed: 2016-04-15 16:50:35.0

Homepage:

Size: 47143

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ocr-parser-tool

To build, use ./gradlew clean build shadowJar on Linux, or gradlew.bat clean build shadowJar on Windows. This will produce a jar file in the directory build/libs/ called ocr-parser-tool-1.0-SNAPSHOT-all.jar.

Usage

java -jar ocr-parser-tool-1.0-SNAPSHOT-all.jar -f folder-to-scan -w 8 to scan the folder folder-to-scan using 8 threads.

For additional settings, use java -jar ocr-parser-tool-1.0-SNAPSHOT-all.jar -h


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.