Name: ocr-parser-tool
Owner: Lucidworks
Description: Tool multi-threaded OCR-scanning of a folder with Tika and Tesseract
Created: 2016-04-12 18:54:28.0
Updated: 2018-05-17 21:50:44.0
Pushed: 2016-04-15 16:50:35.0
Size: 47143
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
To build, use ./gradlew clean build shadowJar
on Linux, or gradlew.bat clean build shadowJar
on Windows. This will produce a jar file in the directory build/libs/
called ocr-parser-tool-1.0-SNAPSHOT-all.jar
.
java -jar ocr-parser-tool-1.0-SNAPSHOT-all.jar -f folder-to-scan -w 8
to scan the folder folder-to-scan
using 8
threads.
For additional settings, use java -jar ocr-parser-tool-1.0-SNAPSHOT-all.jar -h