Name: ocrobin
Owner: NVIDIA Research Projects
Description: null
Created: 2018-04-11 19:28:40.0
Updated: 2018-05-08 06:11:07.0
Pushed: 2018-04-22 04:30:10.0
Homepage: null
Size: 25107
Language: Jupyter Notebook
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Automatic binarization using deep learning.
This implements a grayscale-to-binary pixel-for-pixel transformation. The models it is usually used with perform some denoising and deblurring, but they are small enough not to contain any significant shape priors. The use of 2D LSTMs in the binarization model allows for some modeling of global noise and intensity properties.
ab inline
image", cmap="gray", interpolation="bicubic")
Populating the interactive namespace from numpy and matplotlib
rt ocrobin
ocrobin.Binarizer("bin-000000046-005393.pt")
odel
Sequential(
(0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
(2): ReLU()
(3): LSTM2(
(hlstm): RowwiseLSTM(
(lstm): LSTM(8, 4, bidirectional=1)
)
(vlstm): RowwiseLSTM(
(lstm): LSTM(8, 4, bidirectional=1)
)
)
(4): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1))
(5): Sigmoid()
)
ize(10, 10)
e = mean(imread("testdata/sample.png")[:, :, :3], 2)
ry = bm.binarize(image)
lot(121); imshow(image)
lot(122); imshow(binary)
<matplotlib.image.AxesImage at 0x7fdca6651790>
lot(121); imshow(image[400:600, 400:600])
lot(122); imshow(1-binary[400:600, 400:600])
<matplotlib.image.AxesImage at 0x7fdca6576910>
Training data for ocrobin-train
is stored in tarfiles, with binary images and corresponding grayscale images.
sh
-ztvf testdata/bindata.tgz | sed 5q
drwxrwxr-x tmb/tmb 0 2018-04-17 10:27 ./
-rw-rw-r-- tmb/tmb 391766 2018-04-10 09:35 ./A001BIN.bin.png
-rw-rw-r-- tmb/tmb 6021129 2018-04-10 09:35 ./A001BIN.gray.png
-rw-rw-r-- tmb/tmb 226629 2018-04-10 09:36 ./A002BIN.bin.png
-rw-rw-r-- tmb/tmb 2685607 2018-04-10 09:36 ./A002BIN.gray.png
tar: write error
The training data is actually artificially generated; document image degradation for this kind of training works quite well at simulating real data.
dlinputs import tarrecords
le = tarrecords.tariterator(open("testdata/bindata.tgz")).next()
le["__key__"]
lot(121); imshow(sample["gray.png"])
lot(122); imshow(sample["bin.png"])
<matplotlib.image.AxesImage at 0x7fdc386ec390>
You can use the ocrobin-train
binary to carry out the training.
sh
robin-train -d testdata/bindata.tgz -o temp