The Optical Character Recognition (OCR) system consists of a comprehensive neural network built using Python and TensorFlow that was trained on over 115,000 wordimages from the IAM On-Line Handwriting Database (IAM-OnDB). The neural network consists of 5 Convolutional Neural Network (CNN) layers, 2 Recurrent Neural Network (RNN) Layers, and a final Connectionist Temporal Classification (CTC) layer. As the input image is fed into the CNN layers, a non-linear ReLU function is applied to extract relevant features from the image. The ReLU function is preferred due to the lower likelihood of a vanishing gradient (which arises when network parameters and hyperparameters are not properly set) relative to a sigmoid function. In the case of the RNN layers, the Long Short-Term Memory (LSTM) implementation is used due to its ability to propagate information through long distances. The CTC is given the RNN output matrix and the ground truth text to compute the loss value and the mean of the loss values of the batch elements is used to train the OCR system. This means is fed into an RMSProp optimizer which is focused on minimizing the loss, and it does so in a very robust manner. For inference, the CTC layer decodes the RNN output matrix into the final text. The OCR system reports an accuracy rate of 95.7% for the IAM Test Dataset, but this accuracy falls to 89.4% for unseen handwritten doctors’ prescriptions.
Stars
16
Forks
2
Watchers
16
Open Issues
5
Overall repository health assessment
No package.json found
This might not be a Node.js project
13
commits
2
commits
Merge pull request #3 from ronitkathuria15/dependabot/pip/tensorflow-1.15.2
4a6d7f2View on GitHubMerge pull request #4 from ronitkathuria15/dependabot/pip/werkzeug-0.15.3
7926443View on GitHubMerge pull request #1 from ronitkathuria15/README-edits
c6c06b2View on GitHub