History and the future: Deep-learning-based OCR
19 January 2021
facebook1 facebook
twitter1 twitter
linkedin1 linkedin

Written by Jiangbo Yu, AI-R&D Lead


An introduction to OCR

Optical Character Recognition (OCR) is a technology that automatically recognises the text content from images or videos into a machine-readable and processable structured character information. It plays the function of the "computer's eye", which is an essential basis for machine interaction with the real world.

OCR technology is known as the most "grounded" artificial intelligence technology. It is widely used in the financial sector, government service and various industries such as logistics, healthcare, and educational service. In the wave of digitalisation, OCR plays a vital role to help enterprises' digital transformation. It improves information collection efficiency, reduces labour costs, and significantly accelerate industrial transformation.

Scientists first proposed the conception of OCR in 1929 and started the in-depth research journey worldwide in the 1960s and 1970s. However, in the early phase of the OCR research, scientists mainly focused on studying "number recognition". With the development of the internet and the emergence of the deep learning algorithm, OCR has improved in leaps and bounds through the "training" of massive data accumulated via deep learning.

OCR has significantly broadened the scope of usages in modern times. In this blog, let's discover the difference between "Deep OCR" (the deep-learning-based OCR) and "Traditional OCR". (the early-phase OCR technology that did not combine with the deep learning algorithm).

  • Traditional OCR

    Prior to AlexNet's win at the ImageNet, traditional Computer Vision (CV) technology dominated the OCR research. At that time, a standard processing flow of the OCR contained image pre-processing[1], text (character) detection (a "bottom-up" processes)[2], character segmentation, character recognition[3], and post-processing of the recognition results.

    Traditional OCR only performs well on good-quality printed documents due to traditional CV algorithms' limitations. It lacks versatile and requires a lot of manual fine-tuning to adapt to different business scenarios. Traditional OCR's text recognition performance in complex scenarios (e.g., low resolution, blurred images, image degradation) is less than ideal.

  •  Deep OCR

    Deep learning algorithms have gained a great process on image classification and the object recognition research in the 21st century.

    In 2012, scientists introduced deep learning algorithms to OCR study. By using Convolutional Neural Networks (CNN) to replace the traditional manual feature design. Deep OCR has divided into two kinds: "the independent two-stage method" and "end-to-end text recognition method".

    "The independent two-stage method" models text detection and text recognition separately. The text detection module is mainly responsible for the detection of text regions and directions. Commonly used text recognition algorithms include classical object recognition algorithms (SSD), object segmentation algorithms (PixelLink) and specific text detection algorithms (CTPN).Different algorithms have different characteristics. For example:Segmentation-based algorithms are not limited to text size and shape but are prone to text line adhesion for text lines that are close to each other;

    The text recognition module is to perform text recognition on the detected text boxes. The primary method is to extract the text image features by CNN as input and call LSTM+CTC or Attention Mechanism to recognise the text content of indefinite length;

    The CTC technique can effectively capture the input sequences' context-dependency and solve images and text characters' alignment. However, because of the ambiguity of CTC decoding features, a large amount of training data is required to obtain the desired accuracy.;

    The Attention Mechanism-based text recognition technology has natural interpretability, effectively improving OCR models' feature representation capability by learning weights to locate to the corresponding feature vectors.

    "End-to-end text recognition system" is to integrate text detection and recognition into one network for training, which better shares the weights and optimise the overall model, and avoid the information loss in the independent two-stage method.

OCR challenges

The emergence of deep learning algorithms significantly improves the innovation of OCR technology. However, OCR technology's cognitive skill still cannot be the same as a human ability, particularly in complex usage scenarios.

Use scenarios of our daily life are complex and variable. Factors that can affect OCR's recognition ability include complex backgrounds, low resolution, character distortion, multilingual mixing, image degradation, text character mutilation and deformation, etc. The increase of OCR-embedded applications places higher demands on OCR performance than before. For example, cloud-based OCR requires low latency and high concurrency; mobile-based OCR requires compatibility and operational efficiency.

Although Deep OCR can recognise the text better, in business scenarios, we need to solve the requirement of structure the text in a picture (e.g., the text of a card, the digits of a form) and improve OCR technology with document format analysis. Fortunately, after 2017, the Natural Language Processing technology (NLP) was introduced in OCR. The combination of OCR and NLP gives OCR technology the ability to understand the text content truly. The association of semantic information can improve end-to-end OCR solutions.

Breakthroughs of ADVANCE OCR

ADVANCE OCR, developed independently by ADVANCE.AI, is widely used in ID document recognition and business form recognition. It also provides mobile-OCR quality inspection services.

ADVANCE OCR wins the sound reputation by its quick, high-accuracy and reliable product features from Southeast Asia to the global market. However, the research and development of ADVANCE OCR is not an easy journey. As we knew, ID documents of Southeast Asia are in a great verity, usually with low-quality print; the majority local people are still using low-end phones – these, all put a high demand on our research of OCR scalability.

ADVANCE.AI is committed to providing top-quality OCR service. The breakthroughs of ADVANCE OCR include:

  • Streamlined OCR development, ADVANE.AI has developed an in-house auto OCR system that can bring together data annotation, model development and complex data mining into a complete ecosystem after data collection to accelerate model iteration.

  • Automatic layout analysis can replace the traditional manual design rules to support various cards, reduce the reliance on algorithms staff, and provide greater scalability.

  • Provide OCR quality inspection SDK to perfect the whole OCR process. The SDK covers the entire process from user photo capture to cloud recognition, enabling more flexible control of the capture quality and meeting users' customisation needs and improving the quality of OCR services.

The future of Deep OCR

As a part of the "computer's eye", OCR will be used in more fields as the technology develops. When OCR is combined with NLP technology, the machine will be able to "understand" the text content and structure the text better than before.

In the future, carriers of OCR service will be more diversified, so as the terminal carriers such as smartphones, smart electronics, and cloud services. Thus, the use cost will be reduced.

At present, Deep OCR has been widely used in traditional fields such as card recognition, receipt recognition, automobile-related recognition (e.g., driver's license, driving license, license plate) and business form recognition. In the emerging industry like internet advertising, Deep OCR also began to serve image content extraction, advertisement review and user understanding analysis.

There is still room for the improvement of end-to-end OCR technology. When computers can "understand" the real-world text content better than today, it will gradually replace the tedious manual data entry task.


[1] Note: The pre-processing of OCR includes image correction, geometric transformation (perspective, distortion, rotation), deblurring and light correction.

[2] Note: The traditional text detection technology of OCR applied in two directions: connected domain and sliding window, and based on manually designed features.

[3] Note: Character recognition technology of OCR includes image classification and template matching.

Follow us:
facebook1 facebook
linkedin linkedin
Relevant company news