In the ‘AI on the road’ section, our Artificial Intelligence (AI) expert Krijn Logister looks at questions customers ask him about AI during his meetings with them. In this edition: ‘What’s so AI about Optical Character Recognition (OCR)?’

Krijn, what’s so AI about OCR when OCR has been around for years?
This is indeed a question that I’m asked a lot by organisations I visit to explain what Artificial Intelligence can do for their archive. OCR, i.e. the technology that converts records into readable and searchable text, is not new, it has been around for years. But what is different today, compared to several years ago, is that the present-day OCR application integrates AI technology for fast searches of even the oldest archives. The OCR engine that we have today has tremendously improved accuracy and speed, because it uses machine learning technology.

Could you give us an example to explain this?
‘We provide our OCR/AI application called Hyarchis Search-It to various organisations in the mortgage industry. Mortgage cases are always kept on file for a long time, with twenty to thirty years being very normal. The documents in these archives were often scanned in an age when there were fewer technological aids available and there was less quality control. As a result, these documents often have many flaws, such as coffee stains, making it harder to convert them into searchable text. While the human eye will have no difficulty seeing through these flaws, a computer will struggle.

So, where does Artificial Intelligence come in?
Hyarchis Search-It puts the focus not only on the OCR feature, but also on the preprocessing and postprocessing. The tool recognises documents, automatically determines what is needed for maximum processing, and optimises the documents for OCR. The records are subsequently recognised and a ‘blind’ OCR layer is applied to the documents. Finally, the document is put back together, and all of this without making any changes to the original record.

What’s the ultimate purpose of this kind of OCR?
AI helps to make the most of the documents’ content. Search-It lets you make smart use of AI, offering a relatively easy and fast way to make your archive fully searchable. This, in turn, lets you use your content in an intelligent way. Knowing exactly what you have in your archive, you can start turning the complex data into insights.

Let’s go back to the practical side. How and where do you use this technology?
Let’s take Quion as an example, which is a large company that takes care of administrative processes for the mortgage industry. They manage complete mortgage cases for lenders, i.e. archives containing millions of documents. AI is used to convert the still unstructured content of these documents into structured data. The resulting data subsequently offers a wide range of possibilities for further service optimisation. Hyarchis and Quion have put together a roadmap, on which Search-It is the first project, followed by testing of the Hyarchis Classify tool. 

And what does Quion get out of this in the end?
Whether it be adding a home construction account or requesting that a mortgage case be sent, all emails and attachments are opened and the right process is triggered. Our AI tools make all documents searchable, following which Hyarchis Classify classifies these documents and automatically assigns them to the right workflow. The proof of concept with Hyarchis Search-it has meanwhile been approved. It will be taken into production in the second quarter of 2020.

Can we conclude that the current OCR tools are definitely a form of AI?
Yes, we can. The AI technology in OCR tools ensures even better and faster recognition and reading of document content. Besides, AI lets you turn data into knowledge. So, yes, the answer to the question is that present-day OCR functionality is indeed AI.

Author: Krijn Logister

