Applied artificial intelligence: classification and error correction

In their work processes, our customers always need to be able to record documents from external sources in the document management system (DMS). The documents also need to be classified and indexed, which is a manual operation in most processes, and a manual operation means expensive. Given that a manual operation is also prone to errors, the demand for automating this is high.

In 2005, Hyarchis started testing software for automatic document classification, with an external system at that time. A drawback of this software was that it could not handle the wide variety of documents, meaning you quickly encountered certain limitations. Over the years, Hyarchis has tried out various systems and applied some of these to particular areas. Each of these systems has its advantages and disadvantages.

With the knowledge it had acquired of the various packages and, especially, having seen what doesn’t work, Hyarchis started a Proof of Concept (PoC) for document classification and data mining of key information in the mortgage process. A prerequisite for this PoC was the ability to distinguish between a deed for a closed-end mortgage (vaste hypotheek) and an open-end mortgage (bankhypotheek). The difference between these two forms of deeds is very small and can only be distinguished using very sophisticated logic.

For this PoC, Hyarchis developed its own technology and logic, which it devised by putting 3 million documents (for a total of 13 million pages) through various systems. After determining the type of mortgage, the system searches for the effective date of the deed and the associated contract number and checks this data against known values in the external customer systems. With the enhanced logic of this “cross-system consistency check”, we were able to guarantee the reliability of information found.

Where can this process be used?

In this PoC, we have used this process to enrich existing data. The system could, however, also be used to verify data that has previously been gathered, in which case it could be used for correcting classification and indexing errors.

The larger the volume of documents the system can analyse, the more the system learns, and with this enhanced logic the better it can automatically process the document flow. Using people to look through 13 million pages would be a Herculean task. What’s more, people are more prone to make mistakes that an automatic system simply would not.

What are the advantages of this system for the customer?

Two major advantages associated with this process are the reduction in costs and the time savings. And to save even more time, multiple processes can be run in parallel. Besides the applications mentioned above for this PoC, there are many more possibilities. Anything related to classification, data extraction, look up, verification and making connections using the underlying logic can be carried out automatically.

Can this also be carried out using the DMS full-text search option?

No. Searching for text is just one part of the process. This concerns a system into which enhanced logic and additional process steps have been built. The results of the PoC have been so successful that we have decided to further develop the tools for use in a new module for the Hyarchis DMS solution. If you have any questions, we’d be happy to discuss these with you.