This article describes how to process identity documents automatically using myBiros. Specifically, how to manage data extraction. This service can be useful in many application scenarios, such as customer onboarding. In this case, the automatic processing of identity documents can simplify the customer’s registration process.
Use case description
Identity documents (identity card, passport, driver’s license, health card) are classified as structured documents. This is why identity documents can be processed with traditional solutions based on rules and templates applied to the output of an OCR engine.
However, for this type of document, there are a number of complications that do not allow conventional solutions to completely automate the use case. This is due to:
- Document acquisition is not always guided. This can result in low-quality, rotated documents that are hard to read. As a result, rules-based approaches cannot be used to resolve the use case in question;
- Even if a single identity document is easy to interpret, every nation has at least one identity document. If we think of a world-wide use case, we are talking about hundreds of different documents. To adopt traditional solutions, each document needs a different set of rules and templates;
- Documents change format over time, necessitating additional rules and templates.
Deep learning techniques can be used to solve the problems, eliminating the need to create rules and templates. See how myBiros does it.
The article focuses on the use case of the Italian identity card.
Build a use case with myBiros
Through the deep learning techniques used by myBiros, the need for rules and templates can be avoided using a data-driven approach. As part of its technological solution, MyBiros does not only analyze the position of the field to extract information, but also its semantics, its geometry and its layout.
It is quite easy to create a use case with myBiros and consists of the following steps:
- Collection of documents;
- Data annotation;
- Service release and performance test.
Collection of documents
The first step is the collection of a small sample of reference documents: 10 Italian IDs. Documents must be collected because AI algorithms learn from training data.
During the annotation stage, it is possible to generate training data. Thanks to the myBiros no code interface, you can specify interesting information by simply clicking on the data you want to extract. In addition, myBiros Artificial Intelligence attempts to suggest information of interest to extract, thus accelerating the process.
At this point, the algorithm draws on information prepared during the annotation phase.This stage is fully automatic and within a few hours you will get the new trained model. During the training, with myBiros it is possible a double choice: train the AI algorithm from scratch or choose one of the preconfigured models to speed up training and increase accuracy.
During this step, the model evaluation parameters can be observed with respect to the accuracy of the extracted data.
Service release and performance test
At the end of the training, the service can be tested via a simple interface that lets you see the results. This allows us to evaluate the performance of the model.The interface releases the new API for the created use case, showing a few examples of possible integration.
Results and benefits
The result is an artificial intelligence model that can understand the relevant information specified in the training phase.Everything is encapsulated in a remote API.
The benefits of setting up a case for extracting data from documents are:
- errors reduction;
- eliminating repetitive and alienating activities;
- time and cost savings;
- more accurate and reliable data;
- lean processes