Processing utility bills automatically is possible thanks to artificial intelligence. Specifically, the methodology by which all key information is extracted and obtained from utility bills, which is useful for various processes.
This article shows how to process utility bills automatically. Specifically, how to extract and obtain key information from bills related to supply and consumption useful for numerous processes. The main aspects covered in the article are summarized in the following list:
Let's see right away in what scenarios it can benefit to process bills automatically.
Having the information (in structured format) contained in utility bills proves useful in many processes and application scenarios, for example:
Bills fall into the category of documents semi-structured, in fact each provider defines the format to be used at will. The various formats typically contain a very similar if not equal set of information. The same provider, depending on the type of supply, may change bill formats over time. Given the complexity of the documents involved and the large number of different formats, numerous complications arise that penalize traditional solutions, limiting their accuracy and thus the degree of automation of the entire system. The following are some examples:
These are just some of the problems associated with automated bill processing. The article follows by describing methodological alternatives for solving the problem. For ease of discussion, the article will deal with Italian energy bills without loss of generality.
The most important information to extract involves consumption, supply details, and contract holder information. These typically can be found in different formats and units. Another aspect that complicates the use case is the amount of relevant information, in fact we are talking about more than 25 different fields. The main information of interest is reported below: tariff, type of consumption, total consumption, cost of energy commodity, reporting period, POD, total payable, provider data, consumption bands (such as f0,f1,f2,f3), recipient and holder data, supply data (voltage, committed power, etc. )
To properly process such a document, therefore, several features are required to be used in synergy: extraction of key-value information, interpretation of tabular data, and classification of bill type (gas, electricity, etc. ).
Manual extraction of data from energy bills (but the same is true for any bill) is costly, time-consuming, and error-prone. The processing steps require skilled people who can identify relevant information in the document and extract it consistently from sometimes complex layouts. Some challenges and issues related to manual processing include:
The processing of bills using traditional OCR techniques and template matching/regex is a decidedly ill-advised and wasteful approach. This is because it is necessary to have an ad hoc set of rules and templates for each document type. The formats are many and the vendors are potentially undefined in number a priori. The languages to be considered are often numerous for a solution that must work in processes with global reach. This makes the number of rules or templates needed decidedly numerous and constantly changing as new formats and countries are considered. All this results in a high setup and maintenance cost for the solution and often poor performance. In addition, maintenance and configuration of the solution must be done by trained resources with technical training.
In general, all the problems presented in the use case description plague both manual and traditional approaches. This has led to the need for higher performance solutions that solve the complications recounted so far. Thanks to recent developments in the field of AI and particularly Deep Learning, higher quality results can be achieved. In addition, time and cost are lowered in each step of the pipeline. Starting from OCR capable of learning, improving over time and transcribing even handwritten documents to semantic analysis and interpretation of tabular data (and much more). The set of techniques based on artificial neural networks for comprehensive document processing is commonly called Intelligent Document Processing.
A modern approach based on Deep Learning techniques is the best choice for solving such problems. In fact, the ability to use the best techniques of Computer Vision for document analysis and reading, and NLP for natural language understanding, makes it possible to solve previous problems. It is not necessary to adapt the solution each time (writing new rules or configuring new templates). It is sufficient to have a sufficient amount of data belonging to the process to instruct the system.
Another advantage is the ability to apply the same approach to solving different tasks, such as: key-value data extraction, tabular data extraction, and document classification. Such an approach can also benefit strongly from the human validation step. The latter consists not only of correcting errors made by the system, but also enables continuous learning of the algorithm. In doing so, the algorithm improves over time and calibrates itself on the specific process.
Compared to traditional solutions, the maintenance and evolution of the system is also simplified. In fact, adding a new field that you want to extract, a document category to classify, or wanting to add a new language among those supported does not involve writing code. The collection of new documents will be sufficient, and subsequent retraining of the system can be easily followed even by non-technical resources. Finally, the most effective IDP solutions allow for unprecedented accuracy of results far surpassing traditional approaches.
myBiros is a performant, easy-to-use and versatile Intelligent Document Processing solution that enables automatic document processing. Core functionalities are information extraction and automatic document classification. All this is offered through a prebuilt set of ready-to-use APIs with pre-trained templates for common use cases and the ability to retrain the entire pipeline (both the OCR engine and the document interpretation system) for custom cases.
By leveraging advanced deep learning techniques that analyze multimodal features, it is possible to process all document types with a single solution. The system uses pre-trained models, data-augmentation techniques, and for that reason can be trained with a small volume of data allowing even processes involving a small volume of documents to be automated.
This solution includes a scoring mechanism: in fact, the system reduces false positives by enabling the ability to review low confidence data while minimizing errors. Interaction with a human user enables the system to correct errors while continuing to train the system so that past mistakes are not repeated(human in the loop and continuous learning). Finally, the high scalability of the cloud-based architecture makes it possible to process highly variable masses of documents without having to allocate expensive resources in advance.
Additional features include the ability to process tabular data, identify artifacts present in the image, and the ability to process heterogeneous and multi-language documents with a single pipeline.
The features mentioned so far enable myBiros to perform optimally in bill processing. By effectively and quickly managing to identify all relevant information. If you are curious about how myBiros works in order to simplify bill processing, please contact us and try our demo. We are ready to help you!
Below you will find a glossary that lists and defines essential terms for understanding and making the most of intelligent document automation.
Read it nowEvery business department involves document management, which is necessary to record information, communicate with customers and suppliers, and store important data. If done manually, these activities expose the company to numerous risks.
Read it nowErrors due to manual data entry come at a significant cost to companies. It is important to invest in reliable data entry processes and proper quality controls so that errors and subsequent costs can be remedied.
Read it nowCustomer onboarding is the process by which a company introduces a new customer to its product or service. The following article explains what digital onboarding is, its automation, and its benefits.
Read it nowDigital transformation includes implementing innovative technologies and redefining business processes to automate.
Read it nowMany companies still manage expenses manually, causing low employee productivity. Today, expense management can be automated, reducing time, cost, and repetitive tasks that cause frustration.
Read it now