How to process energy bills automatically

Automatic processing of energy bills is now possible thanks to artificial intelligence. This technology enables the extraction and retrieval of all key information from bills, facilitating various processes.

Francesco Cavina
Francesco Cavina
CEO & Co-Founder

This article explains how to process bills automatically, specifically focusing on extracting key information related to supply and consumption, which is useful for numerous processes. The main aspects covered in the article are summarized as follows:

  • Application scenarios
  • Description of the use case and challenges
  • Key information to extract
  • Processing alternatives
  • myBiros: a modern approach to IDP
  • Conclusions

Let's dive into the scenarios where automatic bill processing can be especially beneficial.

Application scenarios

Having the information from energy bills available in a structured format is beneficial for many processes and application scenarios, including:

  • Bill comparison
  • Signing new energy contracts
  • Processing and controlling all received energy bills, with data available to support supply tenders
  • Producing analyses, reports, and KPIs to support energy efficiency initiatives
  • Automatic accounting and reporting of all invoices
  • Assisting in the creation of forecast budgets and monthly expenditure allocations

This version organizes the scenarios in a clear and structured way, making it easy to understand the potential applications of automated bill processing.

Description of the use case

Energy bills fall into the category of semi-structured documents, as each provider can define the format at will. While different formats typically contain a similar set of information, even the same provider may change the bill format over time, depending on the type of supply. Given the complexity of these documents and the large number of formats, traditional solutions face numerous challenges, limiting their accuracy and thus the overall automation of the system. Here are some examples:

  • Non-standardized document acquisition: Bills can be acquired from various sources, resulting in heterogeneous formats, such as digital PDFs, scans, or photographs. This diversity complicates processing, especially when documents are captured via smartphones, which often produce blurry or rotated images that are difficult to read. These issues reduce the effectiveness of traditional approaches and limit data entry efficiency. To address this, image normalization steps are required, adding complexity to the pipeline and reducing the flexibility of traditional methods.
  • Variety of bill formats: Each provider, and even each type of supply, has its own bill format. The relevant information is abundant, and its position within the document varies. In an international context, it is also necessary to handle changes in language and units of measurement. The number of formats and field positions to consider can become quite large, further complicating the process. Moreover, bill formats frequently change over time.
  • Data within tables or images: Some crucial information is embedded within tables or images, adding another layer of complexity to the processing.

These are just a few of the challenges associated with automating bill processing. The article goes on to discuss methodological alternatives for addressing these issues. For simplicity, the focus will be on Italian energy bills, though the discussion applies more broadly to other contexts as well

Relevant information to extract

The most important information to extract from energy bills includes consumption data, supply details, and contract holder information, which may appear in various formats and units of measurement. One of the main challenges is the volume of relevant information, with over 35 different fields to consider. Below is a list of the key information typically extracted: rate, type of consumption, total consumption, cost of the raw material energy, reference period, POD, total to pay, data relating to the provider, consumption ranges (such as f0, f1, f2, f3), data of the recipient and the holder, data relating to the supply (voltage, committed power, etc.)

To efficiently process a document of this nature, multiple functionalities must work in synergy, including key-value information extraction, tabular data interpretation, and classification of the bill type (e.g., gas, electricity, etc.).

Processing alternatives

Manual approach

Manually extracting data from energy bills (or any type of bill) is a time-consuming and error-prone process that can become increasingly costly as a business grows. Qualified personnel are required to consistently identify and extract relevant information from often complex layouts. Some of the challenges associated with manual processing include:

  1. Cost issues
    While manual data extraction may be feasible for small businesses with limited document volumes, it becomes expensive as the business scales. Beyond the costs of hiring additional staff for data extraction, there are hidden costs such as coordination issues when expanding the workforce. As more employees are involved, the chances of errors increase, particularly in identifying and entering data. Data validation, a critical phase of the process, adds further costs. Without verification steps, error rates can reach up to 4%. According to the 1-10-100 rule in data entry, verifying data accuracy at the point of entry costs about $1, correcting errors by rechecking the entire batch costs $10, and undetected errors can cost the company $100 or more.
  2. Time issues
    Manual data extraction is also highly time-consuming, especially in global supply chains where more controls, approvals, and coordination across teams in different countries are required. The involvement of multiple stakeholders across various levels of hierarchy adds complexity to the integration of processing and verification steps, further slowing the workflow.
  3. Human fallibility
    The repetitive and tedious nature of data entry can be demoralizing, increasing the likelihood of mistakes. Additionally, energy bills lack a standard format. While the essential information is present in all bills, each vendor uses a different layout with significant spatial variability, making manual extraction difficult. Linguistic variations between the place of issue and the place of delivery add another layer of complexity to data comprehension. These factors combined increase the chances of introducing errors into the process.

Traditional OCR solutions

Using traditional OCR techniques combined with template matching or regex for bill processing is highly discouraged and costly. This approach requires a set of specific rules and templates for each document type, which quickly becomes unsustainable. There are numerous formats, and the number of vendors is not predetermined, especially for global operations. Additionally, with multiple languages to handle, the number of rules and templates needed becomes vast and constantly changing as new formats and countries are introduced. This leads to high setup and maintenance costs, and the performance of such systems is often subpar. Moreover, maintaining and configuring these solutions requires trained resources with technical expertise.

In general, the issues outlined in the use case description affect both manual and traditional processing approaches. As a result, the need for more efficient, high-performing solutions has emerged. Recent advances in artificial intelligence (AI), particularly in Deep Learning, have made it possible to achieve higher-quality results while reducing time and costs at every step of the pipeline. From OCR systems that learn and improve over time—capable even of transcribing handwritten documents—to semantic analysis and the interpretation of tabular data, these AI-driven methods offer a powerful alternative. The collection of techniques based on artificial neural networks used for comprehensive document processing is known as Intelligent Document Processing (IDP).

Intelligent Document Processing (IDP)

A modern approach based on Deep Learning techniques is the optimal solution for addressing these challenges. By utilizing advanced Computer Vision techniques for document analysis and reading, alongside Natural Language Processing (NLP) for understanding text, this approach effectively resolves the limitations of traditional methods. There is no need to constantly adapt the system by writing new rules or configuring templates. Instead, it is sufficient to provide a relevant dataset to train the system.

One of the key advantages of this approach is its versatility—it can be applied to various tasks such as key-value data extraction, tabular data extraction, and document classification. Furthermore, this method can greatly benefit from human validation, where not only are the system's errors corrected, but the algorithm also continues to learn. Over time, this allows the algorithm to improve and become more tailored to the specific process.

Compared to traditional solutions, maintaining and evolving the system is much simpler. Adding a new field for extraction, a document category for classification, or a new supported language does not require writing any code. Instead, collecting new documents and retraining the system can be handled even by non-technical staff. Ultimately, the most advanced Intelligent Document Processing (IDP) solutions achieve unprecedented accuracy, far surpassing the performance of traditional methods.

myBiros and benefits

myBiros is a high-performance, easy-to-use, and versatile Intelligent Document Processing (IDP) solution that enables the automatic processing of documents. Its core functionalities include information extraction and automatic document classification, all delivered through a prebuilt set of ready-to-use APIs with pre-trained models for common use cases. Additionally, MyBiros offers the flexibility to retrain the entire pipeline (including both the OCR engine and document interpretation system) for custom scenarios.

Using advanced deep learning techniques to analyze multimodal features, myBiros processes all types of documents with a single solution. Its use of pre-trained models and data augmentation techniques allows the system to train on a smaller volume of data, making it suitable for automating processes with limited document volumes. The solution also provides a scoring mechanism, which reduces false positives by enabling the review of low-confidence data, thus minimizing errors. With Human-in-the-Loop interaction and continuous learning, users can correct errors while ensuring the system improves over time.

The cloud-based architecture of myBiros ensures high scalability, allowing it to process varying volumes of documents without requiring the upfront allocation of costly resources. Additional features include tabular data processing, artifact detection within images, and the ability to handle heterogeneous, multi-language documents in a single pipeline.

These features make myBiros an optimal solution for bill processing, enabling quick and accurate identification of all relevant information. If you're curious about how MyBiros can simplify bill processing, contact us. We're ready to help!

Articles in the same category

digital transformation and automated document processing

Digital transformation and document hyperautomation

Digital transformation involves implementing innovative technologies and redefining business processes to enable automation.

Read it now
Expense management

Why automate Expense Management processes?

Many companies still manage expenses manually, leading to reduced employee productivity. Today, expense management can be automated, significantly cutting down on time, costs, and the repetitive tasks that often lead to frustration.

Read it now
risks of manual document processing

Risks of Manual Document Processing

Every business department relies on document management to record information, communicate with customers and suppliers, and store critical data. When done manually, these activities expose the company to numerous risks.

Read it now
Hands typing on keyboard

Companies still rely on manual data entry

Many companies still rely on manual data entry, which leads to numerous challenges. Today, this process can be automated using modern technologies, eliminating repetitive tasks and significantly reducing both time and costs.

Read it now
IDP Intelligent Document Processing

Intelligent Document Processing (IDP)

Intelligent Document Processing refers to a suite of tools and solutions based on deep learning techniques, designed to automate the processing of all types of documents.

Read it now
document classification with myBiros

IDP: automatic document classification

In this article, you will find details about automatic document classification (IDP): what it is, the steps involved in the process, various classification methods, and the advantages of utilizing this innovative software.

Read it now