VUB research teaches AI systems to read tables more accurately

Grafiek op ipad

The growing volume of reports, invoices, scientific publications and other business documents increasingly confronts companies and institutions with the challenge of processing information quickly and reliably. In his doctoral research at the VUB, Willy Carlos Tchuitcheu from the Mathematics & Data Science Research Group developed an innovative method that teaches computers to handle tables more effectively. His findings represent an important asset for applications in artificial intelligence and automatic document processing.

The key data in a document are often summarised in tables, and these frequently pose a problem for current AI systems. Many so-called Large Language Models – such as ChatGPT, for example – convert tables into linear text, causing the two-dimensional structure, the headings and the relationships between cells to be lost. This leads to errors and inaccuracies. “We discovered that many AI language models struggle with something called ‘order independence’,” says Tchuitcheu. “It means that when you swap the rows or columns of a table, the AI sees that table as a completely new table. That essentially shows that AI does not always truly understand the underlying structure of a table. As a result, the information can be misinterpreted.”

Understanding the structure of the table

Tchuitcheu therefore introduced the so-called Table Understanding principle, a theoretical framework that describes how humans interpret tables by automatically linking each cell to the correct row and column header. Based on that principle, he developed a method that no longer reduces tables to plain text, but also takes the structure of the table into account. “Our goal was to ensure that AI systems would understand the tables in a document more naturally,” Tchuitcheu explains. “We want to offer an alternative to simply imitating a principle determined by their training on textual data. If AI systems understand the underlying structure, just like humans do, this leads to more reliable analyses and faster, usable insights, especially in situations where table data play a strategic role.”

The new approach proves particularly robust, partly because it takes permutation invariance into account: the fact that tables usually retain their meaning even when rows or columns are rearranged. As a result, the model performs consistently, even when the form of the table changes.

Supervisor Prof. Dr Ann Dooms emphasises the importance of the research for the broader evolution of artificial intelligence. “Document processing is a crucial component in many social and economic processes,” she says. “The work of Willy Tchuitcheu shows that we can make AI systems far more reliable by fundamentally changing the way they look at tables. It opens the door to new applications in administrative automation, in scientific analysis and in data-intensive industries.”

Mathematical modelling remains necessary

The doctoral research, titled ‘Representation Learning for Table Understanding in Intelligent Document Processing’, shows strong results in two central applications: the automatic recognition of column types and answering questions based on table data. In addition, the method increases the speed and accuracy of information extraction, which is important for companies that process large volumes of documents.

Co-supervisor Prof. Dr Tan Lu: “Although document processing and automated reasoning increasingly rely on large language models (LLMs), mathematical modelling, such as the work of Tchuitcheu, remains enormously important. By teaching AI systems how to process the data in tables, the results of automated document processing become much more precise. At the same time, they can go further in interpreting that data and we also make progress in terms of transparency. This significantly increases the reliability of AI systems.”

Willy Carlos Tchuitcheu obtained his master’s degree in Mathematical Sciences in 2019 from the African Institute for Mathematical Sciences in Rwanda. He worked for three years as a research engineer at Camertronix in Cameroon and began his PhD at the VUB in 2021 within the Department of Mathematics and Data Science. His research resulted in three articles as first author in international journals, a patent application and a Best Poster Award at the Flanders AI Research Day 2021. In addition, he is co-author of two further publications, one of them again as first author.

Portret Tchuitcheu

In this article:

  • Why do AI systems still often misinterpret documents today, particularly when they contain tables?
  • How does this VUB PhD research teach computers to read tables in the same way that humans do?
  • What does this mean in practical terms for faster and more reliable processing of large volumes of documents?