VQA Project: Validated Question Awsering

Horizon 2020 Asse I – PON “Imprese e competitività” 2014-2020 FESR |Fondo per la Crescita Sostenibile – Sportello Fabbrica Intelligente DM 05/03/2018 – DD 20/11/2018. Progetto “Validated Question Answering” Altilia is proud to be the lead partner of the VQA project. As a technology company specializing in artificial intelligence and machine learning, we have brought our expertise and innovation to the project, leading a team of talented professionals, and collaborating with academic and research institutions.

Project Description

The VQA project [“Validated Question Answering” – n. F/190114/01/X44 – CUP: B28I20000040005 PON “IC” 2014-2020 FESR – And for sustainable growth – Sustainable manufacturing DM 05.03.2018 – DD 20/11/2018, art. 38, 47 e 48 D.P.R. n. 445 of 28/12/2000] is an industrial research and experimental development activity aimed at creating new methodologies, algorithms, and approaches for question-answering technologies that can leverage typical blockchain mechanisms to develop software solutions in various application contexts, with a particular focus on automating processes in the financial and healthcare industries.

The goal of the project is to develop innovative tools for cognitive automation, also known as cognitive robotic process automation systems. These tools are aimed to simplify and automate complex tasks or entire business processes that require specific human cognitive abilities.

Project results

The VQA project, has led to the creation of a prototype that utilizes advanced technologies and algorithms to provide semi-automatic text recognition support in various research and business fields, with the goal of helping business domain experts to recognize text portions containing answers to specific questions of interest, in domain-specific documents.

The prototype involves a Human-In-The-Loop Machine Learning (HITL ML) workflow for the cyclical execution of data extraction and data enrichment activities, aimed at training ML models.In this scenario, human experts interact with the machine for reading comprehension tasks, by providing support through annotations and answer validation, with the aim of fine-tuning ML models for Open-Domain Question Answering. The prototype also allows users to deploy trained models into execution workflows to automate the extraction of labelled data from new documents.

The VQA prototype has been tested in financial and clinical-health scenarios, where it has helped banks and other financial institutions for credit scoring assessments, and supported dataset labelling operations in the medical field. To ensure the integrity of the labelled dataset, the overall prototype model uses sophisticated distributed ledger techniques for tracking and inviolability, and it has been designed for auditing systems’ certification compliance.

Achieved goals and objectives

The VQA project has successfully achieved multiple objectives and goals in the financial domain, covering the following categories:

Tick Democratization and explainability of AI techniques:

Adaptability of the prototype to the resolution of various Layout Analysis and NLP tasks and in particular of Open-Domain Question Answering in the same HITL ML scenario and in a context of Workflow dominated by the business domain expert with the cooperation of the machines of which it is hidden the processing complexity and emphasized instead the automation functionality for a precious operational support which must always be checked and possibly validated by domain experts.

No-code/low-code interface in a cloud-based environment and user-friendly tools designed for business domain experts to simplify the activities of search (syntactic and semantic full-text search with filters), labeling (point-and-click document annotation), supervision of the annotated dataset (annotation graphs), model training, auto-labeling (programmatic/rule-based or neural approach), and validation.

Conjugation of sophisticated AI paradigms (Weak Supervision, Transfer Learning, Active Learning, etc.) to facilitate the training process and reduce the manual intervention of the human labeler minimizing time, costs and resources that can be dedicated to more complex analysis activities.

Formalization of methodologies for the use of the prototype by the business domain experts in the scenarios of HITL ML and Document Reading and Comprehension for the Question Answering.

Advanced architectural techniques and functional features based on user requirements and demonstrators aimed at satisfying the needs of use in real scenarios that require the rapid and efficient extraction and processing of large amount of data and metadata.

approach based on a syntactic or neural retriever-reader pipeline for a powerful end-to-end Open-Domain QA solution.

Application of sophisticated certification techniques and auditing of labeled datasets.

Tick Agile Integration

Application of interfacing and communication standards between the various modules of the modular architecture.

Extensibility of the prototype designed to include new modules and models for carrying out other datapoints.

Participation of the business domain experts in the integration plan through the inclusion of the sophisticated trained models in process flows conceived as simple forms of cognitive automation.

Tick Usability in real business scenarios

ESG (Environmental, Social and Governance) data collection: introduction of advanced methods to collect ESG data from different sources within unstructured texts and document such as company website, annual report, proxy report, sustainability report, corporate social report (CSR), news, social media, company reviews, and so on.

Experimentation of the prototype within the project “ESG – Alternative data in credit management” recognized by Banca d’Italia among the 10 best innovative projects selected as part of the “Call for Proposals 2021” for innovative applications of AI technologies in banking.

Tick Dissemination of results:

Paper submitted and accepted by International Conference on Enterprise Information Systems (ICEIS 2023) entitled: “ESG Data Collection with Adaptive AI”.

Presentation of experimental results during the “Salone dei Pagamenti” event promoted by ABI (Italian Banking Association) in collaboration with Banca d’Italia and Fintech Milano Hub for AI innovations in ESG and Credit Management (November 23rd to 25th 2022).

Tick Evolution prospects

evolutionary perspectives of the prototype towards the creation of a robust and efficient Intelligent Document Processing (IDP) platform capable of adopting taxonomic standards for extracted knowledge representation and providing increasingly advanced and colloquial forms of automation and integration in real business scenarios also in consideration of possible future interactions with third-party Robotic Process Automation (RPA) applications.

Research partners:

Open Knowledge Technologies – OKT

Open Knowledge Technologies – OKT

OKT is a spin-off company of the University of Calabria, focused on implementing the results of industrial research to create innovative and interoperable business solutions, with a specialization in integrating and developing open-source technologies to meet the needs of different economic and industrial realities, starting from proposals from the academic world and research for new IT solutions. OKT also aims to support companies in the market by providing tools that incorporate cutting-edge technologies.

https://www.okt-srl.com/

Università “Magna Graecia” di Catanzaro

Università “Magna Graecia” di Catanzaro

The University “Magna Graecia” of Catanzaro is a public university established in 1998 in the Calabria region of Italy. The university offers a range of undergraduate, graduate, and post-graduate courses in various disciplines such as Medicine, Economics, Law and Social Sciences, with a strong focus on research and innovation.

https://web.unicz.it/