Trustworthy RAG in Wireless Test & Measurement: Retrieval Fine-Tuning and Tables as Images

AAI Whitepaper Mockup Trustworthy RAG

Our RAG use case study on Retrieval-Augmented Generation (RAG) within the test and measurement industry highlights common challenges in the technical domain and explores effective RAG evaluation techniques. We demonstrate how Large Language Models (LLMs) can be leveraged to scale up RAG evaluation reliably, and address industry-specific challenges such as multilingual datain-domain data, and complex tabular structures. Our vision pipeline and retrieval fine-tuning solutions have significantly improved the accuracy of RAG, proving the value of customized RAG applications for the wireless test and measurement sector.

Retrieval-Augmented Generation (RAG) is an AI method that combines the power of Large Language Models (LLMs) with information retrieval techniques, enabling systems to generate answers based on external data sources like proprietary company data. In this use case study, we explore common challenges in RAG for the technical domain, including multilingual and specific in-domain datatables as images, and underspecified questions, with an overarching focus on trustworthy RAG evaluation techniques. Our proposed solutions, including a vision pipeline and retrieval fine-tuning, have led to a 46% improvement in the correctness of answers compared to the initial RAG pipeline. This demonstrates the benefits of customized RAG applications in the test and measurement industry and highlights the importance of a tailored approach for RAG evaluation.

Key Takeaways:

  1. Augmenting a text-based RAG pipeline with a vision pipeline can enhance answer correctness, though its necessity depends on the strength of the text-based pipeline and the complexity of the questions.
  2. Transitioning from a Q&A system to a dialogue chatbot helps resolve uncertain user questions and improve the user experience.
  3. Conducting retrieval fine-tuning in tandem with inexpensive synthetic data generation through LLMs is a robust solution for enhancing the retrieval performance of RAG.
  4. Imitating human evaluation through LLMs provides a scalable and trustworthy RAG evaluation method, enabling rapid prototyping and iteration.

Authors:

  • Dr. Christian Geishauser, Generative AI Engineer (appliedAI Initiative GmbH)
  • Johannes Birk, Generative AI Engineer (appliedAI Initiative GmbH)
  • Bernhard Pflugfelder, Head of Generative AI (appliedAI Initiative GmbH)

Contributors:

  • Dr. Paul Yu-Chun Chang, Senior AI Expert: Foundation Models - Large Language Models (LLMs) (appliedAI Initiative GmbH)
  • Dr. Ivan Rodriguez, Principal AI Engineer (appliedAI Initiative GmbH)
  • Dr. Sebastian Husch Lee, Solution Engineering Tech Lead (deepset GmbH)
  • Laura Luckert, Senior Applied NLP Engineer (deepset GmbH)
  • Philipp Joppich, Machine Learning Engineer (Rohde & Schwarz GmbH & Co. KG)
  • Johannes Steffens, Senior Directory (Rohde & Schwarz GmbH & Co. KG)
  • Dr. Andrew Schaefer, Technology Coordinator, AI (Rohde & Schwarz GmbH & Co. KG)