Add RAG benchmarking demo notebook

validmind-library

2.8.20

documentation

enhancement

Published

April 16, 2025

We have introduced a comprehensive notebook, rag_benchmarking_demo.ipynb, to help you benchmark Retrieval-Augmented Generation (RAG) models using the ValidMind library. This notebook allows you to compare multiple configurations for the RAG RFP use case.

You can evaluate two embedding models: OpenAI’s text-embedding-3-small and text-embedding-3-large. Additionally, the notebook includes two retrieval models with different k parameters, 5 and 10. There are also two LLM generators available: gpt-3.5-turbo and gpt-4o. This results in four complete RAG configurations for thorough testing.

The notebook replicates tests from rag_documentation_demo.ipynb. It evaluates context precision, faithfulness, and answer correctness. It also assesses generation quality using metrics like ROUGE, BLEU, and BERT Score. Furthermore, it conducts bias and toxicity evaluations across all configurations.