Add RAG benchmarking demo notebook
We have introduced a comprehensive notebook, rag_benchmarking_demo.ipynb
, to help you benchmark Retrieval-Augmented Generation (RAG) models using the ValidMind library. This notebook allows you to compare multiple configurations for the RAG RFP use case.
You can evaluate two embedding models: OpenAI’s text-embedding-3-small
and text-embedding-3-large
. Additionally, the notebook includes two retrieval models with different k
parameters, 5 and 10. There are also two LLM generators available: gpt-3.5-turbo
and gpt-4o
. This results in four complete RAG configurations for thorough testing.
The notebook replicates tests from rag_documentation_demo.ipynb
. It evaluates context precision, faithfulness, and answer correctness. It also assesses generation quality using metrics like ROUGE, BLEU, and BERT Score. Furthermore, it conducts bias and toxicity evaluations across all configurations.