Faithfulness
Evaluates the faithfulness of the generated answers with respect to retrieved contexts.
This metric uses a judge LLM to measure the factual consistency of the generated answer against the given context(s). It is calculated using the generated text answer
from the LLM and the retrieved contexts
which come from some RAG process. The score is a value between 0 and 1, where a higher score indicates that the generated answer is more faithful to the given context(s).
The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context. To calculate this a set of claims from the generated answer is first identified. Then each one of these claims are cross checked with given context to determine if it can be inferred from given context or not. The faithfulness score formula is as follows:
\[ \\text{Faithfulness score} = {|\\text{Number of claims in the generated answer that can be inferred from given context}| \\over |\\text{Total number of claims in the generated answer}|} \]
Configuring Columns
This metric requires the following columns in your dataset:
user_input
(str): The user input that the model is responding to.retrieved_contexts
(List[str]): A list of text contexts which are retrieved to generate the answer.response
(str): The response generated by the model which will be evaluated for faithfulness against the given contexts.
If the above data is not in the appropriate column, you can specify different column names for these fields using the parameters retrieved_contexts_column
and response_column
.
For example, if your dataset has this data stored in different columns, you can pass the following parameters:
{": "context_info",
retrieved_contexts_columnresponse_column": "my_answer_col",
": "user_input",
user_input_column}
If the data is stored as a dictionary in another column, specify the column and key like this:
= dataset.prediction_column(model)
pred_col = {
params ": f"{pred_col}.retrieved_contexts",
retrieved_contexts_columnresponse_column": f"{pred_col}.response",
": "user_input",
user_input_column}
For more complex situations, you can use a function to extract the data:
= dataset.prediction_column(model)
pred_col = {
params ": lambda row: [row[pred_col]["context_message"]],
retrieved_contexts_columnresponse_column": lambda row: "\\n\\n".join(row[pred_col]["messages"]),
": "user_input",
user_input_column}